Monitoring Kubernetes with Prometheus & Grafana

The industry-standard monitoring stack for Kubernetes: Prometheus scrapes metrics, Grafana visualizes them, Alertmanager fires alerts.

Architecture Overview

 K8s Workloads         Prometheus Stack              Visualization
 ─────────────         ────────────────              ─────────────
 
 Pod /metrics ──┐
 Node Exporter ─┤──► Prometheus ──► Alertmanager ──► PagerDuty/Slack
 kube-state-   ─┘      │
 metrics               │
                        └──────────────────────────► Grafana Dashboards

Key components:

Component	Role
Prometheus	Scrapes and stores time-series metrics
Alertmanager	Routes alerts to Slack, PagerDuty, email, etc.
Grafana	Dashboards and visualization
node-exporter	Exposes hardware/OS metrics from each node
kube-state-metrics	Exposes K8s object state (pod counts, deployment status)
metrics-server	Lightweight in-cluster metrics for HPA/kubectl top
Prometheus Operator	Manages Prometheus via CRDs (`PrometheusRule`, `ServiceMonitor`)

Installation via kube-prometheus-stack

The fastest production-ready setup (Prometheus + Grafana + Alertmanager + node-exporter + kube-state-metrics):

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --set grafana.adminPassword=admin123 \
  --set prometheus.prometheusSpec.retention=15d \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi \
  --wait

# Verify all pods running
kubectl get pods -n monitoring

# Access Grafana (default: admin/admin123)
kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80

# Access Prometheus UI
kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090

# Access Alertmanager
kubectl port-forward -n monitoring svc/kube-prometheus-stack-alertmanager 9093:9093

Prometheus Concepts

Metric Types

Type	Description	Example
Counter	Monotonically increasing value	`http_requests_total`
Gauge	Value that goes up and down	`memory_usage_bytes`
Histogram	Distribution of values in buckets	`http_request_duration_seconds`
Summary	Pre-computed quantiles	`rpc_duration_seconds{quantile="0.9"}`

PromQL — Prometheus Query Language

# ── Basic queries ──────────────────────────────────────────────

# All HTTP requests
http_requests_total

# Filter by label
http_requests_total{job="api", status="200"}

# Current CPU usage per pod
rate(container_cpu_usage_seconds_total[5m])

# Memory usage in MB
container_memory_usage_bytes / 1024 / 1024

# ── Aggregations ───────────────────────────────────────────────

# Sum requests by service
sum(rate(http_requests_total[5m])) by (service)

# Average memory per namespace
avg(container_memory_usage_bytes) by (namespace)

# Max CPU across all pods in a deployment
max(rate(container_cpu_usage_seconds_total[5m])) by (pod)

# ── Common K8s metrics ─────────────────────────────────────────

# Pod CPU usage %
100 * sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (pod)
  / sum(kube_pod_container_resource_limits{resource="cpu"}) by (pod)

# Pod memory usage %
100 * sum(container_memory_usage_bytes{container!=""}) by (pod)
  / sum(kube_pod_container_resource_limits{resource="memory"}) by (pod)

# Pod restart rate
increase(kube_pod_container_status_restarts_total[1h])

# Number of pods not running
count(kube_pod_status_phase{phase!="Running", phase!="Succeeded"}) by (namespace, phase)

# Node CPU usage %
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Node memory available
node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100

# HTTP error rate
sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))

# P99 request latency
histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))

Instrumenting Your Application

Expose /metrics from your app

// Go example with prometheus/client_golang
import "github.com/prometheus/client_golang/prometheus"
import "github.com/prometheus/client_golang/prometheus/promhttp"

var httpRequests = prometheus.NewCounterVec(
  prometheus.CounterOpts{
    Name: "http_requests_total",
    Help: "Total HTTP requests",
  },
  []string{"method", "status"},
)

func init() {
  prometheus.MustRegister(httpRequests)
}

http.Handle("/metrics", promhttp.Handler())

# Python example with prometheus_client
from prometheus_client import Counter, Histogram, start_http_server

REQUEST_COUNT = Counter('http_requests_total', 'Total requests', ['method', 'endpoint'])
REQUEST_LATENCY = Histogram('http_request_duration_seconds', 'Request latency')

start_http_server(8080)  # exposes /metrics

Tell Prometheus to scrape it — ServiceMonitor

# Prometheus Operator CRD — auto-discovers services to scrape
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: api-monitor
  namespace: monitoring
  labels:
    release: kube-prometheus-stack   # must match Prometheus selector
spec:
  namespaceSelector:
    matchNames: [production]
  selector:
    matchLabels:
      app: api                       # matches Service labels
  endpoints:
  - port: metrics                    # port name on the Service
    interval: 15s
    path: /metrics

# The Service must expose the metrics port
apiVersion: v1
kind: Service
metadata:
  name: api
  namespace: production
  labels:
    app: api
spec:
  selector:
    app: api
  ports:
  - name: http
    port: 80
  - name: metrics      # ← this port is scraped
    port: 8080

Alerting Rules

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: api-alerts
  namespace: monitoring
  labels:
    release: kube-prometheus-stack
spec:
  groups:
  - name: api.rules
    interval: 30s
    rules:

    # Alert: high error rate
    - alert: HighErrorRate
      expr: |
        sum(rate(http_requests_total{status=~"5.."}[5m])) by (service)
        /
        sum(rate(http_requests_total[5m])) by (service) > 0.05
      for: 2m
      labels:
        severity: critical
        team: backend
      annotations:
        summary: "High error rate on {{ $labels.service }}"
        description: "Error rate is {{ $value | humanizePercentage }} (threshold 5%)"

    # Alert: pod crash looping
    - alert: PodCrashLooping
      expr: increase(kube_pod_container_status_restarts_total[1h]) > 5
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Pod {{ $labels.pod }} is crash looping"
        description: "{{ $value }} restarts in the last hour"

    # Alert: high memory usage
    - alert: PodHighMemoryUsage
      expr: |
        container_memory_usage_bytes{container!=""}
        / container_spec_memory_limit_bytes{container!=""} > 0.85
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Pod {{ $labels.pod }} memory > 85%"

    # Alert: node not ready
    - alert: NodeNotReady
      expr: kube_node_status_condition{condition="Ready",status="true"} == 0
      for: 1m
      labels:
        severity: critical
      annotations:
        summary: "Node {{ $labels.node }} is NotReady"

    # Alert: deployment replicas mismatch
    - alert: DeploymentReplicasMismatch
      expr: |
        kube_deployment_spec_replicas != kube_deployment_status_available_replicas
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Deployment {{ $labels.deployment }} has fewer replicas than desired"

Alertmanager Configuration

# values.yaml for kube-prometheus-stack
alertmanager:
  config:
    global:
      slack_api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'

    route:
      receiver: 'slack-critical'
      group_by: ['alertname', 'namespace']
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 4h
      routes:
      - match:
          severity: critical
        receiver: 'pagerduty-critical'
      - match:
          severity: warning
        receiver: 'slack-warning'

    receivers:
    - name: 'slack-critical'
      slack_configs:
      - channel: '#alerts-critical'
        title: '{{ .GroupLabels.alertname }}'
        text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'
        send_resolved: true

    - name: 'pagerduty-critical'
      pagerduty_configs:
      - routing_key: 'YOUR_PAGERDUTY_KEY'

    - name: 'slack-warning'
      slack_configs:
      - channel: '#alerts-warning'
        send_resolved: true

Key Grafana Dashboards

Import these by ID in Grafana → Dashboards → Import:

Dashboard	ID	What it shows
Kubernetes cluster overview	`315`	Nodes, pods, CPU/memory
Kubernetes pod resources	`6781`	Per-pod CPU/memory/network
Node Exporter Full	`1860`	Host-level OS metrics
NGINX Ingress Controller	`9614`	Request rate, latency, errors
Cert-manager	`11001`	TLS cert expiry
ArgoCD	`14584`	GitOps sync status

Common Interview Questions

Q: What is the difference between metrics-server and Prometheus?

metrics-server is lightweight and stores only the current value — used by kubectl top and HPA. Prometheus stores historical time-series data and supports complex queries, alerting, and long-term retention.

Q: What is a ServiceMonitor?

A CRD from the Prometheus Operator. It tells Prometheus which Services to scrape for metrics, replacing manual scrape_configs. Prometheus automatically discovers ServiceMonitor objects by label selector.

Q: How does HPA use Prometheus metrics?

By default HPA uses metrics-server. For custom metrics (e.g., queue depth), you deploy the Prometheus Adapter which bridges Prometheus queries to the custom.metrics.k8s.io API that HPA reads from.

Q: What is the difference between a Counter and a Gauge?

A Counter only ever increases (like total HTTP requests). A Gauge can go up and down (like current memory usage). Use rate() on Counters, use Gauges directly.