Monitoring Kubernetes with Prometheus & Grafana
The industry-standard monitoring stack for Kubernetes: Prometheus scrapes metrics, Grafana visualizes them, Alertmanager fires alerts.
Architecture Overview
K8s Workloads Prometheus Stack Visualization
───────────── ──────────────── ─────────────
Pod /metrics ──┐
Node Exporter ─┤──► Prometheus ──► Alertmanager ──► PagerDuty/Slack
kube-state- ─┘ │
metrics │
└──────────────────────────► Grafana Dashboards
Key components:
| Component | Role |
|---|---|
| Prometheus | Scrapes and stores time-series metrics |
| Alertmanager | Routes alerts to Slack, PagerDuty, email, etc. |
| Grafana | Dashboards and visualization |
| node-exporter | Exposes hardware/OS metrics from each node |
| kube-state-metrics | Exposes K8s object state (pod counts, deployment status) |
| metrics-server | Lightweight in-cluster metrics for HPA/kubectl top |
| Prometheus Operator | Manages Prometheus via CRDs (PrometheusRule, ServiceMonitor) |
Installation via kube-prometheus-stack
The fastest production-ready setup (Prometheus + Grafana + Alertmanager + node-exporter + kube-state-metrics):
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace \
--set grafana.adminPassword=admin123 \
--set prometheus.prometheusSpec.retention=15d \
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi \
--wait
# Verify all pods running
kubectl get pods -n monitoring
# Access Grafana (default: admin/admin123)
kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80
# Access Prometheus UI
kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090
# Access Alertmanager
kubectl port-forward -n monitoring svc/kube-prometheus-stack-alertmanager 9093:9093
Prometheus Concepts
Metric Types
| Type | Description | Example |
|---|---|---|
| Counter | Monotonically increasing value | http_requests_total |
| Gauge | Value that goes up and down | memory_usage_bytes |
| Histogram | Distribution of values in buckets | http_request_duration_seconds |
| Summary | Pre-computed quantiles | rpc_duration_seconds{quantile="0.9"} |
PromQL — Prometheus Query Language
# ── Basic queries ──────────────────────────────────────────────
# All HTTP requests
http_requests_total
# Filter by label
http_requests_total{job="api", status="200"}
# Current CPU usage per pod
rate(container_cpu_usage_seconds_total[5m])
# Memory usage in MB
container_memory_usage_bytes / 1024 / 1024
# ── Aggregations ───────────────────────────────────────────────
# Sum requests by service
sum(rate(http_requests_total[5m])) by (service)
# Average memory per namespace
avg(container_memory_usage_bytes) by (namespace)
# Max CPU across all pods in a deployment
max(rate(container_cpu_usage_seconds_total[5m])) by (pod)
# ── Common K8s metrics ─────────────────────────────────────────
# Pod CPU usage %
100 * sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (pod)
/ sum(kube_pod_container_resource_limits{resource="cpu"}) by (pod)
# Pod memory usage %
100 * sum(container_memory_usage_bytes{container!=""}) by (pod)
/ sum(kube_pod_container_resource_limits{resource="memory"}) by (pod)
# Pod restart rate
increase(kube_pod_container_status_restarts_total[1h])
# Number of pods not running
count(kube_pod_status_phase{phase!="Running", phase!="Succeeded"}) by (namespace, phase)
# Node CPU usage %
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# Node memory available
node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100
# HTTP error rate
sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))
# P99 request latency
histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))
Instrumenting Your Application
Expose /metrics from your app
// Go example with prometheus/client_golang
import "github.com/prometheus/client_golang/prometheus"
import "github.com/prometheus/client_golang/prometheus/promhttp"
var httpRequests = prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "http_requests_total",
Help: "Total HTTP requests",
},
[]string{"method", "status"},
)
func init() {
prometheus.MustRegister(httpRequests)
}
http.Handle("/metrics", promhttp.Handler())
# Python example with prometheus_client
from prometheus_client import Counter, Histogram, start_http_server
REQUEST_COUNT = Counter('http_requests_total', 'Total requests', ['method', 'endpoint'])
REQUEST_LATENCY = Histogram('http_request_duration_seconds', 'Request latency')
start_http_server(8080) # exposes /metrics
Tell Prometheus to scrape it — ServiceMonitor
# Prometheus Operator CRD — auto-discovers services to scrape
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: api-monitor
namespace: monitoring
labels:
release: kube-prometheus-stack # must match Prometheus selector
spec:
namespaceSelector:
matchNames: [production]
selector:
matchLabels:
app: api # matches Service labels
endpoints:
- port: metrics # port name on the Service
interval: 15s
path: /metrics
# The Service must expose the metrics port
apiVersion: v1
kind: Service
metadata:
name: api
namespace: production
labels:
app: api
spec:
selector:
app: api
ports:
- name: http
port: 80
- name: metrics # ← this port is scraped
port: 8080
Alerting Rules
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: api-alerts
namespace: monitoring
labels:
release: kube-prometheus-stack
spec:
groups:
- name: api.rules
interval: 30s
rules:
# Alert: high error rate
- alert: HighErrorRate
expr: |
sum(rate(http_requests_total{status=~"5.."}[5m])) by (service)
/
sum(rate(http_requests_total[5m])) by (service) > 0.05
for: 2m
labels:
severity: critical
team: backend
annotations:
summary: "High error rate on {{ $labels.service }}"
description: "Error rate is {{ $value | humanizePercentage }} (threshold 5%)"
# Alert: pod crash looping
- alert: PodCrashLooping
expr: increase(kube_pod_container_status_restarts_total[1h]) > 5
for: 5m
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.pod }} is crash looping"
description: "{{ $value }} restarts in the last hour"
# Alert: high memory usage
- alert: PodHighMemoryUsage
expr: |
container_memory_usage_bytes{container!=""}
/ container_spec_memory_limit_bytes{container!=""} > 0.85
for: 5m
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.pod }} memory > 85%"
# Alert: node not ready
- alert: NodeNotReady
expr: kube_node_status_condition{condition="Ready",status="true"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Node {{ $labels.node }} is NotReady"
# Alert: deployment replicas mismatch
- alert: DeploymentReplicasMismatch
expr: |
kube_deployment_spec_replicas != kube_deployment_status_available_replicas
for: 5m
labels:
severity: warning
annotations:
summary: "Deployment {{ $labels.deployment }} has fewer replicas than desired"
Alertmanager Configuration
# values.yaml for kube-prometheus-stack
alertmanager:
config:
global:
slack_api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
route:
receiver: 'slack-critical'
group_by: ['alertname', 'namespace']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
routes:
- match:
severity: critical
receiver: 'pagerduty-critical'
- match:
severity: warning
receiver: 'slack-warning'
receivers:
- name: 'slack-critical'
slack_configs:
- channel: '#alerts-critical'
title: '{{ .GroupLabels.alertname }}'
text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'
send_resolved: true
- name: 'pagerduty-critical'
pagerduty_configs:
- routing_key: 'YOUR_PAGERDUTY_KEY'
- name: 'slack-warning'
slack_configs:
- channel: '#alerts-warning'
send_resolved: true
Key Grafana Dashboards
Import these by ID in Grafana → Dashboards → Import:
| Dashboard | ID | What it shows |
|---|---|---|
| Kubernetes cluster overview | 315 |
Nodes, pods, CPU/memory |
| Kubernetes pod resources | 6781 |
Per-pod CPU/memory/network |
| Node Exporter Full | 1860 |
Host-level OS metrics |
| NGINX Ingress Controller | 9614 |
Request rate, latency, errors |
| Cert-manager | 11001 |
TLS cert expiry |
| ArgoCD | 14584 |
GitOps sync status |
Common Interview Questions
Q: What is the difference between metrics-server and Prometheus?
metrics-serveris lightweight and stores only the current value — used bykubectl topand HPA. Prometheus stores historical time-series data and supports complex queries, alerting, and long-term retention.
Q: What is a ServiceMonitor?
A CRD from the Prometheus Operator. It tells Prometheus which Services to scrape for metrics, replacing manual
scrape_configs. Prometheus automatically discovers ServiceMonitor objects by label selector.
Q: How does HPA use Prometheus metrics?
By default HPA uses metrics-server. For custom metrics (e.g., queue depth), you deploy the Prometheus Adapter which bridges Prometheus queries to the
custom.metrics.k8s.ioAPI that HPA reads from.
Q: What is the difference between a Counter and a Gauge?
A Counter only ever increases (like total HTTP requests). A Gauge can go up and down (like current memory usage). Use
rate()on Counters, use Gauges directly.