PagerDuty alert: 'payments-api pod restarts > 10 in 5m' at 02:14 UTC. Grafana showed memory usage hitting 512Mi limit, OOMKill events every 90 seconds, and HPA scaling from 4 to maxReplicas 20 within 3 minutes. Node CPU utilization spiked to 94% across all three nodes as 20 pods competed for CPU during startup. p99 latency jumped from 180ms to 12,400ms. Customer-facing error rate hit 38%.