Real
DevOps
Interview
Questions
Companies
Mock Interview
Study Guide
Tools & Tech
Search questions, tools...
⌘K
Questions
Companies
Mock
Guide
Tools
Home
/
Kubernetes
/
Interview Questions
/
Advanced
📝
Advanced Interview Questions
advanced
41 questions with detailed answers, code examples, and interview tips.
Beginner
Intermediate
Advanced
Architect
≣
List
⚡
Flashcard
All (41)
scheduling (4)
general (8)
pods (3)
networking (4)
rbac (3)
storage (1)
cicd (4)
monitoring (4)
security (2)
Kubernetes (8)
Asked by:
All Companies
Amazon
Capital One
Confluent
Google
Gremlin
Intuit
LinkedIn
Microsoft
Netflix
Shopify
Spotify
Stripe
Tesla
Uber
Value Momentum
Weaveworks
0/41 reviewed
01
How does the Kubernetes scheduler decide which node to place a Pod on (filtering and scoring phases)?
advanced
scheduling
Google
Amazon
Microsoft
02
What happens when etcd loses quorum and how do you recover the cluster?
advanced
general
Google
Amazon
Microsoft
03
How do HPA and VPA work together for autoscaling in production?
advanced
scheduling
Shopify
Netflix
Uber
04
What are init containers and sidecar containers -- when to use each pattern?
advanced
pods
05
How does CoreDNS work internally in Kubernetes and what DNS records are created?
advanced
networking
06
How does RBAC work -- Roles, ClusterRoles, RoleBindings, and ServiceAccounts?
advanced
rbac
Capital One
Stripe
Value Momentum
07
What are taints, tolerations, and node affinity -- how do they control Pod scheduling?
advanced
scheduling
08
How does Kubernetes handle graceful shutdown -- preStop hooks, SIGTERM, and terminationGracePeriod?
advanced
pods
09
What is a StatefulSet and how does it differ from a Deployment for stateful workloads?
advanced
storage
10
How do NetworkPolicies work -- what is the default behavior and how do you implement zero-trust networking?
advanced
networking
11
How do you triage a broad production DevOps incident when the first report only says the application is unhealthy?
advanced
general
12
How do you decide whether to roll back, scale up, or restart services during a Kubernetes production incident?
advanced
cicd
13
What Kubernetes evidence would you collect before diagnosing whether a production outage is caused by the application, the cluster, networking, or the database?
advanced
networking
14
How should a DevOps engineer turn production incident triage into a permanent remediation plan?
advanced
monitoring
15
How do you handle a deployment that passes Dev and UAT but fails in Production, and what systematic approach prevents environment-specific failures?
advanced
cicd
Value Momentum
16
How do you troubleshoot a production incident in Kubernetes and maintain RCA documentation to prevent the same issue from recurring?
advanced
general
Value Momentum
17
How do you identify whether a pod restart is caused by OOMKilled, a connectivity failure, or an application-level bug?
advanced
pods
Value Momentum
Netflix
Uber
Shopify
18
How do you manage access to EKS clusters across L1/L2/Dev/Admin levels, and how do IAM and RBAC connect?
advanced
rbac
Value Momentum
Capital One
Stripe
19
How do you embed shift-left security in a Kubernetes CI/CD pipeline using SAST, DAST, and container scanning?
advanced
security
20
How do you implement least-privilege RBAC in EKS for dev teams, CI/CD pipelines, and production access?
advanced
rbac
Capital One
Stripe
Value Momentum
21
How do you manage secrets in Kubernetes using HashiCorp Vault with auto-rotation and zero application changes?
advanced
security
22
How do you implement chaos engineering in Kubernetes using LitmusChaos or Gremlin, and how do you control blast radius?
advanced
general
Netflix
Amazon
Gremlin
23
How do you design an observability stack with Prometheus, Grafana, and OpenTelemetry for a microservices platform?
advanced
monitoring
24
How do you implement distributed tracing across microservices to diagnose latency in a payments flow?
advanced
monitoring
25
How do you validate multi-AZ and multi-region failover actually works before you need it in production?
advanced
networking
26
How do you define SLOs, SLIs, and error budgets for a Kubernetes-hosted payments service, and how do they drive engineering decisions?
advanced
monitoring
Google
Netflix
LinkedIn
27
How do you run blameless post-mortems after a production incident, and what makes them effective vs just a checklist?
advanced
general
28
How do you implement capacity planning and load forecasting for Kubernetes clusters to handle traffic spikes?
advanced
scheduling
29
How do you design CI/CD pipelines that deploy to Kubernetes with approval gates, canary analysis, and automatic rollback?
advanced
cicd
Amazon
Netflix
Spotify
30
How do you implement GitOps with Terraform Enterprise and ArgoCD to manage infrastructure and applications together?
advanced
cicd
Intuit
Tesla
Weaveworks
31
How do you run Kafka on Kubernetes using Strimzi, and what are the production challenges?
advanced
general
LinkedIn
Uber
Confluent
32
How does Kafka enable event-driven microservices, and how do you guarantee message ordering?
advanced
general
LinkedIn
Uber
Confluent
33
How do you handle Kafka consumer lag and backpressure in high-throughput payment processing?
advanced
general
LinkedIn
Uber
Confluent
34
A pod keeps getting OOMKilled but the application's heap usage looks normal at 60% of the container memory limit. What is happening and how do you debug it?
advanced
Kubernetes
35
Your Kubernetes cluster autoscaler is not scaling up even though pods are stuck in Pending state. Walk through your complete debugging process.
advanced
Kubernetes
36
Your service's p99 latency spiked from 50ms to 500ms but p50 remains at 20ms. The service has not been deployed recently. How do you investigate?
advanced
Kubernetes
37
Amazon EKS Interview: Explain how IAM Roles for Service Accounts (IRSA) works internally, including the OIDC provider trust chain, and describe a production scenario where IRSA misconfiguration caused a security incident.
advanced
Kubernetes
38
You are using Karpenter for node provisioning on EKS and notice that during a traffic spike, new nodes are provisioned but pods remain Pending for 3-4 minutes. What is causing the delay and how do you optimize node startup time?
advanced
Kubernetes
39
A Kubernetes cluster experiences a cascading failure where one microservice's pod crashes cause a chain reaction bringing down 15 dependent services. Describe the failure propagation mechanism and what architectural patterns prevent this.
advanced
Kubernetes
40
Your Kubernetes cluster's TLS certificates for the API server, kubelet, and etcd are expiring in 48 hours. The cluster was bootstrapped with kubeadm. Walk through the complete renewal process and what happens if certificates actually expire.
advanced
Kubernetes
41
You need to implement a custom Kubernetes controller (operator) that automatically provisions cloud databases (RDS instances) when a custom resource is created. Describe the controller architecture, reconciliation loop design, and how you handle eventual consistency and error scenarios.
advanced
Kubernetes
✦
Feedback / Comment