Confluent

5 interview questions · kubernetes

kubernetesadvancedintermediate

How do you run Kafka on Kubernetes using Strimzi, and what are the production challenges?

advancedgeneralkubernetes

▼

Quick Answer

Strimzi provides a Kubernetes Operator that manages Kafka clusters as custom resources. It handles broker deployment, topic management, user authentication, and rolling upgrades. Production challenges include persistent storage sizing, rack awareness for HA, and managing JVM heap tuning across broker pods.

Detailed Answer

Think of running a post office inside a shopping mall. The mall provides space, power, and security (Kubernetes), but the post office needs its own sorting machines, mailboxes, and delivery routes (Kafka brokers, topics, partitions). Strimzi is the contractor that builds and maintains the post office inside the mall, handling construction, repairs, and expansions without shutting down mail service. Strimzi is a CNCF project that provides Kubernetes Operators for running Apache Kafka. Instead of manually deploying Kafka brokers as StatefulSets with complex configuration, you declare a Kafka custom resource that specifies replicas, storage, listeners, and authentication. The Strimzi Operator reconciles this into the actual Kubernetes resources: StatefulSets for brokers and ZooKeeper (or KRaft controllers), Services for client access, ConfigMaps for broker configuration, and PersistentVolumeClaims for data. Internally, the Strimzi Operator watches for Kafka, KafkaTopic, KafkaUser, and KafkaConnect custom resources. When you create a Kafka CR with 3 replicas and 100Gi storage, the Operator creates a StatefulSet with 3 pods, each with a PVC for data and a PVC for logs. It configures broker IDs, advertised listeners with proper DNS names, inter-broker replication, and rack awareness using topology labels. For client access, Strimzi creates bootstrap Services and per-broker Services so clients can discover and connect to specific brokers. At production scale, the key challenges are storage performance (Kafka is I/O intensive, requiring SSDs with provisioned IOPS), JVM tuning (brokers need careful heap sizing to avoid long GC pauses), rolling upgrades (Strimzi performs rolling restarts but you must ensure ISR counts stay healthy), and monitoring (JMX metrics exported via Prometheus JMX exporter). Teams should configure PodDisruptionBudgets to prevent multiple brokers from going down simultaneously during node maintenance. The non-obvious gotcha is that Kafka's advertised listeners must be reachable by all clients. In Kubernetes, if you expose Kafka outside the cluster using NodePort or LoadBalancer listeners, each broker needs its own external address. Strimzi handles this with per-broker Services, but misconfigured DNS or security groups can cause clients to connect to the bootstrap but fail when redirected to individual brokers. Always test external client connectivity from outside the cluster before going live.

Code Example

# Install Strimzi Operator in the kafka namespace
kubectl create namespace kafka # Dedicated namespace for Kafka components
kubectl apply -f https://strimzi.io/install/latest?namespace=kafka -n kafka # Deploy Strimzi Operator CRDs and controller

# Create a 3-broker Kafka cluster with KRaft mode (no ZooKeeper)
apiVersion: kafka.strimzi.io/v1beta2 # Strimzi Kafka API
kind: Kafka # Custom resource for a Kafka cluster
metadata:
  name: payments-kafka # Cluster name used in Service DNS
  namespace: kafka # Deploy in the kafka namespace
spec:
  kafka:
    version: 3.7.0 # Kafka version
    replicas: 3 # Three brokers for HA
    listeners:
    - name: plain # Internal plaintext listener
      port: 9092 # Standard Kafka port
      type: internal # ClusterIP access only
    - name: tls # Internal TLS listener
      port: 9093 # TLS port
      type: internal # Encrypted internal traffic
      tls: true # Enable TLS encryption
    storage:
      type: persistent-claim # Use PVCs for durability
      size: 100Gi # 100GB per broker
      class: gp3 # AWS gp3 for consistent IOPS
    config:
      num.partitions: 12 # Default partitions per topic
      default.replication.factor: 3 # Replicate to all brokers
      min.insync.replicas: 2 # Require 2 ISR for acks=all

# Check broker pod status
kubectl get pods -n kafka -l strimzi.io/cluster=payments-kafka # Shows broker and controller pods

# Check under-replicated partitions
kubectl exec -n kafka payments-kafka-kafka-0 -- bin/kafka-topics.sh --bootstrap-server localhost:9092 --describe --under-replicated-partitions # Should return empty if healthy

◈ Architecture Diagram

┌──────────────────────────────┐
│  Strimzi Operator            │
│  (watches Kafka CRs)         │
└──────────┬───────────────────┘
           ↓
┌──────────────────────────────┐
│  Kafka CR → StatefulSet      │
│  ┌────────┐┌────────┐┌──────┐│
│  │Broker 0││Broker 1││Brk 2 ││
│  │100Gi   ││100Gi   ││100Gi ││
│  └────────┘└────────┘└──────┘│
└──────────────────────────────┘

How does Kafka enable event-driven microservices, and how do you guarantee message ordering?

advancedgeneralkubernetes

▼

Quick Answer

Kafka decouples producers from consumers through topics and partitions, enabling asynchronous event-driven communication. Message ordering is guaranteed within a single partition by using a consistent partition key. Exactly-once semantics require idempotent producers and transactional writes.

Detailed Answer

Think of a newspaper printing operation. Writers (producers) submit articles to different sections (topics) — sports, finance, weather. Each section has multiple columns (partitions). Articles within the same column are printed in order, but articles across different columns may interleave. If you want all articles about the same team in order, you assign them to the same column using the team name as a key. Kafka enables event-driven architecture by acting as a durable, distributed message log between microservices. Instead of services calling each other directly via HTTP (tight coupling), services publish events to Kafka topics and other services consume them independently. A payments service publishes payment-completed events, and the notification service, analytics service, and fraud detection service each consume those events at their own pace without knowing about each other. Internally, each Kafka topic is divided into partitions. Producers send messages with an optional key, and Kafka hashes the key to determine the partition. All messages with the same key go to the same partition, and within a partition, messages are strictly ordered by offset. Consumer groups assign partitions to consumers, so each partition is read by exactly one consumer in a group. For exactly-once semantics, Kafka provides idempotent producers (deduplicate retries using producer ID and sequence number) and transactional writes (atomic writes across multiple partitions). At production scale, partition key design is the most important decision. For a payments system, using the account ID as the partition key ensures all events for one account are ordered — payment initiated, payment authorized, payment completed. If you use random partitioning for throughput, you lose ordering guarantees. Consumer group rebalancing during scaling or failures can cause brief processing pauses, so teams should use cooperative sticky assignor to minimize disruption. The non-obvious gotcha is that ordering only works within a partition, not across partitions. If you need global ordering across all events, you must use a single partition — but that limits throughput to one consumer. Most systems design for per-entity ordering (all events for account-123 in order) rather than global ordering, which is sufficient for almost all business requirements.

Code Example

# Create a topic with 12 partitions and replication factor 3
kubectl exec -n kafka payments-kafka-kafka-0 -- bin/kafka-topics.sh \
  --bootstrap-server localhost:9092 \
  --create --topic payment-events \
  --partitions 12 \
  --replication-factor 3 # 12 partitions for parallel consumption, 3 replicas for durability

# Produce a keyed message (account ID ensures ordering per account)
kubectl exec -n kafka payments-kafka-kafka-0 -- bin/kafka-console-producer.sh \
  --bootstrap-server localhost:9092 \
  --topic payment-events \
  --property parse.key=true \
  --property key.separator=: # Format: account-123:{"event":"payment_completed"}

# Consume with a consumer group
kubectl exec -n kafka payments-kafka-kafka-0 -- bin/kafka-console-consumer.sh \
  --bootstrap-server localhost:9092 \
  --topic payment-events \
  --group settlements-processor \
  --from-beginning # Reads from earliest offset for new group

# Check consumer lag per partition
kubectl exec -n kafka payments-kafka-kafka-0 -- bin/kafka-consumer-groups.sh \
  --bootstrap-server localhost:9092 \
  --describe --group settlements-processor # Shows LAG column per partition

◈ Architecture Diagram

┌──────────┐
│ Producer │
│ key=acct │
└────┬─────┘
     ↓ hash(key)
┌─────────────────────┐
│ Topic: payments     │
│ ┌───┐ ┌───┐ ┌───┐  │
│ │P0 │ │P1 │ │P2 │  │
│ │ordered│ordered│ordered│
│ └───┘ └───┘ └───┘  │
└─────────────────────┘
     ↓
┌──────────┐
│ Consumer │
│ Group    │
└──────────┘

How do you handle Kafka consumer lag and backpressure in high-throughput payment processing?

advancedgeneralkubernetes

▼

Quick Answer

Monitor consumer lag per partition using kafka-consumer-groups CLI or Prometheus metrics. Scale consumers with KEDA based on lag thresholds. Handle backpressure with dead letter queues for failed messages, circuit breakers for slow downstream services, and partition rebalancing to distribute load evenly.

Detailed Answer

Think of a call center during a product recall. If calls come in faster than agents can answer, the queue grows (consumer lag). You can add more agents (scale consumers), redirect overflow calls to voicemail (dead letter queue), or temporarily stop accepting new calls from the website (backpressure to producers). The key is detecting the queue growth early and responding before callers give up. Consumer lag is the difference between the latest message offset in a partition and the consumer's current committed offset. A lag of zero means the consumer is caught up. Growing lag means messages are being produced faster than consumed. In a payment processing system, growing lag means transactions are being delayed, which can trigger timeouts, duplicate processing attempts, and compliance violations for settlement deadlines. The response strategy has three layers. First, scale consumers: use KEDA (Kubernetes Event-Driven Autoscaler) with a Kafka trigger that watches consumer group lag and scales the Deployment from 1 to N consumers. Each consumer in the group gets assigned partitions, so the maximum parallelism equals the number of partitions. Second, handle failures: messages that fail processing after retries go to a dead letter topic for manual investigation rather than blocking the entire partition. Third, apply backpressure: if a downstream service (like the fraud detection API) is slow, implement circuit breakers so consumers pause processing rather than overwhelming the failing service. At production scale, partition count determines your maximum consumer parallelism. If you have 12 partitions and 12 consumers, each handles one partition. Adding a 13th consumer gives it nothing to do. Teams should provision enough partitions upfront (at least 3x the expected peak consumer count) because repartitioning an existing topic requires data migration. Monitor lag per partition, not just aggregate lag, because a single hot partition with a slow consumer can cause localized delays while the aggregate looks healthy. The non-obvious gotcha is that consumer lag spikes are normal during deployments. When consumers restart during a rolling update, partitions are reassigned and the new consumers start from the last committed offset, which may be slightly behind. This brief lag spike should resolve within minutes. Alert on lag that is continuously growing for 15+ minutes, not on momentary spikes. Also, max.poll.records and max.poll.interval.ms settings determine how many messages a consumer fetches per poll and how long it can take to process them — misconfiguring these causes unnecessary rebalances that worsen lag.

Code Example

# Check consumer lag per partition
kubectl exec -n kafka payments-kafka-kafka-0 -- bin/kafka-consumer-groups.sh \
  --bootstrap-server localhost:9092 \
  --describe --group settlements-processor # Shows LAG per partition

# KEDA ScaledObject to auto-scale consumers based on lag
apiVersion: keda.sh/v1alpha1 # KEDA API
kind: ScaledObject # Auto-scaling configuration
metadata:
  name: settlements-processor-scaler # Scaler for the settlements consumer
  namespace: payments # Application namespace
spec:
  scaleTargetRef:
    name: settlements-processor # Deployment to scale
  minReplicaCount: 1 # Minimum 1 consumer always running
  maxReplicaCount: 12 # Max equals partition count
  triggers:
  - type: kafka # KEDA Kafka trigger
    metadata:
      bootstrapServers: payments-kafka-kafka-bootstrap.kafka:9092 # Kafka bootstrap address
      consumerGroup: settlements-processor # Consumer group to monitor
      topic: payment-events # Topic to watch
      lagThreshold: "100" # Scale up when lag exceeds 100 messages per partition

# Dead letter topic configuration in consumer application
# If processing fails after 3 retries, send to DLQ
# spring.kafka.consumer.properties.max.poll.records=50
# spring.kafka.consumer.properties.max.poll.interval.ms=300000

How do you monitor Kafka on Kubernetes, and what metrics and alerts matter most?

intermediatemonitoringkubernetes

▼

Quick Answer

Monitor Kafka using JMX metrics exported to Prometheus via the JMX Exporter sidecar. Critical metrics include under-replicated partitions, consumer group lag, request latency (produce/fetch), ISR shrink rate, and broker disk usage. Alert on under-replicated partitions > 0 and consumer lag growing continuously.

Detailed Answer

Think of monitoring a highway system. You track traffic flow (throughput), lane closures (under-replicated partitions), backup length (consumer lag), and road surface condition (disk usage). A single lane closure might not cause problems, but if multiple lanes close simultaneously, traffic grinds to a halt. The same applies to Kafka — individual metric spikes are normal, but correlated spikes indicate a systemic problem. Kafka exposes hundreds of JMX metrics covering broker performance, topic throughput, consumer behavior, and replication health. On Kubernetes with Strimzi, the JMX Exporter runs as a sidecar container in each broker pod, converting JMX MBeans into Prometheus-compatible metrics on an HTTP endpoint. A ServiceMonitor resource tells Prometheus to scrape these endpoints, and Grafana dashboards visualize the data. The most critical metrics fall into four categories. Replication health: kafka.server UnderReplicatedPartitions should always be zero — any non-zero value means data is at risk. Consumer health: kafka.consumer.group lag per partition shows how far behind consumers are — growing lag means consumers cannot keep up with producers. Broker performance: kafka.network RequestMetrics for produce and fetch request latency — p99 above 100ms indicates broker pressure. Resource usage: disk utilization per broker — Kafka stores messages on disk, and running out stops the broker. At production scale, alerting should follow the symptom-based approach. Alert on under-replicated partitions greater than zero for more than 5 minutes (data durability risk), consumer lag increasing for more than 15 minutes (processing falling behind), produce request p99 latency above 200ms (client impact), and disk usage above 75% (capacity planning trigger). Avoid alerting on individual broker CPU spikes — they are often transient during rebalancing. The non-obvious gotcha is that consumer lag metrics are only accurate when consumers are actively polling. If a consumer crashes and stops polling, the lag metric freezes at the last known value rather than showing increasing lag. Teams should also monitor consumer group state (Stable, Rebalancing, Dead) and alert on groups stuck in Rebalancing for more than 5 minutes, which indicates a consumer that keeps crashing during rebalance.

Code Example

# Check under-replicated partitions across all topics
kubectl exec -n kafka payments-kafka-kafka-0 -- bin/kafka-topics.sh \
  --bootstrap-server localhost:9092 \
  --describe --under-replicated-partitions # Should return empty when healthy

# Check consumer group lag
kubectl exec -n kafka payments-kafka-kafka-0 -- bin/kafka-consumer-groups.sh \
  --bootstrap-server localhost:9092 \
  --describe --group settlements-processor # Shows CURRENT-OFFSET, LOG-END-OFFSET, LAG

# Prometheus alert rule for under-replicated partitions
# groups:
# - name: kafka-alerts
#   rules:
#   - alert: KafkaUnderReplicatedPartitions
#     expr: kafka_server_replicamanager_underreplicatedpartitions > 0
#     for: 5m
#     labels:
#       severity: critical
#     annotations:
#       summary: "Kafka broker {{ $labels.pod }} has under-replicated partitions"
#
#   - alert: KafkaConsumerLagGrowing
#     expr: delta(kafka_consumergroup_lag[15m]) > 1000
#     for: 15m
#     labels:
#       severity: warning
#     annotations:
#       summary: "Consumer group {{ $labels.consumergroup }} lag growing on {{ $labels.topic }}"

◈ Architecture Diagram

┌──────────┐
│ Broker   │
│ JMX      │
└────┬─────┘
     ↓
┌──────────┐
│JMX Export│
│ (sidecar)│
└────┬─────┘
     ↓
┌──────────┐
│Prometheus│
└────┬─────┘
     ↓
┌──────────┐
│ Grafana  │
│ + Alerts │
└──────────┘

What changed with Kafka KRaft mode replacing ZooKeeper, and why does it matter?

intermediategeneralkubernetes

▼

Quick Answer

KRaft mode uses Kafka's internal Raft-based consensus protocol for metadata management instead of an external ZooKeeper ensemble. This simplifies operations by removing a separate distributed system to manage, speeds up controller failover from seconds to milliseconds, and reduces the cluster's resource footprint.

Detailed Answer

Think of a company that used to outsource its HR department to an external firm (ZooKeeper). Every hiring decision, org chart change, and payroll update had to go through the external firm, adding latency and a dependency. KRaft mode brings HR in-house — the company manages its own employee records directly, which is faster, simpler, and eliminates the external dependency. Historically, Kafka relied on Apache ZooKeeper for metadata management: tracking which brokers are alive, which broker is the controller, partition leadership assignments, topic configurations, and ACLs. This meant operating two distributed systems — Kafka and ZooKeeper — each with their own deployment, monitoring, scaling, and failure modes. ZooKeeper required its own ensemble of 3 or 5 nodes, its own storage, and its own expertise to troubleshoot. KRaft (Kafka Raft) replaces ZooKeeper with a built-in metadata quorum. A subset of Kafka nodes run as controllers using the Raft consensus protocol to manage cluster metadata. The controller quorum elects a leader, and all metadata changes (topic creation, partition reassignment, broker registration) go through this leader. The metadata is stored as a replicated log within Kafka itself, using the same storage engine as regular Kafka topics. This means controller failover happens in milliseconds instead of the seconds it took with ZooKeeper leader election. At production scale, KRaft simplifies operations significantly. You deploy and manage one system instead of two. Cluster startup is faster because brokers do not need to wait for ZooKeeper to be available. Scaling is simpler because you do not need to resize the ZooKeeper ensemble separately. Monitoring is unified — you no longer need separate dashboards and alerts for ZooKeeper health. Strimzi supports KRaft mode since Kafka 3.5, and ZooKeeper support is being deprecated. The non-obvious gotcha is that KRaft changes how you think about controller nodes. In ZooKeeper mode, any broker could become the controller. In KRaft mode, you explicitly designate controller nodes (or run combined controller+broker nodes for smaller clusters). For production, dedicated controller nodes are recommended because they avoid resource contention between metadata operations and message processing. Teams migrating from ZooKeeper mode should test KRaft in staging first, as some older Kafka clients may not support KRaft-specific metadata protocols.

Code Example

# Strimzi Kafka CR with KRaft mode (no ZooKeeper)
apiVersion: kafka.strimzi.io/v1beta2 # Strimzi API
kind: Kafka # Kafka cluster custom resource
metadata:
  name: payments-kafka # Cluster name
  namespace: kafka # Namespace
  annotations:
    strimzi.io/kraft: enabled # Enable KRaft mode
spec:
  kafka:
    version: 3.7.0 # Kafka version with stable KRaft
    replicas: 3 # Three broker nodes
    storage:
      type: persistent-claim # Persistent storage
      size: 100Gi # Per-broker storage
    config:
      process.roles: broker,controller # Combined mode for small clusters
      controller.quorum.voters: [email protected]:9093,[email protected]:9093,[email protected]:9093 # KRaft voter list

# Verify KRaft mode is active (no ZooKeeper pods should exist)
kubectl get pods -n kafka # Should show only kafka-kafka-* pods, no zookeeper pods

# Check controller quorum status
kubectl exec -n kafka payments-kafka-kafka-0 -- bin/kafka-metadata.sh \
  --snapshot /var/lib/kafka/data/__cluster_metadata-0/00000000000000000000.log \
  --cluster-id $(kubectl exec -n kafka payments-kafka-kafka-0 -- cat /var/lib/kafka/data/meta.properties | grep cluster.id | cut -d= -f2) # Shows metadata log entries

◈ Architecture Diagram

┌─ ZooKeeper Mode ──────┐  ┌─ KRaft Mode ─────────┐
│ ZK Ensemble (3 nodes) │  │ No ZooKeeper needed  │
│ + Kafka Brokers (3)   │  │ Kafka nodes handle   │
│ = 6 nodes total       │  │ metadata internally  │
│ Failover: seconds     │  │ = 3 nodes total      │
│ Two systems to manage │  │ Failover: millisecs  │
└───────────────────────┘  └──────────────────────┘