Adobe

2 interview questions · kubernetes

kubernetesarchitect

How does Cilium use eBPF to replace kube-proxy and traditional service mesh sidecars, and what architectural tradeoffs should an architect evaluate before adopting it?

architectnetworkingkubernetes

▼

Quick Answer

Cilium loads eBPF programs into the Linux kernel to handle packet forwarding, service load balancing, network policy, and L7 observability without iptables rules or per-pod sidecar proxies. Architects must evaluate kernel version requirements, observability maturity via Hubble, CNI migration complexity, and the loss of fine-grained L7 control that a full sidecar proxy provides.

Detailed Answer

Think of a highway toll system. Traditional kube-proxy is like a toll booth where every car stops, gets checked, and is directed to its lane. EBPF with Cilium is like an electronic pass reader embedded in the road surface — the car never stops, the toll is processed at wire speed, and the road itself knows which lane to direct traffic into without a booth. Cilium replaces the iptables-based kube-proxy and the user-space proxy model used by traditional service meshes. Instead of maintaining thousands of iptables rules that the kernel evaluates linearly, Cilium attaches eBPF programs to network hooks inside the kernel. These programs handle service IP translation, load balancing across endpoints, network policy enforcement, and even some L7 protocol parsing without packets ever leaving kernel space. This eliminates the context switches between kernel and user space that Envoy-based sidecars require for every connection. Internally, Cilium uses several eBPF map types to store service endpoints, identity labels, policy rules, and connection tracking state. When a packet arrives, the eBPF program attached to the network interface or socket looks up the destination service, selects a backend Pod using consistent hashing or round-robin, rewrites headers, and forwards the packet — all within a single kernel function call chain. Hubble, the observability layer built on top of Cilium, taps into these eBPF data paths to provide flow logs, DNS visibility, and HTTP metrics without injecting any proxy. At production scale, Cilium handles over 5,000 production deployments as of 2025, including platforms at Adobe, Bell Canada, and multiple hyperscalers. Teams should monitor eBPF program load errors, map memory usage, endpoint synchronization latency, dropped flow events in Hubble, and kernel version compatibility. Cilium requires Linux kernel 5.10 or later for full feature support, and some advanced features like bandwidth manager or BBR congestion control need even newer kernels. The non-obvious gotcha is that Cilium does not fully replicate every L7 feature of Envoy-based meshes. While it handles mTLS via SPIFFE identities, basic HTTP routing, and L7 policy, complex traffic management like retries with budgets, circuit breaking with outlier detection, or gRPC-aware load balancing may still require a sidecar or gateway proxy. Architects should map their actual L7 requirements before declaring a full service mesh unnecessary, because removing sidecars and then re-adding them later is a painful migration.

Code Example

# Install Cilium with kube-proxy replacement enabled on a fresh cluster
helm install cilium cilium/cilium --version 1.16.4 \
  --namespace kube-system \
  --set kubeProxyReplacement=true \
  --set k8sServiceHost=api.payments-cluster.internal \
  --set k8sServicePort=6443 \
  --set hubble.enabled=true \
  --set hubble.relay.enabled=true \
  --set hubble.ui.enabled=true

# Verify Cilium replaced kube-proxy and is handling service translation
kubectl -n kube-system exec ds/cilium -- cilium status --verbose | grep KubeProxyReplacement

# View real-time network flows for the payments namespace using Hubble
kubectl -n kube-system exec deploy/hubble-relay -- hubble observe --namespace payments --protocol http

# Check eBPF program load status on a specific node
kubectl -n kube-system exec ds/cilium -- cilium bpf lb list

# Apply an L7 network policy that restricts HTTP methods on the checkout API
apiVersion: cilium.io/v2 # Cilium-specific CRD for extended network policy
kind: CiliumNetworkPolicy # Extends Kubernetes NetworkPolicy with L7 rules
metadata:
  name: checkout-api-l7-policy # Policy name describing its scope
  namespace: payments # Applies to the payments namespace
spec:
  endpointSelector:
    matchLabels:
      app: checkout-api # Targets the checkout API pods
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: web-frontend # Allows traffic only from the frontend
    toPorts:
    - ports:
      - port: "8080" # The checkout API listening port
        protocol: TCP # HTTP runs over TCP
      rules:
        http:
        - method: POST # Allows POST for creating orders
          path: /api/v2/orders # Restricts to the orders endpoint
        - method: GET # Allows GET for reading order status
          path: /api/v2/orders/.*  # Permits path parameters for order lookups

◈ Architecture Diagram

┌──────────┐     ┌──────────┐
│ Pod A    │     │ Pod B    │
└────┬─────┘     └────┬─────┘
     │                │
     ↓                ↓
┌─────────────────────────────┐
│  eBPF (kernel)              │
│  ┌────────┐ ┌────────────┐  │
│  │Svc LB  │ │L7 Policy   │  │
│  └────────┘ └────────────┘  │
│  ┌────────┐ ┌────────────┐  │
│  │ConnTrk │ │Hubble Tap  │  │
│  └────────┘ └────────────┘  │
└─────────────────────────────┘

How does Istio ambient mesh eliminate sidecar proxies, and what should architects evaluate when migrating from sidecar mode to ambient mode?

architectnetworkingkubernetes

▼

Quick Answer

Istio ambient mesh replaces per-pod Envoy sidecars with two shared components: ztunnel, a per-node L4 proxy handling mTLS and basic routing, and optional waypoint proxies for L7 policy. Architects must evaluate the migration path for existing sidecar workloads, L7 feature parity, multi-cluster ambient support maturity, and the operational tradeoff of shared node-level proxies versus isolated per-pod proxies.

Detailed Answer

Think of an apartment building with two security options. The sidecar model gives every apartment its own security guard who checks visitors at the apartment door — effective but expensive. The ambient model puts a guard at the building entrance who checks IDs for everyone, and only apartments that need advanced screening get a shared floor-level inspector. You get security everywhere with far fewer guards. Istio ambient mesh reached general availability with Istio 1.22 in late 2024 and has become production-stable through 2025 and into 2026. It fundamentally changes how the data plane is deployed. Traditional Istio injects an Envoy sidecar into every Pod, which adds memory overhead (typically 50-100 MB per Pod), increases startup latency, and creates operational complexity around sidecar injection, upgrade ordering, and resource accounting. Ambient mesh removes all of this by separating L4 and L7 concerns into shared infrastructure. The architecture has two layers. Ztunnel is a lightweight Rust-based proxy that runs as a DaemonSet on every node. It handles all L4 concerns: mTLS encryption and identity using SPIFFE certificates, TCP-level authorization policy, and basic connection routing. Ztunnel performance has improved 75 percent over recent releases and adds negligible latency. For workloads that need L7 features — HTTP routing, retries, header-based authorization, traffic splitting — architects deploy waypoint proxies, which are shared Envoy instances scoped to a namespace or service account rather than injected per Pod. In production migration, teams should start by enabling ambient mode on a namespace using the label istio.io/dataplane-mode=ambient. Existing sidecar workloads can coexist with ambient workloads during migration. The key evaluation points are: L7 feature gaps between sidecar and waypoint proxy configurations, whether multi-cluster ambient mesh is mature enough (alpha planned for Istio 1.27), how existing Istio AuthorizationPolicy and VirtualService resources translate, and whether shared ztunnel on a node creates a blast radius concern where a ztunnel crash affects all Pods on that node. The non-obvious gotcha is that ambient mesh changes the failure domain. In sidecar mode, a proxy crash affects one Pod. In ambient mode, a ztunnel crash can disrupt networking for every Pod on that node. This makes ztunnel reliability, resource limits, and upgrade strategy (rolling DaemonSet updates) critical. Architects should also verify that their observability stack captures ztunnel metrics and waypoint proxy metrics in the same dashboards, because the telemetry surface shifts from per-pod to per-node and per-namespace.

Code Example

# Enable ambient mesh mode on the payments namespace
kubectl label namespace payments istio.io/dataplane-mode=ambient

# Verify ztunnel is running on every node in the mesh
kubectl get pods -n istio-system -l app=ztunnel -o wide

# Deploy a waypoint proxy for L7 policy in the payments namespace
istioctl waypoint apply --namespace payments --name payments-waypoint

# Verify the waypoint proxy is ready and accepting traffic
kubectl get gateway payments-waypoint -n payments

# Apply an L7 AuthorizationPolicy that requires the waypoint proxy
apiVersion: security.istio.io/v1 # Istio security API for authorization
kind: AuthorizationPolicy # Controls which requests are allowed
metadata:
  name: checkout-api-auth # Policy name describing its scope
  namespace: payments # Namespace where the waypoint proxy runs
spec:
  targetRefs:
  - kind: Service # Targets a specific Kubernetes Service
    group: "" # Core API group
    name: checkout-api # The service to protect
  action: ALLOW # Permits matching requests
  rules:
  - from:
    - source:
        principals: ["cluster.local/ns/payments/sa/web-frontend"] # SPIFFE identity of the caller
    to:
    - operation:
        methods: ["POST"] # Allows only POST requests
        paths: ["/api/v2/orders"] # Restricts to the orders endpoint

# Check ztunnel connection metrics on a specific node
kubectl -n istio-system exec ds/ztunnel -- curl -s localhost:15020/metrics | grep ztunnel_tcp_connections

◈ Architecture Diagram

┌───── Node ─────────────────┐
│ ┌────────┐  ┌────────┐    │
│ │ Pod A  │  │ Pod B  │    │
│ │(no sidecar)(no sidecar)  │
│ └───┬────┘  └───┬────┘    │
│     └─────┬─────┘         │
│     ┌─────┴─────┐         │
│     │ ztunnel   │ (L4)    │
│     │ mTLS+auth │         │
│     └─────┬─────┘         │
└───────────┼───────────────┘
            ↓
     ┌──────────┐
     │ Waypoint │ (L7)
     │ Proxy    │
     └──────────┘