Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Module 18: Container Networking

Containers have transformed how we deploy applications, but they bring unique networking challenges. When you run an application directly on a server, networking is straightforward: the app binds to a port, the server has an IP, and traffic flows in. With containers, you have dozens or hundreds of isolated processes sharing a single host, each needing their own network identity, their own IP address, and the ability to discover and talk to other containers — potentially on different hosts. This module covers Docker networking, Kubernetes networking, and service mesh architectures, focusing on the mental models that make container networking predictable rather than magical.
Container Networking
Estimated Time: 4-5 hours
Difficulty: Intermediate to Advanced
Prerequisites: Module 10 (NAT), Module 13 (Load Balancing)

18.1 Container Networking Fundamentals

The Challenge

Think of a traditional server as a house with one family. The house has one address, one mailbox, and one phone line. Now imagine turning that house into an apartment building with 50 units. Each apartment needs its own mailbox, its own doorbell, and the ability to receive deliveries independently — all while sharing the same physical building and street address. That is the container networking problem. Each container needs:
  • Its own network namespace (isolated network stack — its own “apartment” with its own routing table, interfaces, and IP address)
  • A unique IP address (its own “unit number”)
  • Ability to communicate with other containers (the “intercom system” between apartments)
  • (Sometimes) Access to the outside world (a shared “front door” to the street)

Linux Network Namespaces

Containers use Linux network namespaces for isolation. A network namespace is a complete, independent copy of the network stack: its own interfaces, routing tables, iptables rules, and even its own localhost. When a process inside a container calls bind(0.0.0.0:80), it binds to port 80 inside its namespace — not on the host. This is why multiple containers can all listen on port 80 without conflicting.
┌─────────────────────────────────────────────────────────┐
│                    Host Machine                          │
│                                                          │
│  ┌─────────────────┐     ┌─────────────────┐            │
│  │  Container A    │     │  Container B    │            │
│  │  ┌───────────┐  │     │  ┌───────────┐  │            │
│  │  │ Network   │  │     │  │ Network   │  │            │
│  │  │ Namespace │  │     │  │ Namespace │  │            │
│  │  │           │  │     │  │           │  │            │
│  │  │ eth0      │  │     │  │ eth0      │  │            │
│  │  │ 172.17.0.2│  │     │  │ 172.17.0.3│  │            │
│  │  └───────────┘  │     │  └───────────┘  │            │
│  └────────┬────────┘     └────────┬────────┘            │
│           │                       │                      │
│           └───────────┬───────────┘                      │
│                       │                                  │
│              ┌────────┴────────┐                        │
│              │  docker0 bridge │                        │
│              │   172.17.0.1    │                        │
│              └────────┬────────┘                        │
│                       │                                  │
│              ┌────────┴────────┐                        │
│              │    Host eth0    │                        │
│              │  192.168.1.100  │                        │
│              └─────────────────┘                        │
└─────────────────────────────────────────────────────────┘

18.2 Docker Networking Modes

1. Bridge Network (Default)

Containers connect to a virtual bridge — a software-defined network switch inside your host. The bridge (docker0) acts like a physical Ethernet switch: containers are “plugged in” via virtual Ethernet pairs (veth pairs), and the bridge forwards frames between them. The host’s NAT layer (via iptables) gives containers access to the outside world, just like a home router gives your devices internet access.
# Create container on default bridge -- Docker assigns it a 172.17.0.x IP automatically
docker run -d nginx

# Inspect network -- shows all containers on this bridge, their IPs, and the gateway
docker network inspect bridge
┌───────────────────────────────────────────────────────────┐
│                     docker0 (172.17.0.1)                   │
│                            │                               │
│    ┌───────────────────────┼───────────────────────┐      │
│    │                       │                       │      │
│    ▼                       ▼                       ▼      │
│ ┌──────┐              ┌──────┐              ┌──────┐     │
│ │nginx │              │redis │              │mysql │     │
│ │.0.2  │              │.0.3  │              │.0.4  │     │
│ └──────┘              └──────┘              └──────┘     │
└───────────────────────────────────────────────────────────┘
Container-to-Container: Direct via bridge Container-to-Internet: NAT through docker0

2. User-Defined Bridge

Better than the default bridge in almost every way — this is what you should use in practice. The critical upgrade: built-in DNS resolution. On the default bridge, containers can only reach each other by IP address (which changes every time a container restarts). On a user-defined bridge, Docker runs an embedded DNS server that resolves container names to IPs automatically.
# Create custom network -- Docker creates a new bridge with its own subnet
docker network create myapp

# Run containers on the custom network -- they join the same bridge
docker run -d --name web --network myapp nginx
docker run -d --name api --network myapp node

# Containers can reach each other by name -- DNS resolves "api" to its IP
docker exec web ping api  # Works! No hardcoded IPs needed

# Bonus: containers on different user-defined networks are isolated from each other
# This gives you network segmentation for free
Practical rule of thumb: Never use the default bridge in production. Always create user-defined networks. They give you DNS resolution, better isolation (containers on different networks cannot communicate unless explicitly connected), and they are required for Docker Compose multi-service setups.

3. Host Network

Container shares the host’s network stack directly — no bridge, no NAT, no network namespace isolation. The container sees the host’s interfaces, the host’s IP, and binds directly to the host’s ports. It is as if you ran the application directly on the host, but in a container for packaging purposes only.
# The container binds directly to the host's port 80 -- no -p flag needed
docker run --network host nginx
┌────────────────────────────────────────┐
│           Host Machine                  │
│                                         │
│   Container uses host's eth0 directly  │
│   No NAT, no bridge                    │
│   Port 80 in container = Port 80 on host│
│                                         │
└────────────────────────────────────────┘
Use Case: Maximum network performance — eliminates the NAT/bridge overhead (roughly 2-5% throughput improvement and lower latency). Common for monitoring agents, network tools, or high-performance services where every microsecond matters. Drawback: No port mapping (only one container can use port 80), no network isolation (the container can see all host traffic), and it only works on Linux — on macOS and Windows, Docker runs in a VM, so “host” networking still has a layer of indirection.

4. None Network

Container has no network connectivity at all — not even localhost access to the host. The container gets a loopback interface (127.0.0.1) and nothing else.
# This container cannot make any network calls -- complete air gap
docker run --network none alpine
Use Case: Maximum isolation for security-sensitive workloads like cryptographic operations, batch data processing, or running untrusted code where you want a guarantee that no data can be exfiltrated over the network.

5. Overlay Network (Multi-Host)

Spans multiple Docker hosts, creating a single logical network across physical machines. Under the hood, overlay networks use VXLAN (Virtual Extensible LAN) tunnels to encapsulate container-to-container traffic inside UDP packets that traverse the underlay network. Think of it as building a private highway system on top of existing roads — the containers think they are on the same local network, but the traffic is actually tunneled across the physical infrastructure.
┌─────────────────┐     ┌─────────────────┐
│   Host 1        │     │   Host 2        │
│                 │     │                 │
│ ┌─────────────┐ │     │ ┌─────────────┐ │
│ │ Container A │ │     │ │ Container B │ │
│ │ 10.0.0.2    │ │     │ │ 10.0.0.3    │ │
│ └──────┬──────┘ │     │ └──────┬──────┘ │
│        │        │     │        │        │
│   Overlay Network (VXLAN tunnels)       │
│        └────────┼─────┼────────┘        │
└─────────────────┘     └─────────────────┘

18.3 Port Publishing

Expose container ports to the host. This is how external traffic reaches your containers — by mapping a port on the host to a port inside the container. Under the hood, Docker creates iptables DNAT (Destination NAT) rules that rewrite the destination address of incoming packets from the host IP to the container IP.
# Map host port 8080 to container port 80
# Format: -p <host_port>:<container_port>
docker run -p 8080:80 nginx

# Map to specific interface -- IMPORTANT for security!
# This binds ONLY to localhost, not all interfaces
# Without this, -p 8080:80 binds to 0.0.0.0 (all interfaces, including public)
docker run -p 127.0.0.1:8080:80 nginx

# Random host port -- Docker picks an available port
docker run -p 80 nginx
docker port <container>  # See which host port was assigned
Security pitfall: docker run -p 8080:80 binds to 0.0.0.0, which means ALL network interfaces, including public ones. On a server with a public IP, this exposes the container directly to the internet. Docker also creates iptables rules that bypass UFW and firewalld. Many teams have been surprised to find their Docker containers reachable from the internet despite having a host firewall configured. Always use 127.0.0.1:port:port for internal-only services and put a reverse proxy (NGINX, Traefik) in front for public services.

Port Publishing Flow

External Request → Host:8080

                       ▼ (iptables DNAT)
               Container:80 (172.17.0.2:80)

18.4 Kubernetes Networking Model

Kubernetes has specific networking requirements, and they are intentionally opinionated. Rather than prescribing a specific implementation, Kubernetes defines three rules that any networking solution must satisfy. These rules create a flat network where every Pod can reach every other Pod directly — no NAT, no port mapping, no surprises:
  1. All Pods can communicate with all other Pods without NAT — any Pod can send a packet to any other Pod’s IP and it arrives unmodified
  2. All Nodes can communicate with all Pods without NAT — the node (host) can reach any Pod directly, which is essential for health checks and monitoring
  3. The IP a Pod sees itself as is the same IP others see it as — no hidden NAT translations, which means applications do not need to know they are in a container
This flat network model is a deliberate design choice. It eliminates the complexity of port mapping (which Docker requires for host-to-container communication) and makes service discovery simpler. The trade-off is that it requires a more sophisticated network layer — which is where CNI plugins come in.

Pod Networking

Each Pod gets a unique IP:
┌─────────────────────────────────────────────────────────┐
│                       Node                               │
│                                                          │
│  ┌────────────────────────┐  ┌────────────────────────┐ │
│  │         Pod A          │  │         Pod B          │ │
│  │    ┌───────┬───────┐   │  │    ┌───────────────┐   │ │
│  │    │ nginx │ sidecar│  │  │    │    redis      │   │ │
│  │    └───┬───┴───┬───┘   │  │    └───────┬───────┘   │ │
│  │        │  lo   │        │  │            │           │ │
│  │        └───┬───┘        │  │      ┌─────┘           │ │
│  │      eth0: 10.244.1.5   │  │ eth0: 10.244.1.6      │ │
│  └──────────────┬──────────┘  └──────────┬────────────┘ │
│                 │                        │               │
│                 └──────────┬─────────────┘               │
│                            │                             │
│                    ┌───────┴───────┐                    │
│                    │   CNI Plugin  │                    │
│                    │  (Calico,etc) │                    │
│                    └───────────────┘                    │
└─────────────────────────────────────────────────────────┘
Containers in the same Pod share a network namespace, which means they can communicate over localhost. This is why sidecar containers (like Envoy proxies, log collectors, or monitoring agents) work — they are in the same Pod and talk to the main application over 127.0.0.1, with zero network overhead. Think of it like two processes on the same machine: they share the same network stack, the same IP, and the same ports (so two containers in the same Pod cannot both bind to port 80).

18.5 Kubernetes Services

Services provide stable endpoints for Pods. Here is the core problem they solve: Pods are ephemeral. They get created, destroyed, rescheduled, and assigned new IPs constantly. If your frontend hardcodes a Pod IP to reach the backend, it breaks the moment that Pod restarts. A Service is an abstraction layer — a stable virtual IP (ClusterIP) and DNS name that never changes, backed by a dynamically updated set of Pod IPs. Think of it like a phone company’s customer service number: you always dial the same number, but the call might be routed to any of a dozen agents in the call center.

ClusterIP (Default)

Internal cluster access only:
apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  type: ClusterIP          # Default type -- only reachable inside the cluster
  selector:
    app: my-app            # Routes traffic to all Pods with this label
  ports:
    - port: 80             # The port the Service listens on (what clients use)
      targetPort: 8080     # The port the Pod is actually listening on
      # This decoupling is powerful: clients connect to port 80,
      # but the app runs on 8080. You can change the app port
      # without changing every client.
                 ┌───────────────────────────────┐
                 │  Service: my-service          │
                 │  ClusterIP: 10.96.50.100      │
                 │  Port: 80                     │
                 └───────────────┬───────────────┘

              ┌──────────────────┼──────────────────┐
              │                  │                  │
              ▼                  ▼                  ▼
        ┌──────────┐      ┌──────────┐      ┌──────────┐
        │  Pod 1   │      │  Pod 2   │      │  Pod 3   │
        │  :8080   │      │  :8080   │      │  :8080   │
        └──────────┘      └──────────┘      └──────────┘
How it works: kube-proxy runs on every node and configures iptables (or IPVS) rules to DNAT the ClusterIP to actual Pod IPs. When a Pod sends a packet to 10.96.50.100:80, kube-proxy’s iptables rules intercept it, randomly select one of the backend Pod IPs, and rewrite the destination address. The Pod never actually runs on the ClusterIP — it is a virtual IP that exists only in iptables rules. This is why you cannot ping a ClusterIP (there is no ARP response for it).

NodePort

Exposes the service on a static port on every node’s IP address. This means external clients can reach the service by hitting any node’s IP on that port, even if the Pod is not running on that specific node. Kube-proxy forwards the traffic to the correct node and Pod.
apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  type: NodePort
  selector:
    app: my-app
  ports:
    - port: 80             # ClusterIP port (still works inside the cluster)
      targetPort: 8080     # Pod port
      nodePort: 30080      # Port on every node -- must be in range 30000-32767
      # The high port range avoids conflicts with system services.
      # In production, you rarely expose NodePort directly to users;
      # instead, you put a load balancer in front.
External → Node1:30080 ─┐
External → Node2:30080 ─┼──→ Service ──→ Any Pod
External → Node3:30080 ─┘

LoadBalancer

Provisions a cloud provider’s load balancer (AWS ELB/NLB, GCP Cloud Load Balancer, Azure Load Balancer) and wires it to the NodePort automatically. This is the standard way to expose services to the internet in cloud environments.
apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  type: LoadBalancer
  selector:
    app: my-app
  ports:
    - port: 80
      targetPort: 8080
Internet → Cloud LB (public IP) → NodePort → Service → Pods
Cost consideration: Each LoadBalancer Service provisions a separate cloud load balancer, which typically costs 1525/monthonAWS.Ifyouhave20services,thatis15-25/month on AWS. If you have 20 services, that is 300-500/month just for load balancers. This is why Ingress (covered in section 18.8) is preferred for HTTP services — one Ingress controller with one load balancer can route to many services based on hostname or path.

Headless Service

Returns Pod IPs directly via DNS instead of a virtual ClusterIP. No load balancing, no proxy — the client gets a list of IP addresses and decides which one to connect to. This is essential for stateful workloads where the client needs to connect to a specific Pod (like a specific database replica).
apiVersion: v1
kind: Service
metadata:
  name: my-headless
spec:
  clusterIP: None     # Setting to "None" makes this headless -- no virtual IP
  selector:
    app: my-app       # DNS query for my-headless returns IPs of all matching Pods
DNS query nslookup my-headless returns all Pod IPs as A records. When combined with StatefulSets, each Pod gets a predictable DNS name like my-app-0.my-headless, my-app-1.my-headless — which is how databases like PostgreSQL, Cassandra, and Kafka discover their peers.

18.6 CNI (Container Network Interface)

CNI plugins implement the actual networking that makes Kubernetes’ flat network model work. Kubernetes itself does not implement networking — it defines the rules (every Pod gets an IP, every Pod can reach every other Pod) and delegates the implementation to a CNI plugin. Choosing the right CNI plugin is one of the most impactful infrastructure decisions you will make, because it affects performance, security (network policies), observability, and operational complexity.
PluginFeaturesBest For
CalicoNetwork policies, BGP, IPIP tunnelsGeneral purpose, policy-heavy environments
FlannelSimple overlay, VXLANSmall clusters, getting started quickly
CiliumeBPF-based, advanced observability, service meshLarge clusters, security-focused teams
WeaveMesh overlay, built-in encryptionMulti-cloud setups needing encryption
AWS VPC CNINative VPC IPs for Pods (each Pod gets a real VPC IP)EKS clusters where you need Pods in VPC security groups
If you are on AWS EKS, the AWS VPC CNI is the default and usually the right choice. Each Pod gets a real VPC IP address from your subnet, which means Pods can be targeted by VPC security groups and NACLs directly — no overlay network overhead. The trade-off: you consume IP addresses from your VPC subnets, so plan your CIDR ranges carefully. A /24 subnet only has 251 usable IPs, and each node reserves several IPs for itself.

Calico Example

┌────────────────────────────────────────────────────────────┐
│                        Cluster                              │
│                                                             │
│   Node 1                              Node 2                │
│   ┌────────────────┐                  ┌────────────────┐   │
│   │ Pod: 10.244.1.5│                  │ Pod: 10.244.2.3│   │
│   └───────┬────────┘                  └───────┬────────┘   │
│           │                                   │             │
│    ┌──────┴──────┐                    ┌──────┴──────┐      │
│    │ Calico Agent│                    │ Calico Agent│      │
│    │  (Felix)    │                    │  (Felix)    │      │
│    └──────┬──────┘                    └──────┬──────┘      │
│           │         BGP peering               │             │
│           └────────────┬──────────────────────┘             │
│                        │                                    │
│             Routes learned via BGP                          │
│   Node1 knows: 10.244.2.0/24 via Node2                     │
│   Node2 knows: 10.244.1.0/24 via Node1                     │
└────────────────────────────────────────────────────────────┘

18.7 Network Policies

Control traffic between Pods — this is the Kubernetes equivalent of security groups. By default, Kubernetes allows all Pods to communicate with all other Pods (the flat network model). Network Policies let you restrict that. They follow a “default allow, explicit deny” model: once you apply any NetworkPolicy that selects a Pod, that Pod switches to “default deny” for the specified direction (ingress, egress, or both), and only traffic matching the policy rules is allowed.
Critical caveat: Network Policies require a CNI plugin that supports them. Flannel does NOT enforce network policies. If you write a NetworkPolicy with Flannel as your CNI, the policy object is created successfully (no error), but it has zero effect — all traffic still flows freely. This is a dangerous silent failure. Calico, Cilium, and Weave all support network policies.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: db-policy
  namespace: default           # Network policies are namespace-scoped
spec:
  podSelector:
    matchLabels:
      app: database            # This policy applies to Pods labeled "app=database"
  policyTypes:
    - Ingress                  # Restrict who can send traffic TO the database
    - Egress                   # Restrict where the database can send traffic
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: backend     # ONLY Pods labeled "app=backend" can connect
      ports:
        - protocol: TCP
          port: 5432           # ONLY on the PostgreSQL port
  egress:
    - to:
        - podSelector:
            matchLabels:
              app: backend     # Database can only send responses to backend
      # Note: you may also need to allow DNS egress (port 53 to kube-dns)
      # for the database to resolve service names
             ┌─────────────────────────────────────────┐
             │         Network Policy: db-policy       │
             └─────────────────────────────────────────┘

                     Only pods with app=backend
                     can connect to port 5432

┌──────────────┐                 │                ┌──────────────┐
│  Frontend    │                 │                │   Backend    │
│  app=frontend│ ───────╳────────┤───────✓────────│  app=backend │
└──────────────┘   Blocked!      │     Allowed    └──────┬───────┘
                                 │                       │
                                 ▼                       ▼
                         ┌───────────────┐       ┌───────────────┐
                         │   Database    │◄──────│   Database    │
                         │ app=database  │       │ app=database  │
                         └───────────────┘       └───────────────┘

18.8 Ingress

Manage external access to services at Layer 7 (HTTP/HTTPS). Ingress solves the “one load balancer per service” cost problem by consolidating all external HTTP routing into a single entry point. The Ingress resource defines routing rules (hostname, path), and an Ingress Controller (a running Pod, typically NGINX, Traefik, or AWS ALB) reads those rules and configures itself to route traffic accordingly. Think of it as a smart receptionist at the front desk who reads the visitor’s badge and directs them to the right department.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-ingress
spec:
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /users
            pathType: Prefix
            backend:
              service:
                name: user-service
                port:
                  number: 80
          - path: /orders
            pathType: Prefix
            backend:
              service:
                name: order-service
                port:
                  number: 80
                    Internet


              ┌─────────────────┐
              │ Ingress Controller│
              │  (nginx, traefik) │
              └────────┬────────┘

         ┌─────────────┼─────────────┐
         │             │             │
    /users         /orders       /products
         │             │             │
         ▼             ▼             ▼
   ┌──────────┐  ┌──────────┐  ┌──────────┐
   │user-svc  │  │order-svc │  │product-svc│
   └──────────┘  └──────────┘  └──────────┘

18.9 Service Mesh

The Problem

As microservices grow from 5 to 50 to 500 services, networking concerns that used to be “someone else’s problem” suddenly become everyone’s problem:
  • Service discovery — how does Service A find Service B when Pods are created and destroyed constantly?
  • Load balancing — kube-proxy does basic round-robin, but you need smarter strategies (weighted, least-connections, retry-aware)
  • Encryption (mTLS) — every service-to-service call should be encrypted, but managing certificates across 500 services is a nightmare
  • Observability — which service is calling which? What is the latency breakdown across the call chain? Where are the errors?
  • Resilience — retries, timeouts, circuit breakers, rate limiting — all the patterns from the microservices playbook
You could implement all of these in application code, but then every team in every language has to get it right. A service mesh extracts these concerns into the infrastructure layer.

Service Mesh Solution

Inject a sidecar proxy into every Pod. All network traffic in and out of the Pod passes through the sidecar (usually Envoy Proxy), which handles encryption, load balancing, retries, and telemetry transparently. The application code does not change — it still makes plain HTTP calls to http://other-service:8080, but the sidecar intercepts the call, encrypts it with mTLS, applies retry policies, collects metrics, and forwards it to the destination’s sidecar.
┌──────────────────────────────────────────────────────────┐
│                         Pod                               │
│                                                           │
│   ┌─────────────┐           ┌─────────────┐              │
│   │    App      │◄─────────►│   Sidecar   │◄────┐       │
│   │  Container  │  localhost │   (Envoy)   │     │       │
│   └─────────────┘           └─────────────┘     │       │
│                                                  │       │
└──────────────────────────────────────────────────┼───────┘

              All external traffic goes through sidecar


                                            Other Pods
MeshSidecarFeaturesTrade-off
IstioEnvoyFull-featured: mTLS, traffic splitting, fault injectionComplex to operate, significant resource overhead (~100MB RAM per sidecar)
Linkerdlinkerd2-proxy (Rust)Lightweight, simple, fast to adoptFewer advanced features than Istio
Consul ConnectEnvoyIntegrates with HashiCorp Vault, Terraform, NomadBest if you are already in the HashiCorp ecosystem
AWS App MeshEnvoyNative AWS integration, managed control planeAWS lock-in, less community support
Honest opinion on service mesh adoption: A service mesh adds significant operational complexity and resource overhead. If you have fewer than 20 microservices, you probably do not need one — Kubernetes Services, Ingress, and application-level retry libraries (like resilience4j or axios retry) will get you far. Adopt a service mesh when you genuinely need mTLS everywhere, you are drowning in observability gaps, or you need advanced traffic management (canary deployments with automatic rollback based on error rates). Start with Linkerd if you want simplicity; choose Istio if you need the full feature set and have the team to operate it.

Istio Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Control Plane (Istiod)                   │
│                                                              │
│   ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│   │    Pilot     │  │    Citadel   │  │    Galley    │     │
│   │   (config)   │  │   (certs)    │  │  (validation)│     │
│   └──────────────┘  └──────────────┘  └──────────────┘     │
└────────────────────────────┬────────────────────────────────┘
                             │  Config + Certs

┌────────────────────────────────────────────────────────────┐
│                        Data Plane                           │
│                                                             │
│   ┌─────────────────┐            ┌─────────────────┐       │
│   │      Pod A      │            │      Pod B      │       │
│   │  ┌─────┐ ┌────┐ │    mTLS    │ ┌────┐ ┌─────┐  │       │
│   │  │ App │←│Envoy│◄────────────►│Envoy│→│ App │  │       │
│   │  └─────┘ └────┘ │            │ └────┘ └─────┘  │       │
│   └─────────────────┘            └─────────────────┘       │
│                                                             │
└─────────────────────────────────────────────────────────────┘

18.10 Debugging Container Networks

Docker

# Inspect container network -- shows IP, gateway, MAC, attached networks
# This is your first stop when "the container cannot connect to anything"
docker inspect <container> | jq '.[0].NetworkSettings'

# Exec into container for debugging -- but many prod images lack tools
docker exec -it <container> sh
# Inside: ping, curl, nc, etc.
# Problem: minimal images (Alpine, distroless) often lack these tools

# View all containers on the bridge and their IPs
docker network inspect bridge

# The nuclear option: spin up a debug container with ALL network tools preinstalled
# nicolaka/netshoot includes ping, curl, dig, nslookup, tcpdump, iptables, and more
docker run --rm --network container:<target-container> nicolaka/netshoot bash
# Using --network container:<name> puts you in the target's network namespace
# so you can debug from that container's network perspective

Kubernetes

# Get pod IPs and which node each Pod is on -- essential for cross-node debugging
kubectl get pods -o wide

# Show the Endpoints object -- this lists the actual Pod IPs behind a Service
# If this is empty, your selector labels do not match any running Pods
kubectl describe svc <service-name>
kubectl get endpoints <service-name>

# Spin up a temporary debug Pod with full network tools
# --rm deletes it when you exit, so it does not linger in the cluster
kubectl run debug --image=nicolaka/netshoot -it --rm -- bash

# Inside the debug pod -- systematic troubleshooting:
nslookup my-service                 # Does DNS resolve? If not, check kube-dns/CoreDNS
curl -v my-service:80               # Does the Service respond? -v shows connection details
ping <pod-ip>                       # Can you reach the Pod directly? If yes, Service is the issue
wget -qO- --timeout=2 <pod-ip>:8080  # Bypass the Service entirely to test the Pod

# Check CNI plugin logs -- if Pods cannot get IPs, the CNI is usually the problem
kubectl logs -n kube-system -l k8s-app=calico-node
# For Cilium:
kubectl logs -n kube-system -l k8s-app=cilium
Debugging mental model: When a Kubernetes service is unreachable, work through these layers in order: (1) Is the Pod running and healthy? (2) Is the Pod listening on the correct port? (3) Does DNS resolve the Service name? (4) Does the Service have Endpoints? (5) Can you reach the Pod IP directly? (6) Are Network Policies blocking traffic? This systematic approach prevents you from chasing ghosts in the wrong layer.

18.11 Key Takeaways

Pods Get IPs

Every Kubernetes Pod gets a unique, routable IP address.

Services Abstract Pods

Services provide stable endpoints; Pods come and go.

CNI Does the Work

CNI plugins implement the actual networking.

Service Mesh for Complex Needs

Use service mesh for mTLS, observability, traffic management.

Course Completion

Congratulations! You’ve completed the Networking Mastery course. You now have deep knowledge of:
  • IP addressing, subnetting, and CIDR
  • NAT and how private networks access the internet
  • Routing protocols and how packets find their way
  • DNS and domain name resolution
  • Load balancing and reverse proxies
  • Network troubleshooting tools
  • VPNs and secure tunneling
  • Firewalls and security groups
  • Container and Kubernetes networking

Practice Resources

  • Set up a home lab with VMs/containers
  • Get hands-on with AWS VPC
  • Deploy a Kubernetes cluster and explore networking
  • Capture and analyze packets with Wireshark

Interview Deep-Dive

Strong Answer:Let me trace the full path. Assume Pod A (10.244.1.5) on Node 1 wants to reach Pod B (10.244.2.3) on Node 2 using Calico with BGP as the CNI.Step 1: Pod A’s application makes a network call to 10.244.2.3. The packet leaves the application and enters the Pod’s network namespace. The Pod’s routing table has a default route pointing to the veth pair interface that connects it to the host namespace.Step 2: The packet crosses the veth pair from the Pod’s network namespace into the host (Node 1) network namespace. At this point, the host kernel’s routing table takes over.Step 3: On Node 1, Calico’s Felix agent has programmed routes into the kernel. There is a route that says “10.244.2.0/24 via Node2-IP” — this was learned via BGP peering between Calico agents on Node 1 and Node 2. The kernel knows to forward the packet to Node 2.Step 4: Depending on the Calico mode, the packet is either sent directly (if nodes are on the same L2 network and can route natively) or encapsulated in an IPIP or VXLAN tunnel. With IPIP, the original packet becomes the payload of a new IP packet with source=Node1-IP and destination=Node2-IP. This outer packet traverses the physical network between nodes (through switches, routers, whatever the underlay provides).Step 5: The packet arrives at Node 2. If IPIP encapsulation was used, the kernel decapsulates it, revealing the original packet destined for 10.244.2.3. Node 2’s routing table has a route for 10.244.2.3 pointing to the veth pair of Pod B.Step 6: The packet crosses the veth pair into Pod B’s network namespace, arriving at eth0 inside the Pod. The application receives the packet with the original source IP (10.244.1.5), so Pod B can respond directly.With the AWS VPC CNI, the flow is different and simpler: each Pod gets a real VPC IP attached to the node’s ENI as a secondary IP address. The packet goes directly through VPC routing — no overlay, no encapsulation. The VPC route tables and security groups apply natively, which is why AWS VPC CNI Pods can be targeted by VPC security groups.Follow-up: “What happens to this path when you have a Kubernetes Service in between instead of direct Pod-to-Pod communication?”When Pod A connects to a Service ClusterIP (say 10.96.50.100:80), kube-proxy’s iptables rules on Node 1 intercept the packet before it leaves the node. The iptables DNAT rule randomly selects one of the Service’s backend Pod IPs (say 10.244.2.3) and rewrites the destination address. From this point, the packet follows the same Node-to-Node path I described above, but with the destination already rewritten to the Pod IP. The response from Pod B goes directly back to Pod A’s IP (not through the Service ClusterIP), because the conntrack table on Node 1 remembers the DNAT mapping and performs the reverse translation on the return packet. With IPVS mode (instead of iptables), kube-proxy uses the kernel’s IPVS load balancer, which is more efficient at scale — iptables rules are O(n) with the number of Services, while IPVS is O(1).
Strong Answer:I would work through this systematically, layer by layer, starting from the simplest explanations.First, verify the basics. I would run kubectl get pods -o wide to confirm both Pods are Running and have IPs. Then kubectl get svc to verify the database Service exists and has the correct type and port. Then kubectl get endpoints db-service to check if the Service has endpoints — if this is empty, the Service selector labels do not match the database Pod labels, which is the most common cause of this issue. I have seen teams spend hours debugging networking when the problem was a typo in a label selector.Second, test connectivity from inside the application Pod. I would exec into the microservice Pod (or deploy a netshoot debug Pod in the same namespace) and try: nslookup db-service to verify DNS resolution, then curl -v db-service:5432 or nc -zv db-service 5432 to test TCP connectivity. If DNS fails, the problem is CoreDNS (check kubectl logs -n kube-system -l k8s-app=kube-dns). If DNS works but the connection times out, the problem is network-level.Third, check Network Policies. This is the most likely culprit in a cluster that has any network policies. I would run kubectl get networkpolicy -n <namespace> to see if there are policies affecting either the microservice or the database. If a NetworkPolicy selects the database Pod, it defaults to deny-all ingress, and there must be an explicit ingress rule allowing traffic from the microservice Pod. I would check the policy’s podSelector, namespaceSelector, and port configuration. A common mistake: the network policy allows traffic from app=backend but the new microservice is labeled app=api — the labels do not match.Fourth, bypass the Service and test direct Pod-to-Pod connectivity. I would ping or curl the database Pod IP directly from the microservice Pod. If this works but the Service does not, the issue is in kube-proxy or the Service configuration (wrong targetPort, wrong selector). If even direct Pod-to-Pod connectivity fails, the issue is at the CNI or node level.Fifth, if cross-node communication fails but same-node works, I would check the CNI plugin. Are the Calico/Cilium/Flannel pods healthy? Are the node-to-node routes correct? Is there a firewall or security group on the cloud provider blocking traffic between nodes on the required ports (Calico BGP uses port 179, VXLAN uses UDP 4789)?Follow-up: “The endpoints list is correct and DNS resolves, but the connection still times out. What next?”I would check whether the database Pod is actually listening on the expected port. Exec into the database Pod and run ss -tlnp to see what ports are open. A common issue: the database is configured to listen on 127.0.0.1:5432 (localhost only) instead of 0.0.0.0:5432 (all interfaces). In a container, if the process binds to localhost, only processes in the same Pod (same network namespace) can connect — traffic from other Pods arrives on the eth0 interface, not loopback. The fix is changing the database bind address to 0.0.0.0 or the Pod’s IP. If the port is correct, I would run tcpdump inside the database Pod to see if packets are arriving — if they arrive but get no response, the issue is the application itself (wrong authentication, connection limit reached, etc.).
Strong Answer:Docker’s default bridge network was designed for single-host simplicity. Every container gets an IP on a private 172.17.0.0/16 subnet, connected through a Linux bridge. Containers on the same host can communicate via the bridge, but reaching the outside world or other hosts requires NAT. To expose a service, you publish ports with -p 8080:80, which creates iptables DNAT rules. This works fine for a single developer machine with 5-10 containers, but it creates serious problems at scale.The problems with Docker’s bridge model for orchestration: First, port conflicts — if two containers need port 80, you have to map them to different host ports (8080, 8081), and every client needs to know which host port corresponds to which service. Second, NAT obscures Pod identity — when a container makes an outbound connection, the source IP is the host’s IP (due to SNAT), not the container’s IP. This breaks audit logging, rate limiting, and any policy based on source identity. Third, multi-host communication requires complex overlay setup or manual port forwarding.Kubernetes’ flat network model eliminates all of these problems by design. Every Pod gets a unique, routable IP. No NAT between Pods means the source IP is preserved — if Pod A calls Pod B, Pod B sees Pod A’s real IP. No port mapping means every Pod can use whatever port it wants (even port 80) without conflicts, because each Pod has its own IP address. Service discovery becomes a simple DNS lookup instead of “which host, which port.”The trade-off is complexity in the network layer. The CNI plugin has to solve a harder problem: giving every Pod a routable IP across all nodes. This requires either an overlay network (VXLAN/IPIP tunnels, which add encapsulation overhead) or native routing integration (like AWS VPC CNI allocating real VPC IPs, or Calico using BGP to distribute routes). Docker’s bridge model is simpler to implement but does not scale; Kubernetes’ flat model is harder to implement but scales to thousands of nodes.The design decision reflects Kubernetes’ philosophy: push complexity into the infrastructure layer (CNI plugins, written by networking experts) so that application developers get a simple, predictable model (every Pod has an IP, every Pod can reach every other Pod). Google’s internal Borg system used a similar flat network, and the lessons learned there directly informed Kubernetes’ networking requirements.Follow-up: “When would you still use Docker’s bridge networking instead of Kubernetes?”For local development, CI/CD pipelines, and small single-host deployments where the operational overhead of Kubernetes is not justified. Docker Compose with user-defined bridges gives you DNS-based service discovery, network isolation between projects, and simple port publishing — everything you need for a development environment or a small production deployment with 5-10 services on a single server. The break-even point where Kubernetes networking starts paying for its complexity is roughly when you need multi-host orchestration, auto-scaling, or zero-downtime deployments.
Strong Answer:A service mesh is an infrastructure layer that handles service-to-service communication by injecting a sidecar proxy (typically Envoy) into every Pod. All traffic flows through the sidecar, which provides mTLS encryption, load balancing, retries, circuit breaking, observability (distributed traces, metrics, access logs), and traffic management (canary deployments, traffic splitting) without any changes to application code.When to adopt: I use three concrete criteria. First, you need mTLS everywhere and managing certificates manually is unsustainable — if you have 50+ services across multiple teams and compliance requires encrypted inter-service communication, a service mesh automates certificate issuance, rotation, and verification. Second, you have observability gaps — you cannot answer “which service is calling which, what is the latency at each hop, and where do errors originate” without instrumenting every application individually. A service mesh gives you this for free because the sidecar sees all traffic. Third, you need advanced traffic management — canary deployments where you shift 5% of traffic to a new version and automatically roll back if the error rate exceeds a threshold. This is nearly impossible to do correctly at the application level across multiple languages and frameworks.When it is overkill: If you have fewer than 15-20 services, a single team managing them, and one primary programming language. In that case, a shared HTTP client library with built-in retries, circuit breaking (resilience4j for Java, polly for .NET), and a tracing SDK (OpenTelemetry) gives you 80% of the benefits at 20% of the complexity. If your mTLS needs are limited, you can use cert-manager with SPIFFE for certificate automation without a full mesh.The operational cost is real. Each sidecar consumes 50-100MB of memory and adds 1-3ms of latency per hop. For a cluster with 200 Pods, that is 10-20GB of additional RAM just for sidecars. The control plane (Istiod) needs to be highly available and adds another operational burden. Debugging becomes harder because every network issue has an additional layer — “is the sidecar the problem?” becomes a question you ask on every incident.My recommendation: start with Linkerd if you decide you need a mesh. It is simpler to install, has lower resource overhead (its Rust-based proxy uses about 10MB per sidecar versus Envoy’s 50-100MB), and covers the core use cases (mTLS, observability, retries) without the configuration complexity of Istio. Graduate to Istio only if you need its advanced features like sophisticated traffic routing, fault injection testing, or multi-cluster mesh federation.Follow-up: “How does Cilium’s service mesh approach differ from Istio’s sidecar model?”Cilium takes a fundamentally different approach. Instead of injecting a sidecar proxy into every Pod, Cilium uses eBPF programs running directly in the Linux kernel to implement service mesh features. There is no sidecar, so there is no additional memory overhead per Pod, and the latency is significantly lower because traffic does not need to cross between the application process, the sidecar proxy, and the kernel multiple times. Cilium’s eBPF-based approach handles L3/L4 policies directly in the kernel and uses an optional per-node Envoy proxy (not per-Pod) for L7 features like HTTP routing. The trade-off: eBPF requires a modern Linux kernel (5.4+), the L7 features are less mature than Istio’s, and the debugging toolchain is different (you need to understand eBPF to troubleshoot deeply). For new clusters in 2025-2026, Cilium is increasingly the default choice because it combines CNI, network policy, and service mesh into a single component.
Strong Answer:Intermittent failures with healthy Pods points to a subset problem — something is wrong with some of the backend Pods, or some of the network paths, but not all of them. Let me walk through my investigation.First, I would determine if the failures correlate to specific Pods. I would check the Service’s endpoints (kubectl get endpoints) and then look at per-Pod metrics. If I have an Ingress controller or service mesh, the access logs show which backend Pod handled each request. If 10% of requests fail and I have 10 Pods, there is a good chance one Pod is the culprit — every request routed to that Pod fails.Second, I would check if the failing Pod is actually healthy or if the health check is too shallow. A common scenario: the readiness probe checks /health which returns 200, but the Pod’s database connection pool is exhausted, so actual requests fail. The health check passes (the endpoint responds) but the Pod cannot serve real traffic. The fix is a deeper health check that verifies downstream dependencies, or better yet, implementing dependency-aware readiness that marks the Pod not-ready when its connection pool is saturated.Third, I would check for a rolling deployment in progress. If a new version is being deployed and the new Pods have a bug, the 10% failure rate might correspond to the fraction of new Pods that have rolled out. I would check kubectl rollout status deployment/<name> and kubectl get pods to see if some Pods are running a different image version.Fourth, I would investigate network-level issues. If the failures are timeouts (not application errors), one possible cause is a node with network problems. If the failing Pod is on a node with a degraded network link, all requests routed to that Pod time out. I would correlate the failed request destinations with node placement. Another possibility: the CNI plugin has issues on one node — Calico Felix crashed, routes are missing, and Pods on that node are partially unreachable.Fifth, I would check for resource limits. If a Pod hits its CPU limit, it gets throttled and responds slowly (which can trigger timeouts and appear as 5xx). If it hits its memory limit, it gets OOM-killed and restarted — during the restart, requests to that Pod fail. kubectl top pods and kubectl describe pod (look for OOMKilled restart reasons) help here.The systematic approach: add request-level logging at the Ingress or service mesh layer that records which backend Pod served each request, the response code, and the latency. Then correlate failed requests to specific Pods, specific nodes, or specific time windows (deployment events, autoscaling events). In my experience, 80% of intermittent failures in Kubernetes are caused by one of three things: a single unhealthy Pod that passes health checks, a rolling deployment with a buggy new version, or resource limits causing throttling or OOM kills.Follow-up: “How would you implement a more robust health check to catch this scenario?”I would implement three levels of health checks. The liveness probe stays simple: TCP check or a basic HTTP GET that verifies the process is running and not deadlocked. The readiness probe becomes deeper: it checks the database connection pool (are connections available?), checks circuit breaker state (are downstream services healthy?), and checks internal queue depth (is the Pod overwhelmed?). If any of these fail, the Pod is marked not-ready and removed from the Service’s endpoint list, so kube-proxy stops sending traffic to it. I would also add a startup probe with a longer timeout for applications that take time to initialize (JVM warmup, cache loading) — this prevents the liveness probe from killing Pods that are still starting up.