Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Chapter 4: Container Networking

Isolated containers need to communicate — with each other and the outside world. Let’s implement Docker-style bridge networking! Container networking can feel like the most confusing part of Docker, but it builds on concepts you already know from physical networking, just virtualized in software. A veth pair is a virtual cable with two ends: one plugged into the container, one plugged into a bridge on the host. The bridge is a virtual switch that forwards packets between containers. NAT (via iptables) translates addresses so containers with private IPs can reach the internet. If you have ever set up a home router, you have used these same concepts — your router does NAT, your ethernet switch is a bridge, and each device connected to it is like a container with its own IP.
Prerequisites: Chapter 3: Filesystem
Further Reading: AWS Networking
Time: 3-4 hours
Outcome: Containers with network connectivity

Container Networking Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                      CONTAINER NETWORKING                                    │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   INTERNET                                                                  │
│       ↑                                                                     │
│       │                                                                     │
│   ┌───┴───┐                                                                 │
│   │ eth0  │ 192.168.1.100 (Host physical interface)                        │
│   └───┬───┘                                                                 │
│       │                                                                     │
│       │ NAT (iptables MASQUERADE)                                          │
│       │                                                                     │
│   ┌───┴───────────────────────────────────────────────────────────────┐    │
│   │                   docker0 BRIDGE (172.17.0.1)                      │    │
│   │                   Virtual network switch                           │    │
│   └───┬───────────────────────┬───────────────────────┬───────────────┘    │
│       │                       │                       │                     │
│   ┌───┴───┐               ┌───┴───┐               ┌───┴───┐                │
│   │ veth1 │               │ veth2 │               │ veth3 │   Host side   │
│   └───┬───┘               └───┬───┘               └───┬───┘                │
│       │                       │                       │                     │
│ ══════╪═══════════════════════╪═══════════════════════╪════════════════    │
│       │   Network NS          │   Network NS          │   Network NS       │
│   ┌───┴───┐               ┌───┴───┐               ┌───┴───┐                │
│   │ eth0  │               │ eth0  │               │ eth0  │   Container   │
│   │.17.0.2│               │.17.0.3│               │.17.0.4│   side        │
│   └───────┘               └───────┘               └───────┘                │
│   Container A             Container B             Container C              │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Key Networking Concepts

┌─────────────────────────────────────────────────────────────────────────────┐
│                      NETWORKING BUILDING BLOCKS                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   VETH PAIR (Virtual Ethernet)                                              │
│   ─────────────────────────────                                             │
│   Two virtual interfaces connected like a pipe:                             │
│                                                                              │
│   ┌─────────┐                           ┌─────────┐                         │
│   │  veth0  │ ◄────── packets ──────► │  veth1  │                         │
│   │ (host)  │                           │(container)│                       │
│   └─────────┘                           └─────────┘                         │
│                                                                              │
│   Whatever goes in one end comes out the other!                             │
│                                                                              │
│   ─────────────────────────────────────────────────────────────────────     │
│                                                                              │
│   BRIDGE                                                                    │
│   ──────                                                                    │
│   Virtual network switch that connects multiple interfaces:                 │
│                                                                              │
│   ┌─────────────────────────────────────────────────────────────┐          │
│   │                        docker0 bridge                        │          │
│   │  ┌─────┐  ┌─────┐  ┌─────┐  ┌─────┐  ┌─────┐  ┌─────┐     │          │
│   │  │port1│  │port2│  │port3│  │port4│  │port5│  │port6│     │          │
│   └──┴─────┴──┴─────┴──┴─────┴──┴─────┴──┴─────┴──┴─────┴─────┘          │
│       ↑        ↑        ↑        ↑        ↑        ↑                       │
│      veth    veth     veth     veth     veth     veth                      │
│      pair    pair     pair     pair     pair     pair                      │
│                                                                              │
│   ─────────────────────────────────────────────────────────────────────     │
│                                                                              │
│   IPTABLES (NAT / Port Forwarding)                                         │
│   ────────────────────────────────                                          │
│                                                                              │
│   PREROUTING:  Modify packets BEFORE routing (DNAT for port forwarding)    │
│   FORWARD:     Allow packets to pass through host                           │
│   POSTROUTING: Modify packets AFTER routing (SNAT/MASQUERADE for NAT)      │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Part 1: Network Manager

src/main/java/com/minidocker/network/NetworkManager.java
package com.minidocker.network;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.concurrent.atomic.AtomicInteger;

/**
 * Manages container networking.
 * 
 * Uses Linux networking tools (ip, iptables) to create:
 * - Bridge network
 * - Veth pairs
 * - NAT rules
 * - Port forwarding
 */
public class NetworkManager {
    
    private static final String BRIDGE_NAME = "minidocker0";
    private static final String BRIDGE_IP = "172.18.0.1";
    private static final String NETWORK_CIDR = "172.18.0.0/16";
    private static final int BRIDGE_PREFIX = 16;
    
    private final AtomicInteger ipCounter = new AtomicInteger(2);  // Start at .2
    
    /**
     * Initializes the bridge network (run once at startup).
     */
    public void initializeBridge() throws IOException {
        System.out.println("Initializing bridge network...");
        
        // Check if bridge already exists
        if (interfaceExists(BRIDGE_NAME)) {
            System.out.println("Bridge already exists");
            return;
        }
        
        // Create bridge
        exec("ip", "link", "add", BRIDGE_NAME, "type", "bridge");
        
        // Assign IP address
        exec("ip", "addr", "add", BRIDGE_IP + "/" + BRIDGE_PREFIX, "dev", BRIDGE_NAME);
        
        // Bring up
        exec("ip", "link", "set", BRIDGE_NAME, "up");
        
        // Enable IP forwarding
        exec("sysctl", "-w", "net.ipv4.ip_forward=1");
        
        // Setup NAT for outbound traffic
        exec("iptables", "-t", "nat", "-A", "POSTROUTING",
             "-s", NETWORK_CIDR, "-j", "MASQUERADE");
        
        // Allow forwarding from/to bridge
        exec("iptables", "-A", "FORWARD", "-i", BRIDGE_NAME, "-j", "ACCEPT");
        exec("iptables", "-A", "FORWARD", "-o", BRIDGE_NAME, "-j", "ACCEPT");
        
        System.out.println("✓ Bridge " + BRIDGE_NAME + " created with IP " + BRIDGE_IP);
    }
    
    /**
     * Sets up networking for a container.
     * 
     * @param containerId Container identifier
     * @param pid Container's PID (to move interface into namespace)
     * @return Assigned IP address
     */
    public String setupContainerNetwork(String containerId, int pid) throws IOException {
        String vethHost = "veth" + containerId.substring(0, 6);
        String vethContainer = "eth0";
        String containerIp = "172.18.0." + ipCounter.getAndIncrement();
        
        System.out.println("Setting up network for container " + containerId);
        
        // Create veth pair
        exec("ip", "link", "add", vethHost, "type", "veth", "peer", "name", vethContainer);
        
        // Attach host side to bridge
        exec("ip", "link", "set", vethHost, "master", BRIDGE_NAME);
        exec("ip", "link", "set", vethHost, "up");
        
        // Move container side into container's network namespace
        exec("ip", "link", "set", vethContainer, "netns", String.valueOf(pid));
        
        // Configure container interface (must run inside container's netns)
        execInNetns(pid, "ip", "addr", "add", containerIp + "/16", "dev", vethContainer);
        execInNetns(pid, "ip", "link", "set", vethContainer, "up");
        execInNetns(pid, "ip", "link", "set", "lo", "up");
        execInNetns(pid, "ip", "route", "add", "default", "via", BRIDGE_IP);
        
        System.out.println("✓ Container network configured:");
        System.out.println("  - Interface: " + vethContainer + " (" + vethHost + " on host)");
        System.out.println("  - IP: " + containerIp);
        System.out.println("  - Gateway: " + BRIDGE_IP);
        
        return containerIp;
    }
    
    /**
     * Adds a port forwarding rule.
     * 
     * @param hostPort Port on host
     * @param containerIp Container IP address
     * @param containerPort Port in container
     */
    public void addPortForward(int hostPort, String containerIp, int containerPort) 
            throws IOException {
        
        // DNAT: Redirect incoming traffic to container
        exec("iptables", "-t", "nat", "-A", "PREROUTING",
             "-p", "tcp", "--dport", String.valueOf(hostPort),
             "-j", "DNAT", "--to-destination", containerIp + ":" + containerPort);
        
        // Also for local connections (from host itself)
        exec("iptables", "-t", "nat", "-A", "OUTPUT",
             "-p", "tcp", "--dport", String.valueOf(hostPort),
             "-j", "DNAT", "--to-destination", containerIp + ":" + containerPort);
        
        System.out.println("✓ Port forward: " + hostPort + " -> " + 
                          containerIp + ":" + containerPort);
    }
    
    /**
     * Removes port forwarding rules for a container.
     */
    public void removePortForward(int hostPort, String containerIp, int containerPort) 
            throws IOException {
        
        exec("iptables", "-t", "nat", "-D", "PREROUTING",
             "-p", "tcp", "--dport", String.valueOf(hostPort),
             "-j", "DNAT", "--to-destination", containerIp + ":" + containerPort);
        
        exec("iptables", "-t", "nat", "-D", "OUTPUT",
             "-p", "tcp", "--dport", String.valueOf(hostPort),
             "-j", "DNAT", "--to-destination", containerIp + ":" + containerPort);
    }
    
    /**
     * Cleans up container networking.
     */
    public void cleanup(String containerId) throws IOException {
        String vethHost = "veth" + containerId.substring(0, 6);
        
        // Removing host side of veth automatically removes container side too
        if (interfaceExists(vethHost)) {
            exec("ip", "link", "del", vethHost);
            System.out.println("✓ Cleaned up network for: " + containerId);
        }
    }
    
    /**
     * Cleans up the bridge (run at shutdown).
     */
    public void destroyBridge() throws IOException {
        if (!interfaceExists(BRIDGE_NAME)) {
            return;
        }
        
        // Remove iptables rules
        try {
            exec("iptables", "-t", "nat", "-D", "POSTROUTING",
                 "-s", NETWORK_CIDR, "-j", "MASQUERADE");
            exec("iptables", "-D", "FORWARD", "-i", BRIDGE_NAME, "-j", "ACCEPT");
            exec("iptables", "-D", "FORWARD", "-o", BRIDGE_NAME, "-j", "ACCEPT");
        } catch (IOException e) {
            // Rules might not exist
        }
        
        // Bring down and delete bridge
        exec("ip", "link", "set", BRIDGE_NAME, "down");
        exec("ip", "link", "del", BRIDGE_NAME);
        
        System.out.println("✓ Bridge " + BRIDGE_NAME + " destroyed");
    }
    
    private boolean interfaceExists(String name) throws IOException {
        try {
            exec("ip", "link", "show", name);
            return true;
        } catch (IOException e) {
            return false;
        }
    }
    
    private void exec(String... command) throws IOException {
        ProcessBuilder pb = new ProcessBuilder(command);
        pb.redirectErrorStream(true);
        
        Process p = pb.start();
        
        try (BufferedReader reader = new BufferedReader(
                new InputStreamReader(p.getInputStream()))) {
            String line;
            while ((line = reader.readLine()) != null) {
                // Could log output if needed
            }
        }
        
        try {
            int exitCode = p.waitFor();
            if (exitCode != 0) {
                throw new IOException("Command failed: " + String.join(" ", command) + 
                                     " (exit code: " + exitCode + ")");
            }
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
            throw new IOException("Command interrupted", e);
        }
    }
    
    private void execInNetns(int pid, String... command) throws IOException {
        String[] nsenterCmd = new String[command.length + 4];
        nsenterCmd[0] = "nsenter";
        nsenterCmd[1] = "-t";
        nsenterCmd[2] = String.valueOf(pid);
        nsenterCmd[3] = "-n";  // Network namespace only
        System.arraycopy(command, 0, nsenterCmd, 4, command.length);
        
        exec(nsenterCmd);
    }
}

Part 2: Port Mapping

src/main/java/com/minidocker/network/PortMapping.java
package com.minidocker.network;

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * Represents a port mapping (host:container).
 * 
 * Supports formats:
 * - "8080:80"       -> host 8080 to container 80
 * - "8080:80/tcp"   -> TCP only
 * - "8080:80/udp"   -> UDP only
 */
public class PortMapping {
    
    private static final Pattern PATTERN = Pattern.compile(
        "(\\d+):(\\d+)(?:/(tcp|udp))?"
    );
    
    private final int hostPort;
    private final int containerPort;
    private final Protocol protocol;
    
    public enum Protocol {
        TCP, UDP, BOTH
    }
    
    public PortMapping(int hostPort, int containerPort, Protocol protocol) {
        this.hostPort = hostPort;
        this.containerPort = containerPort;
        this.protocol = protocol;
    }
    
    public static PortMapping parse(String spec) {
        Matcher matcher = PATTERN.matcher(spec);
        if (!matcher.matches()) {
            throw new IllegalArgumentException("Invalid port mapping: " + spec);
        }
        
        int hostPort = Integer.parseInt(matcher.group(1));
        int containerPort = Integer.parseInt(matcher.group(2));
        
        Protocol protocol = Protocol.BOTH;
        if (matcher.group(3) != null) {
            protocol = Protocol.valueOf(matcher.group(3).toUpperCase());
        }
        
        return new PortMapping(hostPort, containerPort, protocol);
    }
    
    public static List<PortMapping> parseAll(String... specs) {
        List<PortMapping> mappings = new ArrayList<>();
        for (String spec : specs) {
            mappings.add(parse(spec));
        }
        return mappings;
    }
    
    public int getHostPort() { return hostPort; }
    public int getContainerPort() { return containerPort; }
    public Protocol getProtocol() { return protocol; }
    
    @Override
    public String toString() {
        String protoStr = protocol == Protocol.BOTH ? "" : "/" + protocol.name().toLowerCase();
        return hostPort + ":" + containerPort + protoStr;
    }
}

Part 3: Container-to-Container Communication

┌─────────────────────────────────────────────────────────────────────────────┐
│                    CONTAINER TO CONTAINER                                    │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   Container A (172.18.0.2)              Container B (172.18.0.3)            │
│   ┌─────────────────────┐                ┌─────────────────────┐            │
│   │                     │                │                     │            │
│   │  Web App            │  ──────────►   │  Database           │            │
│   │  curl 172.18.0.3    │                │  Port 5432          │            │
│   │                     │                │                     │            │
│   └──────────┬──────────┘                └──────────┬──────────┘            │
│              │ eth0                                  │ eth0                  │
│              │                                       │                       │
│   ═══════════╪═══════════════════════════════════════╪═══════════════════   │
│              │ veth                                  │ veth                  │
│              │                                       │                       │
│         ┌────┴────────────────────────────────────────┴────┐                │
│         │                minidocker0 bridge                 │                │
│         │                   172.18.0.1                      │                │
│         └───────────────────────────────────────────────────┘                │
│                                                                              │
│   The bridge acts as a switch - containers on the same bridge               │
│   can communicate directly using their IP addresses!                        │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Part 4: DNS Resolution (Container Names)

src/main/java/com/minidocker/network/DnsResolver.java
package com.minidocker.network;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

/**
 * Simple DNS resolver that maps container names to IPs.
 * 
 * Docker uses an embedded DNS server; we'll use /etc/hosts for simplicity.
 */
public class DnsResolver {
    
    private final Map<String, String> nameToIp = new ConcurrentHashMap<>();
    
    /**
     * Registers a container name.
     */
    public void register(String name, String ip) {
        nameToIp.put(name, ip);
    }
    
    /**
     * Unregisters a container name.
     */
    public void unregister(String name) {
        nameToIp.remove(name);
    }
    
    /**
     * Resolves a container name to IP.
     */
    public String resolve(String name) {
        return nameToIp.get(name);
    }
    
    /**
     * Writes /etc/hosts inside container with all container IPs.
     */
    public void writeHostsFile(Path containerEtc, String ownHostname, String ownIp) 
            throws IOException {
        
        StringBuilder hosts = new StringBuilder();
        
        // Standard entries
        hosts.append("127.0.0.1\tlocalhost\n");
        hosts.append("::1\tlocalhost ip6-localhost ip6-loopback\n");
        
        // Container's own entry
        hosts.append(ownIp).append("\t").append(ownHostname).append("\n");
        
        // Other containers
        for (Map.Entry<String, String> entry : nameToIp.entrySet()) {
            if (!entry.getKey().equals(ownHostname)) {
                hosts.append(entry.getValue()).append("\t").append(entry.getKey()).append("\n");
            }
        }
        
        Path hostsFile = containerEtc.resolve("hosts");
        Files.writeString(hostsFile, hosts.toString());
    }
    
    /**
     * Writes /etc/resolv.conf inside container.
     */
    public void writeResolvConf(Path containerEtc, String dnsServer) throws IOException {
        String content = "nameserver " + dnsServer + "\n";
        Path resolvConf = containerEtc.resolve("resolv.conf");
        Files.writeString(resolvConf, content);
    }
}

Part 5: Integrated Example

// Full container with networking
public class Container {
    
    private final NetworkManager network;
    private final DnsResolver dns;
    private final List<PortMapping> portMappings;
    
    private String containerIp;
    
    public void run() throws Exception {
        // ... previous setup code ...
        
        // Setup networking
        containerIp = network.setupContainerNetwork(id, pid);
        
        // Register with DNS
        dns.register(hostname, containerIp);
        
        // Setup port forwarding
        for (PortMapping pm : portMappings) {
            network.addPortForward(pm.getHostPort(), containerIp, pm.getContainerPort());
        }
        
        // Write /etc/hosts and /etc/resolv.conf
        dns.writeHostsFile(rootfs.resolve("etc"), hostname, containerIp);
        dns.writeResolvConf(rootfs.resolve("etc"), "8.8.8.8");
        
        // ... continue with fork and exec ...
    }
    
    private void cleanup() {
        // ... previous cleanup ...
        
        // Cleanup networking
        for (PortMapping pm : portMappings) {
            network.removePortForward(pm.getHostPort(), containerIp, pm.getContainerPort());
        }
        network.cleanup(id);
        dns.unregister(hostname);
    }
}

Testing Networking

# Start container with port mapping
java Container --port 8080:80 mywebserver /usr/sbin/nginx

# Test from host
curl http://localhost:8080

# Start another container
java Container mydb /usr/bin/postgres

# From web container, connect to database
# The containers can communicate via bridge!
ping 172.18.0.3

Network Modes Comparison

┌─────────────────────────────────────────────────────────────────────────────┐
│                         NETWORK MODES                                        │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   BRIDGE (Default - what we implemented)                                    │
│   ──────                                                                    │
│   - Containers get isolated network namespace                               │
│   - Connected to bridge, can talk to each other                             │
│   - NAT for outbound, port forwarding for inbound                          │
│   - Best for: Most containers, web apps, microservices                     │
│                                                                              │
│   HOST                                                                      │
│   ────                                                                      │
│   - Container shares host's network namespace                               │
│   - No network isolation                                                    │
│   - Best performance (no NAT overhead)                                      │
│   - Best for: Performance-critical apps, network tools                     │
│                                                                              │
│   NONE                                                                      │
│   ────                                                                      │
│   - Container has only loopback interface                                   │
│   - Complete network isolation                                              │
│   - Best for: Security-sensitive apps, batch jobs                          │
│                                                                              │
│   CONTAINER:<id>                                                            │
│   ─────────────                                                             │
│   - Share network namespace with another container                          │
│   - Containers see same interfaces, ports, etc.                            │
│   - Best for: Sidecar patterns (like Kubernetes pods)                      │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Exercises

Add host networking mode:
// Skip network namespace creation
// Container uses host's network stack directly
// No veth pairs, no bridge attachment
// Performance benefit but no isolation
Add basic network policies:
// Block traffic between specific containers
// Use iptables FORWARD chain with specific rules
// Example: Container A can reach B but not C
Extend port forwarding for UDP:
// Add iptables rules for UDP protocol
// iptables -t nat -A PREROUTING -p udp ...
// Test with DNS (port 53 UDP)

Key Takeaways

Veth Pairs

Virtual ethernet pairs connect container to host network

Bridge

Virtual switch connects multiple containers together

NAT

iptables MASQUERADE enables outbound connectivity

Port Forwarding

DNAT rules expose container ports on host

What’s Next?

In Chapter 5: Images, we’ll implement:
  • OCI image format
  • Image layers and manifests
  • Pulling images from registries
  • Building images

Next: Images

Learn about container image format

Interview Deep-Dive

Strong Answer:
  • The application inside the container sends a packet to its default gateway (the bridge IP, e.g., 172.17.0.1) via its eth0 interface, which is actually one end of a veth pair. The packet travels through the veth pipe to the host-side interface, which is attached to the docker0 bridge.
  • The bridge acts as a Layer 2 switch. It forwards the packet based on MAC address. Since the destination is outside the bridge network, the packet goes to the bridge interface itself (172.17.0.1), which is a Layer 3 endpoint on the host.
  • The host kernel’s routing table sends the packet toward the host’s physical interface (e.g., eth0). But first, the packet passes through the iptables POSTROUTING chain, where a MASQUERADE rule rewrites the source IP from the container’s private IP (172.17.0.2) to the host’s public IP. This is Source NAT (SNAT). Without it, the response packet would not know how to route back to the container’s private address.
  • The response packet arrives at the host’s physical interface, and the conntrack module (which tracks connection state) recognizes it belongs to the NATed connection. It rewrites the destination IP back to the container’s private IP and forwards it through the bridge and veth pair back to the container.
  • The key insight is that this is the same NAT mechanism that home routers use. Every container behind a bridge is analogous to every device behind your home router — they share a public IP and rely on connection tracking to demultiplex return traffic.
Follow-up: What happens when two containers on the same bridge communicate? Does traffic leave the host?No. The bridge handles it entirely in kernel space. Container A sends a packet to Container B’s IP. The bridge learns Container B’s MAC address from its veth port and forwards the frame directly — it never touches the host’s physical interface or any iptables NAT rules. This is pure Layer 2 switching, which is why same-bridge container communication has extremely low latency (sub-millisecond). The iptables FORWARD chain rules must allow this traffic, which is why Docker adds iptables -A FORWARD -i docker0 -o docker0 -j ACCEPT during bridge setup.
Strong Answer:
  • Port forwarding uses iptables DNAT (Destination NAT) rules in the PREROUTING chain. When you run docker run -p 8080:80, Docker adds a rule: packets arriving at host port 8080 get their destination rewritten to the container’s IP at port 80. The OUTPUT chain gets a similar rule for traffic originating from the host itself.
  • The performance cost is real but often overstated. Each packet traverses the iptables NAT table (PREROUTING or OUTPUT), gets its header rewritten, passes through the bridge, and traverses the veth pair. This adds roughly 5-15 microseconds of latency per packet compared to host networking. For most web applications, this is negligible. For high-throughput services processing millions of packets per second, the overhead accumulates.
  • Host networking (--network host) eliminates all of this. The container shares the host’s network namespace directly — no veth pairs, no bridge, no NAT. The container binds to the host’s actual network interfaces. Latency is identical to bare metal. The trade-off is zero network isolation: the container can see and bind to any host port, and port conflicts between containers become your problem.
  • In Kubernetes, this maps to the hostNetwork: true pod spec, which is commonly used for CNI plugins, ingress controllers, and monitoring agents that need direct hardware access. For application pods, Kubernetes uses CNI plugins that implement pod networking through veth pairs and overlay networks, similar to Docker’s bridge model but cluster-wide.
Follow-up: At massive scale with thousands of containers, what networking problems emerge that you would not see at small scale?The biggest issue is iptables rule explosion. Docker adds several iptables rules per container (NAT, filter, and sometimes mangle). With thousands of containers, the iptables rule set becomes enormous, and rule evaluation becomes linear — every packet checks every rule until a match. This can add milliseconds of latency per packet. The solution is IPVS (IP Virtual Server), which uses hash tables instead of linear rule chains, or eBPF-based solutions like Cilium that bypass iptables entirely. The second issue is ARP table exhaustion: thousands of containers on a single bridge generate thousands of ARP entries, which can overflow the kernel’s neighbor table (tuned via net.ipv4.neigh.default.gc_thresh3). The third is bridge MAC table overflow, causing the bridge to flood packets to all ports instead of switching to the correct one.
Strong Answer:
  • Docker’s bridge model gives each container a private IP within a single host. Containers on different hosts cannot communicate by IP without additional overlay networking. Port forwarding is required for external access, and container IPs are ephemeral and host-scoped.
  • Kubernetes mandates a flat networking model: every pod gets a cluster-routable IP, and every pod can reach every other pod by IP without NAT. This eliminates the port-mapping complexity entirely — a service listening on port 80 in a pod is accessible at that pod’s IP on port 80 from anywhere in the cluster.
  • The problem this solves is service discovery and load balancing at scale. In Docker’s model, you need to track which host a container is on and which port it is mapped to. In Kubernetes, you address pods by IP (or more commonly, by Service DNS name), and the networking layer handles the rest. This is what makes horizontal scaling trivial — spin up more pods, and the Service load balancer automatically includes them.
  • The implementation varies by CNI plugin: Calico uses BGP to distribute pod routes across nodes; Flannel uses VXLAN overlays; Cilium uses eBPF for packet forwarding. Each has different performance characteristics, but they all provide the same flat-network abstraction.
Follow-up: What is the overhead of overlay networks like VXLAN compared to native routing?VXLAN encapsulates Layer 2 frames inside UDP packets, adding a 50-byte header overhead per packet and requiring encapsulation/decapsulation at each hop. This adds roughly 10-20% throughput reduction and 100-200 microseconds of latency compared to native routing. Calico with BGP avoids this by distributing routes natively — packets are routed at Layer 3 without encapsulation, achieving near-bare-metal performance. The trade-off is that BGP requires network infrastructure support (routers that peer with the nodes), while VXLAN works on any network. For latency-sensitive applications, the choice of CNI plugin can be as impactful as the choice of programming language.