Linux Namespaces - The foundation of container isolation

Namespaces Deep Dive

Linux namespaces are the core technology enabling container isolation. Understanding them deeply is essential for infrastructure engineers working with Docker, Kubernetes, and any container-based systems.

Interview Frequency: Very High (especially at infrastructure companies)
Key Topics: Namespace types, creation mechanisms, container implementation
Time to Master: 12-14 hours

What Are Namespaces?

Namespaces partition kernel resources so that processes see isolated views of the system.

Namespace Types

Namespace	Flag	What it Isolates	Kernel Version
Mount	`CLONE_NEWNS`	Mount points	2.4.19 (2002)
UTS	`CLONE_NEWUTS`	Hostname, domain name	2.6.19 (2006)
IPC	`CLONE_NEWIPC`	IPC, message queues, semaphores	2.6.19 (2006)
PID	`CLONE_NEWPID`	Process IDs	2.6.24 (2008)
Network	`CLONE_NEWNET`	Network stack	2.6.29 (2009)
User	`CLONE_NEWUSER`	User and group IDs	3.8 (2013)
Cgroup	`CLONE_NEWCGROUP`	Cgroup root	4.6 (2016)
Time	`CLONE_NEWTIME`	System clocks	5.6 (2020)

PID Namespace

Each PID namespace has its own PID numbering, starting from 1.

┌─────────────────────────────────────────────────────────────────────────────┐
│                         PID NAMESPACE HIERARCHY                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   Root PID Namespace (Host)                                                  │
│   ┌────────────────────────────────────────────────────────────────────┐    │
│   │  PID 1: systemd                                                     │    │
│   │  PID 100: sshd                                                      │    │
│   │  PID 5000: containerd                                               │    │
│   │  PID 5001: container init ──────────────────────────────────────┐  │    │
│   │  PID 5002: container app                                        │  │    │
│   │  PID 5050: container init ────────────────────────────────┐     │  │    │
│   │  PID 5051: container app                                  │     │  │    │
│   └───────────────────────────────────────────────────────────│─────│──┘    │
│                                                               │     │        │
│   Container B PID Namespace                     Container A PID Namespace   │
│   ┌────────────────────────────┐              ┌────────────────────────────┐│
│   │  PID 1: /bin/sh (5050)     │              │  PID 1: /bin/sh (5001)     ││
│   │  PID 2: app (5051)         │              │  PID 2: app (5002)         ││
│   │                            │              │                            ││
│   │  Sees only PIDs 1, 2       │              │  Sees only PIDs 1, 2       ││
│   │  Cannot see host PIDs      │              │  Cannot see host PIDs      ││
│   └────────────────────────────┘              └────────────────────────────┘│
│                                                                              │
│   Key Properties:                                                            │
│   - Nested hierarchy (parent can see child PIDs)                            │
│   - PID 1 in namespace is "init" (reaps orphans)                            │
│   - Signals from parent namespace use host PIDs                             │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

PID 1 Responsibilities

// PID 1 in a container must:
// 1. Reap zombie processes
// 2. Forward signals appropriately
// 3. Not exit (or container dies)

// Simple init process
#include <signal.h>
#include <sys/wait.h>

void sigchld_handler(int sig) {
    while (waitpid(-1, NULL, WNOHANG) > 0);
}

int main() {
    signal(SIGCHLD, sigchld_handler);
    
    // Fork the actual application
    pid_t child = fork();
    if (child == 0) {
        execv("/app", argv);
    }
    
    // Wait forever, reaping children
    while (1) {
        pause();
    }
}

PID Namespace Demo

# Create new PID namespace
sudo unshare --pid --fork --mount-proc bash

# Inside new namespace:
ps aux  # Only shows processes in this namespace
echo $$  # PID is 1!

# View from host:
# The bash process has a different PID in host namespace

Network Namespace

Isolates the entire network stack: interfaces, routing, iptables, sockets.

┌─────────────────────────────────────────────────────────────────────────────┐
│                        NETWORK NAMESPACE ISOLATION                           │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   Host Network Namespace                                                     │
│   ┌────────────────────────────────────────────────────────────────────┐    │
│   │                                                                     │    │
│   │  Physical Interface: eth0 (192.168.1.100)                          │    │
│   │  Bridge: docker0 (172.17.0.1)                                      │    │
│   │  Routes: default via 192.168.1.1                                   │    │
│   │  iptables: NAT rules for containers                                │    │
│   │                                                                     │    │
│   │  ┌──────────────────┐                                              │    │
│   │  │  veth-host-a     │ ←─────────────────────────────────┐          │    │
│   │  └──────────────────┘                                   │          │    │
│   │  ┌──────────────────┐                                   │          │    │
│   │  │  veth-host-b     │ ←─────────────────────────────┐   │          │    │
│   │  └──────────────────┘                               │   │          │    │
│   │                                                      │   │          │    │
│   └──────────────────────────────────────────────────────│───│──────────┘    │
│                                                          │   │               │
│   Container A Network NS                    Container B Network NS           │
│   ┌────────────────────────────┐          ┌────────────────────────────┐    │
│   │                            │          │                            │    │
│   │  Interface: eth0           │ ←──(veth pair)──→ veth-host-a        │    │
│   │  IP: 172.17.0.2           │          │  Interface: eth0           │    │
│   │  Routes: default via      │          │  IP: 172.17.0.3           │    │
│   │          172.17.0.1       │          │  Routes: default via      │    │
│   │  Loopback: lo             │          │          172.17.0.1       │    │
│   │                            │          │                            │    │
│   │  Own iptables rules        │          │  Own iptables rules        │    │
│   │  Own routing table         │          │  Own routing table         │    │
│   │  Own sockets               │          │  Own sockets               │    │
│   │                            │          │                            │    │
│   └────────────────────────────┘          └────────────────────────────┘    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Creating Network Namespaces

# Create network namespace
sudo ip netns add container1

# List network namespaces
ip netns list

# Execute command in namespace
sudo ip netns exec container1 ip addr

# Create veth pair
sudo ip link add veth-host type veth peer name veth-container

# Move one end to container namespace
sudo ip link set veth-container netns container1

# Configure interfaces
sudo ip addr add 10.0.0.1/24 dev veth-host
sudo ip link set veth-host up

sudo ip netns exec container1 ip addr add 10.0.0.2/24 dev veth-container
sudo ip netns exec container1 ip link set veth-container up
sudo ip netns exec container1 ip link set lo up

# Test connectivity
sudo ip netns exec container1 ping 10.0.0.1

# Enable NAT for internet access
sudo iptables -t nat -A POSTROUTING -s 10.0.0.0/24 -j MASQUERADE
sudo ip netns exec container1 ip route add default via 10.0.0.1

Mount Namespace

Isolates mount points - each namespace can have different filesystem views.

┌─────────────────────────────────────────────────────────────────────────────┐
│                        MOUNT NAMESPACE ISOLATION                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   Host Mount Namespace                                                       │
│   /                                                                          │
│   ├── bin/                                                                   │
│   ├── etc/                                                                   │
│   ├── home/                                                                  │
│   ├── var/                                                                   │
│   │   └── lib/                                                              │
│   │       └── docker/                                                       │
│   │           └── overlay2/                                                 │
│   └── ...                                                                   │
│                                                                              │
│   Container Mount Namespace (via overlay filesystem)                         │
│   /                                  ← overlay mount (merged view)           │
│   ├── bin/ → ubuntu:latest          ← image layer (read-only)              │
│   ├── etc/ → merged                 ← overlay of layers                     │
│   ├── var/ → container layer        ← writable layer                        │
│   ├── proc → new procfs             ← container-specific                    │
│   ├── sys → new sysfs               ← container-specific                    │
│   ├── dev → device subset           ← limited devices                       │
│   └── app/ → bind mount             ← volume from host                      │
│                                                                              │
│   Mount Propagation:                                                         │
│   - private: mounts not visible outside namespace                           │
│   - shared: mounts propagate to peer namespaces                             │
│   - slave: receives but doesn't send mount events                           │
│   - unbindable: cannot be bind mounted                                      │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Container Root Filesystem

# Create container-like filesystem isolation
sudo unshare --mount --fork bash

# Make mounts private (don't leak to host)
mount --make-rprivate /

# Create new root
mkdir -p /tmp/newroot/{bin,lib,lib64,proc,sys,dev}

# Copy busybox for a minimal root
cp /bin/busybox /tmp/newroot/bin/

# Mount special filesystems
mount -t proc proc /tmp/newroot/proc
mount -t sysfs sys /tmp/newroot/sys

# Change root
cd /tmp/newroot
pivot_root . .
umount -l .

# Now we're in new root
/bin/busybox sh

User Namespace

Maps user/group IDs between namespace and host. Enables rootless containers.

┌─────────────────────────────────────────────────────────────────────────────┐
│                        USER NAMESPACE MAPPING                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   Host                              Container                                │
│   ┌────────────────────────┐       ┌────────────────────────┐               │
│   │                        │       │                        │               │
│   │  UID 0: root           │       │  UID 0: root ──────────│──┐            │
│   │  UID 1000: alice ──────│───────│→ UID 0 (mapped!)       │  │            │
│   │  UID 1001: bob         │       │  UID 1: app            │  │            │
│   │  ...                   │       │  ...                   │  │            │
│   │                        │       │                        │  │            │
│   │  GID 1000: alice ──────│───────│→ GID 0                 │  │            │
│   │                        │       │                        │  │            │
│   └────────────────────────┘       └────────────────────────┘  │            │
│                                                                 │            │
│   Mapping file (/proc/PID/uid_map):                            │            │
│   ┌─────────────────────────────────────────────────────────┐  │            │
│   │  0 1000 1     ← Container UID 0 = Host UID 1000        │  │            │
│   │  1 100000 65536  ← Container 1-65536 = Host 100000+    │──┘            │
│   └─────────────────────────────────────────────────────────┘               │
│                                                                              │
│   Effect:                                                                    │
│   - "root" in container is unprivileged user on host                        │
│   - Files created with UID 0 appear as UID 1000 on host                    │
│   - Enables rootless containers (Podman, Docker rootless)                   │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

User Namespace Demo

# Create user namespace (as unprivileged user)
unshare --user --map-root-user bash

# Check identity
id  # Shows uid=0(root) gid=0(root)
whoami  # Shows root

# But we're not really root on the host
cat /proc/self/uid_map
# 0  1000  1  (container 0 = host 1000)

# Try to access host files
cat /etc/shadow  # Permission denied (not real root)

Rootless Containers

# Run rootless container with Podman
podman run -it --rm alpine sh

# Inside container: appears as root
id  # uid=0(root)

# On host: process runs as your user
ps aux | grep alpine  # Shows your username, not root

Creating Namespaces

System Calls

#include <sched.h>

// Method 1: clone() with namespace flags
int child_fn(void *arg) {
    // New process in new namespace(s)
    execv("/bin/sh", NULL);
    return 0;
}

int main() {
    char stack[8192];
    
    int flags = CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWNET | SIGCHLD;
    
    pid_t child = clone(child_fn, stack + sizeof(stack), flags, NULL);
    waitpid(child, NULL, 0);
    
    return 0;
}

// Method 2: unshare() - move current process to new namespace
if (unshare(CLONE_NEWPID | CLONE_NEWNS) == -1) {
    perror("unshare");
}

// Method 3: setns() - join existing namespace
int fd = open("/proc/1234/ns/net", O_RDONLY);
if (setns(fd, CLONE_NEWNET) == -1) {
    perror("setns");
}
close(fd);

Namespace Files

# View namespace of a process
ls -la /proc/$$/ns/
# lrwxrwxrwx 1 user user 0 Nov 29 10:00 cgroup -> 'cgroup:[4026531835]'
# lrwxrwxrwx 1 user user 0 Nov 29 10:00 ipc -> 'ipc:[4026531839]'
# lrwxrwxrwx 1 user user 0 Nov 29 10:00 mnt -> 'mnt:[4026531840]'
# lrwxrwxrwx 1 user user 0 Nov 29 10:00 net -> 'net:[4026531992]'
# lrwxrwxrwx 1 user user 0 Nov 29 10:00 pid -> 'pid:[4026531836]'
# lrwxrwxrwx 1 user user 0 Nov 29 10:00 user -> 'user:[4026531837]'
# lrwxrwxrwx 1 user user 0 Nov 29 10:00 uts -> 'uts:[4026531838]'

# Compare namespaces
readlink /proc/$$/ns/net
readlink /proc/1/ns/net  # Different if in different namespace

# Enter namespace
nsenter --target 1234 --net --pid bash

Container Implementation

How Docker/runc actually creates containers:

┌─────────────────────────────────────────────────────────────────────────────┐
│                      CONTAINER CREATION FLOW                                 │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   docker run ubuntu bash                                                     │
│        │                                                                     │
│        ▼                                                                     │
│   1. Docker daemon receives request                                          │
│        │                                                                     │
│        ▼                                                                     │
│   2. containerd prepares container                                           │
│      - Pull image if needed                                                  │
│      - Prepare overlay filesystem                                            │
│      - Generate OCI runtime spec                                            │
│        │                                                                     │
│        ▼                                                                     │
│   3. runc creates container:                                                 │
│      a. Create namespaces:                                                   │
│         clone(CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWNET | ...)             │
│                                                                              │
│      b. Set up cgroups:                                                      │
│         mkdir /sys/fs/cgroup/mycgroup                                       │
│         echo limits > memory.max, cpu.max, etc.                             │
│                                                                              │
│      c. Set up rootfs:                                                       │
│         mount("overlay", "/", ...)                                          │
│         pivot_root(newroot, oldroot)                                        │
│                                                                              │
│      d. Set up networking:                                                   │
│         ip link add veth pair                                               │
│         ip link set to namespace                                            │
│                                                                              │
│      e. Apply security:                                                      │
│         seccomp(filter)                                                      │
│         drop capabilities                                                    │
│         apply AppArmor/SELinux                                              │
│                                                                              │
│      f. Execute entrypoint:                                                  │
│         execve("/bin/bash", ...)                                            │
│        │                                                                     │
│        ▼                                                                     │
│   4. Container running!                                                      │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Minimal Container Runtime

// minimal_container.c - Educational container runtime
#define _GNU_SOURCE
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mount.h>
#include <sys/wait.h>
#include <unistd.h>

#define STACK_SIZE (1024 * 1024)

static char child_stack[STACK_SIZE];

int child_fn(void *arg) {
    char **argv = (char **)arg;
    
    // Set hostname
    sethostname("container", 9);
    
    // Mount proc
    mount("proc", "/proc", "proc", 0, NULL);
    
    // Execute command
    execvp(argv[0], argv);
    perror("execvp");
    return 1;
}

int main(int argc, char **argv) {
    if (argc < 2) {
        fprintf(stderr, "Usage: %s <command>\n", argv[0]);
        return 1;
    }
    
    int flags = CLONE_NEWPID | CLONE_NEWUTS | CLONE_NEWNS | SIGCHLD;
    
    pid_t child = clone(
        child_fn,
        child_stack + STACK_SIZE,
        flags,
        &argv[1]
    );
    
    if (child == -1) {
        perror("clone");
        return 1;
    }
    
    waitpid(child, NULL, 0);
    return 0;
}

Lab Exercises

Lab 1: PID Namespace Exploration

Objective: Understand PID namespace hierarchy

# Create nested PID namespaces
sudo unshare --pid --fork bash -c '
    echo "Level 1 PID: $$"
    unshare --pid --fork bash -c "
        echo \"Level 2 PID: \$$\"
        ps aux
        sleep infinity
    " &
    ps aux
    wait
'

# View from host
ps aux | grep sleep
# Shows actual PID (not 1)

# Check namespace relationship
sudo ls -la /proc/<outer-pid>/ns/
sudo ls -la /proc/<inner-pid>/ns/

Lab 2: Network Namespace Networking

Objective: Build container networking from scratch

# Create two "containers" that can communicate

# Create namespaces
sudo ip netns add container1
sudo ip netns add container2

# Create bridge
sudo ip link add br0 type bridge
sudo ip addr add 10.0.0.1/24 dev br0
sudo ip link set br0 up

# Create veth pairs
sudo ip link add veth1 type veth peer name veth1-br
sudo ip link add veth2 type veth peer name veth2-br

# Move to namespaces
sudo ip link set veth1 netns container1
sudo ip link set veth2 netns container2

# Connect to bridge
sudo ip link set veth1-br master br0
sudo ip link set veth2-br master br0
sudo ip link set veth1-br up
sudo ip link set veth2-br up

# Configure container1
sudo ip netns exec container1 ip addr add 10.0.0.2/24 dev veth1
sudo ip netns exec container1 ip link set veth1 up
sudo ip netns exec container1 ip link set lo up

# Configure container2
sudo ip netns exec container2 ip addr add 10.0.0.3/24 dev veth2
sudo ip netns exec container2 ip link set veth2 up
sudo ip netns exec container2 ip link set lo up

# Test connectivity
sudo ip netns exec container1 ping -c 3 10.0.0.3

# Cleanup
sudo ip netns del container1
sudo ip netns del container2
sudo ip link del br0

Lab 3: Build a Minimal Container

Objective: Create container-like isolation

#!/bin/bash
# mini-container.sh

set -e

ROOTFS="/tmp/container-root"
CONTAINER_NAME="mini-container"

# Create rootfs using debootstrap or busybox
mkdir -p $ROOTFS
if ! [ -f "$ROOTFS/bin/sh" ]; then
    # Use busybox for minimal root
    mkdir -p $ROOTFS/{bin,proc,sys,dev,tmp}
    cp /bin/busybox $ROOTFS/bin/
    for cmd in sh ls ps cat echo mount; do
        ln -sf busybox $ROOTFS/bin/$cmd
    done
fi

# Run container
sudo unshare \
    --mount \
    --uts \
    --ipc \
    --pid \
    --fork \
    /bin/bash -c "
        # Set hostname
        hostname $CONTAINER_NAME
        
        # Mount proc and sys
        mount -t proc proc $ROOTFS/proc
        mount -t sysfs sys $ROOTFS/sys
        
        # Change root
        cd $ROOTFS
        mkdir -p .oldroot
        pivot_root . .oldroot
        
        # Unmount old root
        umount -l /.oldroot
        rmdir /.oldroot
        
        # Run shell
        exec /bin/sh
    "

Interview Questions

Q1: How do containers differ from VMs in terms of isolation?

Answer:

Aspect	Containers	VMs
Kernel	Shared with host	Separate kernel per VM
Isolation	Namespaces + cgroups	Hardware virtualization
Overhead	~1-2%	~5-15%
Boot time	Milliseconds	Seconds to minutes
Isolation strength	Process-level	Hardware-level

Key differences:

Containers share kernel (same syscalls, same vulnerabilities)
VMs have complete kernel isolation
Container escape = host access; VM escape = hypervisor access
Containers are lighter but less isolated

When to use VMs: Multi-tenant, untrusted workloads, different kernel requirements When to use containers: Single-tenant, microservices, CI/CD

Q2: Explain how rootless containers work

Answer:Mechanism:

User namespace maps container root to unprivileged host user
No actual root privileges on host
Uses subordinate UIDs/GIDs from /etc/subuid, /etc/subgid

Implementation:

Host user alice (UID 1000)
Container root (UID 0) → mapped to host UID 100000
Container UID 1 → mapped to host UID 100001

Benefits:

Container compromise = unprivileged access
No root daemon required
Better security posture

Limitations:

Can’t bind to ports < 1024 (without capabilities)
Some syscalls restricted
Network namespace requires slirp4netns

Q3: What happens to orphaned processes in a PID namespace?

Answer:In regular Linux:

Orphaned process is adopted by PID 1 (init)
Init reaps zombie when process exits

In PID namespace:

Orphaned process adopted by namespace’s PID 1
If PID 1 doesn’t reap, zombies accumulate
If PID 1 exits, all processes in namespace killed

Docker behavior:

Uses tini or —init flag for proper init
Reaps zombies and forwards signals
Without init: potential zombie accumulation

Why it matters:

Zombie processes consume PID table entries
Application might not handle SIGCHLD
Proper init is critical for long-running containers

Q4: How does Docker networking work under the hood?

Answer:Bridge networking (default):

Create bridge: docker0 bridge interface on host
Per container:
- Create veth pair
- One end in container namespace (eth0)
- Other end connected to docker0
IP assignment: Docker’s IPAM assigns from subnet
Routing:
- Container default route via docker0
- Host NATs outgoing (iptables MASQUERADE)
- Port mapping via DNAT rules

iptables rules:

-A POSTROUTING -s 172.17.0.0/16 -j MASQUERADE
-A DOCKER -p tcp --dport 80 -j DNAT --to-destination 172.17.0.2:80

Host networking: No network namespace isolation, uses host stack directlyNone networking: Only loopback, no external connectivity

Key Takeaways

Namespace Types

8 namespace types isolate different kernel resources

PID 1 Matters

Container’s init process must reap zombies and handle signals

User Namespaces

Enable rootless containers by mapping UIDs

Network Isolation

veth pairs and bridges connect isolated network stacks

Next: Cgroups v1 & v2 →

Overview

Testing & Code Quality

Crash Courses

AI Engineering

Math for ML - Understanding Linear Algebra

Probability & Statistics for ML

Math for ML - Understanding Calculus

ML Mastery

Deep Learning Mastery

NestJS Mastery

Microservices Mastery

Low Level Design

OOP Concepts

SOLID Principles

Design Patterns

LLD Case Studies

System Design (HLD)

Senior Level (L5+/Staff)

HLD Case Studies

Engineering Fundamentals

DevOps & Operations

Azure Cloud Engineering

AWS Cloud

AWS Monitoring & Observability

AWS Security Services

AWS Serverless

AWS Operations

AWS Advanced

AWS Case Studies

GCP Cloud Engineering

DevOps Tools

Database Engineering

HIPAA Compliance Mastery

Operating Systems

Linux Internals

Distributed Systems

Networking Mastery

Build Your Own X

Go Lang Mastery

C Programming

Classic Research Papers

Distributed System Tools

​Namespaces Deep Dive

​What Are Namespaces?

​Namespace Types

​PID Namespace

​PID 1 Responsibilities

​PID Namespace Demo

​Network Namespace

​Creating Network Namespaces

​Mount Namespace

​Container Root Filesystem

​User Namespace

​User Namespace Demo

​Rootless Containers

​Creating Namespaces

​System Calls

​Namespace Files

​Container Implementation

​Minimal Container Runtime

​Lab Exercises

​Interview Questions

​Key Takeaways

Namespace Types

PID 1 Matters

User Namespaces

Network Isolation

Namespaces Deep Dive

What Are Namespaces?

Namespace Types

PID Namespace

PID 1 Responsibilities

PID Namespace Demo

Network Namespace

Creating Network Namespaces

Mount Namespace

Container Root Filesystem

User Namespace

User Namespace Demo

Rootless Containers

Creating Namespaces

System Calls

Namespace Files

Container Implementation

Minimal Container Runtime

Lab Exercises

Interview Questions

Key Takeaways