Skip to main content

Containers & Virtualization

Virtualization and containerization are foundational technologies for modern cloud computing. Understanding these concepts is essential for senior engineers designing scalable, isolated, and efficient systems.
Interview Frequency: Very High
Key Topics: Namespaces, cgroups, Docker internals, hypervisors
Time to Master: 12-15 hours

Virtualization Overview

Virtualization Spectrum

Linux Namespaces

Namespaces provide isolation of global system resources. Each namespace type isolates a different aspect of the system.

Namespace Types

Linux Namespaces

PID Namespace

PID Namespace Isolation

Network Namespace

Network Namespace Isolation

Mount Namespace

# Each container has its own mount namespace
# Container sees only its own filesystem tree

# View mount namespace
ls -la /proc/self/ns/mnt

# Create new mount namespace
unshare --mount /bin/bash

# In new namespace, mounts are private
mount -t tmpfs tmpfs /mnt/test  # Only visible in this namespace

Creating Namespaces

#include <sched.h>
#include <unistd.h>

int main() {
    // Create new namespaces
    // CLONE_NEWPID | CLONE_NEWNET | CLONE_NEWNS | CLONE_NEWUTS
    
    if (unshare(CLONE_NEWPID | CLONE_NEWNET | CLONE_NEWNS) == -1) {
        perror("unshare");
        return 1;
    }
    
    // Fork to become PID 1 in new namespace
    if (fork() == 0) {
        // Child is now PID 1 in new PID namespace
        sethostname("container", 9);
        execl("/bin/bash", "bash", NULL);
    }
    
    return 0;
}
# Command-line namespace creation
# Create and enter new namespaces
unshare --pid --net --mount --uts --fork /bin/bash

# Enter existing namespace
nsenter --target <PID> --pid --net --mount /bin/bash

# List namespaces
lsns

# View process's namespaces
ls -la /proc/<PID>/ns/

Using cgroups

# cgroup v2 examples

# Create a cgroup
mkdir /sys/fs/cgroup/mygroup

# Enable controllers for children
echo "+cpu +memory +io" > /sys/fs/cgroup/cgroup.subtree_control

# Set CPU limit (100ms per 1 second = 10% of one CPU)
echo "100000 1000000" > /sys/fs/cgroup/mygroup/cpu.max

# Set memory limit (500MB)
echo "500M" > /sys/fs/cgroup/mygroup/memory.max

# Add process to cgroup
echo $$ > /sys/fs/cgroup/mygroup/cgroup.procs

# View current usage
cat /sys/fs/cgroup/mygroup/memory.current
cat /sys/fs/cgroup/mygroup/cpu.stat

systemd and cgroups

# systemd uses cgroups extensively
systemctl status docker.service

# View cgroup for a service
systemd-cgls

# Resource limits in service file
[Service]
CPUQuota=50%           # Limit to 50% of one CPU
MemoryMax=1G           # Limit to 1GB RAM
IOWeight=100           # I/O priority (1-10000)
TasksMax=100           # Max processes

# View resource usage
systemctl show docker.service --property=CPUUsageNSec
systemctl show docker.service --property=MemoryCurrent

# Real-time resource view
systemd-cgtop

Container Architecture

How Containers Work

Container Architecture

Container Runtime Stack

Container Runtime Stack

Building a Container from Scratch

// Minimal container implementation
#define _GNU_SOURCE
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mount.h>
#include <sys/wait.h>

#define STACK_SIZE (1024 * 1024)
static char child_stack[STACK_SIZE];

int child_fn(void *arg) {
    // Set hostname
    sethostname("container", 9);
    
    // Create new root filesystem
    chroot("/path/to/rootfs");
    chdir("/");
    
    // Mount proc (needed for ps, top, etc.)
    mount("proc", "/proc", "proc", 0, NULL);
    
    // Execute shell
    char *args[] = {"/bin/sh", NULL};
    execv("/bin/sh", args);
    
    return 1;
}

int main() {
    int flags = CLONE_NEWPID    // New PID namespace
              | CLONE_NEWNET    // New network namespace
              | CLONE_NEWNS     // New mount namespace
              | CLONE_NEWUTS    // New UTS namespace
              | CLONE_NEWIPC;   // New IPC namespace
    
    pid_t child_pid = clone(child_fn, 
                            child_stack + STACK_SIZE,
                            flags | SIGCHLD, 
                            NULL);
    
    if (child_pid == -1) {
        perror("clone");
        exit(1);
    }
    
    waitpid(child_pid, NULL, 0);
    return 0;
}

Virtual Machines

Hypervisor Types

Hypervisor Types

KVM (Kernel-based Virtual Machine)

KVM Architecture

Hardware Virtualization Extensions

CPU Virtualization

Memory Virtualization

Memory Virtualization

Containers vs VMs

Containers vs VMs

When to Use Each

Use CaseRecommendation
MicroservicesContainers
Dev/Test environmentsContainers
CI/CD pipelinesContainers
Multi-tenant with untrusted codeVMs
Running Windows on LinuxVMs
Legacy applicationsVMs
Maximum isolationVMs
Serverless functionsContainers or microVMs

Hybrid: MicroVMs and Kata Containers

MicroVM Architecture

Interview Questions

Answer: Containers use multiple Linux kernel features:
  1. Namespaces - Isolate system resources:
    • PID: Separate process tree
    • Network: Own network stack
    • Mount: Own filesystem view
    • User: Separate UID/GID mapping
  2. cgroups - Limit resources:
    • CPU time
    • Memory
    • I/O bandwidth
    • Number of processes
  3. Seccomp - Filter system calls
  4. Capabilities - Fine-grained privileges
  5. OverlayFS - Layered filesystem
Key differences:
AspectContainerVM
IsolationProcess-level (shared kernel)Hardware-level (own kernel)
Boot timeSecondsMinutes
OverheadMinimal2-10%
SizeMBsGBs
SecurityWeaker isolationStronger isolation
Containers share the host kernel; VMs have their own kernel. Use VMs when you need different OS or stronger isolation. Use containers for fast, lightweight deployment.
Answer:Docker uses network namespaces and virtual ethernet (veth) pairs:
  1. Bridge mode (default):
    • docker0 bridge on host
    • Each container gets a veth pair
    • One end in container, one on bridge
    • NAT for external access
  2. Host mode:
    • Container shares host network namespace
    • No isolation, but no overhead
  3. Overlay:
    • Multi-host networking
    • VXLAN encapsulation
    • Used by Docker Swarm/Kubernetes
Answer:cgroups (control groups) limit and account for resource usage:
  1. Hierarchy: Tree structure of process groups
  2. Controllers: cpu, memory, io, pids, etc.
  3. Limits: Set via pseudo-filesystem (/sys/fs/cgroup)
Example: Limit container to 1 CPU and 512MB RAM:
echo "100000 100000" > /sys/fs/cgroup/mycontainer/cpu.max
echo "512M" > /sys/fs/cgroup/mycontainer/memory.max
cgroup v2 unified hierarchy is now preferred over v1’s multiple hierarchies.

Summary

Containers vs Virtualization Summary