Chapter 2: Control Groups (cgroups)

While namespaces provide isolation, cgroups provide resource limits. Without cgroups, a container could consume all system resources. Let’s implement resource limiting!

Prerequisites: Chapter 1: Namespaces
Further Reading: Operating Systems: Resource Management
Time: 3-4 hours
Outcome: Containers with CPU, memory, and process limits

What Are Cgroups?

┌─────────────────────────────────────────────────────────────────────────────┐
│                         WITHOUT CGROUPS                                      │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   SYSTEM: 8 CPU cores, 16GB RAM                                             │
│                                                                              │
│   ┌─────────────────────────────────────────────────────────────────────┐  │
│   │                                                                      │  │
│   │   Container A        Container B        Container C                 │  │
│   │   ┌──────────┐      ┌──────────┐       ┌──────────┐                │  │
│   │   │ Uses 6   │      │ Uses 1.5 │       │ Uses 0.5 │                │  │
│   │   │ CPUs,    │      │ CPUs,    │       │ CPUs,    │                │  │
│   │   │ 14GB RAM │      │ 1GB RAM  │       │ 1GB RAM  │                │  │
│   │   └──────────┘      └──────────┘       └──────────┘                │  │
│   │                                                                      │  │
│   │   Container A is a "noisy neighbor" - starving others!              │  │
│   │                                                                      │  │
│   └─────────────────────────────────────────────────────────────────────┘  │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────────┐
│                           WITH CGROUPS                                       │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   SYSTEM: 8 CPU cores, 16GB RAM                                             │
│                                                                              │
│   ┌─────────────────────────────────────────────────────────────────────┐  │
│   │                                                                      │  │
│   │   Container A        Container B        Container C                 │  │
│   │   ┌──────────┐      ┌──────────┐       ┌──────────┐                │  │
│   │   │ LIMIT:   │      │ LIMIT:   │       │ LIMIT:   │                │  │
│   │   │ 2 CPUs   │      │ 2 CPUs   │       │ 2 CPUs   │                │  │
│   │   │ 4GB RAM  │      │ 4GB RAM  │       │ 4GB RAM  │                │  │
│   │   │ 100 procs│      │ 100 procs│       │ 100 procs│                │  │
│   │   └──────────┘      └──────────┘       └──────────┘                │  │
│   │                                                                      │  │
│   │   Fair resource allocation! Predictable performance!                │  │
│   │                                                                      │  │
│   └─────────────────────────────────────────────────────────────────────┘  │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Cgroup v2 Hierarchy

┌─────────────────────────────────────────────────────────────────────────────┐
│                      CGROUP V2 FILESYSTEM                                    │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   /sys/fs/cgroup/                         ← Cgroup root                     │
│   ├── cgroup.controllers                  ← Available controllers           │
│   ├── cgroup.subtree_control              ← Enabled controllers             │
│   ├── cpu.max                             ← Root CPU limits                 │
│   ├── memory.max                          ← Root memory limits              │
│   │                                                                         │
│   ├── minidocker/                         ← Our container group            │
│   │   ├── cgroup.controllers                                                │
│   │   ├── cgroup.procs                    ← PIDs in this group             │
│   │   ├── cpu.max                         ← CPU limit (quota period)       │
│   │   ├── cpu.weight                      ← CPU shares                     │
│   │   ├── memory.max                      ← Memory limit (bytes)           │
│   │   ├── memory.current                  ← Current memory usage           │
│   │   ├── pids.max                        ← Max processes                  │
│   │   └── pids.current                    ← Current process count          │
│   │                                                                         │
│   └── docker/                             ← Docker's groups                 │
│       ├── container-abc123/                                                 │
│       └── container-def456/                                                 │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Part 1: Cgroup Manager

src/main/java/com/minidocker/cgroup/CgroupManager.java

package com.minidocker.cgroup;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;

/**
 * Manages cgroup v2 resource limits.
 * 
 * Cgroups control:
 * - CPU: How much CPU time a container can use
 * - Memory: Maximum memory allocation
 * - PIDs: Maximum number of processes
 * - I/O: Disk bandwidth limits
 */
public class CgroupManager {
    
    private static final Path CGROUP_ROOT = Path.of("/sys/fs/cgroup");
    private static final String CGROUP_NAME = "minidocker";
    
    private final String containerId;
    private final Path cgroupPath;
    
    public CgroupManager(String containerId) {
        this.containerId = containerId;
        this.cgroupPath = CGROUP_ROOT.resolve(CGROUP_NAME).resolve(containerId);
    }
    
    /**
     * Creates the cgroup for this container.
     */
    public void create() throws IOException {
        // Create the cgroup directory
        Files.createDirectories(cgroupPath);
        System.out.println("✓ Created cgroup: " + cgroupPath);
        
        // Enable controllers in parent
        Path parentSubtreeControl = cgroupPath.getParent().resolve("cgroup.subtree_control");
        if (Files.exists(parentSubtreeControl)) {
            String controllers = "+cpu +memory +pids +io";
            Files.writeString(parentSubtreeControl, controllers);
            System.out.println("✓ Enabled controllers: cpu, memory, pids, io");
        }
    }
    
    /**
     * Sets CPU limit.
     * 
     * @param cpuPercent CPU percentage (100 = 1 core, 200 = 2 cores)
     */
    public void setCpuLimit(int cpuPercent) throws IOException {
        // cpu.max format: "$QUOTA $PERIOD"
        // QUOTA: microseconds of CPU time per PERIOD
        // PERIOD: typically 100000 (100ms)
        
        int period = 100000;  // 100ms
        int quota = period * cpuPercent / 100;
        
        Path cpuMaxPath = cgroupPath.resolve("cpu.max");
        String value = quota + " " + period;
        Files.writeString(cpuMaxPath, value);
        
        System.out.println("✓ CPU limit set: " + cpuPercent + "% (" + value + ")");
    }
    
    /**
     * Sets memory limit.
     * 
     * @param memoryBytes Maximum memory in bytes
     */
    public void setMemoryLimit(long memoryBytes) throws IOException {
        Path memoryMaxPath = cgroupPath.resolve("memory.max");
        Files.writeString(memoryMaxPath, String.valueOf(memoryBytes));
        
        // Also set swap limit to prevent swap usage
        Path memorySwapMaxPath = cgroupPath.resolve("memory.swap.max");
        if (Files.exists(memorySwapMaxPath)) {
            Files.writeString(memorySwapMaxPath, "0");
        }
        
        String humanReadable = formatBytes(memoryBytes);
        System.out.println("✓ Memory limit set: " + humanReadable);
    }
    
    /**
     * Sets process (PID) limit.
     * 
     * Prevents fork bombs!
     * 
     * @param maxPids Maximum number of processes
     */
    public void setPidsLimit(int maxPids) throws IOException {
        Path pidsMaxPath = cgroupPath.resolve("pids.max");
        Files.writeString(pidsMaxPath, String.valueOf(maxPids));
        
        System.out.println("✓ PIDs limit set: " + maxPids);
    }
    
    /**
     * Sets I/O bandwidth limits.
     * 
     * @param deviceMajorMinor Device ID (e.g., "8:0" for /dev/sda)
     * @param rbps Read bytes per second
     * @param wbps Write bytes per second
     */
    public void setIOLimit(String deviceMajorMinor, long rbps, long wbps) throws IOException {
        Path ioMaxPath = cgroupPath.resolve("io.max");
        String value = deviceMajorMinor + " rbps=" + rbps + " wbps=" + wbps;
        Files.writeString(ioMaxPath, value);
        
        System.out.println("✓ I/O limit set for " + deviceMajorMinor + 
                          ": read=" + formatBytes(rbps) + "/s, write=" + formatBytes(wbps) + "/s");
    }
    
    /**
     * Adds a process to this cgroup.
     * 
     * @param pid Process ID to add
     */
    public void addProcess(int pid) throws IOException {
        Path procsPath = cgroupPath.resolve("cgroup.procs");
        Files.writeString(procsPath, String.valueOf(pid));
        
        System.out.println("✓ Added PID " + pid + " to cgroup");
    }
    
    /**
     * Adds the current process to this cgroup.
     */
    public void addCurrentProcess() throws IOException {
        addProcess(ProcessHandle.current().pid());
    }
    
    /**
     * Gets current resource usage.
     */
    public ResourceUsage getUsage() throws IOException {
        long memoryBytes = 0;
        int pids = 0;
        long cpuUsage = 0;
        
        Path memoryCurrent = cgroupPath.resolve("memory.current");
        if (Files.exists(memoryCurrent)) {
            memoryBytes = Long.parseLong(Files.readString(memoryCurrent).trim());
        }
        
        Path pidsCurrent = cgroupPath.resolve("pids.current");
        if (Files.exists(pidsCurrent)) {
            pids = Integer.parseInt(Files.readString(pidsCurrent).trim());
        }
        
        Path cpuStat = cgroupPath.resolve("cpu.stat");
        if (Files.exists(cpuStat)) {
            String stats = Files.readString(cpuStat);
            for (String line : stats.split("\n")) {
                if (line.startsWith("usage_usec")) {
                    cpuUsage = Long.parseLong(line.split(" ")[1]);
                }
            }
        }
        
        return new ResourceUsage(memoryBytes, pids, cpuUsage);
    }
    
    /**
     * Removes the cgroup (cleanup).
     */
    public void destroy() throws IOException {
        // First, kill all processes in the cgroup
        Path procsPath = cgroupPath.resolve("cgroup.procs");
        if (Files.exists(procsPath)) {
            String procs = Files.readString(procsPath);
            for (String pid : procs.split("\n")) {
                if (!pid.isEmpty()) {
                    ProcessHandle.of(Long.parseLong(pid))
                                 .ifPresent(ProcessHandle::destroy);
                }
            }
        }
        
        // Then remove the directory
        Files.deleteIfExists(cgroupPath);
        System.out.println("✓ Destroyed cgroup: " + cgroupPath);
    }
    
    private static String formatBytes(long bytes) {
        if (bytes < 1024) return bytes + " B";
        if (bytes < 1024 * 1024) return (bytes / 1024) + " KB";
        if (bytes < 1024 * 1024 * 1024) return (bytes / (1024 * 1024)) + " MB";
        return (bytes / (1024 * 1024 * 1024)) + " GB";
    }
    
    /**
     * Resource usage statistics.
     */
    public record ResourceUsage(long memoryBytes, int pids, long cpuUsageMicros) {
        @Override
        public String toString() {
            return String.format("Memory: %s, PIDs: %d, CPU: %.2fms",
                formatBytes(memoryBytes), pids, cpuUsageMicros / 1000.0);
        }
    }
}

Part 2: Resource Limits Configuration

src/main/java/com/minidocker/cgroup/ResourceLimits.java

package com.minidocker.cgroup;

/**
 * Container resource limits configuration.
 * 
 * Similar to Docker's --cpus, --memory, --pids-limit flags.
 */
public class ResourceLimits {
    
    private int cpuPercent = 100;           // 100 = 1 core
    private long memoryBytes = 512 * 1024 * 1024;  // 512MB default
    private int maxPids = 100;              // Max processes
    
    public static ResourceLimits defaults() {
        return new ResourceLimits();
    }
    
    public ResourceLimits withCpu(double cpus) {
        this.cpuPercent = (int) (cpus * 100);
        return this;
    }
    
    public ResourceLimits withMemory(String memory) {
        this.memoryBytes = parseMemory(memory);
        return this;
    }
    
    public ResourceLimits withMemoryBytes(long bytes) {
        this.memoryBytes = bytes;
        return this;
    }
    
    public ResourceLimits withMaxPids(int pids) {
        this.maxPids = pids;
        return this;
    }
    
    public int getCpuPercent() { return cpuPercent; }
    public long getMemoryBytes() { return memoryBytes; }
    public int getMaxPids() { return maxPids; }
    
    private static long parseMemory(String memory) {
        memory = memory.trim().toUpperCase();
        
        long multiplier = 1;
        if (memory.endsWith("K")) {
            multiplier = 1024;
            memory = memory.substring(0, memory.length() - 1);
        } else if (memory.endsWith("M")) {
            multiplier = 1024 * 1024;
            memory = memory.substring(0, memory.length() - 1);
        } else if (memory.endsWith("G")) {
            multiplier = 1024 * 1024 * 1024;
            memory = memory.substring(0, memory.length() - 1);
        }
        
        return Long.parseLong(memory) * multiplier;
    }
    
    @Override
    public String toString() {
        return String.format("ResourceLimits{cpu=%.2f cores, memory=%dMB, pids=%d}",
            cpuPercent / 100.0,
            memoryBytes / (1024 * 1024),
            maxPids);
    }
}

Part 3: Integrating with Container

src/main/java/com/minidocker/Container.java

package com.minidocker;

import com.minidocker.cgroup.CgroupManager;
import com.minidocker.cgroup.ResourceLimits;
import com.minidocker.linux.LibC;
import com.minidocker.namespace.NamespaceManager;
import com.minidocker.namespace.NamespaceOptions;

import java.util.UUID;

/**
 * Container with namespace isolation AND resource limits.
 */
public class Container {
    
    private final LibC libc = LibC.INSTANCE;
    private final NamespaceManager namespaces = new NamespaceManager();
    
    private final String id;
    private final String hostname;
    private final String[] command;
    private final ResourceLimits limits;
    
    private CgroupManager cgroup;
    
    public Container(String hostname, String[] command, ResourceLimits limits) {
        this.id = UUID.randomUUID().toString().substring(0, 12);
        this.hostname = hostname;
        this.command = command;
        this.limits = limits;
    }
    
    public void run() throws Exception {
        System.out.println("=== Starting Container " + id + " ===");
        System.out.println("Hostname: " + hostname);
        System.out.println("Limits: " + limits);
        System.out.println();
        
        // Step 1: Create cgroup
        cgroup = new CgroupManager(id);
        cgroup.create();
        
        // Step 2: Set resource limits
        cgroup.setCpuLimit(limits.getCpuPercent());
        cgroup.setMemoryLimit(limits.getMemoryBytes());
        cgroup.setPidsLimit(limits.getMaxPids());
        
        // Step 3: Create namespaces
        NamespaceOptions nsOptions = NamespaceOptions.builder()
            .withPid()
            .withMount()
            .withUts()
            .withIpc()
            .build();
        
        namespaces.createNamespaces(nsOptions);
        namespaces.setHostname(hostname);
        
        // Step 4: Fork
        int pid = libc.fork();
        
        if (pid == 0) {
            // Child: add to cgroup and run
            cgroup.addCurrentProcess();
            runContainerInit();
        } else if (pid > 0) {
            // Parent: wait and monitor
            monitorContainer(pid);
        } else {
            throw new RuntimeException("Fork failed");
        }
    }
    
    private void monitorContainer(int childPid) {
        // Start monitoring thread
        Thread monitor = new Thread(() -> {
            try {
                while (true) {
                    Thread.sleep(1000);
                    CgroupManager.ResourceUsage usage = cgroup.getUsage();
                    System.out.println("[Monitor] " + usage);
                }
            } catch (Exception e) {
                // Container exited
            }
        });
        monitor.setDaemon(true);
        monitor.start();
        
        // Wait for child
        int[] status = new int[1];
        libc.waitpid(childPid, status, 0);
        
        System.out.println("Container exited with status: " + status[0]);
        
        // Cleanup
        try {
            cgroup.destroy();
        } catch (Exception e) {
            System.err.println("Failed to cleanup cgroup: " + e.getMessage());
        }
    }
    
    private void runContainerInit() {
        try {
            System.out.println("\n=== Container Init (PID " + libc.getpid() + ") ===");
            
            if (command.length > 0) {
                String[] argv = new String[command.length + 1];
                System.arraycopy(command, 0, argv, 0, command.length);
                argv[command.length] = null;
                
                libc.execv(command[0], argv);
                System.err.println("Failed to execute: " + command[0]);
                System.exit(1);
            }
        } catch (Exception e) {
            System.err.println("Container init failed: " + e.getMessage());
            System.exit(1);
        }
    }
    
    public static void main(String[] args) {
        if (args.length < 2) {
            System.out.println("Usage: java Container <hostname> <command...>");
            System.out.println("       --cpus=<n>     CPU cores (e.g., 0.5, 2)");
            System.out.println("       --memory=<n>   Memory limit (e.g., 256M, 1G)");
            System.out.println("       --pids=<n>     Max processes");
            System.exit(1);
        }
        
        ResourceLimits limits = ResourceLimits.defaults();
        String hostname = null;
        String[] command = null;
        
        // Parse arguments
        int cmdStart = 0;
        for (int i = 0; i < args.length; i++) {
            if (args[i].startsWith("--cpus=")) {
                limits.withCpu(Double.parseDouble(args[i].substring(7)));
            } else if (args[i].startsWith("--memory=")) {
                limits.withMemory(args[i].substring(9));
            } else if (args[i].startsWith("--pids=")) {
                limits.withMaxPids(Integer.parseInt(args[i].substring(7)));
            } else if (hostname == null) {
                hostname = args[i];
            } else {
                cmdStart = i;
                break;
            }
        }
        
        if (hostname != null && cmdStart > 0) {
            command = new String[args.length - cmdStart];
            System.arraycopy(args, cmdStart, command, 0, command.length);
        }
        
        try {
            Container container = new Container(hostname, command, limits);
            container.run();
        } catch (Exception e) {
            System.err.println("Error: " + e.getMessage());
            e.printStackTrace();
            System.exit(1);
        }
    }
}

Part 4: Testing Resource Limits

Testing CPU Limits

# Run a CPU-intensive workload with 50% CPU limit
sudo java Container --cpus=0.5 testhost /bin/sh -c "while true; do :; done"

# In another terminal, observe CPU usage (should be ~50%)
top -p $(pgrep -f "while true")

Testing Memory Limits

// MemoryHog.java - For testing memory limits
public class MemoryHog {
    public static void main(String[] args) throws Exception {
        List<byte[]> allocations = new ArrayList<>();
        
        while (true) {
            // Allocate 10MB chunks
            byte[] chunk = new byte[10 * 1024 * 1024];
            allocations.add(chunk);
            System.out.println("Allocated: " + (allocations.size() * 10) + "MB");
            Thread.sleep(100);
        }
        // Container will be OOM-killed when hitting memory limit!
    }
}

Testing PID Limits

# Fork bomb protection!
# Without pids limit, this would crash the system:
:(){ :|:& };:

# With pids limit set to 100, the fork bomb is contained

Understanding Cgroup Controllers

┌─────────────────────────────────────────────────────────────────────────────┐
│                      CGROUP CONTROLLERS                                      │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   CONTROLLER    FILES               DESCRIPTION                             │
│   ──────────    ─────               ───────────                             │
│                                                                              │
│   cpu           cpu.max             Bandwidth limit (quota/period)          │
│                 cpu.weight          Relative CPU shares (1-10000)           │
│                 cpu.stat            Usage statistics                        │
│                                                                              │
│   memory        memory.max          Hard limit (OOM kill if exceeded)       │
│                 memory.high         Soft limit (throttling starts)          │
│                 memory.current      Current usage                           │
│                 memory.swap.max     Swap limit                              │
│                 memory.stat         Detailed statistics                     │
│                                                                              │
│   pids          pids.max            Maximum processes                       │
│                 pids.current        Current process count                   │
│                                                                              │
│   io            io.max              Bandwidth/IOPS limits per device        │
│                 io.weight           Relative I/O priority                   │
│                 io.stat             I/O statistics                          │
│                                                                              │
│   cpuset        cpuset.cpus         Allowed CPU cores (e.g., "0-3")         │
│                 cpuset.mems         Allowed NUMA nodes                      │
│                                                                              │
│   hugetlb       hugetlb.X.max       Huge page limits                        │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Exercises

Exercise 1: Implement I/O Throttling

Add disk I/O limits:

// 1. Find the device major:minor for the disk
// cat /sys/block/sda/dev

// 2. Set io.max
// "8:0 rbps=1048576 wbps=1048576" = 1MB/s read/write

// 3. Test with dd:
// dd if=/dev/zero of=/tmp/test bs=1M count=100

Exercise 2: Implement CPU Pinning

Pin container to specific CPU cores:

// Use cpuset controller
// Write to cpuset.cpus: "0,2" (only use cores 0 and 2)
// Write to cpuset.mems: "0" (NUMA node 0)

// This ensures containers don't share CPU cache
// Great for latency-sensitive workloads

Exercise 3: Add Resource Monitoring

Implement real-time resource monitoring:

// 1. Read cpu.stat periodically for CPU usage
// 2. Read memory.current and memory.stat for memory
// 3. Read io.stat for disk I/O
// 4. Calculate deltas to get rates
// 5. Display like 'docker stats'

Key Takeaways

Hierarchical

Cgroups form a tree - children inherit parent limits

Controller-Based

Different controllers for CPU, memory, I/O, PIDs

Kernel Enforcement

Limits are enforced by kernel, not by honor system

OOM Killer

Exceeding memory limit triggers OOM killer

What’s Next?

In Chapter 3: Filesystem, we’ll implement:

Overlay filesystems
Copy-on-write layers
Container root filesystem setup

Next: Filesystem

Build the layered filesystem

Overview

Testing & Code Quality

Crash Courses

AI Engineering

Math for ML - Understanding Linear Algebra

Probability & Statistics for ML

Math for ML - Understanding Calculus

ML Mastery

Deep Learning Mastery

NestJS Mastery

Microservices Mastery

Low Level Design

OOP Concepts

SOLID Principles

Design Patterns

LLD Case Studies

System Design (HLD)

Senior Level (L5+/Staff)

HLD Case Studies

Engineering Fundamentals

DevOps & Operations

Azure Cloud Engineering

AWS Cloud

AWS Monitoring & Observability

AWS Security Services

AWS Serverless

AWS Operations

AWS Advanced

AWS Case Studies

GCP Cloud Engineering

DevOps Tools

Database Engineering

HIPAA Compliance Mastery

Operating Systems

Linux Internals

Distributed Systems

Networking Mastery

Build Your Own X

Go Lang Mastery

C Programming

Classic Research Papers

Distributed System Tools

​Chapter 2: Control Groups (cgroups)

​What Are Cgroups?

​Cgroup v2 Hierarchy

​Part 1: Cgroup Manager

​Part 2: Resource Limits Configuration

​Part 3: Integrating with Container

​Part 4: Testing Resource Limits

​Testing CPU Limits

​Testing Memory Limits

​Testing PID Limits

​Understanding Cgroup Controllers

​Exercises

​Key Takeaways

Hierarchical

Controller-Based

Kernel Enforcement

OOM Killer

​What’s Next?

Next: Filesystem

Chapter 2: Control Groups (cgroups)

What Are Cgroups?

Cgroup v2 Hierarchy

Part 1: Cgroup Manager

Part 2: Resource Limits Configuration

Part 3: Integrating with Container

Part 4: Testing Resource Limits

Testing CPU Limits

Testing Memory Limits

Testing PID Limits

Understanding Cgroup Controllers

Exercises

Key Takeaways

What’s Next?