> ## Documentation Index > Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt > Use this file to discover all available pages before exploring further. # Chapter 2: Control Groups (cgroups) > Implement resource limits using Linux cgroups for CPU, memory, and process control # Chapter 2: Control Groups (cgroups) While namespaces provide **isolation** (what a container can *see*), cgroups provide **resource limits** (what a container can *use*). Without cgroups, a container could consume all system resources and starve everything else on the host. Let's implement resource limiting! Here is a real-world analogy: namespaces are like giving each tenant in an apartment building their own mailbox and doorbell, so they don't interfere with each other. Cgroups are like the lease agreement that limits how much water and electricity each tenant can use. Without the lease, one tenant could run every faucet at full blast and leave the rest of the building dry. In cloud computing, cgroups are what make multi-tenant systems safe and fair. Every major cloud provider (AWS, GCP, Azure) uses cgroups under the hood to enforce the resource limits you pay for. **Prerequisites**: [Chapter 1: Namespaces](/courses/build-your-own-x/docker-1-namespaces)\ **Further Reading**: [Operating Systems: Resource Management](/operating-systems/scheduling)\ **Time**: 3-4 hours\ **Outcome**: Containers with CPU, memory, and process limits *** ## What Are Cgroups? ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ WITHOUT CGROUPS │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ SYSTEM: 8 CPU cores, 16GB RAM │ │ │ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │ │ │ │ │ │ Container A Container B Container C │ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ │ │ Uses 6 │ │ Uses 1.5 │ │ Uses 0.5 │ │ │ │ │ │ CPUs, │ │ CPUs, │ │ CPUs, │ │ │ │ │ │ 14GB RAM │ │ 1GB RAM │ │ 1GB RAM │ │ │ │ │ └──────────┘ └──────────┘ └──────────┘ │ │ │ │ │ │ │ │ Container A is a "noisy neighbor" - starving others! │ │ │ │ │ │ │ └─────────────────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────────────────┐ │ WITH CGROUPS │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ SYSTEM: 8 CPU cores, 16GB RAM │ │ │ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │ │ │ │ │ │ Container A Container B Container C │ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ │ │ LIMIT: │ │ LIMIT: │ │ LIMIT: │ │ │ │ │ │ 2 CPUs │ │ 2 CPUs │ │ 2 CPUs │ │ │ │ │ │ 4GB RAM │ │ 4GB RAM │ │ 4GB RAM │ │ │ │ │ │ 100 procs│ │ 100 procs│ │ 100 procs│ │ │ │ │ └──────────┘ └──────────┘ └──────────┘ │ │ │ │ │ │ │ │ Fair resource allocation! Predictable performance! │ │ │ │ │ │ │ └─────────────────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` *** ## Cgroup v2 Hierarchy ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ CGROUP V2 FILESYSTEM │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ /sys/fs/cgroup/ ← Cgroup root │ │ ├── cgroup.controllers ← Available controllers │ │ ├── cgroup.subtree_control ← Enabled controllers │ │ ├── cpu.max ← Root CPU limits │ │ ├── memory.max ← Root memory limits │ │ │ │ │ ├── minidocker/ ← Our container group │ │ │ ├── cgroup.controllers │ │ │ ├── cgroup.procs ← PIDs in this group │ │ │ ├── cpu.max ← CPU limit (quota period) │ │ │ ├── cpu.weight ← CPU shares │ │ │ ├── memory.max ← Memory limit (bytes) │ │ │ ├── memory.current ← Current memory usage │ │ │ ├── pids.max ← Max processes │ │ │ └── pids.current ← Current process count │ │ │ │ │ └── docker/ ← Docker's groups │ │ ├── container-abc123/ │ │ └── container-def456/ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` *** ## Part 1: Cgroup Manager ```java src/main/java/com/minidocker/cgroup/CgroupManager.java theme={null} package com.minidocker.cgroup; import java.io.IOException; import java.nio.file.Files; import java.nio.file.Path; /** * Manages cgroup v2 resource limits. * * Cgroups control: * - CPU: How much CPU time a container can use * - Memory: Maximum memory allocation * - PIDs: Maximum number of processes * - I/O: Disk bandwidth limits */ public class CgroupManager { private static final Path CGROUP_ROOT = Path.of("/sys/fs/cgroup"); private static final String CGROUP_NAME = "minidocker"; private final String containerId; private final Path cgroupPath; public CgroupManager(String containerId) { this.containerId = containerId; this.cgroupPath = CGROUP_ROOT.resolve(CGROUP_NAME).resolve(containerId); } /** * Creates the cgroup for this container. */ public void create() throws IOException { // Create the cgroup directory Files.createDirectories(cgroupPath); System.out.println("✓ Created cgroup: " + cgroupPath); // Enable controllers in parent Path parentSubtreeControl = cgroupPath.getParent().resolve("cgroup.subtree_control"); if (Files.exists(parentSubtreeControl)) { String controllers = "+cpu +memory +pids +io"; Files.writeString(parentSubtreeControl, controllers); System.out.println("✓ Enabled controllers: cpu, memory, pids, io"); } } /** * Sets CPU limit. * * @param cpuPercent CPU percentage (100 = 1 core, 200 = 2 cores) * * How this works under the hood: the kernel scheduler checks cpu.max * every PERIOD microseconds. If the cgroup has consumed more than QUOTA * microseconds of CPU time in that period, all processes in the group are * throttled (paused) until the next period begins. So "100 100000" means * "use at most 100ms of CPU every 100ms" -- exactly one core. "200 100000" * means "200ms per 100ms" -- two cores. * * Common pitfall: setting QUOTA to 0 does NOT mean "unlimited" -- it means * "no CPU at all." Use "max 100000" for unlimited CPU. */ public void setCpuLimit(int cpuPercent) throws IOException { // cpu.max format: "$QUOTA $PERIOD" // QUOTA: microseconds of CPU time per PERIOD // PERIOD: typically 100000 (100ms) int period = 100000; // 100ms int quota = period * cpuPercent / 100; Path cpuMaxPath = cgroupPath.resolve("cpu.max"); String value = quota + " " + period; Files.writeString(cpuMaxPath, value); System.out.println("✓ CPU limit set: " + cpuPercent + "% (" + value + ")"); } /** * Sets memory limit. * * @param memoryBytes Maximum memory in bytes */ public void setMemoryLimit(long memoryBytes) throws IOException { Path memoryMaxPath = cgroupPath.resolve("memory.max"); Files.writeString(memoryMaxPath, String.valueOf(memoryBytes)); // Also set swap limit to prevent swap usage Path memorySwapMaxPath = cgroupPath.resolve("memory.swap.max"); if (Files.exists(memorySwapMaxPath)) { Files.writeString(memorySwapMaxPath, "0"); } String humanReadable = formatBytes(memoryBytes); System.out.println("✓ Memory limit set: " + humanReadable); } /** * Sets process (PID) limit. * * This is your defense against fork bombs -- a malicious or buggy process * that recursively spawns children until the system runs out of PIDs or * memory. Without this limit, a single container could crash the entire * host by exhausting the kernel's process table. A typical default for * Docker is 4096 PIDs per container. * * @param maxPids Maximum number of processes */ public void setPidsLimit(int maxPids) throws IOException { Path pidsMaxPath = cgroupPath.resolve("pids.max"); Files.writeString(pidsMaxPath, String.valueOf(maxPids)); System.out.println("✓ PIDs limit set: " + maxPids); } /** * Sets I/O bandwidth limits. * * @param deviceMajorMinor Device ID (e.g., "8:0" for /dev/sda) * @param rbps Read bytes per second * @param wbps Write bytes per second */ public void setIOLimit(String deviceMajorMinor, long rbps, long wbps) throws IOException { Path ioMaxPath = cgroupPath.resolve("io.max"); String value = deviceMajorMinor + " rbps=" + rbps + " wbps=" + wbps; Files.writeString(ioMaxPath, value); System.out.println("✓ I/O limit set for " + deviceMajorMinor + ": read=" + formatBytes(rbps) + "/s, write=" + formatBytes(wbps) + "/s"); } /** * Adds a process to this cgroup. * * @param pid Process ID to add */ public void addProcess(int pid) throws IOException { Path procsPath = cgroupPath.resolve("cgroup.procs"); Files.writeString(procsPath, String.valueOf(pid)); System.out.println("✓ Added PID " + pid + " to cgroup"); } /** * Adds the current process to this cgroup. */ public void addCurrentProcess() throws IOException { addProcess(ProcessHandle.current().pid()); } /** * Gets current resource usage. */ public ResourceUsage getUsage() throws IOException { long memoryBytes = 0; int pids = 0; long cpuUsage = 0; Path memoryCurrent = cgroupPath.resolve("memory.current"); if (Files.exists(memoryCurrent)) { memoryBytes = Long.parseLong(Files.readString(memoryCurrent).trim()); } Path pidsCurrent = cgroupPath.resolve("pids.current"); if (Files.exists(pidsCurrent)) { pids = Integer.parseInt(Files.readString(pidsCurrent).trim()); } Path cpuStat = cgroupPath.resolve("cpu.stat"); if (Files.exists(cpuStat)) { String stats = Files.readString(cpuStat); for (String line : stats.split("\n")) { if (line.startsWith("usage_usec")) { cpuUsage = Long.parseLong(line.split(" ")[1]); } } } return new ResourceUsage(memoryBytes, pids, cpuUsage); } /** * Removes the cgroup (cleanup). */ public void destroy() throws IOException { // First, kill all processes in the cgroup Path procsPath = cgroupPath.resolve("cgroup.procs"); if (Files.exists(procsPath)) { String procs = Files.readString(procsPath); for (String pid : procs.split("\n")) { if (!pid.isEmpty()) { ProcessHandle.of(Long.parseLong(pid)) .ifPresent(ProcessHandle::destroy); } } } // Then remove the directory Files.deleteIfExists(cgroupPath); System.out.println("✓ Destroyed cgroup: " + cgroupPath); } private static String formatBytes(long bytes) { if (bytes < 1024) return bytes + " B"; if (bytes < 1024 * 1024) return (bytes / 1024) + " KB"; if (bytes < 1024 * 1024 * 1024) return (bytes / (1024 * 1024)) + " MB"; return (bytes / (1024 * 1024 * 1024)) + " GB"; } /** * Resource usage statistics. */ public record ResourceUsage(long memoryBytes, int pids, long cpuUsageMicros) { @Override public String toString() { return String.format("Memory: %s, PIDs: %d, CPU: %.2fms", formatBytes(memoryBytes), pids, cpuUsageMicros / 1000.0); } } } ``` *** ## Part 2: Resource Limits Configuration ```java src/main/java/com/minidocker/cgroup/ResourceLimits.java theme={null} package com.minidocker.cgroup; /** * Container resource limits configuration. * * Similar to Docker's --cpus, --memory, --pids-limit flags. */ public class ResourceLimits { private int cpuPercent = 100; // 100 = 1 core private long memoryBytes = 512 * 1024 * 1024; // 512MB default private int maxPids = 100; // Max processes public static ResourceLimits defaults() { return new ResourceLimits(); } public ResourceLimits withCpu(double cpus) { this.cpuPercent = (int) (cpus * 100); return this; } public ResourceLimits withMemory(String memory) { this.memoryBytes = parseMemory(memory); return this; } public ResourceLimits withMemoryBytes(long bytes) { this.memoryBytes = bytes; return this; } public ResourceLimits withMaxPids(int pids) { this.maxPids = pids; return this; } public int getCpuPercent() { return cpuPercent; } public long getMemoryBytes() { return memoryBytes; } public int getMaxPids() { return maxPids; } private static long parseMemory(String memory) { memory = memory.trim().toUpperCase(); long multiplier = 1; if (memory.endsWith("K")) { multiplier = 1024; memory = memory.substring(0, memory.length() - 1); } else if (memory.endsWith("M")) { multiplier = 1024 * 1024; memory = memory.substring(0, memory.length() - 1); } else if (memory.endsWith("G")) { multiplier = 1024 * 1024 * 1024; memory = memory.substring(0, memory.length() - 1); } return Long.parseLong(memory) * multiplier; } @Override public String toString() { return String.format("ResourceLimits{cpu=%.2f cores, memory=%dMB, pids=%d}", cpuPercent / 100.0, memoryBytes / (1024 * 1024), maxPids); } } ``` *** ## Part 3: Integrating with Container ```java src/main/java/com/minidocker/Container.java theme={null} package com.minidocker; import com.minidocker.cgroup.CgroupManager; import com.minidocker.cgroup.ResourceLimits; import com.minidocker.linux.LibC; import com.minidocker.namespace.NamespaceManager; import com.minidocker.namespace.NamespaceOptions; import java.util.UUID; /** * Container with namespace isolation AND resource limits. */ public class Container { private final LibC libc = LibC.INSTANCE; private final NamespaceManager namespaces = new NamespaceManager(); private final String id; private final String hostname; private final String[] command; private final ResourceLimits limits; private CgroupManager cgroup; public Container(String hostname, String[] command, ResourceLimits limits) { this.id = UUID.randomUUID().toString().substring(0, 12); this.hostname = hostname; this.command = command; this.limits = limits; } public void run() throws Exception { System.out.println("=== Starting Container " + id + " ==="); System.out.println("Hostname: " + hostname); System.out.println("Limits: " + limits); System.out.println(); // Step 1: Create cgroup cgroup = new CgroupManager(id); cgroup.create(); // Step 2: Set resource limits cgroup.setCpuLimit(limits.getCpuPercent()); cgroup.setMemoryLimit(limits.getMemoryBytes()); cgroup.setPidsLimit(limits.getMaxPids()); // Step 3: Create namespaces NamespaceOptions nsOptions = NamespaceOptions.builder() .withPid() .withMount() .withUts() .withIpc() .build(); namespaces.createNamespaces(nsOptions); namespaces.setHostname(hostname); // Step 4: Fork int pid = libc.fork(); if (pid == 0) { // Child: add to cgroup and run cgroup.addCurrentProcess(); runContainerInit(); } else if (pid > 0) { // Parent: wait and monitor monitorContainer(pid); } else { throw new RuntimeException("Fork failed"); } } private void monitorContainer(int childPid) { // Start monitoring thread Thread monitor = new Thread(() -> { try { while (true) { Thread.sleep(1000); CgroupManager.ResourceUsage usage = cgroup.getUsage(); System.out.println("[Monitor] " + usage); } } catch (Exception e) { // Container exited } }); monitor.setDaemon(true); monitor.start(); // Wait for child int[] status = new int[1]; libc.waitpid(childPid, status, 0); System.out.println("Container exited with status: " + status[0]); // Cleanup try { cgroup.destroy(); } catch (Exception e) { System.err.println("Failed to cleanup cgroup: " + e.getMessage()); } } private void runContainerInit() { try { System.out.println("\n=== Container Init (PID " + libc.getpid() + ") ==="); if (command.length > 0) { String[] argv = new String[command.length + 1]; System.arraycopy(command, 0, argv, 0, command.length); argv[command.length] = null; libc.execv(command[0], argv); System.err.println("Failed to execute: " + command[0]); System.exit(1); } } catch (Exception e) { System.err.println("Container init failed: " + e.getMessage()); System.exit(1); } } public static void main(String[] args) { if (args.length < 2) { System.out.println("Usage: java Container "); System.out.println(" --cpus= CPU cores (e.g., 0.5, 2)"); System.out.println(" --memory= Memory limit (e.g., 256M, 1G)"); System.out.println(" --pids= Max processes"); System.exit(1); } ResourceLimits limits = ResourceLimits.defaults(); String hostname = null; String[] command = null; // Parse arguments int cmdStart = 0; for (int i = 0; i < args.length; i++) { if (args[i].startsWith("--cpus=")) { limits.withCpu(Double.parseDouble(args[i].substring(7))); } else if (args[i].startsWith("--memory=")) { limits.withMemory(args[i].substring(9)); } else if (args[i].startsWith("--pids=")) { limits.withMaxPids(Integer.parseInt(args[i].substring(7))); } else if (hostname == null) { hostname = args[i]; } else { cmdStart = i; break; } } if (hostname != null && cmdStart > 0) { command = new String[args.length - cmdStart]; System.arraycopy(args, cmdStart, command, 0, command.length); } try { Container container = new Container(hostname, command, limits); container.run(); } catch (Exception e) { System.err.println("Error: " + e.getMessage()); e.printStackTrace(); System.exit(1); } } } ``` *** ## Part 4: Testing Resource Limits ### Testing CPU Limits ```bash theme={null} # Run a CPU-intensive workload with 50% CPU limit sudo java Container --cpus=0.5 testhost /bin/sh -c "while true; do :; done" # In another terminal, observe CPU usage (should be ~50%) top -p $(pgrep -f "while true") ``` ### Testing Memory Limits ```java theme={null} // MemoryHog.java - For testing memory limits public class MemoryHog { public static void main(String[] args) throws Exception { List allocations = new ArrayList<>(); while (true) { // Allocate 10MB chunks byte[] chunk = new byte[10 * 1024 * 1024]; allocations.add(chunk); System.out.println("Allocated: " + (allocations.size() * 10) + "MB"); Thread.sleep(100); } // Container will be OOM-killed when hitting memory limit! } } ``` ### Testing PID Limits ```bash theme={null} # Fork bomb protection! # Without pids limit, this would crash the system: :(){ :|:& };: # With pids limit set to 100, the fork bomb is contained ``` *** ## Understanding Cgroup Controllers ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ CGROUP CONTROLLERS │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ CONTROLLER FILES DESCRIPTION │ │ ────────── ───── ─────────── │ │ │ │ cpu cpu.max Bandwidth limit (quota/period) │ │ cpu.weight Relative CPU shares (1-10000) │ │ cpu.stat Usage statistics │ │ │ │ memory memory.max Hard limit (OOM kill if exceeded) │ │ memory.high Soft limit (throttling starts) │ │ memory.current Current usage │ │ memory.swap.max Swap limit │ │ memory.stat Detailed statistics │ │ │ │ pids pids.max Maximum processes │ │ pids.current Current process count │ │ │ │ io io.max Bandwidth/IOPS limits per device │ │ io.weight Relative I/O priority │ │ io.stat I/O statistics │ │ │ │ cpuset cpuset.cpus Allowed CPU cores (e.g., "0-3") │ │ cpuset.mems Allowed NUMA nodes │ │ │ │ hugetlb hugetlb.X.max Huge page limits │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` *** ## Exercises Add disk I/O limits: ```java theme={null} // 1. Find the device major:minor for the disk // cat /sys/block/sda/dev // 2. Set io.max // "8:0 rbps=1048576 wbps=1048576" = 1MB/s read/write // 3. Test with dd: // dd if=/dev/zero of=/tmp/test bs=1M count=100 ``` Pin container to specific CPU cores: ```java theme={null} // Use cpuset controller // Write to cpuset.cpus: "0,2" (only use cores 0 and 2) // Write to cpuset.mems: "0" (NUMA node 0) // This ensures containers don't share CPU cache // Great for latency-sensitive workloads ``` Implement real-time resource monitoring: ```java theme={null} // 1. Read cpu.stat periodically for CPU usage // 2. Read memory.current and memory.stat for memory // 3. Read io.stat for disk I/O // 4. Calculate deltas to get rates // 5. Display like 'docker stats' ``` *** ## Key Takeaways Cgroups form a tree - children inherit parent limits Different controllers for CPU, memory, I/O, PIDs Limits are enforced by kernel, not by honor system Exceeding memory limit triggers OOM killer *** ## What's Next? In [Chapter 3: Filesystem](/courses/build-your-own-x/docker-3-filesystem), we'll implement: * Overlay filesystems * Copy-on-write layers * Container root filesystem setup Build the layered filesystem *** ## Interview Deep-Dive **Strong Answer:** * Cgroups v1 had a per-controller hierarchy: each controller (cpu, memory, pids) had its own independent tree, and a process could be in different groups for different controllers. This created incoherent resource accounting -- a process could be in cgroup A for CPU but cgroup B for memory, making it impossible to get a unified view of resource consumption. * Cgroups v2 uses a single unified hierarchy. A process belongs to exactly one cgroup, and all controllers apply to that cgroup. This makes resource accounting consistent and simplifies management tooling. It also enables cross-controller coordination -- for example, the memory controller can influence the I/O controller's behavior for a process, which was impossible in v1. * V2 also introduced the "no internal processes" rule: if a cgroup has child cgroups, the parent cannot directly contain processes. This eliminates ambiguity about which level of the hierarchy should be charged for resource usage. * The migration took years because systemd, Docker, and Kubernetes all had deep dependencies on v1 semantics. The practical interview relevance is that debugging resource limits in production requires knowing which cgroup version the host is running, because the filesystem paths and file formats differ significantly. **Follow-up: How do you check which cgroup version a system is running?** Check with `stat -fc %T /sys/fs/cgroup/` -- if it returns `cgroup2fs`, the system uses v2; `tmpfs` indicates v1. In v1, you find CPU limits at `/sys/fs/cgroup/cpu/docker//cpu.cfs_quota_us`. In v2, it is `/sys/fs/cgroup/system.slice/docker-.scope/cpu.max`. The file format also differs: v1 uses separate files for quota and period; v2 combines them into one file (`cpu.max`). **Strong Answer:** * When a cgroup hits its `memory.max` limit, the kernel's OOM killer activates for that cgroup specifically (it does not kill processes outside the cgroup). It selects the victim by calculating an `oom_score` for each process based on RSS, swap usage, and the `oom_score_adj` value. * For production tuning, set `memory.high` to a value below `memory.max` (e.g., high at 900MB, max at 1GB). When usage crosses `memory.high`, the kernel throttles memory allocation instead of killing, giving the application a chance to shed load or run garbage collection. This turns a hard crash into a soft slowdown. * Also configure `oom_score_adj` to ensure the right process dies if OOM is unavoidable. Setting the main process to `-500` and helper sidecars to `0` ensures sidecars die first. And critically, monitor `memory.events` to alert on `oom_kill` events before they cascade. * A subtle point: application-level memory metrics (like JVM heap usage) only show user-space allocations. The kernel counts *all* memory charged to the cgroup, including page cache, tmpfs mounts, and kernel stack pages. A container writing heavily to tmpfs will trigger OOM even though the application reports low heap usage. **Follow-up: What is the difference between a cgroup OOM and a host-level OOM? Which is worse operationally?** A cgroup OOM is scoped and controlled -- only processes in that cgroup die. A host-level OOM means the entire machine's memory is exhausted, and the kernel's global OOM killer activates, often killing the wrong process. Host OOM is operationally catastrophic because it cascades across unrelated services. This is precisely why cgroup memory limits exist -- they convert a global disaster into a local, contained failure. **Strong Answer:** * CFS bandwidth control gives each cgroup a quota of CPU time per period (default 100ms). If `cpu.max` is `50000 100000`, the container gets 50ms of CPU time per 100ms period. * The latency spike occurs when the container uses its entire 50ms quota in a burst at the start of the period. For the remaining 50ms, threads are suspended even if the CPU is idle. A request arriving during the throttled window waits up to 100ms, creating artificial latency. * This is measurable via `cpu.stat` -- look for `nr_throttled` and `throttled_usec`. A container with low average CPU usage can still show thousands of throttle events if usage is bursty. * Mitigation strategies: increase the CPU limit for burst headroom, reduce the CFS period (smaller periods like 10ms distribute throttling more evenly but increase scheduling overhead), or use `cpuset.cpus` to pin the container to dedicated cores, bypassing the bandwidth controller entirely. **Follow-up: Why do some Kubernetes operators recommend removing CPU limits entirely?** The argument is that CFS throttling causes more damage to latency-sensitive services than noisy neighbors do. If the cluster has accurate CPU requests and nodes are not overprovisioned, the scheduler places pods so total requested CPU does not exceed node capacity. Without limits, a pod can burst above its request when CPU is available. The counterargument is that without limits, a misbehaving pod can starve neighbors during peak load. The right answer depends on workload characteristics: for steady-state services with predictable CPU profiles, removing limits works well; for batch jobs or untrusted workloads, limits are necessary.