Learn from real-world examples of OS concepts applied in production systems. These case studies demonstrate how theory meets practice.
Purpose: Connect theory to real systems Target: Senior engineers preparing for system design Approach: Analysis of actual production incidents and design decisions
┌─────────────────────────────────────────────────────────────────┐│ PATHFINDER TASKS │├─────────────────────────────────────────────────────────────────┤│ ││ High Priority: bc_dist ││ - Bus distribution task ││ - Must run frequently ││ - Uses shared bus via mutex ││ ││ Medium Priority: Various tasks ││ - Image processing ││ - Data logging ││ ││ Low Priority: Meteorological data collection ││ - Takes bus mutex for long time ││ - Reads sensors ││ │└─────────────────────────────────────────────────────────────────┘Timeline of bug:┌────────────────────────────────────────────────────────────────┐│ ││ Time Action ││ ───── ─────────────────────────────────────────────────── ││ T+0 Low priority (L) acquires bus mutex ││ T+1 High priority (H) wakes up, needs mutex, BLOCKS ││ T+2 Medium priority (M) wakes up, preempts L ││ T+3 M runs... and runs... and runs... ││ T+4 H is still waiting (for L, which can't run) ││ T+5 Watchdog timer fires → SYSTEM RESET! ││ ││ Problem: H is waiting for L, but M (lower than H) runs ││ This is PRIORITY INVERSION ││ │└────────────────────────────────────────────────────────────────┘
// VxWorks RTOS (used on Pathfinder)// The fix was a configuration flag that was OFF by default!// Enable priority inheritance on mutexsemMCreate(SEM_Q_PRIORITY | SEM_INVERSION_SAFE);// ^^^^^^^^^^^^^^^^// This was missing!
-- Before: No protectionlocal match = ngx.re.match(input, pattern)-- After: With timeout using pcre_extra limitslocal match = ngx.re.match(input, pattern, "jo", nil, 1000)-- ^^^^-- match_limit: max backtracking
Scenario: Production database server running out of memory.
Copy
# dmesg output:[10854.231] Out of memory: Killed process 8234 (postgres) total-vm:7234512kB, anon-rss:6891234kB, file-rss:1234kB# What happened:# 1. A runaway query consumed excessive memory# 2. System couldn't allocate for other processes# 3. OOM killer chose postgres (highest memory user)# 4. Database terminated, service outage
┌─────────────────────────────────────────────────────────────────┐│ PROPER MEMORY MANAGEMENT │├─────────────────────────────────────────────────────────────────┤│ ││ 1. Application-level limits ││ • PostgreSQL: shared_buffers, work_mem limits ││ • JVM: -Xmx heap limit ││ • Go: GOMEMLIMIT ││ ││ 2. Container/cgroup limits ││ • Kubernetes: resources.limits.memory ││ • Docker: --memory flag ││ ││ 3. Systemd service limits ││ • MemoryMax=8G in unit file ││ ││ 4. Graceful degradation ││ • Reject new connections at 80% ││ • Drop caches at 90% ││ • Circuit breaker at 95% ││ │└─────────────────────────────────────────────────────────────────┘### Lesson**Don't rely on the OOM Killer** — it's a last resort. Instead:- Set appropriate memory limits- Monitor and alert- Design for graceful degradation---## Case Study 5: End-to-End Web Request (Linux Server)### BackgroundA typical web request on a Linux server exercises **almost every OS subsystem** covered in this course. Understanding the end-to-end path helps cement the relationships between chapters.### Request Lifecycle1. **Packet arrival**: - NIC receives an Ethernet frame with an IP/TCP packet. - DMA transfers the frame into RAM; NIC triggers an interrupt.2. **Interrupt handling and networking stack**: - Interrupt handler schedules NAPI; packets are pulled from the device ring. - The kernel’s network stack parses Ethernet/IP/TCP headers, validates checksums. - Payload is placed into the appropriate socket’s receive queue.3. **Scheduler and application wake-up**: - A worker thread blocked in `epoll_wait` / `io_uring_enter` / `read` is woken. - The **scheduler** chooses a CPU, considering runnable threads and affinities.4. **System call and process context**: - The thread issues `read()` or similar; control transitions to the kernel. - Data is copied from kernel buffers into user-space memory.5. **Application processing**: - User-space parses HTTP, runs business logic, maybe hits a database. - This triggers further syscalls: `connect`, `send`, `read`, file I/O, etc.6. **Response send**: - Application writes the HTTP response (possibly via `sendfile` or `writev`). - Kernel queues data into the socket’s send buffer; TCP handles retransmissions and congestion control.7. **Scheduling, I/O, and completion**: - The scheduler multiplexes the CPU among many connections. - The storage stack and file systems serve static assets from disk or page cache.### OS Concepts Applied- **Networking**: NIC, DMA rings, NAPI, TCP/IP stack, socket buffers.- **Scheduling**: CFS/EEVDF deciding which request handler runs.- **Virtual Memory**: Page cache for static assets; working set of application code and data.- **File Systems & I/O**: Serving static content via page cache, `sendfile`, `io_uring`.- **Synchronization**: Worker pools, connection queues, logging, shared caches.- **Security**: Process isolation, capabilities, seccomp profiles for the web server.### LessonEvery “simple” web request is a tour through **CPU, memory, scheduler, I/O, networking, and security**. When debugging latency or throughput, trace the request along this path and map symptoms to the relevant OS chapter.------## Case Study 5: Docker Fork Bomb Prevention### ProblemA container runs a fork bomb, potentially taking down the host:```bash# Classic fork bomb:(){ :|:& };:# This creates exponential processes# 2^n processes very quickly# Can exhaust PIDs, file descriptors, memory