Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Operating Systems Interview Preparation

This guide covers the most frequently asked OS interview questions at top tech companies, organized by topic with detailed answers and common follow-ups.
Target: Senior/Staff Engineering Interviews
Companies: FAANG, startups, systems companies
Preparation Time: 20-30 hours across all topics

Interview Question Patterns

┌─────────────────────────────────────────────────────────────────┐
│                    OS INTERVIEW CATEGORIES                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  1. CONCEPTUAL (40%)                                            │
│     "Explain how virtual memory works"                          │
│     "What happens when you call malloc?"                        │
│                                                                  │
│  2. DESIGN (30%)                                                │
│     "Design a thread pool"                                      │
│     "Implement a simple scheduler"                              │
│                                                                  │
│  3. DEBUGGING (20%)                                             │
│     "This program deadlocks. Why?"                              │
│     "Why is this server slow?"                                  │
│                                                                  │
│  4. LINUX-SPECIFIC (10%)                                        │
│     "Explain cgroups"                                           │
│     "What is /proc?"                                            │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Learning Tracks and Progression

Use this section as a roadmap for practicing OS questions based on your goals.

Track 1: Generalist / Application Engineer

Focus on being dangerously good at fundamentals:
  • Read: Processes & Threads, Virtual Memory, Synchronization, Scheduling.
  • Practice:
    • Explain process vs thread vs coroutine in your own words.
    • Walk through malloc → page tables → page faults.
    • Debug simple deadlock and starvation examples.
  • Goal: Comfortably answer most “Top 20” OS questions in this file.

Track 2: Systems / Infra / Backend Engineer

You own services in production and need to debug OS-level issues:
  • Read: Everything in Track 1 plus File Systems, I/O Systems, Networking, Deadlocks, Linux Internals.
  • Practice:
    • Trace a slow web request through CPU, memory, and I/O using tools (strace, perf, iostat, ss).
    • Design a thread pool and explain scheduler interactions.
    • Reason about cgroups, namespaces, and container isolation.
  • Goal: Confidently debug “server is slow / high CPU / high IO wait” incidents.

Track 3: Kernel / Low-level Engineer

You want to work on kernels, drivers, or high-performance runtimes:
  • Read: All OS chapters, especially CPU Architectures, Kernel Memory, Linux Internals, Device Drivers, Storage Stack, Security, eBPF.
  • Practice:
    • Read and explain small sections of real kernel code (e.g., do_page_fault, tcp_recvmsg).
    • Implement simple kernel modules, experiment with scheduling and memory policies.
    • Design lock hierarchies and reason about RCU and lock-free algorithms.
  • Goal: Handle deep-dive interviews where you whiteboard OS internals and read code live.
Use these tracks to decide which sections to master first and which to treat as later extensions.

Top 20 OS Interview Questions

Process & Threading

Answer:
AspectProcessThread
MemorySeparate address spaceShared address space
CreationExpensive (fork)Cheap
Context switchExpensive (TLB flush)Cheap
CommunicationIPC neededShared memory
Crash impactIsolatedAffects all threads
When to use each:
  • Processes: Isolation needed (security), different languages, crash isolation
  • Threads: Shared state, low latency communication, same codebase
Follow-up: “How does fork() work?”
  • Creates copy of parent’s address space (copy-on-write)
  • Child gets PID, returns 0 from fork()
  • Parent gets child’s PID, returns child PID from fork()
Answer (step-by-step):
  1. Shell parses command, finds executable in PATH
  2. fork(): Create child process
    • Copy page tables (COW)
    • Copy file descriptors
    • New PID, same code
  3. execve(): Replace child with new program
    • Load ELF headers
    • Set up new address space
    • Map code, data sections
    • Set up stack with args/env
    • Jump to _start (entry point)
  4. Dynamic linking: ld.so loads shared libraries
  5. main(): C runtime calls your main()
  6. exit(): Cleanup, return status to parent
  7. wait(): Parent reaps child, gets exit status
Key syscalls: fork, execve, wait4, exit_group
Answer:What is saved:
  • CPU registers (general purpose, PC, SP)
  • Floating point/SIMD registers
  • Kernel stack pointer
  • Page table pointer (CR3 on x86)
Steps:
1. Save current task's registers to its task_struct
2. Save stack pointer
3. Select next task (scheduler)
4. Restore next task's stack pointer
5. Restore next task's registers
6. If different process: switch page tables (TLB flush)
7. Return to user mode
Cost:
  • Thread switch: ~1-2 μs
  • Process switch: ~5-10 μs (TLB flush)
Why it matters: Too many context switches = poor performance

Memory Management

Answer:Concept: Each process sees private address space (illusion of having all memory)How it works:
Virtual Address → MMU → Page Tables → Physical Address

┌────────────────┐     ┌─────────────────┐     ┌────────────────┐
│ Process A      │     │   Page Table A  │     │ Physical RAM   │
│ addr: 0x1000   │────►│ 0x1→phy 0x8000 │────►│ Frame 0x8000   │
└────────────────┘     └─────────────────┘     └────────────────┘

┌────────────────┐     ┌─────────────────┐     
│ Process B      │     │   Page Table B  │     Same physical
│ addr: 0x1000   │────►│ 0x1→phy 0x9000 │────►frame or different
└────────────────┘     └─────────────────┘     
Benefits:
  • Process isolation
  • Memory overcommit
  • Demand paging (don’t load until needed)
  • Shared libraries (one physical copy, many mappings)
  • Easy relocation
Components:
  • Page tables (software)
  • MMU (hardware)
  • TLB (cache for page table lookups)
Answer:Types:
  1. Minor fault: Page in memory, just needs mapping
  2. Major fault: Page on disk, I/O required
  3. Invalid: Segmentation fault
Flow:
1. CPU accesses address
2. MMU checks page table → entry invalid/not present
3. CPU raises page fault exception
4. Kernel handler runs:
   - Find VMA for address
   - Check permissions
   - If valid:
     - Minor: Update page table
     - Major: Read from disk, then update
   - If invalid: SIGSEGV
5. Return to faulting instruction (retry)
Common causes:
  • Demand paging (first access)
  • Copy-on-write
  • Stack growth
  • Swapped out page
  • Null pointer dereference (→ SIGSEGV)
Answer:Layers:
malloc(100)           ← User request


User allocator        ← glibc (ptmalloc/jemalloc)
(maintains free list, caches)

     ▼ brk()/mmap() if needed
Kernel
(VMA management, demand paging)
Small allocations (less than 128KB typically):
  • Come from pre-allocated heap
  • Free list management
  • May use sbrk() to extend heap
Large allocations (>128KB):
  • Use mmap() directly
  • Return to OS on free
Allocator strategies:
  • ptmalloc: Per-thread arenas, reduces contention
  • jemalloc: Better for multi-threaded, used by Firefox
  • tcmalloc: Thread-caching, used by Go
Follow-up: “What happens when you free()?”
  • Small: Returns to free list (not to OS)
  • Large (mmap’d): munmap() returns to OS
  • May trigger coalescing of free blocks

Synchronization

Answer:Four Coffman conditions (all must hold):
  1. Mutual exclusion: Resource can’t be shared
  2. Hold and wait: Holding one, waiting for another
  3. No preemption: Can’t forcibly take resource
  4. Circular wait: A→B→C→A waiting cycle
Prevention strategies:
StrategyMethodTradeoff
Lock orderingAlways acquire in same orderRequires discipline
Lock timeoutGive up after timeoutMay fail needlessly
Try-lockNon-blocking acquireRetry logic needed
Single lockOne lock for allPoor concurrency
Lock-freeAtomic operations onlyComplex to implement
Example fix:
// BAD: Thread 1 locks A then B, Thread 2 locks B then A

// GOOD: Always lock in order (A before B)
void transfer(Account *from, Account *to) {
    Account *first = (from < to) ? from : to;
    Account *second = (from < to) ? to : from;
    lock(first);
    lock(second);
    // transfer...
    unlock(second);
    unlock(first);
}
Answer:
PrimitivePurposeCountUse Case
MutexMutual exclusion0/1Protect critical section
SemaphoreResource counting0-NLimit concurrent access
Cond VarWait for conditionN/AProducer-consumer
Mutex:
pthread_mutex_lock(&mutex);
// Critical section - only one thread
pthread_mutex_unlock(&mutex);
Semaphore:
sem_wait(&sem);  // Decrement, block if 0
// Access resource (up to N concurrent)
sem_post(&sem);  // Increment
Condition Variable:
pthread_mutex_lock(&mutex);
while (!condition) {
    pthread_cond_wait(&cond, &mutex);  // Releases mutex while waiting
}
// Condition is now true
pthread_mutex_unlock(&mutex);

// In another thread:
pthread_mutex_lock(&mutex);
condition = true;
pthread_cond_signal(&cond);  // Wake one waiter
pthread_mutex_unlock(&mutex);
Key insight: Condition variables are always used with a mutex and a predicate (while loop).
Answer:Spinlock: Busy-wait for lock instead of sleeping
while (atomic_test_and_set(&lock) == 1) {
    // Spin! CPU is busy waiting
}
// Have lock
// ... critical section ...
atomic_clear(&lock);
When to use:
  • Very short critical sections (less than 1μs)
  • Interrupt handlers (can’t sleep)
  • Lock held time shorter than context switch time
  • Known low contention
When NOT to use:
  • Long critical sections
  • High contention
  • User space (usually)
  • Single CPU system
Comparison:
AspectSpinlockMutex
WaitingBusy-waitSleep
CPU use100% while waiting0% while waiting
Context switchNoneYes
Best forVery short holdsLonger holds
Kernel useVery commonLess common

Scheduling

Answer:
AlgorithmDescriptionProsCons
FCFSFirst come first servedSimpleConvoy effect
SJFShortest job firstOptimal avg waitNeed to know times
Round RobinTime slicesFairContext switch overhead
PriorityBased on priorityImportant tasks firstStarvation
MultilevelMultiple queuesFlexibleComplex
CFSFair share of CPUFair, no starvationOverhead
Linux CFS (Completely Fair Scheduler):
  • Each task tracks “virtual runtime” (vruntime)
  • Lower vruntime = hasn’t had fair share = run it next
  • Red-black tree for O(log n) task selection
  • Nice values adjust time slices, not priority
Key metrics:
  • Throughput: Jobs completed per time
  • Turnaround: Submit to completion
  • Wait time: Time in ready queue
  • Response time: Submit to first run
Answer:Problem:
Low priority task (L) holds lock
High priority task (H) needs lock → blocks
Medium priority task (M) preempts L

Result: H waits for M, even though H > M priority!
Solutions:
  1. Priority Inheritance:
    • L temporarily gets H’s priority while holding lock
    • L runs, releases lock, H continues
    • Used in Linux (rt_mutex)
  2. Priority Ceiling:
    • Lock has ceiling = max priority of any user
    • Acquiring task gets ceiling priority
    • Prevents other tasks from preempting
Real example: Mars Pathfinder (1997)
  • Low-priority task held bus mutex
  • High-priority task blocked
  • Watchdog timer triggered reset
  • Fixed by enabling priority inheritance

File Systems & I/O

Answer:System call path:
read(fd, buffer, size)


VFS (Virtual File System)
- Translates to inode operations


Page Cache (Check if cached)
─────► HIT: Copy to user buffer, return
─────► MISS: Continue...


Filesystem (ext4, xfs)
- Map file offset → block numbers


Block Layer
- I/O scheduling, merging


Device Driver
- Issue commands to disk


Hardware (SSD/HDD)
- DMA data to memory


Complete I/O
- Wake up process
- Copy to user buffer
Optimizations:
  • Page cache (avoid disk for repeated reads)
  • Read-ahead (prefetch next blocks)
  • I/O merging (combine adjacent requests)
Answer:Synchronous I/O:
// Process blocks until I/O completes
ssize_t n = read(fd, buffer, size);
// Control returns here after disk I/O
Asynchronous I/O:
// Submit request, continue immediately
struct aiocb cb = {...};
aio_read(&cb);
// Do other work...
// Later: check completion or get signal
Modern: io_uring (Linux 5.1+):
// Submit multiple I/O requests
io_uring_prep_read(sqe, fd, buffer, size, offset);
io_uring_submit(&ring);

// Check completions (batched)
io_uring_wait_cqe(&ring, &cqe);
Comparison:
AspectSyncAsync (AIO)io_uring
BlockingYesNoNo
System calls1 per I/O2 per I/OBatched
ComplexitySimpleComplexModerate
PerformanceOKBetterBest

Linux Specific

Answer:
Aspectselectpollepoll
Max FDs1024 (FD_SETSIZE)UnlimitedUnlimited
Passing FDsCopy each callCopy each callRegister once
CheckingO(n) scanO(n) scanO(1) ready list
Edge triggerNoNoYes
select:
fd_set readfds;
FD_ZERO(&readfds);
FD_SET(fd, &readfds);
select(fd+1, &readfds, NULL, NULL, NULL);
if (FD_ISSET(fd, &readfds)) { ... }
poll:
struct pollfd fds[1];
fds[0].fd = fd;
fds[0].events = POLLIN;
poll(fds, 1, -1);
if (fds[0].revents & POLLIN) { ... }
epoll:
int epfd = epoll_create1(0);
struct epoll_event ev = {.events = EPOLLIN, .data.fd = fd};
epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev);

struct epoll_event events[10];
int n = epoll_wait(epfd, events, 10, -1);
// Only returns ready FDs!
Use epoll for:
  • High-performance servers
  • Many connections
  • Production systems
Answer:Namespaces (isolation):
NamespaceIsolates
PIDProcess IDs (PID 1 inside container)
NETNetwork stack, interfaces
MNTFilesystem mounts
UTSHostname
IPCShared memory, semaphores
USERUID/GID mapping
CGROUPCgroup root
// Create new namespace
clone(..., CLONE_NEWPID | CLONE_NEWNET, ...);

// Or join existing
setns(fd, CLONE_NEWPID);
Cgroups (limits):
ControllerLimits
memoryRAM usage
cpuCPU time
blkioDisk I/O
pidsNumber of processes
# Create cgroup
mkdir /sys/fs/cgroup/memory/mycontainer
echo 100M > /sys/fs/cgroup/memory/mycontainer/memory.max
echo $$ > /sys/fs/cgroup/memory/mycontainer/cgroup.procs
Docker = namespaces + cgroups + union filesystem + OCI runtime

Common Design Questions

Requirements:
  • Fixed number of worker threads
  • Task queue
  • Graceful shutdown
Implementation:
typedef struct {
    pthread_t *threads;
    int num_threads;
    
    // Task queue
    task_t *queue;
    int head, tail;
    int queue_size;
    
    // Synchronization
    pthread_mutex_t lock;
    pthread_cond_t not_empty;
    pthread_cond_t not_full;
    
    bool shutdown;
} thread_pool_t;

void *worker(void *arg) {
    thread_pool_t *pool = arg;
    
    while (1) {
        pthread_mutex_lock(&pool->lock);
        
        while (pool->head == pool->tail && !pool->shutdown) {
            pthread_cond_wait(&pool->not_empty, &pool->lock);
        }
        
        if (pool->shutdown) {
            pthread_mutex_unlock(&pool->lock);
            break;
        }
        
        task_t task = pool->queue[pool->head];
        pool->head = (pool->head + 1) % pool->queue_size;
        
        pthread_cond_signal(&pool->not_full);
        pthread_mutex_unlock(&pool->lock);
        
        // Execute task
        task.func(task.arg);
    }
    return NULL;
}

void submit(thread_pool_t *pool, void (*func)(void*), void *arg) {
    pthread_mutex_lock(&pool->lock);
    
    while ((pool->tail + 1) % pool->queue_size == pool->head) {
        pthread_cond_wait(&pool->not_full, &pool->lock);
    }
    
    pool->queue[pool->tail] = (task_t){func, arg};
    pool->tail = (pool->tail + 1) % pool->queue_size;
    
    pthread_cond_signal(&pool->not_empty);
    pthread_mutex_unlock(&pool->lock);
}
Follow-ups:
  • How to handle task priorities? Use priority queue
  • How to handle task cancellation? Cancellation tokens
  • How to tune thread count? CPU cores, task type (I/O vs CPU)
Simple bump allocator:
typedef struct {
    void *base;
    size_t size;
    size_t offset;
} arena_t;

void *arena_alloc(arena_t *arena, size_t size) {
    // Align to 8 bytes
    size = (size + 7) & ~7;
    
    if (arena->offset + size > arena->size) {
        return NULL;
    }
    
    void *ptr = (char*)arena->base + arena->offset;
    arena->offset += size;
    return ptr;
}

void arena_reset(arena_t *arena) {
    arena->offset = 0;
}
Free list allocator:
typedef struct block {
    size_t size;
    struct block *next;
} block_t;

block_t *free_list = NULL;

void *my_malloc(size_t size) {
    block_t **prev = &free_list;
    block_t *curr = free_list;
    
    // First fit
    while (curr) {
        if (curr->size >= size) {
            *prev = curr->next;
            return (char*)curr + sizeof(block_t);
        }
        prev = &curr->next;
        curr = curr->next;
    }
    
    // No fit, get from OS
    block_t *new = sbrk(size + sizeof(block_t));
    new->size = size;
    return (char*)new + sizeof(block_t);
}

void my_free(void *ptr) {
    block_t *block = (block_t*)((char*)ptr - sizeof(block_t));
    block->next = free_list;
    free_list = block;
}
Real allocators add:
  • Coalescing free blocks
  • Size classes (slab-like)
  • Per-thread caches
  • Best fit or other strategies
Token bucket algorithm:
typedef struct {
    double tokens;
    double max_tokens;
    double refill_rate;  // tokens per second
    struct timespec last_refill;
    pthread_mutex_t lock;
} rate_limiter_t;

bool acquire(rate_limiter_t *rl, double tokens) {
    pthread_mutex_lock(&rl->lock);
    
    // Refill tokens based on elapsed time
    struct timespec now;
    clock_gettime(CLOCK_MONOTONIC, &now);
    
    double elapsed = (now.tv_sec - rl->last_refill.tv_sec) +
                    (now.tv_nsec - rl->last_refill.tv_nsec) / 1e9;
    
    rl->tokens += elapsed * rl->refill_rate;
    if (rl->tokens > rl->max_tokens) {
        rl->tokens = rl->max_tokens;
    }
    rl->last_refill = now;
    
    // Try to acquire
    if (rl->tokens >= tokens) {
        rl->tokens -= tokens;
        pthread_mutex_unlock(&rl->lock);
        return true;
    }
    
    pthread_mutex_unlock(&rl->lock);
    return false;
}
Sliding window:
// Track requests in time window
// More accurate for bursts but more memory
typedef struct {
    int *requests;      // Circular buffer
    int window_size;    // Seconds
    int max_requests;
    struct timespec window_start;
} sliding_window_t;
Distributed rate limiting:
  • Use Redis for shared state
  • Approximate algorithms (e.g., sliding window log)
  • Accept some inconsistency for performance

Debugging Scenarios

Approach:
  1. Attach debugger / get stack traces:
    $ gdb -p <pid>
    (gdb) thread apply all bt
    
    # Or without GDB:
    $ pstack <pid>
    
  2. Check for deadlock:
    # Look for mutex wait in stack traces
    # Multiple threads waiting on locks = deadlock candidate
    
    # Check lock order: Does thread 1 hold A, wait for B?
    #                   Does thread 2 hold B, wait for A?
    
  3. Check for infinite loop:
    $ top -H -p <pid>
    # Look for thread at 100% CPU
    
    $ perf top -p <pid>
    # What function is hot?
    
  4. Check for I/O block:
    $ cat /proc/<pid>/stack
    # Look for I/O wait syscalls
    
    $ strace -p <pid>
    # What syscall is it stuck on?
    
  5. Check for network:
    $ ss -tunapee | grep <pid>
    # Stuck connections?
    
    $ netstat -an | grep CLOSE_WAIT
    # Leaked sockets?
    
Common causes:
  • Deadlock (lock order violation)
  • Network timeout too high
  • Database connection exhausted
  • Full disk / slow I/O
  • Signal handling issue
Systematic approach:
  1. CPU check:
    $ top
    $ mpstat -P ALL 1
    
    # High CPU? Which process?
    # Low CPU but slow? Waiting on I/O?
    
  2. Memory check:
    $ free -h
    $ cat /proc/meminfo
    $ vmstat 1
    
    # Swapping? OOM pressure?
    
  3. Disk I/O check:
    $ iostat -x 1
    $ iotop
    
    # Disk at 100% utilization?
    # High await times?
    
  4. Network check:
    $ ss -s
    $ netstat -i
    $ sar -n DEV 1
    
    # High retransmits? Connection errors?
    
  5. Application profiling:
    $ perf record -g -p <pid> sleep 30
    $ perf report
    
    # Where is time spent?
    
  6. System calls:
    $ strace -c -p <pid>
    
    # Which syscalls are slow?
    
Common bottlenecks:
  • Lock contention
  • Database queries
  • Network latency
  • GC pauses (Java, Go)
  • Disk I/O (logging, temp files)
  • Connection exhaustion

Quick Reference Cheat Sheet

Key Numbers to Know

MetricApproximate Value
L1 cache access1 ns
L2 cache access4 ns
L3 cache access12 ns
RAM access100 ns
SSD read100 μs
HDD seek10 ms
Context switch1-10 μs
Page fault (minor)5-10 μs
Page fault (major)1-10 ms
System call100-200 ns

System Call Quick Reference

// Process
fork()      // Create child process
execve()    // Replace with new program
wait()      // Wait for child
exit()      // Terminate process

// Files
open()      // Open file
read()      // Read from fd
write()     // Write to fd
close()     // Close fd
stat()      // Get file metadata

// Memory
mmap()      // Map memory/files
munmap()    // Unmap
brk()       // Extend heap

// Network
socket()    // Create socket
bind()      // Bind to address
listen()    // Listen for connections
accept()    // Accept connection
connect()   // Connect to remote

Common File Paths

/proc/meminfo      # Memory statistics
/proc/cpuinfo      # CPU information
/proc/<pid>/status # Process status
/proc/<pid>/maps   # Memory mappings
/proc/<pid>/fd/    # Open file descriptors
/sys/class/net/    # Network interfaces
/sys/block/        # Block devices
/etc/passwd        # User accounts
/etc/fstab         # Mount configuration

Study Plan (Original 5-Week Outline)

1

Week 1: Fundamentals

Process/threads, memory management, virtual memory
2

Week 2: Synchronization

Mutexes, deadlocks, condition variables, lock-free
3

Week 3: Scheduling & I/O

CPU scheduling, file systems, disk I/O
4

Week 4: Linux Internals

System calls, kernel modules, containers
5

Week 5: Practice

Implement thread pool, allocator, rate limiter

How to Study OS Interviews — Detailed 30-Day Plan

The 5-week outline above is the skeleton. The 30-day plan below is the muscle: what to actually do each day so that on interview day, you have not just read the material, you have practiced it. The structure mirrors how staff engineers at FAANG actually learn this material — in layers, with deliberate practice, not by passively re-reading textbooks.
1

Week 1: Foundations (Days 1-7)

Goal: Build the bedrock mental model. By end of week, you should be able to draw user-space vs kernel-space, processes vs threads, and the syscall path on a whiteboard from memory.
  • Day 1-2: Read the OS Fundamentals and Processes chapters of this course. Re-read the xv6 book chapters on processes (Ch 2-3). Draw the user/kernel boundary diagram from memory three times.
  • Day 3: Practice explaining “what happens on fork()” out loud, to a wall, in 3 minutes. Record yourself. Listen back. Note where you stumble.
  • Day 4-5: Read Threads and Synchronization chapters. Implement a thread pool from scratch in C or your language of choice. Do not look up reference implementations until you have a working version.
  • Day 6: Pick 5 questions from the “Top 20” list above. Answer each one out loud, in writing, and on a whiteboard. Compare your answers to the references.
  • Day 7: Rest day or catch-up day. Review what felt shaky.
Mid-week check: If you cannot explain the difference between a thread and a process at the kernel level (task_struct, clone() flags), do NOT proceed to week 2. Go back.
2

Week 2: Memory and Processes (Days 8-14)

Goal: Master virtual memory. By end of week, you should be able to walk a virtual address through 4-level page tables, explain TLB shootdowns, and describe what happens during malloc() end-to-end.
  • Day 8: Virtual Memory chapter, plus the Kernel Memory chapter. Draw the four-level page table walk (PML4 -> PDPT -> PD -> PT -> page) on paper. Memorize the entry sizes (8 bytes per entry, 512 entries per table).
  • Day 9: Memory Management chapter. Implement a simple bump allocator, then a free-list allocator. Hand-trace the heap state through 5 mallocs and 3 frees.
  • Day 10: Read Brendan Gregg’s chapter on memory in “Systems Performance” (or watch his free YouTube talks). Run pmap on a real process. Decode every section.
  • Day 11: Page faults: minor, major, COW, demand paging. Trace what the kernel does for each. Implement a userfaultfd toy to feel the mechanism.
  • Day 12: Containers and cgroups — read the Containers/Virtualization chapter. Set up a Docker container manually using unshare, mount, and cgcreate. Do not skip this — knowing namespaces and cgroups by hand impresses interviewers immediately.
  • Day 13-14: Practice all “memory” questions from this guide. Record yourself answering. Watch back, identify weak spots.
Common stuck point: “What is the difference between virtual memory and swap?” If you cannot answer this in 30 seconds, you are not ready for week 3.
3

Week 3: I/O, Storage, and Filesystems (Days 15-21)

Goal: Understand the full I/O path from read() to disk. By end of week, you should be able to compare epoll, io_uring, and AIO; explain what a filesystem journal does; and diagnose disk-bound performance.
  • Day 15: I/O Systems chapter and Storage Stack chapter. Trace a read() syscall from user space through VFS to the device driver to the block layer.
  • Day 16: epoll, select, poll, io_uring — the four major Linux I/O multiplexing mechanisms. Compare in writing: when each is appropriate, what their syscall signatures look like, what edge-vs-level-triggered means.
  • Day 17: File Systems chapter. Read about ext4 (extents, journaling), XFS (allocation groups), and at least one COW filesystem (btrfs or ZFS). Compare design choices.
  • Day 18: Networking chapter. Trace a TCP connection establishment from connect() syscall through the network stack to the wire. Memorize the TCP state diagram.
  • Day 19: Performance and debugging. Practice using iostat, iotop, strace, ftrace, and at least one eBPF tool (biolatency). On a real system, generate disk load and watch the metrics.
  • Day 20: Practice 5 “I/O is slow” type questions. Walk through diagnosis methodology out loud.
  • Day 21: Rest or catch-up. Schedule a mock interview with a friend who knows OS material.
Self-check: If asked “explain io_uring vs epoll,” can you answer in under 2 minutes with concrete differences (submission queue, completion queue, async ops, kernel polling)? If not, re-read.
4

Week 4: Distributed, Security, and Linux Internals (Days 22-30)

Goal: Tie OS concepts to distributed systems and security. By end of week, you should be able to discuss how Linux features support container security, how distributed systems abuse OS primitives, and the trade-offs of microkernel vs monolithic kernel design.
  • Day 22-23: Linux Internals chapter (this course). Practice the “what happens when you boot Linux” question and “trace a packet through the kernel” question.
  • Day 24: Security chapter — capabilities, namespaces, seccomp, SELinux/AppArmor. Read the Spectre/Meltdown papers (or LWN summaries). Understand KPTI.
  • Day 25: Synchronization deep-dive: futexes, RCU, lock-free programming. Read Paul McKenney’s “What is RCU, Fundamentally?” series. Implement a SPSC ring buffer in your language of choice.
  • Day 26: Case studies: read the OS Case Studies chapter. Pick three (Chrome, Pathfinder, Cloudflare) and rehearse summarizing each in 90 seconds with the key OS lesson.
  • Day 27: Modern features (eBPF, io_uring, kernel TLS, etc.). Skim the Modern Features chapter. Try writing a simple bpftrace one-liner.
  • Day 28: Mock interview day 1. Have a friend ask 5 random questions from this guide. Time-box answers to 5 minutes each. Review weaknesses.
  • Day 29: Mock interview day 2. Practice the “system design” angle: “design a thread-safe rate limiter,” “design a userspace TCP stack,” “design an OS scheduler for a real-time workload.”
  • Day 30: Light review. Rest before interview. Re-read your own notes and the cheat sheet. Do NOT cram new material.
Final readiness check: Pick a topic at random, set a 5-minute timer, give a complete answer with diagram and trade-offs. If you can do this for any topic in this guide, you are interview-ready.

Caveats and Common Pitfalls (Interview Strategy)

The biggest reason strong engineers bomb OS interviews is not lack of knowledge — it is bad strategy. The pitfalls below have killed more candidates than weak technical depth.
Strategic mistakes that sink even technically strong candidates:
  1. Memorizing answers without first-principles understanding. A candidate who memorized “fork copies the page tables with COW” can answer the surface question but falls apart on follow-ups: “what triggers the COW fault?” “what does the page-fault handler actually do?” “what if both processes write to the same page simultaneously?” If you do not understand the underlying mechanism, the second-level question exposes you. Always be able to derive an answer from primitives, not just recall it.
  2. Practicing only on paper, not on a whiteboard or in person. Drawing a clean diagram while talking is a separate skill from reading or writing. Many candidates can write a perfect explanation but freeze at a whiteboard. Solution: practice on whiteboards (or large paper, or a tablet) at least 5-10 times before any onsite. Stand up. Talk while you draw. The motor and verbal coordination matter.
  3. Trying to seem confident when you are wrong is worse than admitting uncertainty. Interviewers can tell. Bluffing through a wrong answer (“yes, that is correct”) destroys trust. Saying “I am not 100% sure — my mental model is X, but I want to verify by walking through Y” preserves trust. Senior engineers admit gaps; junior engineers hide them. Behave like the senior engineer you want to be.
  4. Not reading the room. Some interviewers want depth on one topic; others want breadth across many. Some love trade-off discussions; others want concrete code. Pay attention to follow-up questions — they tell you what the interviewer cares about. If you spent 8 minutes on a tangent and the interviewer is checking their watch, wrap up.
Solutions and strategic patterns:
  • Build mental models, not flashcards. When studying any concept, ask “why does this exist? what would break without it?” If you can derive clone() from “we need both fork() and pthread_create() and they are 95% the same,” you understand it. If you just memorize “clone has 7 flags,” you do not.
  • Whiteboard-practice with a friend or alone. Physically standing and drawing while explaining trains different muscle memory than typing. Set a timer. Limit yourself to 10 minutes per question. The constraint forces clarity.
  • Use the phrase “I am not certain, but here is my best understanding…” liberally. It signals humility and intellectual honesty. Then proceed with your best answer. This is what actual senior engineers sound like in design reviews.
  • Listen to follow-ups for signals. “Can you go deeper on X?” means “I care about X, give me more.” “What about Y?” means “I think there is a gap related to Y.” React accordingly. Do not just continue with your prepared script.
  • End answers with a synthesis. “So in summary, X is a trade-off between Y and Z, and I would choose X for workloads with characteristic W.” This signals senior judgment, not just recall.

Interview Deep-Dive (Strategy and Approach)

Strong Answer Framework:
  1. Clarify the scope first (1-2 minutes). Do not start designing until you have asked: what is the workload (read-heavy, write-heavy, mixed)? What is the scale (10 QPS, 100K QPS)? What are the failure modes you care about (durability, availability, latency)? What are the constraints (single machine, distributed, embedded)? Many candidates jump straight to a solution and design for the wrong problem.
  2. State your assumptions out loud. “I will assume we are designing for a single Linux machine, multi-core, with NVMe storage, optimizing for low p99 latency on reads.” Now the interviewer can correct you if you have the wrong picture.
  3. Start with the high-level architecture. Draw boxes: client, kernel boundary, your component, dependencies. Label data flow. Do not optimize yet — get the structure right first.
  4. Identify the OS primitives you will use. Threads vs processes? epoll vs io_uring? Shared memory or message passing? Mutex or RCU? State the choice and the reason for each. This is where senior judgment shows.
  5. Walk through the happy path end-to-end. “A request arrives. It hits epoll, the worker thread reads from the socket, processes it, and returns the response.” Trace state changes, syscalls, context switches.
  6. Identify the bottleneck and discuss trade-offs. “The bottleneck will be lock contention on the shared data. I would shard by key, paying memory cost for parallelism.” Always state what you are giving up to gain something.
  7. Failure modes and recovery. What happens if a worker crashes? If memory fills up? If disk fails? An OS systems-design answer that ignores failure modes is incomplete.
  8. Wrap up with the trade-offs. “I optimized for read latency at the cost of write amplification. If the workload were write-heavy I would change to design B.” This shows judgment.
Real-World Example: In 2020, Google’s Eric Brewer gave a talk at the ACM SIGOPS conference on “How to design a system.” His core advice was almost identical: clarify, assume, structure, primitives, trade-offs. The pattern is industry-canonical because it is how staff engineers actually run design reviews. Reference: ACM Tech Talks archive, also similar material in “Designing Data-Intensive Applications” by Kleppmann.
Senior follow-up 1: “What if I asked you to design this for 100x the scale?” Reveals if you can think about scaling cliffs. The right answer talks about where the current design breaks (lock contention, single-machine memory, single-disk throughput) and what you would change at each cliff (shard, distribute, replicate).
Senior follow-up 2: “What part of this design are you least confident in?” A great question — they want to see if you have self-awareness. Pick a real area (concurrency model, failure handling, memory budget) and explain what you would prototype to validate.
Senior follow-up 3: “How would you measure if this design works?” Reveals if you think about observability from day one. Answer with USE and RED methods, specific metrics (p99 latency, throughput, error rate), and the dashboards/alerts you would set up.
Common Wrong Answers:
  1. “Let me just sketch a rough design, then we can refine.” Sounds humble but actually means “I am not going to do the structured thinking.” Senior interviewers want to see the framework.
  2. “I would use Kafka and Kubernetes.” Reaching for off-the-shelf tools without justifying primitives is a junior tell. Even if those tools are right, explain why — which OS-level primitives they provide that you need.
Further Reading:
  • Brendan Gregg’s “Systems Performance” — not a design book per se but the framework chapters are exemplary.
  • “Designing Data-Intensive Applications” by Martin Kleppmann — has explicit design frameworks.
  • Google SRE book, “Postmortem Culture” chapter — shows how seniors think about failure modes.
Strong Answer Framework:
  1. Buy a moment without panicking. Phrases that work: “That is a good question, let me think for a second.” “I want to make sure I get this right — can I think out loud?” “I have not encountered this exactly, but let me reason about it.” All of these are honest, professional, and buy 5-15 seconds of thinking time. Saying nothing while you think looks like a freeze.
  2. Reason from primitives. Even if you do not know the specific answer, you almost always know related concepts. Start there: “I do not know the exact answer, but I know X is true and Y is true; that suggests Z…” Reasoning out loud is itself the interview signal. Many interviewers prefer “I do not know the answer but here is how I would derive it” over a memorized correct answer.
  3. Bound your uncertainty. “I think this is true, but I would want to verify against the kernel source / documentation.” Showing you know what you do not know is a senior trait. Pretending to know what you do not is a junior trait that interviewers detect and disqualify on.
  4. Redirect to adjacent strength when honest. “I have not used X in production, but in the related case of Y, I have done Z, and I would expect similar trade-offs.” This is honest redirection, not evasion — you are showing relevant experience.
  5. Admit and offer to learn live. “I do not know — can you share the answer? I would love to understand.” This works occasionally; over-using it makes you look unprepared. Use sparingly for genuinely novel questions.
Real-World Example: In a published mock interview between Brendan Gregg and a candidate (Netflix Tech Talks, around 2017), the candidate was asked about a specific eBPF feature he did not know. He said “I have not used that specific helper, but the BPF helpers I am familiar with work like X, so I would expect this one to behave like Y.” Gregg’s feedback: that answer was better than a correct memorized answer because it showed reasoning ability. The lesson: interviewers are testing thinking, not just recall.
Senior follow-up 1: “What if you genuinely have no idea, even from primitives?” Then say so. “I genuinely do not know — this is outside my experience. Could you point me at a starting reference and I will follow up?” Honest, professional, ends the question without lingering.
Senior follow-up 2: “How do you avoid getting flustered when this happens?” Practice. Mock interviews where someone deliberately throws you a curveball train you to stay calm. Also: remember that not knowing one thing is normal; bluffing is a disqualifier but admitting is just a data point.
Senior follow-up 3: “When does ‘I do not know’ kill your candidacy versus when does it strengthen it?” It kills you if you say it for fundamentals you should know (process vs thread, virtual memory, basic syscalls). It strengthens you if you say it for esoterica (specific kernel commit, niche subsystem) and pair it with reasoning. Know the line.
Common Wrong Answers:
  1. “Hmm, I am not sure but I think it is X…” (then a confident wrong answer). Interviewer detects bluff. Trust destroyed.
  2. “I do not know.” (full stop). True but missed opportunity. Always pair with reasoning or a next-step.
Further Reading:
  • Cal Newport, “Deep Work” — the mental discipline of staying calm under question pressure.
  • “Cracking the Coding Interview” by Gayle McDowell — has a chapter on handling difficult questions.
  • “How Will You Measure Your Life?” by Clayton Christensen — broader, but the principle of intellectual honesty applies directly.
Strong Answer Framework:
  1. Process / Thread / Concurrency model. Almost every interview touches it. Prepare: clone() and what flags do, fork+exec sequence, thread vs process trade-offs, GIL or equivalent in your language, mutex vs spinlock vs RCU. Practice: implement a thread pool from scratch.
  2. Virtual memory and page tables. The classic deep-dive question. Prepare: 4-level page table walk on x86_64, TLB and TLB shootdowns, COW, demand paging, swap vs RAM, mmap vs read/write. Practice: trace malloc(8MB) end-to-end with all the kernel actions.
  3. System calls and the syscall path. Bread-and-butter for systems roles. Prepare: SYSCALL instruction mechanics, ring transitions, vDSO, syscall table, context switching cost. Practice: walk through read() from user code to disk and back.
  4. Synchronization primitives and their trade-offs. Always asked because concurrency bugs are universal. Prepare: spinlock vs mutex vs futex, RCU read-side vs write-side cost, lock-free structures (CAS, atomic), memory ordering (acquire/release/seqcst). Practice: explain why a double-checked lock fails without the right memory barrier.
  5. Performance debugging and observability. Increasingly common, especially for SRE/infra roles. Prepare: USE method, RED method, perf, ftrace, eBPF, flame graphs, off-CPU analysis. Practice: walk through a “p99 latency spike” investigation from scratch.
Real-World Example: Glassdoor and Levels.fyi data over 2020-2024 consistently show these five topics appearing in 80%+ of senior systems interviews at Google, Meta, Amazon, Microsoft, and Apple. Confirmed by published company-specific guides (e.g., Meta’s “Production Engineer interview prep” page) which explicitly call out concurrency, memory, and Linux internals as required topics.
Senior follow-up 1: “What about distributed systems topics like Raft or CRDTs?” Important but usually evaluated in a separate “system design” loop, not the OS-specific loop. The OS loop tests OS primitives; the design loop tests architecture. Prepare both, but study them separately.
Senior follow-up 2: “Should I learn the Linux kernel source code before interviewing?” Not full kernel — but reading specific paths (do_page_fault, sys_read, the scheduler) gives you concrete examples for answers. “When I was reading do_page_fault in mm/memory.c, I noticed…” is a very strong signal.
Senior follow-up 3: “How does the OS interview differ for a kernel role versus an SRE role versus an application role?” Kernel roles: deep mechanics, possibly live code reading. SRE roles: production debugging, performance tools, incident response. Application roles: enough OS to use it correctly (file descriptors, memory, syscalls, threads), less depth on internals.
Common Wrong Answers:
  1. “Just read ‘Operating Systems: Three Easy Pieces’ cover to cover.” Good book, but reading is not preparation. You must practice answering questions out loud, on whiteboards, under time pressure. Reading alone is necessary but not sufficient.
  2. “Memorize a list of 100 questions and answers.” Memorization without first-principles fails on follow-ups. Use questions as practice prompts, not scripts.
Further Reading:
  • “Operating Systems: Three Easy Pieces” (free online) — the textbook that gets the foundations right.
  • “Cracking the Coding Interview” by Gayle McDowell — not OS-specific but the strategy chapters apply.
  • Interview Cake and System Design Primer (free GitHub repo) — have explicit OS sections.

Key Takeaways

Know the Fundamentals

Process vs thread, virtual memory, page faults - these come up constantly.

Practice Debugging

“Why is this slow/hanging?” questions are common. Know your tools.

Design Questions

Be ready to implement thread pool, allocator, scheduler from scratch.

Know Linux Specifics

epoll vs select, namespaces, cgroups - expected for systems roles.

Next: Case Studies

Interview Deep-Dive

Strong Answer:
  • In Linux, there is no fundamental distinction between a process and a thread at the kernel level. Both are represented by a task_struct. The difference is in what they share. When you call fork(), the kernel creates a new task_struct with a new memory space (new page tables via copy-on-write), new file descriptor table, new signal handlers — a full copy. When you call pthread_create() (which internally uses clone() with specific flags), the kernel creates a new task_struct that SHARES the parent’s memory space, file descriptor table, signal handlers, and more.
  • The clone() syscall is the unified primitive. The flags control what is shared: CLONE_VM (share memory), CLONE_FILES (share file descriptors), CLONE_SIGHAND (share signal handlers), CLONE_THREAD (same thread group). A “process” is clone() with no sharing flags; a “thread” is clone() with all sharing flags.
  • From the scheduler’s perspective, every task_struct is a schedulable entity. The scheduler does not distinguish between processes and threads. Context switching between threads of the same process is faster because the page tables (CR3 register on x86) do not change, avoiding a TLB flush.
  • The practical implication: a crash (segfault) in one thread takes down all threads in the process because they share the same address space. A crash in a child process leaves the parent unaffected because they have separate address spaces. This is the core trade-off: threads give you cheap communication (shared memory) at the cost of fault isolation.
Follow-up: What is a TLB flush, and why does switching between threads of the same process avoid it?The TLB caches virtual-to-physical address translations. When you switch to a different process, its page tables are different (different CR3 value), so all TLB entries from the old process are invalid and must be flushed. After a flush, every memory access triggers a page table walk until the TLB is repopulated — the “cold TLB” penalty, which can cost thousands of cycles. Switching between threads of the same process does not change CR3 (they share page tables), so the TLB remains valid. Modern CPUs also use PCIDs (Process Context IDs) to tag TLB entries with the process, allowing entries from multiple processes to coexist in the TLB and reducing the cost of process switches.
Strong Answer:
  • CFS (Completely Fair Scheduler) was the default Linux scheduler for over 15 years. It tracks “virtual runtime” (vruntime) for each task — how much CPU time the task has received relative to its fair share. Tasks with the lowest vruntime are scheduled next, stored in a red-black tree for O(log n) selection. Nice values adjust the weight (not priority) so higher-nice tasks accumulate vruntime faster and get less CPU. CFS provides good fairness and interactive responsiveness for general-purpose workloads.
  • EEVDF (Earliest Eligible Virtual Deadline First) replaced CFS in Linux 6.6. It adds a “virtual deadline” concept: each task has a deadline based on its requested time slice. Among eligible tasks (those whose virtual time allows them to run), the one with the earliest deadline runs next. EEVDF provides better latency guarantees than CFS — it reduces tail latency for interactive tasks by preventing “sleeper bonus” gaming and providing more predictable scheduling.
  • SCHED_FIFO is a real-time policy. A SCHED_FIFO task runs until it voluntarily yields or is preempted by a higher-priority SCHED_FIFO task. It is not fair — a SCHED_FIFO task at priority 99 will starve everything else indefinitely. It is used when you need deterministic, bounded latency: audio processing (JACK), industrial control, low-latency trading.
  • In production: use the default (EEVDF/CFS) for web servers, databases, and general applications. Use SCHED_FIFO for latency-critical real-time components (but be careful — a runaway SCHED_FIFO task can hang the system). Use cgroups with cpu.max to limit how much CPU a group of tasks can consume, regardless of scheduler policy.
Follow-up: What is the “convoy effect” in FCFS scheduling, and how do modern schedulers avoid it?The convoy effect occurs in First-Come-First-Served scheduling when a long-running CPU-bound task gets the CPU and all shorter, I/O-bound tasks queue behind it. The I/O tasks finish their I/O quickly but wait in the run queue for the CPU-bound task to finish its time on CPU. This dramatically increases average turnaround time. CFS/EEVDF avoid this by giving I/O-bound tasks (which have low vruntime because they sleep a lot) immediate priority when they wake up — they have “earned” CPU time by not using it. Preemptive scheduling with time slices also prevents any single task from monopolizing the CPU indefinitely.
Strong Answer:
  • The terminal emulator captures the keystroke and writes the character to the pseudoterminal master (ptmx). The kernel’s tty layer passes it to the slave pty, which is bash’s stdin. Bash’s read() returns when it gets the newline.
  • Bash parses “ls”, searches $PATH for the executable, and finds /usr/bin/ls (likely cached from a previous hash). Bash calls fork() which creates a child process via clone(). The kernel creates a new task_struct, copies the page tables with COW (copy-on-write), copies the file descriptor table, and assigns a new PID.
  • In the child, bash calls execve("/usr/bin/ls", ...). The kernel loads the ELF binary: parses the ELF header, creates new VMAs (virtual memory areas) for the text, data, and BSS segments, sets up the stack with argv and envp, and points the instruction pointer to the ELF entry point (or to the dynamic linker ld-linux.so if dynamically linked).
  • The dynamic linker runs first: it loads shared libraries (libc.so, libpthread.so, etc.) using mmap(), resolves symbol references, and jumps to the main program.
  • ls calls opendir() -> getdents() syscall, which reads directory entries from the filesystem. The kernel does a path walk through the dentry cache (fast if cached) to the target directory’s inode, reads the directory data blocks, and returns the entries to user space.
  • ls calls stat() on each entry to get metadata (file type, permissions, size), formats the output, calls write() to stdout (which is the slave pty), the tty layer passes it to the master pty, and the terminal emulator renders it.
  • ls calls exit(). The kernel sends SIGCHLD to bash (the parent). Bash’s wait4() reaps the child, reads the exit status, and prints a new prompt.
Follow-up: During the fork(), what does copy-on-write actually do, and when does a real copy happen?After fork(), parent and child share the same physical pages, but the kernel marks all writable pages as read-only in both page tables. When either process writes to a page, the CPU triggers a page fault (write to read-only page). The kernel’s page fault handler recognizes this as a COW fault, allocates a new physical frame, copies the contents of the original page into the new frame, updates the faulting process’s page table to point to the new frame (now writable), and returns. The other process keeps the original page. This means if the child immediately calls execve() (which replaces the entire address space), most pages are never copied — only the stack page(s) touched during execve setup. This is why fork+exec is efficient despite appearing to copy the entire process.