Skip to main content

Process Management

What is a Process? (From Scratch)

Imagine you want to run a program. You double-click an icon or type a command. What actually happens? The operating system creates a process - a running instance of that program. A process is a program in execution - the fundamental unit of work in an operating system. But what does that really mean?

Program vs Process: The Key Distinction

Program (Static):
  • A file on disk containing instructions
  • Just bytes stored in a file (e.g., /usr/bin/python3)
  • Doesn’t do anything by itself
  • Can be copied, deleted, read
  • Like a recipe in a cookbook
Process (Dynamic):
  • A program that has been loaded into memory and is running
  • Has state: current instruction, memory contents, open files
  • Consumes resources: CPU time, RAM, file descriptors
  • Changes over time as it executes
  • Like actually cooking the recipe - active, using ingredients, producing results
Analogy:
  • Program = A blueprint for a house (static document)
  • Process = Actually building the house (active construction, using materials, changing state)

Why Do We Need Processes?

Without processes:
  • Only one program could run at a time
  • No way to run multiple instances of same program
  • No isolation between programs
  • No way to manage resources per program
With processes:
  • Multiple programs run “simultaneously” (OS switches between them)
  • Each program has its own memory space
  • OS can track and limit resource usage per process
  • One program crash doesn’t kill others

Real-World Example: Running Multiple Programs

When you use your computer, you might have:
  • Web browser (process 1)
  • Text editor (process 2)
  • Music player (process 3)
  • Background tasks (processes 4, 5, 6…)
Each is a separate process. The OS:
  • Gives each process CPU time
  • Gives each process its own memory
  • Tracks which files each has open
  • Can kill one without affecting others

A process is a program in execution — the fundamental unit of work in an operating system. Understanding process management is essential for senior engineering interviews, as it underlies everything from application behavior to container orchestration.
Interview Frequency: Very High (asked in 80%+ of OS interviews)
Key Topics: Process states, fork/exec, context switching, PCB
Time to Master: 8-10 hours

Process vs Program: Deep Dive

Program

  • Static entity stored on disk
  • Contains code and static data
  • Passive — does nothing by itself
  • Example: /usr/bin/python3

Process

  • Dynamic instance in execution
  • Has runtime state (registers, heap, stack)
  • Active — consumes CPU, memory, I/O
  • Example: Running Python interpreter with PID 1234

Detailed Comparison

Program (The File):
Location: /usr/bin/python3
Size: 4,832,456 bytes
Type: Executable binary (ELF format on Linux)
Contents:
  - Machine code instructions
  - Static data (constants, strings)
  - Metadata (entry point, required libraries)
Process (The Running Instance):
Process ID: 1234
State: Running
Memory: 45 MB virtual, 12 MB physical
CPU Time: 2.3 seconds
Open Files: stdin, stdout, stderr, script.py
Parent: Shell (PID 567)
Children: None

The Transformation: Program → Process

Step-by-step: What happens when you run a program?
$ python3 script.py
1. Shell receives command
  • Shell (itself a process) parses the command
  • Identifies program: python3
  • Identifies arguments: script.py
2. Shell calls fork()
  • Creates a copy of itself (child process)
  • Child process will become the Python interpreter
  • Parent (shell) will wait for child to finish
3. Child calls exec()
  • Replaces its memory with Python interpreter program
  • Loads /usr/bin/python3 from disk into memory
  • Sets up initial state (registers, stack, heap)
  • Program has become a process!
4. Process starts executing
  • CPU begins executing Python interpreter code
  • Interpreter reads script.py
  • Interpreter executes Python bytecode
  • Process consumes CPU cycles, uses memory
5. Process terminates
  • Script finishes or error occurs
  • Process calls exit()
  • OS cleans up: frees memory, closes files, removes process table entry
  • Process is gone, program file remains on disk

Multiple Processes from Same Program

Key Insight: You can run the same program multiple times, creating multiple processes:
$ python3 script1.py &    # Process 1 (PID 1001)
$ python3 script2.py &    # Process 2 (PID 1002)
$ python3 script1.py &    # Process 3 (PID 1003) - same program as Process 1!
Each process:
  • Has different PID
  • Has separate memory space
  • Can have different data/state
  • Runs independently
Example: Web Server
Program: /usr/sbin/nginx (one file on disk)
Processes:
  - PID 100: Master process (manages workers)
  - PID 101: Worker process (handles requests)
  - PID 102: Worker process (handles requests)
  - PID 103: Worker process (handles requests)
  
All running the same program, but different processes with different roles!
Interview Insight: “A program becomes a process when loaded into memory and given system resources. Multiple processes can run the same program simultaneously. Each process has its own memory space, file descriptors, and execution state, even if they’re running the same program file.”

Process Lifecycle Story: From Birth to Zombie

To build intuition, follow a single process from creation to termination.

1. Birth: fork() + execve()

Consider running a web server worker:
$ nginx -g 'daemon off;'
  1. The master process starts (PID 100).
  2. It forks several worker processes (PIDs 101, 102, 103…).
  3. Each worker execve()s the same nginx binary but handles its own subset of connections.
At this point, each worker has:
  • Its own PID and PCB.
  • Its own address space (code, heap, stack).
  • Shared open file descriptors inherited from the master (e.g., listening sockets).

2. Life: Running, Ready, and Waiting

Over its lifetime, a worker process moves between classic states:
  • Running: Actively executing on a CPU.
  • Ready (Runnable): Eligible to run but waiting in the scheduler’s queue.
  • Blocked/Waiting: Sleeping on I/O (e.g., read() on a socket) or waiting on a lock.
You can observe this directly:
$ ps -o pid,ppid,state,cmd | grep nginx
  100     1 S nginx: master process
  101   100 S nginx: worker process
  • S = sleeping (waiting on I/O or events).
  • Under load, you may see R (running) when workers are actively handling requests.

3. Aging: Resource Usage and Limits

As the process runs, the kernel tracks:
  • CPU time: utime (user) and stime (system) in the PCB.
  • Memory: RSS, virtual size, page faults.
  • Open files: Counted against per-process and system-wide limits.
Tools to explore:
$ ps -p 101 -o pid,etime,rss,pcpu,cmd
$ cat /proc/101/status
$ cat /proc/101/limits
These reflect the fields you see in the task_struct (PCB) described earlier.

4. Death: Exit and Zombie State

When a process finishes:
  1. It calls exit() (explicitly or by returning from main).
  2. The kernel:
    • Closes file descriptors.
    • Frees the address space.
    • Marks the PCB as a zombie: minimal entry remains so the parent can read the exit status.
A zombie process:
  • Has released almost all resources (no memory, no open files).
  • Still occupies a PID and a small PCB entry.
  • Shows as Z in tools:
$ ps -o pid,ppid,state,cmd | grep Z
  4242  100 Z [my_child] <defunct>

5. Reaping: Orphans and Init

  • The parent must call wait() / waitpid() to reap the child (remove the zombie entry and free the PID).
  • If the parent dies without reaping, the child becomes an orphan and is re-parented to PID 1 (systemd or init), which periodically calls wait() to clean up.
This lifecycle story is what underlies many interview questions about zombies, orphans, and process trees.
Every process has a well-defined memory layout, typically divided into segments. In a 32-bit architecture, this totals 4GB of address space (2^32), usually split into User Space (low memory) and Kernel Space (high memory). Process Memory Layout

Memory Segment Details

SegmentDirectionContentsCharacteristics
Kernel SpaceTopKernel code/dataInaccessible to user mode. Contains PCB, page tables, kernel stack.
StackGrows Down ↓Function callsStores local variables, return addresses, stack frames. Auto-managed.
Mapping SegmentN/AShared libsMemory mapped files, shared libraries (e.g., libc.so).
HeapGrows Up ↑Dynamic allocationmalloc()/new. Manually managed. Fragmentation risk.
BSSFixedUninitialized globals”Block Started by Symbol”. Initialized to zero by OS loader.
DataFixedInitialized globalsint x = 10;. Read-write static data.
Text (Code)FixedMachine codeRead-only to prevent accidental modification. Sharable.
Stack vs Heap Collision: In legacy systems without ASLR or ample virtual memory, the Stack (growing down) could potentially collide with the Heap (growing up), leading to Stack Overflow or memory corruption. Modern OSs use ASLR (Address Space Layout Randomization) to randomize segment locations for security.

Process Control Block (PCB)

The PCB (or task_struct in Linux) is the kernel’s data structure representing a process:
// Simplified view of Linux task_struct
struct task_struct {
    // Process Identification
    pid_t pid;                    // Process ID
    pid_t tgid;                   // Thread Group ID
    
    // Process State
    volatile long state;          // RUNNING, SLEEPING, etc.
    
    // Scheduling Information
    int prio, static_prio;        // Priority values
    struct sched_entity se;       // Scheduler entity
    
    // Memory Management
    struct mm_struct *mm;         // Memory descriptor
    
    // File System
    struct files_struct *files;   // Open file table
    struct fs_struct *fs;         // Filesystem info
    
    // Credentials
    const struct cred *cred;      // Security credentials
    
    // Parent/Child Relationships
    struct task_struct *parent;   // Parent process
    struct list_head children;    // Child processes
    
    // CPU Context (saved on context switch)
    struct thread_struct thread;  // CPU-specific state
    
    // Signals
    struct signal_struct *signal; // Signal handlers
};

PCB Information Categories

  • PID: Unique process identifier
  • PPID: Parent process ID
  • UID/GID: User and group ownership
  • Session ID: For terminal sessions

Process States

A process transitions through various states during its lifetime: Process State Diagram

State Definitions

StateDescriptionLinux Representation
NewProcess being createdN/A (transient)
ReadyWaiting for CPUTASK_RUNNING (in run queue)
RunningExecuting on CPUTASK_RUNNING (current)
Blocked/WaitingWaiting for I/O or eventTASK_INTERRUPTIBLE / TASK_UNINTERRUPTIBLE
ZombieTerminated, waiting for parentTASK_ZOMBIE
TerminatedFully cleaned upN/A (removed)
TASK_INTERRUPTIBLE vs TASK_UNINTERRUPTIBLE:
  • Interruptible: Process can be woken by signals (common case)
  • Uninterruptible: Must complete I/O first (shows as ‘D’ in ps — often disk I/O)

Process Creation: fork() and exec()

The Unix process model is elegant: fork() creates a copy, exec() transforms it.

Why fork() + exec()? The Design Philosophy

The Problem: How do you run a new program? Naive approach (doesn’t work well):
  • Create new process from scratch
  • Load program into it
  • Set up everything
Problems:
  • What if you want to redirect I/O (e.g., program > output.txt)?
  • What if you want to set environment variables?
  • What if you want to change working directory first?
  • Parent needs to coordinate with child
Unix Solution: fork() + exec()
  • fork(): Create exact copy of current process (inherits everything)
  • Modify the copy: Change I/O, environment, etc. (in the child)
  • exec(): Replace the copy’s program with new program
  • Parent and child can coordinate before exec()
Benefits:
  • Flexible: Parent can set up child environment
  • Simple: fork() just copies, exec() just replaces
  • Powerful: Can create complex process hierarchies

Understanding fork(): Creating a Process Copy

fork() — Creating a Child Process

What fork() Does:
  1. Creates an exact copy of the current process
  2. Both processes continue execution from the next instruction
  3. Returns twice:
    • In parent: returns child’s PID (positive number)
    • In child: returns 0
    • On error: returns -1
Key Point: After fork(), there are two identical processes running the same code. The only difference is the return value of fork().

Step-by-Step Example

#include <stdio.h>
#include <unistd.h>
#include <sys/wait.h>

int main() {
    int x = 100;
    
    printf("Before fork: x = %d, PID = %d\n", x, getpid());
    
    pid_t pid = fork();  // ← THE MAGIC HAPPENS HERE
    
    // At this point, we have TWO processes running!
    // Both execute the code below
    
    if (pid < 0) {
        // Error occurred (only reached if fork failed)
        perror("fork failed");
        return 1;
    } 
    else if (pid == 0) {
        // CHILD PROCESS: fork() returned 0
        // This code ONLY runs in the child
        x += 50;
        printf("Child: x = %d, PID = %d, Parent PID = %d\n", 
               x, getpid(), getppid());
        // Child exits here
    } 
    else {
        // PARENT PROCESS: fork() returned child's PID (> 0)
        // This code ONLY runs in the parent
        x -= 50;
        printf("Parent: x = %d, PID = %d, Child PID = %d\n", 
               x, getpid(), pid);
        wait(NULL);  // Wait for child to terminate
        // Parent continues...
    }
    
    return 0;
}
Execution Timeline:
Time    Parent Process (PID 1000)          Child Process (PID 1001)
─────────────────────────────────────────────────────────────────────
T0      x = 100
        printf("Before fork...")
        Output: "Before fork: x = 100, PID = 1000"
        
T1      pid = fork() ────────────────────> fork() returns 0
        fork() returns 1001 (child PID)    Process created!
                                           x = 100 (copy)
                                           
T2      pid == 1001 (true)                pid == 0 (true)
        Enters else block                 Enters else if block
        
T3      x -= 50  (x = 50)                 x += 50  (x = 150)
        printf("Parent...")                printf("Child...")
        Output: "Parent: x = 50..."        Output: "Child: x = 150..."
        
T4      wait(NULL) ──────────────────────> Process exits
        (blocks until child finishes)
        
T5      wait() returns
        return 0
        Process exits
Key Observations:
  1. Two separate processes: Each has its own copy of x
  2. Independent execution: Parent and child can run in any order (scheduling dependent)
  3. Different PIDs: Parent sees child’s PID, child sees 0
  4. Separate memory: Changes to x in one don’t affect the other
Output (order may vary):
Before fork: x = 100, PID = 1000
Parent: x = 50, PID = 1000, Child PID = 1001
Child: x = 150, PID = 1001, Parent PID = 1000
Why the output order might differ:
  • OS scheduler decides which process runs first
  • Both processes are runnable after fork()
  • On multi-core systems, they might run simultaneously

What fork() Actually Does: Under the Hood

Step-by-Step: What happens inside fork()? When you call fork(), the kernel performs these steps:

1. Allocate New Process ID (PID)

Kernel maintains a process table:
Before fork():
  PID 1000: [Parent process data]

After fork():
  PID 1000: [Parent process data]
  PID 1001: [Child process data] ← New entry created

2. Create Process Control Block (PCB)

Parent's PCB:              Child's PCB (copy):
- PID: 1000                - PID: 1001 (new)
- State: Running            - State: Running
- Memory map: [0x1000...]   - Memory map: [0x2000...] (different!)
- Open files: [0,1,2]      - Open files: [0,1,2] (shared initially)
- Registers: [saved]        - Registers: [copy of parent's]
- Parent: 567              - Parent: 1000 (points to parent)

3. Copy Memory (Copy-on-Write Optimization)

Traditional approach (old systems):
  • Copy all of parent’s memory immediately
  • Expensive! (if parent uses 1GB, fork takes time)
Modern approach (Copy-on-Write):
Step 1: Mark parent's pages as "copy-on-write"
  Parent memory pages: [Read-Write] → [Read-Only, COW]

Step 2: Child's page table points to same physical pages
  Parent virtual 0x1000 → Physical 0x5000
  Child virtual 0x1000  → Physical 0x5000 (same page!)

Step 3: When either process writes:
  - CPU detects write to read-only page
  - Kernel allocates new physical page
  - Kernel copies original page to new page
  - Updates page table to point to new page
  - Allows write to proceed
Why COW is fast:
  • fork() only copies page table entries (metadata), not actual data
  • Most processes fork() then immediately exec() (don’t write to shared pages)
  • Only pages that are actually modified get copied

4. Copy Other Resources

File Descriptors:
Parent has open files:
  fd 0: stdin
  fd 1: stdout  
  fd 2: stderr
  fd 3: /home/user/file.txt

Child gets copies:
  fd 0: stdin (same terminal)
  fd 1: stdout (same terminal)
  fd 2: stderr (same terminal)
  fd 3: /home/user/file.txt (same file, shared offset)
Signal Handlers:
  • Child inherits parent’s signal handlers
  • Child can change them independently later
Environment Variables:
  • Child gets copy of parent’s environment
  • Changes in child don’t affect parent

5. Set Up Parent-Child Relationship

Parent's PCB:
  children: [1001]  ← Points to child

Child's PCB:
  parent: 1000      ← Points to parent
  ppid: 1000         ← Parent PID

6. Add Child to Scheduler

  • Child process added to run queue
  • Both parent and child are now runnable
  • Scheduler will give both CPU time

7. Return to User Space

In Parent:
  • fork() returns child’s PID (e.g., 1001)
  • Parent continues execution
In Child:
  • fork() returns 0
  • Child continues execution from same point
The Return Value is the Only Difference! This is why the code can distinguish parent from child:
if (fork() == 0) {
    // Child: fork returned 0
} else {
    // Parent: fork returned child's PID
}
Fork Exec Flow

Copy-on-Write (COW)

Modern systems don’t actually copy all memory immediately:
1

Initial State

After fork(), parent and child share the same physical pages marked read-only
2

Write Attempt

When either process tries to write, a page fault occurs
3

Copy Made

Kernel copies only that specific page for the writer
4

Continue

Process continues with its own private copy of that page
Why COW? Many processes fork() then immediately exec(), so copying all memory would be wasted work. COW makes fork() nearly O(1) in practice.

exec() Family — Replacing Process Image

The exec family of functions replaces the current process execution with a new program. The PID remains the same, but the machine code, data, heap, and stack are replaced. What exec() Does:
  1. Loads new program from disk into memory
  2. Replaces current program - old code/data gone
  3. Sets up new execution environment - new stack, heap, entry point
  4. Preserves some things - PID, open file descriptors (unless explicitly closed), parent process
  5. Starts executing new program - never returns (unless error)
Key Point: exec() replaces the process, it doesn’t create a new one. The process continues with a new identity.

Why exec() Doesn’t Return (Normally)

execvp("ls", args);
printf("This line is NEVER reached if exec succeeds!\n");
Why? The old program’s code is gone. It’s been replaced by the new program. There’s no code to return to! If exec() fails:
  • Returns -1
  • Original program continues
  • Error code in errno
#include <unistd.h>

int main() {
    pid_t pid = fork();
    
    if (pid == 0) {
        // Child: replace with ls command
        // Using execvp (Vector, Path search)
        char *args[] = {"ls", "-la", "/home", NULL};
        execvp("ls", args);
        
        // Only reached if exec fails
        perror("execvp failed");
        return 1;
    }
    
    wait(NULL);
    return 0;
}

Understanding the Variants

The exec function name tells you exactly what arguments it expects:
  • l (list): Arguments are passed as a list of strings (arg0, arg1, ..., NULL).
  • v (vector): Arguments are passed as an array of strings (argv[]).
  • p (path): Searches the $PATH environment variable for the executable.
  • e (environment): Accepts a custom environment variable array.

1. execl() & execv() — Full Path, Default Environment

Use when you have the full path to the binary.
// List version
execl("/bin/ls", "ls", "-l", NULL);

// Vector version
char *args[] = {"ls", "-l", NULL};
execv("/bin/ls", args);
Use when you want the OS to find the binary (like running a command in shell).
// Finds 'python3' in $PATH
execlp("python3", "python3", "script.py", NULL);

3. execle() & execve() — Custom Environment

Use when you need to run a process with specific environment variables (security, isolation). execve is the underlying system call on Linux; all others are library wrappers around it.
char *env[] = {"HOME=/usr/home", "LOGNAME=tarzan", NULL};
char *args[] = {"bash", "-c", "env", NULL};
execle("/bin/bash", "bash", "-c", "env", NULL, env);
FunctionPath LookupsArgs FormatEnvironmentUsage Scenario
execlNoListInheritedHardcoded args
execlpYesListInheritedShell-like commands
execleNoListExplicitSecurity/Custom Env
execvNoArrayInheritedDynamic args
execvpYesArrayInheritedShell implementation
execveNoArrayExplicitLow-level Syscall
Interview Tip: execve is the only true system call on Linux. execl, execlp, etc., are standard C library (libc) wrappers that eventually call execve.

Context Switching

A context switch is the process of saving one process’s state and restoring another’s.

What Gets Saved/Restored

Context Switch

Context Switch Overhead

Context Switch Overhead

Context switches are expensive! A simple switch might take 1-10 microseconds, but the indirect costs can degrade performance by orders of magnitude.

Register Save/Restore (0.1-0.5 μs)

When the kernel switches from Process A to Process B, it must save Process A’s CPU state and load Process B’s CPU state.

What Gets Saved

struct thread_struct {
    unsigned long sp;        // Stack pointer
    unsigned long ip;        // Instruction pointer (where we'll resume)
    unsigned long r0-r15;    // General purpose registers (x86-64 has 16)
    unsigned long flags;     // CPU flags (zero, carry, etc.)
    struct fpu_state fpu;    // Floating point registers (can be huge!)
}
On x86-64, that’s typically 30-40 registers worth of data. The kernel literally does:
; Save current process registers
mov [task_A + offset_r0], rax
mov [task_A + offset_r1], rbx
; ... repeat for all registers

; Restore next process registers  
mov rax, [task_B + offset_r0]
mov rbx, [task_B + offset_r1]
; ... repeat for all registers
Why it matters: These are just memory operations, but you’re moving 200-300 bytes. Fast, but not free.

TLB Flush (0.5-2 μs) - The Expensive One

The Translation Lookaside Buffer is a cache of virtual→physical address mappings. Each process has its own address space, so when you switch processes, these mappings become invalid.

The Problem

Process A: virtual address 0x1000 → physical RAM 0x5000
Process B: virtual address 0x1000 → physical RAM 0x8000
Same virtual address, different physical location! The TLB can’t be trusted.

Traditional Solution: Full Flush

// Invalidate ALL TLB entries
flush_tlb_all();  
Now every memory access after the switch will be slow until the TLB repopulates:
1st access: TLB miss → walk page tables (50-200 cycles)
2nd access: TLB miss → walk page tables again
3rd access: TLB miss → walk page tables again
...eventually TLB fills up and things get fast again
This is why the table says “0.5-2 μs” - you’re looking at hundreds of slow memory accesses.

Modern Solution: ASID (Address Space Identifiers)

Instead of flushing, tag each TLB entry with which process it belongs to:
TLB Entry:
  Virtual: 0x1000
  Physical: 0x5000
  ASID: 42  ← Process A's identifier

TLB Entry:
  Virtual: 0x1000
  Physical: 0x8000
  ASID: 57  ← Process B's identifier
Now both mappings coexist! When Process B runs, the CPU only uses entries tagged ASID=57. No flush needed, massive speedup.

Cache Effects (10-100+ μs) - The Silent Killer

This is about your L1/L2/L3 CPU caches going cold.

Before Context Switch (Process A running)

L1 Cache (32 KB): Full of Process A's hot data
L2 Cache (256 KB): More of Process A's working set
L3 Cache (8 MB): Even more Process A data
Every memory access hits L1 cache → 3-4 cycles latency.

After Context Switch (Process B starts)

L1 Cache: Still has Process A's data (useless!)
Process B accesses memory:
Access: 0x2000 → L1 miss (Process A's data here)
             → L2 miss (Process A's data here too)
             → L3 miss (yep, still Process A)
             → Main RAM: 200+ cycles latency
Process B gradually evicts Process A’s data from cache, replacing it with its own. But for those first microseconds (or milliseconds for big working sets), everything is slow.

Real Numbers

  • Cache hit: 3-4 cycles (~1 ns)
  • Cache miss to RAM: 200-300 cycles (~100 ns)
If your code does 1000 memory accesses and they all miss cache, you just burned 100 μs instead of 1 μs.

Scheduler Decision (0.1-1 μs)

The kernel must pick which process runs next. This involves:
// Simplified version of what Linux does
struct task_struct *pick_next_task(struct rq *runqueue) {
    // 1. Check priority queues
    for (int prio = 0; prio < 140; prio++) {
        if (!list_empty(&runqueue->tasks[prio])) {
            return list_first_entry(&runqueue->tasks[prio]);
        }
    }
    
    // 2. Check CFS (Completely Fair Scheduler) red-black tree
    struct task_struct *next = rb_first(&runqueue->cfs_tasks);
    
    // 3. Update statistics, handle real-time constraints
    update_curr(runqueue);
    
    return next;
}
Why it costs time: Walking data structures, comparing priorities, updating runtime statistics. On a system with 100+ runnable processes, this isn’t instant.

Mitigation Strategies Explained

1. CPU Pinning - Cache Locality

# Pin process to CPU 0
taskset -c 0 ./my_app
Why it helps: If your process always runs on CPU 0, that CPU’s cache stays warm with your data. No cold cache penalty on every switch. Trade-off: Less flexible load balancing.

2. Larger Time Slices - Amortize the Cost

Small slice (10ms):  1000 context switches/sec
Large slice (100ms):  100 context switches/sec
If each switch costs 20 μs total overhead:
  • Small: 1000 × 20 μs = 20 ms wasted/sec (2% overhead)
  • Large: 100 × 20 μs = 2 ms wasted/sec (0.2% overhead)
Trade-off: Higher latency for interactive tasks. Your mouse might feel sluggish.

3. User-Space Threading (Green Threads)

Languages like Go use goroutines that switch without kernel involvement:
// These don't trigger context switches!
go task1()  
go task2()
The Go runtime multiplexes thousands of goroutines onto a few OS threads. Switching between goroutines:
  • No TLB flush (same process)
  • No cache flush (same process)
  • No kernel involvement (no syscall overhead)
  • Just save/restore a tiny bit of state
A goroutine switch might be 50-100 ns vs 2-5 μs for a full context switch.

The Big Picture

Context switches aren’t slow because of one thing—it’s death by a thousand cuts:
Register save:    0.3 μs
TLB flush:        1.5 μs  (with ASID: 0 μs!)
Scheduler logic:  0.5 μs
Cache warmup:    50.0 μs  (the real killer)
─────────────────────────
Total:          ~52 μs per switch
At 1000 switches/second, you’re burning 5% of your CPU just on context switching overhead. This is why high-performance systems obsess over reducing context switches.

Zombie and Orphan Processes

Zombie Process

A zombie is a terminated process whose parent hasn’t yet called wait():
#include <stdio.h>
#include <unistd.h>

int main() {
    pid_t pid = fork();
    
    if (pid == 0) {
        // Child exits immediately
        printf("Child exiting\n");
        return 0;
    }
    
    // Parent doesn't call wait() - child becomes zombie
    printf("Parent sleeping... child is now a zombie\n");
    sleep(60);  // During this time, child is zombie
    
    return 0;
}
Check with ps:
$ ps aux | grep Z
user  1001  0.0  0.0  0  0  ?  Z  12:00  0:00 [a.out] <defunct>
Problem: Zombies consume PID entries. A system can run out of PIDs if too many zombies accumulate.Solution: Parent must call wait() or waitpid(), or use SIGCHLD handler.

Orphan Process

An orphan is a child whose parent terminated first:
#include <stdio.h>
#include <unistd.h>

int main() {
    pid_t pid = fork();
    
    if (pid > 0) {
        // Parent exits immediately
        printf("Parent exiting, child will be orphaned\n");
        return 0;
    }
    
    // Child continues running
    sleep(5);
    printf("Orphan child: my new parent is %d\n", getppid());
    
    return 0;
}
Output:
Parent exiting, child will be orphaned
Orphan child: my new parent is 1
Orphans are “adopted” by init (PID 1) or a subreaper process, which will properly reap them when they terminate.

Fork Variants

vfork()

A vfork() is optimized for the fork-then-exec pattern:
pid_t pid = vfork();

if (pid == 0) {
    // Child: MUST call exec() or _exit() immediately
    // Parent is SUSPENDED until child does so
    execl("/bin/ls", "ls", NULL);
    _exit(1);  // Not exit() — avoid flushing parent's buffers
}
Aspectfork()vfork()
Address spaceCopied (COW)Shared with parent
Parent executionContinuesSuspended until exec/_exit
SafetySafe for any useDangerous — child can corrupt parent
Use caseGeneralfork + immediate exec

clone() — Linux’s Swiss Army Knife

The clone() system call provides fine-grained control over resource sharing:
#include <sched.h>

// Create new thread (shares everything)
clone(fn, stack, CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND, arg);

// Create new process (like fork)
clone(fn, stack, SIGCHLD, arg);

// Create process with new namespace (containers)
clone(fn, stack, CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWNET, arg);
Common Clone Flags:
FlagEffect
CLONE_VMShare virtual memory
CLONE_FSShare filesystem info (cwd, root)
CLONE_FILESShare file descriptor table
CLONE_SIGHANDShare signal handlers
CLONE_THREADSame thread group (for pthreads)
CLONE_NEWPIDNew PID namespace (containers)
CLONE_NEWNSNew mount namespace

PCB Management: How the Kernel Tracks Processes

The kernel doesn’t just store a task_struct for every process; it must be able to find, create, and destroy them efficiently. This is done through several kernel data structures.

1. The Process Table (The Global Registry)

In early operating systems, the process table was a fixed-size array. If the array had 64 slots, you could only run 64 processes. Modern kernels like Linux use a more dynamic approach:
  • Circular Doubly Linked List: All task_struct objects are linked together. This allows the kernel to iterate through every process in the system (e.g., for the ps command).
  • PID Hash Table: Iterating through a list to find a specific PID would be slow (O(N)O(N)). Instead, the kernel maintains a hash table that maps a PID to a pointer to its task_struct, allowing for O(1)O(1) lookups.

2. The PID Allocator

When you call fork(), the kernel needs to give the new process a unique ID.
  • PID Namespace: Each container (like Docker) can have its own PID 1, but globally they have different PIDs.
  • Bitmap Management: The kernel often uses a bitmap where each bit represents a PID. To find a free PID, it looks for the first 0 bit.
  • PID Wrap-around: When PIDs reach the maximum value (e.g., 32768 by default on Linux), the kernel wraps around and starts looking for unused low numbers.

Detailed Process State Transitions

A process is almost never just “Running” or “Ready.” It spends most of its time in complex waiting states.

The Lifecycle of a Request

  1. Ready → Running: The Scheduler picks the process. The CPU context is loaded.
  2. Running → Blocked (Waiting): The process makes a blocking system call (e.g., read() from a slow disk).
    • The kernel moves the process from the Run Queue to a Wait Queue associated with that specific disk device.
    • The process state changes to TASK_INTERRUPTIBLE.
  3. Blocked → Ready: The disk finishes reading. The disk controller triggers a Hardware Interrupt.
    • The kernel’s interrupt handler runs.
    • It identifies which process was waiting for this data.
    • It moves that process from the Wait Queue back to the Run Queue.
    • The state changes to TASK_RUNNING (Ready).
  4. Running → Ready (Preemption): The process has used its entire “Time Slice” (e.g., 10ms).
    • The Timer Interrupt fires.
    • The kernel decides this process has had enough time.
    • It saves the context and puts the process at the end of the Ready queue.

Why “Uninterruptible” (TASK_UNINTERRUPTIBLE) Exists

You may have seen processes in ps with state D. These are in “Deep Sleep.”
  • TASK_INTERRUPTIBLE: The process can be woken up by a signal (like Ctrl+C).
  • TASK_UNINTERRUPTIBLE: The process cannot be woken up by any signal until the I/O finishes.
  • Why? Some kernel operations (like writing critical metadata to disk) are so sensitive that interrupting them halfway would leave the kernel or file system in an inconsistent state. This is why you sometimes can’t kill -9 a process that is stuck waiting for a failing network drive.

The Mechanics of a Context Switch: A Hardware Perspective

A context switch is the most critical “magic trick” an OS performs. Let’s look at what happens at the assembly level during a switch from Process A to Process B.

Step 1: Entering the Kernel

A context switch usually starts with an Interrupt (Timer) or a System Call.
  1. The CPU saves the User Stack Pointer (RSP) and Instruction Pointer (RIP).
  2. The CPU switches to the Kernel Stack of Process A.
  3. The kernel’s entry code saves all general-purpose registers (RAX, RBX, etc.) onto Process A’s kernel stack.

Step 2: The Switch Call

The scheduler decides to run Process B. It calls a function (in Linux, __switch_to).
  1. Save Floating Point State: If Process A was using the GPU or doing heavy math, the large XMM/YMM registers (AVX/SSE) must be saved. This is expensive, so kernels often use “Lazy FPU Switching.”
  2. Switch Page Tables (CR3): The kernel writes the physical address of Process B’s Page Global Directory into the CR3 register.
    • Effect: The CPU’s Memory Management Unit (MMU) now sees a completely different world. Addresses that meant “Process A’s data” now mean “Process B’s data.”
  3. Switch Kernel Stacks: The kernel changes its internal “Current Task” pointer to Process B. It loads Process B’s saved Kernel Stack Pointer into the CPU’s RSP register.

Step 3: Returning to User Space

  1. The kernel pops Process B’s saved registers from its kernel stack.
  2. The kernel executes the sysret or iret instruction.
  3. The CPU hardware restores the User RIP and User RSP from the stack.
  4. Result: The CPU is now executing Process B’s code exactly where it left off.

Signal Management: Communication via Interruption

Signals are the “software interrupts” of the OS. They allow the kernel or other processes to notify a process of an event.

How Signals are Delivered

Each process has two bitmasks in its PCB:
  • Pending Mask: Which signals have arrived but haven’t been handled yet?
  • Blocked Mask: Which signals is the process currently ignoring?
The Delivery Flow:
  1. Process A calls kill(PID_B, SIGTERM).
  2. The kernel sets the SIGTERM bit in Process B’s Pending Mask.
  3. The kernel checks if B is currently running. If not, it marks B as “Ready” so it can wake up and handle the signal.
  4. When Process B is about to return from the kernel to user mode (after its next time slice or syscall), the kernel checks the Pending mask.
  5. If a signal is pending and not blocked, the kernel hijacks the process’s execution:
    • It pushes a “Signal Frame” onto the user stack.
    • It changes the Instruction Pointer (RIP) to the address of the Signal Handler function.
  6. The user’s handler runs. When it finishes, it calls a special sigreturn syscall to tell the kernel to restore the original execution state.

Process Groups, Sessions, and Job Control

Operating systems organize processes into hierarchies for management (especially in terminal sessions).
  • Process Group: A collection of related processes (e.g., cat file | grep "str"). All processes in a pipeline share a Process Group ID (PGID). This allows you to send a signal (like SIGINT via Ctrl+C) to the entire group at once.
  • Session: A collection of process groups. Usually, one terminal window = one session.
  • Foreground vs. Background: Only one process group in a session can be the “Foreground” group. It is the only one that can read from the keyboard. If a background process tries to read from the terminal, the kernel sends it a SIGTTIN signal, which suspends it.

Summary: The Cost of a Process

When you create a process, you are allocating:
  1. Memory: A new Page Table, unique Stack, and unique Heap.
  2. Kernel Objects: A task_struct, entries in the PID hash table, and an Open File Table.
  3. Time: The overhead of fork() (COW management) and the ongoing cost of context switching.
This high cost is why modern high-performance applications (like web servers or databases) often use Threads or Asynchronous I/O to handle many tasks within a single process.

Interview Deep Dive Questions

Complete Answer:
  1. Shell (bash) reads input “ls” from stdin
  2. Shell parses the command and arguments
  3. Shell calls fork() to create child process
    • COW creates lightweight copy
  4. Child process calls execvp("ls", args)
    • Kernel loads /bin/ls executable
    • New code, data, heap, stack are set up
    • File descriptors 0,1,2 remain (inherited)
  5. Parent shell calls waitpid() and blocks
  6. ls process runs, writes to stdout (fd 1)
  7. ls calls exit(0), becomes zombie
  8. Parent’s waitpid() returns, zombie is reaped
  9. Shell displays next prompt
Answer:Even with COW, fork() still must:
  • Allocate new PID and PCB
  • Copy page table entries (not data, but metadata)
  • Copy file descriptor table
  • Copy signal handlers and other process state
  • Set up memory mappings
For a process with 10GB virtual memory, copying page table entries alone can be significant.Alternatives:
  • vfork(): Suspends parent, shares address space
  • posix_spawn(): Single call that does fork+exec atomically
  • Clone with minimal sharing for containers
Answer:No. A zombie is already dead — it’s not running any code. kill sends signals to running processes.A zombie exists only because:
  • Its exit status hasn’t been collected by parent
  • Its PCB entry and PID are retained for this purpose
To eliminate zombies:
  • Parent calls wait()/waitpid()
  • Kill the parent — orphaned zombies are adopted by init and reaped
  • Use SIGCHLD handler to auto-reap
// Auto-reap children
signal(SIGCHLD, SIG_IGN);
Answer:
AspectProcess SwitchThread Switch
Address spaceChangesSame
Page tableSwitched (TLB flush)Not changed
CPU registersSaved/restoredSaved/restored
Kernel overheadHigherLower
Cache effectsWorse (different memory)Better (shared data)
Typical cost1-10 μs + cache misses0.1-1 μs
Thread switches within the same process are much cheaper because:
  • No page table switch needed
  • Shared memory means cached data stays valid
  • Only thread-local state needs saving
Answer:Process-per-connection (not recommended):
  • 10,000 processes = massive memory overhead
  • Context switch overhead kills performance
Thread-per-connection:
  • Better but still problematic at 10K
  • Stack memory: 10K × 8MB = 80GB virtual memory
  • Thread switching overhead
Event-driven (epoll/io_uring):
  • Single thread handles many connections
  • Use epoll_wait() to multiplex I/O
  • Non-blocking I/O for all sockets
Hybrid:
  • Multiple worker processes (CPU count)
  • Each uses event loop for many connections
  • Examples: Nginx, Node.js cluster
// epoll example
int epfd = epoll_create1(0);
while (1) {
    int n = epoll_wait(epfd, events, MAX_EVENTS, -1);
    for (int i = 0; i < n; i++) {
        handle_event(events[i]);  // Non-blocking
    }
}

Practice Exercises

1

Fork Chain

Write a program that creates a chain of N processes (each child creates one grandchild). Print the process tree.
2

Zombie Factory

Create a program that generates zombies, then use ps to observe them. Implement proper cleanup.
3

Measure Context Switch

Use pipes between two processes to measure context switch time by rapidly passing a token back and forth.
4

Custom Shell

Implement a simple shell that can run commands, handle pipes, and manage background processes.

Hands-on Lab: Exploring Processes with fork, exec, wait

These exercises help you see the kernel internals through real system calls.

Lab 1: Basic fork/exec/wait

// lab_fork.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>

int main() {
    printf("Parent PID: %d\n", getpid());

    pid_t child = fork();
    if (child == 0) {
        // Child: replace ourselves with /bin/ls
        printf("Child PID: %d, Parent: %d\n", getpid(), getppid());
        execlp("ls", "ls", "-la", "/proc/self", NULL);
        perror("exec failed");
        exit(1);
    }

    // Parent: wait for child
    int status;
    waitpid(child, &status, 0);
    printf("Child exited with status %d\n", WEXITSTATUS(status));
    return 0;
}
Compile and run: gcc -o lab_fork lab_fork.c && ./lab_fork

Lab 2: Inspect /proc while running

Run a long-lived process:
// lab_proc.c
#include <stdio.h>
#include <unistd.h>

int main() {
    printf("PID: %d — inspect /proc/%d/* in another terminal\n", getpid(), getpid());
    pause();  // sleep forever
    return 0;
}
In another terminal:
PID=<the printed pid>
cat /proc/$PID/status      # state, memory, signals
cat /proc/$PID/maps        # memory mappings
cat /proc/$PID/fd          # open file descriptors
ls -l /proc/$PID/fd/       # see what FDs point to
cat /proc/$PID/stack       # kernel stack (may need root)

Lab 3: Observing Zombies

// lab_zombie.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main() {
    pid_t child = fork();
    if (child == 0) {
        printf("Child exiting now\n");
        exit(0);
    }
    // Parent does NOT call wait()
    printf("Parent sleeping... child is now a zombie\n");
    sleep(60);  // In another terminal: ps aux | grep Z
    return 0;
}
While sleeping, run ps aux | grep Z to see the zombie. Then kill the parent and watch the zombie disappear (reaped by init).

Lab 4: Measuring fork() cost

// lab_fork_time.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
#include <time.h>

#define ITERATIONS 1000

int main() {
    struct timespec start, end;
    clock_gettime(CLOCK_MONOTONIC, &start);

    for (int i = 0; i < ITERATIONS; i++) {
        pid_t child = fork();
        if (child == 0) _exit(0);
        waitpid(child, NULL, 0);
    }

    clock_gettime(CLOCK_MONOTONIC, &end);
    double elapsed = (end.tv_sec - start.tv_sec) + (end.tv_nsec - start.tv_nsec) / 1e9;
    printf("Avg fork+wait: %.2f µs\n", (elapsed / ITERATIONS) * 1e6);
    return 0;
}
This gives you a real measurement of fork overhead on your system.

Key Takeaways

Process = Execution Context

PCB contains everything kernel needs: state, memory, files, credentials

Fork + Exec

Unix model: copy then transform. COW makes fork cheap.

Context Switch Cost

Direct cost + cache/TLB effects. Minimize switches for performance.

Zombie/Orphan Handling

Always reap children. Orphans adopted by init.

Next: Threads & Concurrency