Process Management

What is a Process? (From Scratch)

Imagine you want to run a program. You double-click an icon or type a command. What actually happens? The operating system creates a process - a running instance of that program. A process is a program in execution - the fundamental unit of work in an operating system. But what does that really mean?

Program vs Process: The Key Distinction

Program (Static):

A file on disk containing instructions
Just bytes stored in a file (e.g., /usr/bin/python3)
Doesn’t do anything by itself
Can be copied, deleted, read
Like a recipe in a cookbook

Process (Dynamic):

A program that has been loaded into memory and is running
Has state: current instruction, memory contents, open files
Consumes resources: CPU time, RAM, file descriptors
Changes over time as it executes
Like actually cooking the recipe - active, using ingredients, producing results

Analogy:

Program = A blueprint for a house (static document)
Process = Actually building the house (active construction, using materials, changing state)

Why Do We Need Processes?

Without processes:

Only one program could run at a time
No way to run multiple instances of same program
No isolation between programs
No way to manage resources per program

With processes:

Multiple programs run “simultaneously” (OS switches between them)
Each program has its own memory space
OS can track and limit resource usage per process
One program crash doesn’t kill others

Real-World Example: Running Multiple Programs

When you use your computer, you might have:

Web browser (process 1)
Text editor (process 2)
Music player (process 3)
Background tasks (processes 4, 5, 6…)

Each is a separate process. The OS:

Gives each process CPU time
Gives each process its own memory
Tracks which files each has open
Can kill one without affecting others

A process is a program in execution — the fundamental unit of work in an operating system. Understanding process management is essential for senior engineering interviews, as it underlies everything from application behavior to container orchestration.

Interview Frequency: Very High (asked in 80%+ of OS interviews)
Key Topics: Process states, fork/exec, context switching, PCB
Time to Master: 8-10 hours

Process vs Program: Deep Dive

Program

Static entity stored on disk
Contains code and static data
Passive — does nothing by itself
Example: /usr/bin/python3

Process

Dynamic instance in execution
Has runtime state (registers, heap, stack)
Active — consumes CPU, memory, I/O
Example: Running Python interpreter with PID 1234

Detailed Comparison

Program (The File):

Location: /usr/bin/python3
Size: 4,832,456 bytes
Type: Executable binary (ELF format on Linux)
Contents:
  - Machine code instructions
  - Static data (constants, strings)
  - Metadata (entry point, required libraries)

Process (The Running Instance):

Process ID: 1234
State: Running
Memory: 45 MB virtual, 12 MB physical
CPU Time: 2.3 seconds
Open Files: stdin, stdout, stderr, script.py
Parent: Shell (PID 567)
Children: None

The Transformation: Program → Process

Step-by-step: What happens when you run a program?

$ python3 script.py

1. Shell receives command

Shell (itself a process) parses the command
Identifies program: python3
Identifies arguments: script.py

2. Shell calls fork()

Creates a copy of itself (child process)
Child process will become the Python interpreter
Parent (shell) will wait for child to finish

3. Child calls exec()

Replaces its memory with Python interpreter program
Loads /usr/bin/python3 from disk into memory
Sets up initial state (registers, stack, heap)
Program has become a process!

4. Process starts executing

CPU begins executing Python interpreter code
Interpreter reads script.py
Interpreter executes Python bytecode
Process consumes CPU cycles, uses memory

5. Process terminates

Script finishes or error occurs
Process calls exit()
OS cleans up: frees memory, closes files, removes process table entry
Process is gone, program file remains on disk

Multiple Processes from Same Program

Key Insight: You can run the same program multiple times, creating multiple processes:

$ python3 script1.py &    # Process 1 (PID 1001)
$ python3 script2.py &    # Process 2 (PID 1002)
$ python3 script1.py &    # Process 3 (PID 1003) - same program as Process 1!

Each process:

Has different PID
Has separate memory space
Can have different data/state
Runs independently

Example: Web Server

Program: /usr/sbin/nginx (one file on disk)
Processes:
  - PID 100: Master process (manages workers)
  - PID 101: Worker process (handles requests)
  - PID 102: Worker process (handles requests)
  - PID 103: Worker process (handles requests)
  
All running the same program, but different processes with different roles!

Interview Insight: “A program becomes a process when loaded into memory and given system resources. Multiple processes can run the same program simultaneously. Each process has its own memory space, file descriptors, and execution state, even if they’re running the same program file.”

Process Lifecycle Story: From Birth to Zombie

To build intuition, follow a single process from creation to termination.

1. Birth: `fork()` + `execve()`

Consider running a web server worker:

$ nginx -g 'daemon off;'

The master process starts (PID 100).
It forks several worker processes (PIDs 101, 102, 103…).
Each worker execve()s the same nginx binary but handles its own subset of connections.

At this point, each worker has:

Its own PID and PCB.
Its own address space (code, heap, stack).
Shared open file descriptors inherited from the master (e.g., listening sockets).

2. Life: Running, Ready, and Waiting

Over its lifetime, a worker process moves between classic states:

Running: Actively executing on a CPU.
Ready (Runnable): Eligible to run but waiting in the scheduler’s queue.
Blocked/Waiting: Sleeping on I/O (e.g., read() on a socket) or waiting on a lock.

You can observe this directly:

$ ps -o pid,ppid,state,cmd | grep nginx
  100     1 S nginx: master process
  101   100 S nginx: worker process

S = sleeping (waiting on I/O or events).
Under load, you may see R (running) when workers are actively handling requests.

3. Aging: Resource Usage and Limits

As the process runs, the kernel tracks:

CPU time: utime (user) and stime (system) in the PCB.
Memory: RSS, virtual size, page faults.
Open files: Counted against per-process and system-wide limits.

Tools to explore:

$ ps -p 101 -o pid,etime,rss,pcpu,cmd
$ cat /proc/101/status
$ cat /proc/101/limits

These reflect the fields you see in the task_struct (PCB) described earlier.

4. Death: Exit and Zombie State

When a process finishes:

It calls exit() (explicitly or by returning from main).
The kernel:
- Closes file descriptors.
- Frees the address space.
- Marks the PCB as a zombie: minimal entry remains so the parent can read the exit status.

A zombie process:

Has released almost all resources (no memory, no open files).
Still occupies a PID and a small PCB entry.
Shows as Z in tools:

$ ps -o pid,ppid,state,cmd | grep Z
  4242  100 Z [my_child] <defunct>

5. Reaping: Orphans and Init

The parent must call wait() / waitpid() to reap the child (remove the zombie entry and free the PID).
If the parent dies without reaping, the child becomes an orphan and is re-parented to PID 1 (systemd or init), which periodically calls wait() to clean up.

This lifecycle story is what underlies many interview questions about zombies, orphans, and process trees.

Every process has a well-defined memory layout, typically divided into segments. In a 32-bit architecture, this totals 4GB of address space (2^32), usually split into User Space (low memory) and Kernel Space (high memory).

Memory Segment Details

Segment	Direction	Contents	Characteristics
Kernel Space	Top	Kernel code/data	Inaccessible to user mode. Contains PCB, page tables, kernel stack.
Stack	Grows Down ↓	Function calls	Stores local variables, return addresses, stack frames. Auto-managed.
Mapping Segment	N/A	Shared libs	Memory mapped files, shared libraries (e.g., libc.so).
Heap	Grows Up ↑	Dynamic allocation	`malloc()`/`new`. Manually managed. Fragmentation risk.
BSS	Fixed	Uninitialized globals	”Block Started by Symbol”. Initialized to zero by OS loader.
Data	Fixed	Initialized globals	`int x = 10;`. Read-write static data.
Text (Code)	Fixed	Machine code	Read-only to prevent accidental modification. Sharable.

Stack vs Heap Collision: In legacy systems without ASLR or ample virtual memory, the Stack (growing down) could potentially collide with the Heap (growing up), leading to Stack Overflow or memory corruption. Modern OSs use ASLR (Address Space Layout Randomization) to randomize segment locations for security.

Process Control Block (PCB)

The PCB (or task_struct in Linux) is the kernel’s data structure representing a process:

// Simplified view of Linux task_struct
struct task_struct {
    // Process Identification
    pid_t pid;                    // Process ID
    pid_t tgid;                   // Thread Group ID
    
    // Process State
    volatile long state;          // RUNNING, SLEEPING, etc.
    
    // Scheduling Information
    int prio, static_prio;        // Priority values
    struct sched_entity se;       // Scheduler entity
    
    // Memory Management
    struct mm_struct *mm;         // Memory descriptor
    
    // File System
    struct files_struct *files;   // Open file table
    struct fs_struct *fs;         // Filesystem info
    
    // Credentials
    const struct cred *cred;      // Security credentials
    
    // Parent/Child Relationships
    struct task_struct *parent;   // Parent process
    struct list_head children;    // Child processes
    
    // CPU Context (saved on context switch)
    struct thread_struct thread;  // CPU-specific state
    
    // Signals
    struct signal_struct *signal; // Signal handlers
};

PCB Information Categories

Identification
CPU State
Memory
I/O & Files

PID: Unique process identifier
PPID: Parent process ID
UID/GID: User and group ownership
Session ID: For terminal sessions

Process States

A process transitions through various states during its lifetime:

State Definitions

State	Description	Linux Representation
New	Process being created	N/A (transient)
Ready	Waiting for CPU	TASK_RUNNING (in run queue)
Running	Executing on CPU	TASK_RUNNING (current)
Blocked/Waiting	Waiting for I/O or event	TASK_INTERRUPTIBLE / TASK_UNINTERRUPTIBLE
Zombie	Terminated, waiting for parent	TASK_ZOMBIE
Terminated	Fully cleaned up	N/A (removed)

TASK_INTERRUPTIBLE vs TASK_UNINTERRUPTIBLE:

Interruptible: Process can be woken by signals (common case)
Uninterruptible: Must complete I/O first (shows as ‘D’ in ps — often disk I/O)

Process Creation: fork() and exec()

The Unix process model is elegant: fork() creates a copy, exec() transforms it.

Why fork() + exec()? The Design Philosophy

The Problem: How do you run a new program? Naive approach (doesn’t work well):

Create new process from scratch
Load program into it
Set up everything

Problems:

What if you want to redirect I/O (e.g., program > output.txt)?
What if you want to set environment variables?
What if you want to change working directory first?
Parent needs to coordinate with child

Unix Solution: fork() + exec()

fork(): Create exact copy of current process (inherits everything)
Modify the copy: Change I/O, environment, etc. (in the child)
exec(): Replace the copy’s program with new program
Parent and child can coordinate before exec()

Benefits:

Flexible: Parent can set up child environment
Simple: fork() just copies, exec() just replaces
Powerful: Can create complex process hierarchies

Understanding fork(): Creating a Process Copy

fork() — Creating a Child Process

What fork() Does:

Creates an exact copy of the current process
Both processes continue execution from the next instruction
Returns twice:
- In parent: returns child’s PID (positive number)
- In child: returns 0
- On error: returns -1

Key Point: After fork(), there are two identical processes running the same code. The only difference is the return value of fork().

Step-by-Step Example

#include <stdio.h>
#include <unistd.h>
#include <sys/wait.h>

int main() {
    int x = 100;
    
    printf("Before fork: x = %d, PID = %d\n", x, getpid());
    
    pid_t pid = fork();  // ← THE MAGIC HAPPENS HERE
    
    // At this point, we have TWO processes running!
    // Both execute the code below
    
    if (pid < 0) {
        // Error occurred (only reached if fork failed)
        perror("fork failed");
        return 1;
    } 
    else if (pid == 0) {
        // CHILD PROCESS: fork() returned 0
        // This code ONLY runs in the child
        x += 50;
        printf("Child: x = %d, PID = %d, Parent PID = %d\n", 
               x, getpid(), getppid());
        // Child exits here
    } 
    else {
        // PARENT PROCESS: fork() returned child's PID (> 0)
        // This code ONLY runs in the parent
        x -= 50;
        printf("Parent: x = %d, PID = %d, Child PID = %d\n", 
               x, getpid(), pid);
        wait(NULL);  // Wait for child to terminate
        // Parent continues...
    }
    
    return 0;
}

Execution Timeline:

Time    Parent Process (PID 1000)          Child Process (PID 1001)
─────────────────────────────────────────────────────────────────────
T0      x = 100
        printf("Before fork...")
        Output: "Before fork: x = 100, PID = 1000"
        
T1      pid = fork() ────────────────────> fork() returns 0
        fork() returns 1001 (child PID)    Process created!
                                           x = 100 (copy)
                                           
T2      pid == 1001 (true)                pid == 0 (true)
        Enters else block                 Enters else if block
        
T3      x -= 50  (x = 50)                 x += 50  (x = 150)
        printf("Parent...")                printf("Child...")
        Output: "Parent: x = 50..."        Output: "Child: x = 150..."
        
T4      wait(NULL) ──────────────────────> Process exits
        (blocks until child finishes)
        
T5      wait() returns
        return 0
        Process exits

Key Observations:

Two separate processes: Each has its own copy of x
Independent execution: Parent and child can run in any order (scheduling dependent)
Different PIDs: Parent sees child’s PID, child sees 0
Separate memory: Changes to x in one don’t affect the other

Output (order may vary):

Before fork: x = 100, PID = 1000
Parent: x = 50, PID = 1000, Child PID = 1001
Child: x = 150, PID = 1001, Parent PID = 1000

Why the output order might differ:

OS scheduler decides which process runs first
Both processes are runnable after fork()
On multi-core systems, they might run simultaneously

What fork() Actually Does: Under the Hood

Step-by-Step: What happens inside fork()? When you call fork(), the kernel performs these steps:

1. Allocate New Process ID (PID)

Kernel maintains a process table:
Before fork():
  PID 1000: [Parent process data]

After fork():
  PID 1000: [Parent process data]
  PID 1001: [Child process data] ← New entry created

2. Create Process Control Block (PCB)

Parent's PCB:              Child's PCB (copy):
- PID: 1000                - PID: 1001 (new)
- State: Running            - State: Running
- Memory map: [0x1000...]   - Memory map: [0x2000...] (different!)
- Open files: [0,1,2]      - Open files: [0,1,2] (shared initially)
- Registers: [saved]        - Registers: [copy of parent's]
- Parent: 567              - Parent: 1000 (points to parent)

3. Copy Memory (Copy-on-Write Optimization)

Traditional approach (old systems):

Copy all of parent’s memory immediately
Expensive! (if parent uses 1GB, fork takes time)

Modern approach (Copy-on-Write):

Step 1: Mark parent's pages as "copy-on-write"
  Parent memory pages: [Read-Write] → [Read-Only, COW]

Step 2: Child's page table points to same physical pages
  Parent virtual 0x1000 → Physical 0x5000
  Child virtual 0x1000  → Physical 0x5000 (same page!)

Step 3: When either process writes:
  - CPU detects write to read-only page
  - Kernel allocates new physical page
  - Kernel copies original page to new page
  - Updates page table to point to new page
  - Allows write to proceed

Why COW is fast:

fork() only copies page table entries (metadata), not actual data
Most processes fork() then immediately exec() (don’t write to shared pages)
Only pages that are actually modified get copied

4. Copy Other Resources

File Descriptors:

Parent has open files:
  fd 0: stdin
  fd 1: stdout  
  fd 2: stderr
  fd 3: /home/user/file.txt

Child gets copies:
  fd 0: stdin (same terminal)
  fd 1: stdout (same terminal)
  fd 2: stderr (same terminal)
  fd 3: /home/user/file.txt (same file, shared offset)

Signal Handlers:

Child inherits parent’s signal handlers
Child can change them independently later

Environment Variables:

Child gets copy of parent’s environment
Changes in child don’t affect parent

5. Set Up Parent-Child Relationship

Parent's PCB:
  children: [1001]  ← Points to child

Child's PCB:
  parent: 1000      ← Points to parent
  ppid: 1000         ← Parent PID

6. Add Child to Scheduler

Child process added to run queue
Both parent and child are now runnable
Scheduler will give both CPU time

7. Return to User Space

In Parent:

fork() returns child’s PID (e.g., 1001)
Parent continues execution

In Child:

fork() returns 0
Child continues execution from same point

The Return Value is the Only Difference! This is why the code can distinguish parent from child:

if (fork() == 0) {
    // Child: fork returned 0
} else {
    // Parent: fork returned child's PID
}

Copy-on-Write (COW)

Modern systems don’t actually copy all memory immediately:

Initial State

After fork(), parent and child share the same physical pages marked read-only

Write Attempt

When either process tries to write, a page fault occurs

Copy Made

Kernel copies only that specific page for the writer

Continue

Process continues with its own private copy of that page

Why COW? Many processes fork() then immediately exec(), so copying all memory would be wasted work. COW makes fork() nearly O(1) in practice.

exec() Family — Replacing Process Image

The exec family of functions replaces the current process execution with a new program. The PID remains the same, but the machine code, data, heap, and stack are replaced. What exec() Does:

Loads new program from disk into memory
Replaces current program - old code/data gone
Sets up new execution environment - new stack, heap, entry point
Preserves some things - PID, open file descriptors (unless explicitly closed), parent process
Starts executing new program - never returns (unless error)

Key Point: exec() replaces the process, it doesn’t create a new one. The process continues with a new identity.

Why exec() Doesn’t Return (Normally)

execvp("ls", args);
printf("This line is NEVER reached if exec succeeds!\n");

Why? The old program’s code is gone. It’s been replaced by the new program. There’s no code to return to! If exec() fails:

Returns -1
Original program continues
Error code in errno

#include <unistd.h>

int main() {
    pid_t pid = fork();
    
    if (pid == 0) {
        // Child: replace with ls command
        // Using execvp (Vector, Path search)
        char *args[] = {"ls", "-la", "/home", NULL};
        execvp("ls", args);
        
        // Only reached if exec fails
        perror("execvp failed");
        return 1;
    }
    
    wait(NULL);
    return 0;
}

Understanding the Variants

The exec function name tells you exactly what arguments it expects:

l (list): Arguments are passed as a list of strings (arg0, arg1, ..., NULL).
v (vector): Arguments are passed as an array of strings (argv[]).
p (path): Searches the $PATH environment variable for the executable.
e (environment): Accepts a custom environment variable array.

1. execl() & execv() — Full Path, Default Environment

Use when you have the full path to the binary.

// List version
execl("/bin/ls", "ls", "-l", NULL);

// Vector version
char *args[] = {"ls", "-l", NULL};
execv("/bin/ls", args);

2. execlp() & execvp() — Path Search

Use when you want the OS to find the binary (like running a command in shell).

// Finds 'python3' in $PATH
execlp("python3", "python3", "script.py", NULL);

3. execle() & execve() — Custom Environment

Use when you need to run a process with specific environment variables (security, isolation). execve is the underlying system call on Linux; all others are library wrappers around it.

char *env[] = {"HOME=/usr/home", "LOGNAME=tarzan", NULL};
char *args[] = {"bash", "-c", "env", NULL};
execle("/bin/bash", "bash", "-c", "env", NULL, env);

Function	Path Lookups	Args Format	Environment	Usage Scenario
`execl`	No	List	Inherited	Hardcoded args
`execlp`	Yes	List	Inherited	Shell-like commands
`execle`	No	List	Explicit	Security/Custom Env
`execv`	No	Array	Inherited	Dynamic args
`execvp`	Yes	Array	Inherited	Shell implementation
`execve`	No	Array	Explicit	Low-level Syscall

Interview Tip: execve is the only true system call on Linux. execl, execlp, etc., are standard C library (libc) wrappers that eventually call execve.

Context Switching

A context switch is the process of saving one process’s state and restoring another’s.

What Gets Saved/Restored

Context Switch Overhead

Context switches are expensive! A simple switch might take 1-10 microseconds, but the indirect costs can degrade performance by orders of magnitude.

Register Save/Restore (0.1-0.5 μs)

When the kernel switches from Process A to Process B, it must save Process A’s CPU state and load Process B’s CPU state.

What Gets Saved

struct thread_struct {
    unsigned long sp;        // Stack pointer
    unsigned long ip;        // Instruction pointer (where we'll resume)
    unsigned long r0-r15;    // General purpose registers (x86-64 has 16)
    unsigned long flags;     // CPU flags (zero, carry, etc.)
    struct fpu_state fpu;    // Floating point registers (can be huge!)
}

On x86-64, that’s typically 30-40 registers worth of data. The kernel literally does:

; Save current process registers
mov [task_A + offset_r0], rax
mov [task_A + offset_r1], rbx
; ... repeat for all registers

; Restore next process registers  
mov rax, [task_B + offset_r0]
mov rbx, [task_B + offset_r1]
; ... repeat for all registers

Why it matters: These are just memory operations, but you’re moving 200-300 bytes. Fast, but not free.

TLB Flush (0.5-2 μs) - The Expensive One

The Translation Lookaside Buffer is a cache of virtual→physical address mappings. Each process has its own address space, so when you switch processes, these mappings become invalid.

The Problem

Process A: virtual address 0x1000 → physical RAM 0x5000
Process B: virtual address 0x1000 → physical RAM 0x8000

Same virtual address, different physical location! The TLB can’t be trusted.

Traditional Solution: Full Flush

// Invalidate ALL TLB entries
flush_tlb_all();  

Now every memory access after the switch will be slow until the TLB repopulates:

1st access: TLB miss → walk page tables (50-200 cycles)
2nd access: TLB miss → walk page tables again
3rd access: TLB miss → walk page tables again
...eventually TLB fills up and things get fast again

This is why the table says “0.5-2 μs” - you’re looking at hundreds of slow memory accesses.

Modern Solution: ASID (Address Space Identifiers)

Instead of flushing, tag each TLB entry with which process it belongs to:

TLB Entry:
  Virtual: 0x1000
  Physical: 0x5000
  ASID: 42  ← Process A's identifier

TLB Entry:
  Virtual: 0x1000
  Physical: 0x8000
  ASID: 57  ← Process B's identifier

Now both mappings coexist! When Process B runs, the CPU only uses entries tagged ASID=57. No flush needed, massive speedup.

Cache Effects (10-100+ μs) - The Silent Killer

This is about your L1/L2/L3 CPU caches going cold.

Before Context Switch (Process A running)

L1 Cache (32 KB): Full of Process A's hot data
L2 Cache (256 KB): More of Process A's working set
L3 Cache (8 MB): Even more Process A data

Every memory access hits L1 cache → 3-4 cycles latency.

After Context Switch (Process B starts)

L1 Cache: Still has Process A's data (useless!)

Process B accesses memory:

Access: 0x2000 → L1 miss (Process A's data here)
             → L2 miss (Process A's data here too)
             → L3 miss (yep, still Process A)
             → Main RAM: 200+ cycles latency

Process B gradually evicts Process A’s data from cache, replacing it with its own. But for those first microseconds (or milliseconds for big working sets), everything is slow.

Real Numbers

Cache hit: 3-4 cycles (~1 ns)
Cache miss to RAM: 200-300 cycles (~100 ns)

If your code does 1000 memory accesses and they all miss cache, you just burned 100 μs instead of 1 μs.

Scheduler Decision (0.1-1 μs)

The kernel must pick which process runs next. This involves:

// Simplified version of what Linux does
struct task_struct *pick_next_task(struct rq *runqueue) {
    // 1. Check priority queues
    for (int prio = 0; prio < 140; prio++) {
        if (!list_empty(&runqueue->tasks[prio])) {
            return list_first_entry(&runqueue->tasks[prio]);
        }
    }
    
    // 2. Check CFS (Completely Fair Scheduler) red-black tree
    struct task_struct *next = rb_first(&runqueue->cfs_tasks);
    
    // 3. Update statistics, handle real-time constraints
    update_curr(runqueue);
    
    return next;
}

Why it costs time: Walking data structures, comparing priorities, updating runtime statistics. On a system with 100+ runnable processes, this isn’t instant.

Mitigation Strategies Explained

1. CPU Pinning - Cache Locality

# Pin process to CPU 0
taskset -c 0 ./my_app

Why it helps: If your process always runs on CPU 0, that CPU’s cache stays warm with your data. No cold cache penalty on every switch. Trade-off: Less flexible load balancing.

2. Larger Time Slices - Amortize the Cost

Small slice (10ms):  1000 context switches/sec
Large slice (100ms):  100 context switches/sec

If each switch costs 20 μs total overhead:

Small: 1000 × 20 μs = 20 ms wasted/sec (2% overhead)
Large: 100 × 20 μs = 2 ms wasted/sec (0.2% overhead)

Trade-off: Higher latency for interactive tasks. Your mouse might feel sluggish.

3. User-Space Threading (Green Threads)

Languages like Go use goroutines that switch without kernel involvement:

// These don't trigger context switches!
go task1()  
go task2()

The Go runtime multiplexes thousands of goroutines onto a few OS threads. Switching between goroutines:

No TLB flush (same process)
No cache flush (same process)
No kernel involvement (no syscall overhead)
Just save/restore a tiny bit of state

A goroutine switch might be 50-100 ns vs 2-5 μs for a full context switch.

The Big Picture

Context switches aren’t slow because of one thing—it’s death by a thousand cuts:

Register save:    0.3 μs
TLB flush:        1.5 μs  (with ASID: 0 μs!)
Scheduler logic:  0.5 μs
Cache warmup:    50.0 μs  (the real killer)
─────────────────────────
Total:          ~52 μs per switch

At 1000 switches/second, you’re burning 5% of your CPU just on context switching overhead. This is why high-performance systems obsess over reducing context switches.

Zombie and Orphan Processes

Zombie Process

A zombie is a terminated process whose parent hasn’t yet called wait():

#include <stdio.h>
#include <unistd.h>

int main() {
    pid_t pid = fork();
    
    if (pid == 0) {
        // Child exits immediately
        printf("Child exiting\n");
        return 0;
    }
    
    // Parent doesn't call wait() - child becomes zombie
    printf("Parent sleeping... child is now a zombie\n");
    sleep(60);  // During this time, child is zombie
    
    return 0;
}

Check with ps:

$ ps aux | grep Z
user  1001  0.0  0.0  0  0  ?  Z  12:00  0:00 [a.out] <defunct>

Problem: Zombies consume PID entries. A system can run out of PIDs if too many zombies accumulate.Solution: Parent must call wait() or waitpid(), or use SIGCHLD handler.

Orphan Process

An orphan is a child whose parent terminated first:

#include <stdio.h>
#include <unistd.h>

int main() {
    pid_t pid = fork();
    
    if (pid > 0) {
        // Parent exits immediately
        printf("Parent exiting, child will be orphaned\n");
        return 0;
    }
    
    // Child continues running
    sleep(5);
    printf("Orphan child: my new parent is %d\n", getppid());
    
    return 0;
}

Output:

Parent exiting, child will be orphaned
Orphan child: my new parent is 1

Orphans are “adopted” by init (PID 1) or a subreaper process, which will properly reap them when they terminate.

Fork Variants

vfork()

A vfork() is optimized for the fork-then-exec pattern:

pid_t pid = vfork();

if (pid == 0) {
    // Child: MUST call exec() or _exit() immediately
    // Parent is SUSPENDED until child does so
    execl("/bin/ls", "ls", NULL);
    _exit(1);  // Not exit() — avoid flushing parent's buffers
}

Aspect	fork()	vfork()
Address space	Copied (COW)	Shared with parent
Parent execution	Continues	Suspended until exec/_exit
Safety	Safe for any use	Dangerous — child can corrupt parent
Use case	General	fork + immediate exec

clone() — Linux’s Swiss Army Knife

The clone() system call provides fine-grained control over resource sharing:

#include <sched.h>

// Create new thread (shares everything)
clone(fn, stack, CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND, arg);

// Create new process (like fork)
clone(fn, stack, SIGCHLD, arg);

// Create process with new namespace (containers)
clone(fn, stack, CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWNET, arg);

Common Clone Flags:

Flag	Effect
`CLONE_VM`	Share virtual memory
`CLONE_FS`	Share filesystem info (cwd, root)
`CLONE_FILES`	Share file descriptor table
`CLONE_SIGHAND`	Share signal handlers
`CLONE_THREAD`	Same thread group (for pthreads)
`CLONE_NEWPID`	New PID namespace (containers)
`CLONE_NEWNS`	New mount namespace

PCB Management: How the Kernel Tracks Processes

The kernel doesn’t just store a task_struct for every process; it must be able to find, create, and destroy them efficiently. This is done through several kernel data structures.

1. The Process Table (The Global Registry)

In early operating systems, the process table was a fixed-size array. If the array had 64 slots, you could only run 64 processes. Modern kernels like Linux use a more dynamic approach:

Circular Doubly Linked List: All task_struct objects are linked together. This allows the kernel to iterate through every process in the system (e.g., for the ps command).
PID Hash Table: Iterating through a list to find a specific PID would be slow ( $O(N)$ ). Instead, the kernel maintains a hash table that maps a PID to a pointer to its task_struct, allowing for $O(1)$ lookups.

2. The PID Allocator

When you call fork(), the kernel needs to give the new process a unique ID.

PID Namespace: Each container (like Docker) can have its own PID 1, but globally they have different PIDs.
Bitmap Management: The kernel often uses a bitmap where each bit represents a PID. To find a free PID, it looks for the first 0 bit.
PID Wrap-around: When PIDs reach the maximum value (e.g., 32768 by default on Linux), the kernel wraps around and starts looking for unused low numbers.

Detailed Process State Transitions

A process is almost never just “Running” or “Ready.” It spends most of its time in complex waiting states.

The Lifecycle of a Request

Ready → Running: The Scheduler picks the process. The CPU context is loaded.
Running → Blocked (Waiting): The process makes a blocking system call (e.g., read() from a slow disk).
- The kernel moves the process from the Run Queue to a Wait Queue associated with that specific disk device.
- The process state changes to TASK_INTERRUPTIBLE.
Blocked → Ready: The disk finishes reading. The disk controller triggers a Hardware Interrupt.
- The kernel’s interrupt handler runs.
- It identifies which process was waiting for this data.
- It moves that process from the Wait Queue back to the Run Queue.
- The state changes to TASK_RUNNING (Ready).
Running → Ready (Preemption): The process has used its entire “Time Slice” (e.g., 10ms).
- The Timer Interrupt fires.
- The kernel decides this process has had enough time.
- It saves the context and puts the process at the end of the Ready queue.

Why “Uninterruptible” (TASK_UNINTERRUPTIBLE) Exists

You may have seen processes in ps with state D. These are in “Deep Sleep.”

TASK_INTERRUPTIBLE: The process can be woken up by a signal (like Ctrl+C).
TASK_UNINTERRUPTIBLE: The process cannot be woken up by any signal until the I/O finishes.
Why? Some kernel operations (like writing critical metadata to disk) are so sensitive that interrupting them halfway would leave the kernel or file system in an inconsistent state. This is why you sometimes can’t kill -9 a process that is stuck waiting for a failing network drive.

The Mechanics of a Context Switch: A Hardware Perspective

A context switch is the most critical “magic trick” an OS performs. Let’s look at what happens at the assembly level during a switch from Process A to Process B.

Step 1: Entering the Kernel

A context switch usually starts with an Interrupt (Timer) or a System Call.

The CPU saves the User Stack Pointer (RSP) and Instruction Pointer (RIP).
The CPU switches to the Kernel Stack of Process A.
The kernel’s entry code saves all general-purpose registers (RAX, RBX, etc.) onto Process A’s kernel stack.

Step 2: The Switch Call

The scheduler decides to run Process B. It calls a function (in Linux, __switch_to).

Save Floating Point State: If Process A was using the GPU or doing heavy math, the large XMM/YMM registers (AVX/SSE) must be saved. This is expensive, so kernels often use “Lazy FPU Switching.”
Switch Page Tables (CR3): The kernel writes the physical address of Process B’s Page Global Directory into the CR3 register.
- Effect: The CPU’s Memory Management Unit (MMU) now sees a completely different world. Addresses that meant “Process A’s data” now mean “Process B’s data.”
Switch Kernel Stacks: The kernel changes its internal “Current Task” pointer to Process B. It loads Process B’s saved Kernel Stack Pointer into the CPU’s RSP register.

Step 3: Returning to User Space

The kernel pops Process B’s saved registers from its kernel stack.
The kernel executes the sysret or iret instruction.
The CPU hardware restores the User RIP and User RSP from the stack.
Result: The CPU is now executing Process B’s code exactly where it left off.

Signal Management: Communication via Interruption

Signals are the “software interrupts” of the OS. They allow the kernel or other processes to notify a process of an event.

How Signals are Delivered

Each process has two bitmasks in its PCB:

Pending Mask: Which signals have arrived but haven’t been handled yet?
Blocked Mask: Which signals is the process currently ignoring?

The Delivery Flow:

Process A calls kill(PID_B, SIGTERM).
The kernel sets the SIGTERM bit in Process B’s Pending Mask.
The kernel checks if B is currently running. If not, it marks B as “Ready” so it can wake up and handle the signal.
When Process B is about to return from the kernel to user mode (after its next time slice or syscall), the kernel checks the Pending mask.
If a signal is pending and not blocked, the kernel hijacks the process’s execution:
- It pushes a “Signal Frame” onto the user stack.
- It changes the Instruction Pointer (RIP) to the address of the Signal Handler function.
The user’s handler runs. When it finishes, it calls a special sigreturn syscall to tell the kernel to restore the original execution state.

Process Groups, Sessions, and Job Control

Operating systems organize processes into hierarchies for management (especially in terminal sessions).

Process Group: A collection of related processes (e.g., cat file | grep "str"). All processes in a pipeline share a Process Group ID (PGID). This allows you to send a signal (like SIGINT via Ctrl+C) to the entire group at once.
Session: A collection of process groups. Usually, one terminal window = one session.
Foreground vs. Background: Only one process group in a session can be the “Foreground” group. It is the only one that can read from the keyboard. If a background process tries to read from the terminal, the kernel sends it a SIGTTIN signal, which suspends it.

Summary: The Cost of a Process

When you create a process, you are allocating:

Memory: A new Page Table, unique Stack, and unique Heap.
Kernel Objects: A task_struct, entries in the PID hash table, and an Open File Table.
Time: The overhead of fork() (COW management) and the ongoing cost of context switching.

This high cost is why modern high-performance applications (like web servers or databases) often use Threads or Asynchronous I/O to handle many tasks within a single process.

Interview Deep Dive Questions

Q1: Explain what happens when you type 'ls' in a shell

Complete Answer:

Shell (bash) reads input “ls” from stdin
Shell parses the command and arguments
Shell calls fork() to create child process
- COW creates lightweight copy
Child process calls execvp("ls", args)
- Kernel loads /bin/ls executable
- New code, data, heap, stack are set up
- File descriptors 0,1,2 remain (inherited)
Parent shell calls waitpid() and blocks
ls process runs, writes to stdout (fd 1)
ls calls exit(0), becomes zombie
Parent’s waitpid() returns, zombie is reaped
Shell displays next prompt

Q2: Why is fork() before exec() expensive?

Answer:Even with COW, fork() still must:

Allocate new PID and PCB
Copy page table entries (not data, but metadata)
Copy file descriptor table
Copy signal handlers and other process state
Set up memory mappings

For a process with 10GB virtual memory, copying page table entries alone can be significant.Alternatives:

vfork(): Suspends parent, shares address space
posix_spawn(): Single call that does fork+exec atomically
Clone with minimal sharing for containers

Q3: Can a zombie process be killed with kill -9?

Answer:No. A zombie is already dead — it’s not running any code. kill sends signals to running processes.A zombie exists only because:

Its exit status hasn’t been collected by parent
Its PCB entry and PID are retained for this purpose

To eliminate zombies:

Parent calls wait()/waitpid()
Kill the parent — orphaned zombies are adopted by init and reaped
Use SIGCHLD handler to auto-reap

// Auto-reap children
signal(SIGCHLD, SIG_IGN);

Q4: What's the difference between process and thread context switch?

Answer:

Aspect	Process Switch	Thread Switch
Address space	Changes	Same
Page table	Switched (TLB flush)	Not changed
CPU registers	Saved/restored	Saved/restored
Kernel overhead	Higher	Lower
Cache effects	Worse (different memory)	Better (shared data)
Typical cost	1-10 μs + cache misses	0.1-1 μs

Thread switches within the same process are much cheaper because:

No page table switch needed
Shared memory means cached data stays valid
Only thread-local state needs saving

Q5: Design a system to handle 10,000 concurrent connections

Answer:Process-per-connection (not recommended):

10,000 processes = massive memory overhead
Context switch overhead kills performance

Thread-per-connection:

Better but still problematic at 10K
Stack memory: 10K × 8MB = 80GB virtual memory
Thread switching overhead

Event-driven (epoll/io_uring):

Single thread handles many connections
Use epoll_wait() to multiplex I/O
Non-blocking I/O for all sockets

Hybrid:

Multiple worker processes (CPU count)
Each uses event loop for many connections
Examples: Nginx, Node.js cluster

// epoll example
int epfd = epoll_create1(0);
while (1) {
    int n = epoll_wait(epfd, events, MAX_EVENTS, -1);
    for (int i = 0; i < n; i++) {
        handle_event(events[i]);  // Non-blocking
    }
}

Practice Exercises

Fork Chain

Write a program that creates a chain of N processes (each child creates one grandchild). Print the process tree.

Zombie Factory

Create a program that generates zombies, then use ps to observe them. Implement proper cleanup.

Measure Context Switch

Use pipes between two processes to measure context switch time by rapidly passing a token back and forth.

Custom Shell

Implement a simple shell that can run commands, handle pipes, and manage background processes.

Hands-on Lab: Exploring Processes with fork, exec, wait

These exercises help you see the kernel internals through real system calls.

Lab 1: Basic fork/exec/wait

// lab_fork.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>

int main() {
    printf("Parent PID: %d\n", getpid());

    pid_t child = fork();
    if (child == 0) {
        // Child: replace ourselves with /bin/ls
        printf("Child PID: %d, Parent: %d\n", getpid(), getppid());
        execlp("ls", "ls", "-la", "/proc/self", NULL);
        perror("exec failed");
        exit(1);
    }

    // Parent: wait for child
    int status;
    waitpid(child, &status, 0);
    printf("Child exited with status %d\n", WEXITSTATUS(status));
    return 0;
}

Compile and run: gcc -o lab_fork lab_fork.c && ./lab_fork

Lab 2: Inspect /proc while running

Run a long-lived process:

// lab_proc.c
#include <stdio.h>
#include <unistd.h>

int main() {
    printf("PID: %d — inspect /proc/%d/* in another terminal\n", getpid(), getpid());
    pause();  // sleep forever
    return 0;
}

In another terminal:

PID=<the printed pid>
cat /proc/$PID/status      # state, memory, signals
cat /proc/$PID/maps        # memory mappings
cat /proc/$PID/fd          # open file descriptors
ls -l /proc/$PID/fd/       # see what FDs point to
cat /proc/$PID/stack       # kernel stack (may need root)

Lab 3: Observing Zombies

// lab_zombie.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main() {
    pid_t child = fork();
    if (child == 0) {
        printf("Child exiting now\n");
        exit(0);
    }
    // Parent does NOT call wait()
    printf("Parent sleeping... child is now a zombie\n");
    sleep(60);  // In another terminal: ps aux | grep Z
    return 0;
}

While sleeping, run ps aux | grep Z to see the zombie. Then kill the parent and watch the zombie disappear (reaped by init).

Lab 4: Measuring fork() cost

// lab_fork_time.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
#include <time.h>

#define ITERATIONS 1000

int main() {
    struct timespec start, end;
    clock_gettime(CLOCK_MONOTONIC, &start);

    for (int i = 0; i < ITERATIONS; i++) {
        pid_t child = fork();
        if (child == 0) _exit(0);
        waitpid(child, NULL, 0);
    }

    clock_gettime(CLOCK_MONOTONIC, &end);
    double elapsed = (end.tv_sec - start.tv_sec) + (end.tv_nsec - start.tv_nsec) / 1e9;
    printf("Avg fork+wait: %.2f µs\n", (elapsed / ITERATIONS) * 1e6);
    return 0;
}

This gives you a real measurement of fork overhead on your system.

Key Takeaways

Process = Execution Context

PCB contains everything kernel needs: state, memory, files, credentials

Fork + Exec

Unix model: copy then transform. COW makes fork cheap.

Context Switch Cost

Direct cost + cache/TLB effects. Minimize switches for performance.

Zombie/Orphan Handling

Always reap children. Orphans adopted by init.

Next: Threads & Concurrency →

Overview

Testing & Code Quality

Crash Courses

AI Engineering

Math for ML - Understanding Linear Algebra

Probability & Statistics for ML

Math for ML - Understanding Calculus

ML Mastery

Deep Learning Mastery

NestJS Mastery

Microservices Mastery

Low Level Design

OOP Concepts

SOLID Principles

Design Patterns

LLD Case Studies

System Design (HLD)

Senior Level (L5+/Staff)

HLD Case Studies

Engineering Fundamentals

DevOps & Operations

Azure Cloud Engineering

AWS Cloud

AWS Monitoring & Observability

AWS Security Services

AWS Serverless

AWS Operations

AWS Advanced

AWS Case Studies

GCP Cloud Engineering

DevOps Tools

Database Engineering

HIPAA Compliance Mastery

Operating Systems

Linux Internals

Distributed Systems

Networking Mastery

Build Your Own X

Go Lang Mastery

C Programming

Classic Research Papers

Distributed System Tools

​Process Management

​What is a Process? (From Scratch)

​Program vs Process: The Key Distinction

​Why Do We Need Processes?

​Real-World Example: Running Multiple Programs

​Process vs Program: Deep Dive

Program

Process

​Detailed Comparison

​The Transformation: Program → Process

​Multiple Processes from Same Program

​Process Lifecycle Story: From Birth to Zombie

​1. Birth: fork() + execve()

​2. Life: Running, Ready, and Waiting

​3. Aging: Resource Usage and Limits

​4. Death: Exit and Zombie State

​5. Reaping: Orphans and Init

​Memory Segment Details

​Process Control Block (PCB)

​PCB Information Categories

​Process States

​State Definitions

​Process Creation: fork() and exec()

​Why fork() + exec()? The Design Philosophy

​Understanding fork(): Creating a Process Copy

​fork() — Creating a Child Process

​Step-by-Step Example

​What fork() Actually Does: Under the Hood

​1. Allocate New Process ID (PID)

​2. Create Process Control Block (PCB)

​3. Copy Memory (Copy-on-Write Optimization)

​4. Copy Other Resources

​5. Set Up Parent-Child Relationship

​6. Add Child to Scheduler

​7. Return to User Space

​Copy-on-Write (COW)

​exec() Family — Replacing Process Image

​Why exec() Doesn’t Return (Normally)

Process Management

What is a Process? (From Scratch)

Program vs Process: The Key Distinction

Why Do We Need Processes?

Real-World Example: Running Multiple Programs

Process vs Program: Deep Dive

Detailed Comparison

The Transformation: Program → Process

Multiple Processes from Same Program

Process Lifecycle Story: From Birth to Zombie

1. Birth: `fork()` + `execve()`

2. Life: Running, Ready, and Waiting

3. Aging: Resource Usage and Limits

4. Death: Exit and Zombie State

5. Reaping: Orphans and Init

Memory Segment Details

Process Control Block (PCB)

PCB Information Categories

Process States

State Definitions

Process Creation: fork() and exec()

Why fork() + exec()? The Design Philosophy

Understanding fork(): Creating a Process Copy

fork() — Creating a Child Process

Step-by-Step Example

What fork() Actually Does: Under the Hood

1. Allocate New Process ID (PID)

2. Create Process Control Block (PCB)

3. Copy Memory (Copy-on-Write Optimization)

4. Copy Other Resources

5. Set Up Parent-Child Relationship

6. Add Child to Scheduler

7. Return to User Space

Copy-on-Write (COW)

exec() Family — Replacing Process Image

Why exec() Doesn’t Return (Normally)

Understanding the Variants

1. execl() & execv() — Full Path, Default Environment

2. execlp() & execvp() — Path Search

3. execle() & execve() — Custom Environment

Context Switching

What Gets Saved/Restored

Context Switch Overhead

Context Switch Overhead

Register Save/Restore (0.1-0.5 μs)

What Gets Saved

TLB Flush (0.5-2 μs) - The Expensive One

The Problem

Traditional Solution: Full Flush

Modern Solution: ASID (Address Space Identifiers)

Cache Effects (10-100+ μs) - The Silent Killer

Before Context Switch (Process A running)

After Context Switch (Process B starts)

Real Numbers

Scheduler Decision (0.1-1 μs)

Mitigation Strategies Explained

1. CPU Pinning - Cache Locality

2. Larger Time Slices - Amortize the Cost

3. User-Space Threading (Green Threads)

The Big Picture

Zombie and Orphan Processes

Zombie Process

Orphan Process

Fork Variants

vfork()

clone() — Linux’s Swiss Army Knife

PCB Management: How the Kernel Tracks Processes

1. The Process Table (The Global Registry)

2. The PID Allocator

Detailed Process State Transitions

The Lifecycle of a Request

Why “Uninterruptible” (TASK_UNINTERRUPTIBLE) Exists

The Mechanics of a Context Switch: A Hardware Perspective

Step 1: Entering the Kernel

Step 2: The Switch Call

Step 3: Returning to User Space

Signal Management: Communication via Interruption

How Signals are Delivered

Process Groups, Sessions, and Job Control

Summary: The Cost of a Process