Process Management
What is a Process? (From Scratch)
Imagine you want to run a program. You double-click an icon or type a command. What actually happens? The operating system creates a process - a running instance of that program. A process is a program in execution - the fundamental unit of work in an operating system. But what does that really mean?Program vs Process: The Key Distinction
Program (Static):- A file on disk containing instructions
- Just bytes stored in a file (e.g.,
/usr/bin/python3) - Doesn’t do anything by itself
- Can be copied, deleted, read
- Like a recipe in a cookbook
- A program that has been loaded into memory and is running
- Has state: current instruction, memory contents, open files
- Consumes resources: CPU time, RAM, file descriptors
- Changes over time as it executes
- Like actually cooking the recipe - active, using ingredients, producing results
- Program = A blueprint for a house (static document)
- Process = Actually building the house (active construction, using materials, changing state)
Why Do We Need Processes?
Without processes:- Only one program could run at a time
- No way to run multiple instances of same program
- No isolation between programs
- No way to manage resources per program
- Multiple programs run “simultaneously” (OS switches between them)
- Each program has its own memory space
- OS can track and limit resource usage per process
- One program crash doesn’t kill others
Real-World Example: Running Multiple Programs
When you use your computer, you might have:- Web browser (process 1)
- Text editor (process 2)
- Music player (process 3)
- Background tasks (processes 4, 5, 6…)
- Gives each process CPU time
- Gives each process its own memory
- Tracks which files each has open
- Can kill one without affecting others
A process is a program in execution — the fundamental unit of work in an operating system. Understanding process management is essential for senior engineering interviews, as it underlies everything from application behavior to container orchestration.
Interview Frequency: Very High (asked in 80%+ of OS interviews)
Key Topics: Process states, fork/exec, context switching, PCB
Time to Master: 8-10 hours
Key Topics: Process states, fork/exec, context switching, PCB
Time to Master: 8-10 hours
Process vs Program: Deep Dive
Program
- Static entity stored on disk
- Contains code and static data
- Passive — does nothing by itself
- Example:
/usr/bin/python3
Process
- Dynamic instance in execution
- Has runtime state (registers, heap, stack)
- Active — consumes CPU, memory, I/O
- Example: Running Python interpreter with PID 1234
Detailed Comparison
Program (The File):The Transformation: Program → Process
Step-by-step: What happens when you run a program?- Shell (itself a process) parses the command
- Identifies program:
python3 - Identifies arguments:
script.py
fork()
- Creates a copy of itself (child process)
- Child process will become the Python interpreter
- Parent (shell) will wait for child to finish
exec()
- Replaces its memory with Python interpreter program
- Loads
/usr/bin/python3from disk into memory - Sets up initial state (registers, stack, heap)
- Program has become a process!
- CPU begins executing Python interpreter code
- Interpreter reads
script.py - Interpreter executes Python bytecode
- Process consumes CPU cycles, uses memory
- Script finishes or error occurs
- Process calls
exit() - OS cleans up: frees memory, closes files, removes process table entry
- Process is gone, program file remains on disk
Multiple Processes from Same Program
Key Insight: You can run the same program multiple times, creating multiple processes:- Has different PID
- Has separate memory space
- Can have different data/state
- Runs independently
Interview Insight: “A program becomes a process when loaded into memory and given system resources. Multiple processes can run the same program simultaneously. Each process has its own memory space, file descriptors, and execution state, even if they’re running the same program file.”
Process Lifecycle Story: From Birth to Zombie
To build intuition, follow a single process from creation to termination.1. Birth: fork() + execve()
Consider running a web server worker:
- The master process starts (PID 100).
- It forks several worker processes (PIDs 101, 102, 103…).
- Each worker
execve()s the same nginx binary but handles its own subset of connections.
- Its own PID and PCB.
- Its own address space (code, heap, stack).
- Shared open file descriptors inherited from the master (e.g., listening sockets).
2. Life: Running, Ready, and Waiting
Over its lifetime, a worker process moves between classic states:- Running: Actively executing on a CPU.
- Ready (Runnable): Eligible to run but waiting in the scheduler’s queue.
- Blocked/Waiting: Sleeping on I/O (e.g.,
read()on a socket) or waiting on a lock.
S= sleeping (waiting on I/O or events).- Under load, you may see
R(running) when workers are actively handling requests.
3. Aging: Resource Usage and Limits
As the process runs, the kernel tracks:- CPU time:
utime(user) andstime(system) in the PCB. - Memory: RSS, virtual size, page faults.
- Open files: Counted against per-process and system-wide limits.
task_struct (PCB) described earlier.
4. Death: Exit and Zombie State
When a process finishes:- It calls
exit()(explicitly or by returning frommain). - The kernel:
- Closes file descriptors.
- Frees the address space.
- Marks the PCB as a zombie: minimal entry remains so the parent can read the exit status.
- Has released almost all resources (no memory, no open files).
- Still occupies a PID and a small PCB entry.
- Shows as
Zin tools:
5. Reaping: Orphans and Init
- The parent must call
wait()/waitpid()to reap the child (remove the zombie entry and free the PID). - If the parent dies without reaping, the child becomes an orphan and is re-parented to PID 1 (
systemdorinit), which periodically callswait()to clean up.
Every process has a well-defined memory layout, typically divided into segments. In a 32-bit architecture, this totals 4GB of address space (2^32), usually split into User Space (low memory) and Kernel Space (high memory).
Memory Segment Details
| Segment | Direction | Contents | Characteristics |
|---|---|---|---|
| Kernel Space | Top | Kernel code/data | Inaccessible to user mode. Contains PCB, page tables, kernel stack. |
| Stack | Grows Down ↓ | Function calls | Stores local variables, return addresses, stack frames. Auto-managed. |
| Mapping Segment | N/A | Shared libs | Memory mapped files, shared libraries (e.g., libc.so). |
| Heap | Grows Up ↑ | Dynamic allocation | malloc()/new. Manually managed. Fragmentation risk. |
| BSS | Fixed | Uninitialized globals | ”Block Started by Symbol”. Initialized to zero by OS loader. |
| Data | Fixed | Initialized globals | int x = 10;. Read-write static data. |
| Text (Code) | Fixed | Machine code | Read-only to prevent accidental modification. Sharable. |
Process Control Block (PCB)
The PCB (or task_struct in Linux) is the kernel’s data structure representing a process:PCB Information Categories
- Identification
- CPU State
- Memory
- I/O & Files
- PID: Unique process identifier
- PPID: Parent process ID
- UID/GID: User and group ownership
- Session ID: For terminal sessions
Process States
A process transitions through various states during its lifetime:State Definitions
| State | Description | Linux Representation |
|---|---|---|
| New | Process being created | N/A (transient) |
| Ready | Waiting for CPU | TASK_RUNNING (in run queue) |
| Running | Executing on CPU | TASK_RUNNING (current) |
| Blocked/Waiting | Waiting for I/O or event | TASK_INTERRUPTIBLE / TASK_UNINTERRUPTIBLE |
| Zombie | Terminated, waiting for parent | TASK_ZOMBIE |
| Terminated | Fully cleaned up | N/A (removed) |
TASK_INTERRUPTIBLE vs TASK_UNINTERRUPTIBLE:
- Interruptible: Process can be woken by signals (common case)
- Uninterruptible: Must complete I/O first (shows as ‘D’ in
ps— often disk I/O)
Process Creation: fork() and exec()
The Unix process model is elegant: fork() creates a copy, exec() transforms it.Why fork() + exec()? The Design Philosophy
The Problem: How do you run a new program? Naive approach (doesn’t work well):- Create new process from scratch
- Load program into it
- Set up everything
- What if you want to redirect I/O (e.g.,
program > output.txt)? - What if you want to set environment variables?
- What if you want to change working directory first?
- Parent needs to coordinate with child
- fork(): Create exact copy of current process (inherits everything)
- Modify the copy: Change I/O, environment, etc. (in the child)
- exec(): Replace the copy’s program with new program
- Parent and child can coordinate before exec()
- Flexible: Parent can set up child environment
- Simple: fork() just copies, exec() just replaces
- Powerful: Can create complex process hierarchies
Understanding fork(): Creating a Process Copy
fork() — Creating a Child Process
What fork() Does:- Creates an exact copy of the current process
- Both processes continue execution from the next instruction
- Returns twice:
- In parent: returns child’s PID (positive number)
- In child: returns 0
- On error: returns -1
Step-by-Step Example
- Two separate processes: Each has its own copy of
x - Independent execution: Parent and child can run in any order (scheduling dependent)
- Different PIDs: Parent sees child’s PID, child sees 0
- Separate memory: Changes to
xin one don’t affect the other
- OS scheduler decides which process runs first
- Both processes are runnable after fork()
- On multi-core systems, they might run simultaneously
What fork() Actually Does: Under the Hood
Step-by-Step: What happens inside fork()? When you callfork(), the kernel performs these steps:
1. Allocate New Process ID (PID)
2. Create Process Control Block (PCB)
3. Copy Memory (Copy-on-Write Optimization)
Traditional approach (old systems):- Copy all of parent’s memory immediately
- Expensive! (if parent uses 1GB, fork takes time)
- fork() only copies page table entries (metadata), not actual data
- Most processes fork() then immediately exec() (don’t write to shared pages)
- Only pages that are actually modified get copied
4. Copy Other Resources
File Descriptors:- Child inherits parent’s signal handlers
- Child can change them independently later
- Child gets copy of parent’s environment
- Changes in child don’t affect parent
5. Set Up Parent-Child Relationship
6. Add Child to Scheduler
- Child process added to run queue
- Both parent and child are now runnable
- Scheduler will give both CPU time
7. Return to User Space
In Parent:- fork() returns child’s PID (e.g., 1001)
- Parent continues execution
- fork() returns 0
- Child continues execution from same point
Copy-on-Write (COW)
Modern systems don’t actually copy all memory immediately:exec() Family — Replacing Process Image
Theexec family of functions replaces the current process execution with a new program. The PID remains the same, but the machine code, data, heap, and stack are replaced.
What exec() Does:
- Loads new program from disk into memory
- Replaces current program - old code/data gone
- Sets up new execution environment - new stack, heap, entry point
- Preserves some things - PID, open file descriptors (unless explicitly closed), parent process
- Starts executing new program - never returns (unless error)
Why exec() Doesn’t Return (Normally)
- Returns -1
- Original program continues
- Error code in errno
Understanding the Variants
Theexec function name tells you exactly what arguments it expects:
- l (list): Arguments are passed as a list of strings (
arg0, arg1, ..., NULL). - v (vector): Arguments are passed as an array of strings (
argv[]). - p (path): Searches the
$PATHenvironment variable for the executable. - e (environment): Accepts a custom environment variable array.
1. execl() & execv() — Full Path, Default Environment
Use when you have the full path to the binary.2. execlp() & execvp() — Path Search
Use when you want the OS to find the binary (like running a command in shell).3. execle() & execve() — Custom Environment
Use when you need to run a process with specific environment variables (security, isolation).execve is the underlying system call on Linux; all others are library wrappers around it.
| Function | Path Lookups | Args Format | Environment | Usage Scenario |
|---|---|---|---|---|
execl | No | List | Inherited | Hardcoded args |
execlp | Yes | List | Inherited | Shell-like commands |
execle | No | List | Explicit | Security/Custom Env |
execv | No | Array | Inherited | Dynamic args |
execvp | Yes | Array | Inherited | Shell implementation |
execve | No | Array | Explicit | Low-level Syscall |
Context Switching
A context switch is the process of saving one process’s state and restoring another’s.What Gets Saved/Restored
Context Switch Overhead
Context Switch Overhead
Register Save/Restore (0.1-0.5 μs)
When the kernel switches from Process A to Process B, it must save Process A’s CPU state and load Process B’s CPU state.What Gets Saved
TLB Flush (0.5-2 μs) - The Expensive One
The Translation Lookaside Buffer is a cache of virtual→physical address mappings. Each process has its own address space, so when you switch processes, these mappings become invalid.The Problem
Traditional Solution: Full Flush
Modern Solution: ASID (Address Space Identifiers)
Instead of flushing, tag each TLB entry with which process it belongs to:ASID=57. No flush needed, massive speedup.
Cache Effects (10-100+ μs) - The Silent Killer
This is about your L1/L2/L3 CPU caches going cold.Before Context Switch (Process A running)
After Context Switch (Process B starts)
Real Numbers
- Cache hit: 3-4 cycles (~1 ns)
- Cache miss to RAM: 200-300 cycles (~100 ns)
Scheduler Decision (0.1-1 μs)
The kernel must pick which process runs next. This involves:Mitigation Strategies Explained
1. CPU Pinning - Cache Locality
2. Larger Time Slices - Amortize the Cost
- Small: 1000 × 20 μs = 20 ms wasted/sec (2% overhead)
- Large: 100 × 20 μs = 2 ms wasted/sec (0.2% overhead)
3. User-Space Threading (Green Threads)
Languages like Go use goroutines that switch without kernel involvement:- No TLB flush (same process)
- No cache flush (same process)
- No kernel involvement (no syscall overhead)
- Just save/restore a tiny bit of state
The Big Picture
Context switches aren’t slow because of one thing—it’s death by a thousand cuts:Zombie and Orphan Processes
Zombie Process
A zombie is a terminated process whose parent hasn’t yet calledwait():
ps:
Orphan Process
An orphan is a child whose parent terminated first:Orphans are “adopted” by init (PID 1) or a subreaper process, which will properly reap them when they terminate.
Fork Variants
vfork()
A vfork() is optimized for the fork-then-exec pattern:| Aspect | fork() | vfork() |
|---|---|---|
| Address space | Copied (COW) | Shared with parent |
| Parent execution | Continues | Suspended until exec/_exit |
| Safety | Safe for any use | Dangerous — child can corrupt parent |
| Use case | General | fork + immediate exec |
clone() — Linux’s Swiss Army Knife
Theclone() system call provides fine-grained control over resource sharing:
| Flag | Effect |
|---|---|
CLONE_VM | Share virtual memory |
CLONE_FS | Share filesystem info (cwd, root) |
CLONE_FILES | Share file descriptor table |
CLONE_SIGHAND | Share signal handlers |
CLONE_THREAD | Same thread group (for pthreads) |
CLONE_NEWPID | New PID namespace (containers) |
CLONE_NEWNS | New mount namespace |
PCB Management: How the Kernel Tracks Processes
The kernel doesn’t just store atask_struct for every process; it must be able to find, create, and destroy them efficiently. This is done through several kernel data structures.
1. The Process Table (The Global Registry)
In early operating systems, the process table was a fixed-size array. If the array had 64 slots, you could only run 64 processes. Modern kernels like Linux use a more dynamic approach:- Circular Doubly Linked List: All
task_structobjects are linked together. This allows the kernel to iterate through every process in the system (e.g., for thepscommand). - PID Hash Table: Iterating through a list to find a specific PID would be slow (). Instead, the kernel maintains a hash table that maps a PID to a pointer to its
task_struct, allowing for lookups.
2. The PID Allocator
When you callfork(), the kernel needs to give the new process a unique ID.
- PID Namespace: Each container (like Docker) can have its own PID 1, but globally they have different PIDs.
- Bitmap Management: The kernel often uses a bitmap where each bit represents a PID. To find a free PID, it looks for the first 0 bit.
- PID Wrap-around: When PIDs reach the maximum value (e.g., 32768 by default on Linux), the kernel wraps around and starts looking for unused low numbers.
Detailed Process State Transitions
A process is almost never just “Running” or “Ready.” It spends most of its time in complex waiting states.The Lifecycle of a Request
- Ready → Running: The Scheduler picks the process. The CPU context is loaded.
- Running → Blocked (Waiting): The process makes a blocking system call (e.g.,
read()from a slow disk).- The kernel moves the process from the Run Queue to a Wait Queue associated with that specific disk device.
- The process state changes to
TASK_INTERRUPTIBLE.
- Blocked → Ready: The disk finishes reading. The disk controller triggers a Hardware Interrupt.
- The kernel’s interrupt handler runs.
- It identifies which process was waiting for this data.
- It moves that process from the Wait Queue back to the Run Queue.
- The state changes to
TASK_RUNNING(Ready).
- Running → Ready (Preemption): The process has used its entire “Time Slice” (e.g., 10ms).
- The Timer Interrupt fires.
- The kernel decides this process has had enough time.
- It saves the context and puts the process at the end of the Ready queue.
Why “Uninterruptible” (TASK_UNINTERRUPTIBLE) Exists
You may have seen processes inps with state D. These are in “Deep Sleep.”
- TASK_INTERRUPTIBLE: The process can be woken up by a signal (like
Ctrl+C). - TASK_UNINTERRUPTIBLE: The process cannot be woken up by any signal until the I/O finishes.
- Why? Some kernel operations (like writing critical metadata to disk) are so sensitive that interrupting them halfway would leave the kernel or file system in an inconsistent state. This is why you sometimes can’t
kill -9a process that is stuck waiting for a failing network drive.
The Mechanics of a Context Switch: A Hardware Perspective
A context switch is the most critical “magic trick” an OS performs. Let’s look at what happens at the assembly level during a switch from Process A to Process B.Step 1: Entering the Kernel
A context switch usually starts with an Interrupt (Timer) or a System Call.- The CPU saves the User Stack Pointer (RSP) and Instruction Pointer (RIP).
- The CPU switches to the Kernel Stack of Process A.
- The kernel’s entry code saves all general-purpose registers (RAX, RBX, etc.) onto Process A’s kernel stack.
Step 2: The Switch Call
The scheduler decides to run Process B. It calls a function (in Linux,__switch_to).
- Save Floating Point State: If Process A was using the GPU or doing heavy math, the large XMM/YMM registers (AVX/SSE) must be saved. This is expensive, so kernels often use “Lazy FPU Switching.”
- Switch Page Tables (CR3): The kernel writes the physical address of Process B’s Page Global Directory into the
CR3register.- Effect: The CPU’s Memory Management Unit (MMU) now sees a completely different world. Addresses that meant “Process A’s data” now mean “Process B’s data.”
- Switch Kernel Stacks: The kernel changes its internal “Current Task” pointer to Process B. It loads Process B’s saved Kernel Stack Pointer into the CPU’s
RSPregister.
Step 3: Returning to User Space
- The kernel pops Process B’s saved registers from its kernel stack.
- The kernel executes the
sysretoriretinstruction. - The CPU hardware restores the User RIP and User RSP from the stack.
- Result: The CPU is now executing Process B’s code exactly where it left off.
Signal Management: Communication via Interruption
Signals are the “software interrupts” of the OS. They allow the kernel or other processes to notify a process of an event.How Signals are Delivered
Each process has two bitmasks in its PCB:- Pending Mask: Which signals have arrived but haven’t been handled yet?
- Blocked Mask: Which signals is the process currently ignoring?
- Process A calls
kill(PID_B, SIGTERM). - The kernel sets the
SIGTERMbit in Process B’s Pending Mask. - The kernel checks if B is currently running. If not, it marks B as “Ready” so it can wake up and handle the signal.
- When Process B is about to return from the kernel to user mode (after its next time slice or syscall), the kernel checks the Pending mask.
- If a signal is pending and not blocked, the kernel hijacks the process’s execution:
- It pushes a “Signal Frame” onto the user stack.
- It changes the Instruction Pointer (RIP) to the address of the Signal Handler function.
- The user’s handler runs. When it finishes, it calls a special
sigreturnsyscall to tell the kernel to restore the original execution state.
Process Groups, Sessions, and Job Control
Operating systems organize processes into hierarchies for management (especially in terminal sessions).- Process Group: A collection of related processes (e.g.,
cat file | grep "str"). All processes in a pipeline share a Process Group ID (PGID). This allows you to send a signal (likeSIGINTviaCtrl+C) to the entire group at once. - Session: A collection of process groups. Usually, one terminal window = one session.
- Foreground vs. Background: Only one process group in a session can be the “Foreground” group. It is the only one that can read from the keyboard. If a background process tries to read from the terminal, the kernel sends it a
SIGTTINsignal, which suspends it.
Summary: The Cost of a Process
When you create a process, you are allocating:- Memory: A new Page Table, unique Stack, and unique Heap.
- Kernel Objects: A
task_struct, entries in the PID hash table, and an Open File Table. - Time: The overhead of
fork()(COW management) and the ongoing cost of context switching.
Interview Deep Dive Questions
Q1: Explain what happens when you type 'ls' in a shell
Q1: Explain what happens when you type 'ls' in a shell
Complete Answer:
- Shell (bash) reads input “ls” from stdin
- Shell parses the command and arguments
- Shell calls
fork()to create child process- COW creates lightweight copy
- Child process calls
execvp("ls", args)- Kernel loads
/bin/lsexecutable - New code, data, heap, stack are set up
- File descriptors 0,1,2 remain (inherited)
- Kernel loads
- Parent shell calls
waitpid()and blocks - ls process runs, writes to stdout (fd 1)
- ls calls
exit(0), becomes zombie - Parent’s
waitpid()returns, zombie is reaped - Shell displays next prompt
Q2: Why is fork() before exec() expensive?
Q2: Why is fork() before exec() expensive?
Answer:Even with COW, fork() still must:
- Allocate new PID and PCB
- Copy page table entries (not data, but metadata)
- Copy file descriptor table
- Copy signal handlers and other process state
- Set up memory mappings
vfork(): Suspends parent, shares address spaceposix_spawn(): Single call that does fork+exec atomically- Clone with minimal sharing for containers
Q3: Can a zombie process be killed with kill -9?
Q3: Can a zombie process be killed with kill -9?
Answer:No. A zombie is already dead — it’s not running any code.
kill sends signals to running processes.A zombie exists only because:- Its exit status hasn’t been collected by parent
- Its PCB entry and PID are retained for this purpose
- Parent calls
wait()/waitpid() - Kill the parent — orphaned zombies are adopted by init and reaped
- Use
SIGCHLDhandler to auto-reap
Q4: What's the difference between process and thread context switch?
Q4: What's the difference between process and thread context switch?
Answer:
Thread switches within the same process are much cheaper because:
| Aspect | Process Switch | Thread Switch |
|---|---|---|
| Address space | Changes | Same |
| Page table | Switched (TLB flush) | Not changed |
| CPU registers | Saved/restored | Saved/restored |
| Kernel overhead | Higher | Lower |
| Cache effects | Worse (different memory) | Better (shared data) |
| Typical cost | 1-10 μs + cache misses | 0.1-1 μs |
- No page table switch needed
- Shared memory means cached data stays valid
- Only thread-local state needs saving
Q5: Design a system to handle 10,000 concurrent connections
Q5: Design a system to handle 10,000 concurrent connections
Answer:Process-per-connection (not recommended):
- 10,000 processes = massive memory overhead
- Context switch overhead kills performance
- Better but still problematic at 10K
- Stack memory: 10K × 8MB = 80GB virtual memory
- Thread switching overhead
- Single thread handles many connections
- Use
epoll_wait()to multiplex I/O - Non-blocking I/O for all sockets
- Multiple worker processes (CPU count)
- Each uses event loop for many connections
- Examples: Nginx, Node.js cluster
Practice Exercises
Fork Chain
Write a program that creates a chain of N processes (each child creates one grandchild). Print the process tree.
Zombie Factory
Create a program that generates zombies, then use
ps to observe them. Implement proper cleanup.Measure Context Switch
Use pipes between two processes to measure context switch time by rapidly passing a token back and forth.
Hands-on Lab: Exploring Processes with fork, exec, wait
These exercises help you see the kernel internals through real system calls.Lab 1: Basic fork/exec/wait
gcc -o lab_fork lab_fork.c && ./lab_fork
Lab 2: Inspect /proc while running
Run a long-lived process:Lab 3: Observing Zombies
ps aux | grep Z to see the zombie. Then kill the parent and watch the zombie disappear (reaped by init).
Lab 4: Measuring fork() cost
Key Takeaways
Process = Execution Context
PCB contains everything kernel needs: state, memory, files, credentials
Fork + Exec
Unix model: copy then transform. COW makes fork cheap.
Context Switch Cost
Direct cost + cache/TLB effects. Minimize switches for performance.
Zombie/Orphan Handling
Always reap children. Orphans adopted by init.
Next: Threads & Concurrency →