Virtual Memory
Virtual memory is the abstraction that gives each process its own private address space, decoupled from physical memory. It’s one of the most important OS concepts and heavily tested in senior interviews.Interview Frequency: Very High
Key Topics: Paging, TLB, page faults, replacement algorithms
Time to Master: 12-14 hours
Key Topics: Paging, TLB, page faults, replacement algorithms
Time to Master: 12-14 hours
Why Virtual Memory?
Isolation
Each process has private address space. One process can’t corrupt another.
Simplification
Every process thinks it has contiguous memory starting at 0. No coordination needed.
Overcommit
Virtual space can exceed physical memory. Use disk as extension.
Sharing
Libraries, code pages can be shared between processes with different virtual addresses.
Address Translation
The Big Picture
Virtual Address Structure (x86-64, 4KB pages)
Paging Deep Dive
Page Table Entry (PTE)
Multi-Level Page Tables
- Sparse address spaces: Only allocate tables for used regions
- 48-bit address space with single-level table = 512GB of PTEs!
- Multi-level: Most entries are NULL pointers, saving memory
Translation Lookaside Buffer (TLB)
The TLB caches recent virtual→physical translations:TLB Characteristics
| Property | Typical Value |
|---|---|
| Size | 64-2048 entries |
| Associativity | Fully associative or 4-8 way |
| Access time | 0.5-1 cycle |
| Miss penalty | 10-100+ cycles |
| Hierarchy | L1 (instruction), L1 (data), L2 (unified) |
TLB Shootdown
When a page mapping changes (e.g.,munmap), all TLBs must be invalidated:
Page Faults
A page fault occurs when the virtual address can’t be translated:Page Fault Handler Flow
Page Replacement Algorithms
When physical memory is full, which page to evict?FIFO (First-In-First-Out)
LRU (Least Recently Used)
Clock Algorithm (Second Chance)
Approximates LRU with just a reference bit:Comparison
| Algorithm | Faults (example) | Complexity | Notes |
|---|---|---|---|
| OPT | Minimum possible | N/A | Requires future knowledge |
| LRU | Near optimal | O(n) or O(1) with hardware | Stack property (no Belady’s anomaly) |
| Clock | Slightly worse than LRU | O(1) | Practical approximation |
| FIFO | Often poor | O(1) | Simple but Belady’s anomaly |
Demand Paging and Copy-on-Write
Demand Paging
Pages loaded only when accessed:Copy-on-Write (COW)
Defer copying until modification:CPU Extensions for Memory Management
Modern CPUs provide hardware extensions to enhance memory management capabilities:PAE (Physical Address Extension)
Allows 32-bit CPUs to access more than 4GB of physical memory:PSE (Page Size Extensions)
Enables support for larger page sizes beyond standard 4KB:NX Bit (No-Execute / DEP)
Prevents code execution from data pages — critical for security:SMEP & SMAP (Supervisor Mode Protections)
Advanced kernel protection mechanisms:PCID (Process-Context Identifiers)
Improves TLB efficiency across context switches:Huge Pages
Larger pages reduce TLB misses:Using Huge Pages in Linux
- Pros: Fewer TLB misses, better for large working sets
- Cons: More internal fragmentation
- Cons: Memory compaction overhead (for THP)
Memory-Mapped Files
Map files directly into virtual address space:mmap vs read/write
| Aspect | read()/write() | mmap() |
|---|---|---|
| Copies | Kernel→User buffer | Zero-copy (usually) |
| Random access | Seek + read | Direct pointer access |
| Large files | Memory efficient | Maps entire region |
| Sharing | Requires IPC | MAP_SHARED automatic |
| Small I/O | More efficient | Page fault overhead |
Interview Deep Dive Questions
Q1: Walk through a memory access from virtual to physical address
Q1: Walk through a memory access from virtual to physical address
Answer (assuming TLB miss, 4KB pages, x86-64):
- CPU generates virtual address (e.g., 0x7fff12345678)
- Check TLB: Not found (TLB miss)
-
Page table walk:
- Read CR3 to get PML4 base
- Extract bits [47:39] → index into PML4 → get PDPT address
- Extract bits [38:30] → index into PDPT → get PD address
- Extract bits [29:21] → index into PD → get PT address
- Extract bits [20:12] → index into PT → get PTE
- Each step = 1 memory read (4 total)
-
Check PTE:
- Present bit = 1? Continue
- Present bit = 0? Page fault
- Permissions match access type? Continue
-
Calculate physical address:
- Physical frame from PTE + offset [11:0] from VA
- Fill TLB with this translation
- Access physical memory through cache hierarchy
Q2: How does the kernel know if a page fault is valid or should SIGSEGV?
Q2: How does the kernel know if a page fault is valid or should SIGSEGV?
Answer:Kernel maintains Virtual Memory Areas (VMAs):On page fault:
- Search VMA tree for faulting address
- No VMA found → Invalid access → SIGSEGV
- VMA found, but permissions wrong → SIGSEGV
- VMA found, permissions OK:
- Anonymous page? Allocate zero frame
- File-backed? Read from disk
- Copy-on-write? Make private copy
Q3: Why is TLB shootdown expensive and how can it be avoided?
Q3: Why is TLB shootdown expensive and how can it be avoided?
Answer:Why expensive:
- IPI (Inter-Processor Interrupt): Must interrupt all CPUs
- Synchronization: Initiating CPU waits for acknowledgment
- TLB flush: Target CPUs invalidate entries
- Memory barrier: Ensure ordering
- ASID (Address Space Identifiers):
- Tag TLB entries with process ID
- Context switch doesn’t need flush
- Shootdown only for entries with matching ASID
- Lazy shootdown:
- Mark pages invalid but don’t send IPI
- Other CPUs will fault and update themselves
- Works for some scenarios (e.g., unmap)
- Batch invalidations:
- Collect multiple unmaps, single shootdown
- Linux:
mmu_gatherstructure
- Huge pages:
- Fewer TLB entries to invalidate
- One shootdown covers 2MB instead of 4KB
- Design for locality:
- Keep threads on same NUMA node
- Reduce cross-CPU sharing
Q4: Design a memory allocator for a database buffer pool
Q4: Design a memory allocator for a database buffer pool
Answer:Requirements:Operations:
- Fixed-size pages (e.g., 8KB)
- Fast allocation/deallocation
- Track dirty pages
- LRU-like eviction
- Allocate: Pop from free list (O(1))
- Free: Push to free list (O(1))
- Access: Move to LRU head (O(1))
- Evict: Take from LRU tail, write if dirty
- Pin/Unpin: Increment/decrement reference count
- Background writer for dirty pages
- Clock algorithm instead of strict LRU
- NUMA-aware allocation
- Huge pages for buffer pool
Q5: Explain memory overcommit and the OOM killer
Q5: Explain memory overcommit and the OOM killer
Answer:Memory overcommit:
- Linux allows allocating more virtual memory than physical
malloc(1GB)succeeds even with 512MB RAM- Actual physical pages allocated on first access
- Most allocations not fully used
- fork() would fail otherwise (COW makes it cheap)
- Allows efficient sparse arrays
/proc/sys/vm/overcommit_memory):0: Heuristic (default, usually allows 2x RAM)1: Always allow (never fail malloc)2: Strict (fail if > RAM + swap * ratio)
- Kernel invokes OOM killer
- Calculates “badness” score for each process:
- Memory usage (primary factor)
- Process age, nice value
oom_score_adjtuning
- Kills process with highest score
- Reclaims memory
Practice Exercises
1
Page Table Simulator
Implement a two-level page table with TLB. Measure TLB hit rates for different access patterns.
2
Page Replacement
Implement FIFO, LRU, and Clock algorithms. Compare fault rates on trace files.
3
COW Fork
Write a program demonstrating COW. Use
/proc/self/pagemap to observe physical pages.4
Memory-Mapped IPC
Implement producer-consumer using mmap with MAP_SHARED.
Key Takeaways
Multi-Level Saves Memory
Sparse address spaces only need page tables for used regions.
TLB is Critical
TLB misses cost 100+ cycles. Huge pages reduce misses dramatically.
Clock ≈ LRU in Practice
Reference bit approximation is simple and nearly as effective as LRU.
COW Enables Cheap fork()
Defer copying until write. Most pages never copied.
Next: File Systems →