Skip to main content

Memory Management

Memory management controls how physical memory is allocated, tracked, and protected. This is foundational knowledge for understanding virtual memory and system performance.
Interview Frequency: High
Key Topics: Allocation strategies, fragmentation, memory protection, buddy system
Time to Master: 10-12 hours

Memory Hierarchy

Fast memory is expensive and scarce; cheap memory is slow and abundant. The OS manages this hierarchy to give the illusion of large, fast memory.

Numbers Everyone Should Know

Understanding the relative latencies is crucial for performance tuning:
ComponentLatency (Time)Latency (Scaled to Human Scale)
L1 Cache1 ns1 second
L2 Cache4 ns4 seconds
Main Memory (RAM)100 ns~2 minutes
SSD I/O16,000 ns (16 μs)~4.5 hours
Network (Data Center)500,000 ns (0.5 ms)~6 days
Disk I/O (HDD)4,000,000 ns (4 ms)~1.5 months
Ping (Internet)150,000,000 ns (150 ms)~5 years
Cache Hits Matter: A cache miss (falling from L1 to RAM) is a 100x penalty. A page fault (falling from RAM to Disk) is a 100,000x penalty!
Memory Hierarchy

Address Binding

Programs deal with logical addresses (virtual). The hardware deals with physical addresses. Binding is the mapping process.

When are addresses determined?

Binding TimeDescriptionExample
Compile timeFixed physical address.Embedded systems, Bootloaders. If process moves, must recompile.
Load timeAddress determined when loaded.Relocatable code. Compiler generates offsets; Loader adds base address.
Run timeAddress changes during execution.Modern OS. Needs hardware support (MMU/Base-Limit).

Address Translation Example (Base + Limit)

Simple hardware protection using two registers: Base (start of physical) and Limit (size).
CPU Register State:
Base:  14000
Limit:  1000  (Valid range: 14000 - 15000)

Instruction: LOAD [350]  (Logical Address 350)

1. Check Limit:
   Is 350 < 1000?  YES. (If NO -> Trap: Segmentation Violation)

2. Calculate Physical:
   Physical = Base + Logical
            = 14000 + 350
            = 14350

3. Access Memory:
   Send address 14350 to bus.
Address Translation

Contiguous Memory Allocation

Fixed Partitioning

Fixed Partitions

Variable Partitioning

Variable Partitions

Allocation Strategies

First Fit

// Find first hole large enough
void *first_fit(size_t size) {
    for (block_t *b = free_list; b != NULL; b = b->next) {
        if (b->size >= size) {
            return allocate_from_block(b, size);
        }
    }
    return NULL;  // No fit
}
Pros: Fast
Cons: Fragments front of memory

Best Fit

// Find smallest hole that fits
void *best_fit(size_t size) {
    block_t *best = NULL;
    
    for (block_t *b = free_list; b != NULL; b = b->next) {
        if (b->size >= size) {
            if (best == NULL || b->size < best->size) {
                best = b;
            }
        }
    }
    
    return best ? allocate_from_block(best, size) : NULL;
}
Pros: Leaves larger holes intact
Cons: Slow (searches all), creates tiny unusable holes

Worst Fit

// Find largest hole
void *worst_fit(size_t size) {
    block_t *worst = NULL;
    
    for (block_t *b = free_list; b != NULL; b = b->next) {
        if (b->size >= size) {
            if (worst == NULL || b->size > worst->size) {
                worst = b;
            }
        }
    }
    
    return worst ? allocate_from_block(worst, size) : NULL;
}
Pros: Remaining holes more usable
Cons: Fragments large holes quickly

Comparison

StrategySpeedFragmentationUse Case
First FitFastModerateGeneral purpose
Best FitSlowMany tiny holesWhen memory tight
Worst FitSlowFragments large holesRarely used
Next FitFastSpreads fragmentationCircular allocation

Fragmentation

Internal Fragmentation

Memory wasted inside allocated blocks: Internal Fragmentation

External Fragmentation

Free memory between allocated blocks: External Fragmentation

Solutions to External Fragmentation

  1. Compaction: Move processes to consolidate free space
    • Expensive, requires relocation support
  2. Paging: Non-contiguous allocation (covered in virtual memory)
    • Most common modern solution
  3. Segmentation: Variable-sized logical units
    • Still has external fragmentation

Buddy System Allocator

Used in Linux kernel for page allocation. It mitigates external fragmentation by merging adjacent free blocks (“buddies”).

How It Works

Memory is conceptually a large power-of-2 block.
  1. Splitting: If a request is size SS, find the smallest chunk size 2kS2^k \ge S. If the chunk is too big, split it in two equal buddies. Repeat until specific size matches.
  2. Coalescing: When freeing a block, check if its buddy is free. If yes, merge them into a larger block. Repeat up the chain.

Example Walkthrough (1MB Total Memory)

Process A requests 70KB. Smallest power-of-2 fitting 70KB is 128KB.
  1. Initial: One 1MB free block.
  2. Split 1MB -> 512KB + 512KB
  3. Split 512KB -> 256KB + 256KB
  4. Split 256KB -> 128KB + 128KB
  5. Allocate: First 128KB is given to Process A.
State:
[ Process A (128KB) ] [ Free (128KB) ] [ Free (256KB) ] [ Free (512KB) ]
Merge Phase: If Process A frees its 128KB:
  1. Check buddy (next 128KB) -> It’s FREE. Merge -> 256KB.
  2. Check buddy (next 256KB) -> It’s FREE. Merge -> 512KB.
  3. Check buddy (next 512KB) -> It’s FREE. Merge -> 1MB. Result: Back to original 1MB contiguous block.
Buddy System Allocator

Implementation

#define MAX_ORDER 11  // 2^11 pages = 8 MB max block

struct free_area {
    struct list_head free_list;
    unsigned long nr_free;
};

struct zone {
    struct free_area free_area[MAX_ORDER];
};

// Allocate 2^order pages
struct page *alloc_pages(unsigned int order) {
    for (int current = order; current < MAX_ORDER; current++) {
        if (!list_empty(&zone->free_area[current].free_list)) {
            // Found a block
            struct page *page = list_first_entry(...);
            
            // Split if necessary
            while (current > order) {
                current--;
                // Add buddy to free list
                struct page *buddy = page + (1 << current);
                list_add(&buddy->lru, 
                         &zone->free_area[current].free_list);
            }
            
            return page;
        }
    }
    return NULL;  // Out of memory
}

// Free 2^order pages
void free_pages(struct page *page, unsigned int order) {
    while (order < MAX_ORDER - 1) {
        struct page *buddy = find_buddy(page, order);
        
        if (!buddy_is_free(buddy, order)) {
            break;  // Can't merge
        }
        
        // Remove buddy from free list
        list_del(&buddy->lru);
        
        // Merge: use lower address as new block
        if (buddy < page) {
            page = buddy;
        }
        order++;
    }
    
    // Add merged block to free list
    list_add(&page->lru, &zone->free_area[order].free_list);
}

// Find buddy address (XOR with block size)
struct page *find_buddy(struct page *page, unsigned int order) {
    unsigned long buddy_idx = page_idx ^ (1 << order);
    return pfn_to_page(buddy_idx);
}

Buddy System Properties

PropertyValue
Allocation sizesPowers of 2 only
Internal fragmentationUp to ~50%
External fragmentationLow (coalescing)
SpeedO(log n)
Used byLinux kernel page allocator

Slab Allocator

For frequently allocated fixed-size objects: Slab Allocator

SLUB (Modern Linux)

// Create a cache for a specific object type
struct kmem_cache *task_cache = kmem_cache_create(
    "task_struct",           // Name
    sizeof(struct task_struct),  // Object size
    __alignof__(struct task_struct),  // Alignment
    SLAB_PANIC,              // Flags
    init_task_struct         // Constructor
);

// Allocate an object
struct task_struct *task = kmem_cache_alloc(task_cache, GFP_KERNEL);

// Free an object
kmem_cache_free(task_cache, task);

Memory Protection

Base and Limit Registers

Base + Limit Protection

Segmentation with Protection

Segment Descriptor

Page-Level Protection (Modern)

// Page table entry flags
#define PTE_PRESENT   (1 << 0)   // Page is in memory
#define PTE_WRITABLE  (1 << 1)   // Write allowed
#define PTE_USER      (1 << 2)   // User mode access allowed
#define PTE_NX        (1ULL << 63)  // No execute (x86-64)

// Example: Read-only code page
pte = physical_address | PTE_PRESENT | PTE_USER;  // No PTE_WRITABLE

// Example: Stack page (no execute)
pte = physical_address | PTE_PRESENT | PTE_WRITABLE | PTE_USER | PTE_NX;

Swapping

Moving entire processes to/from disk: Swapping

Modern: Demand Paging vs Swapping

AspectSwappingDemand Paging
GranularityEntire processSingle pages
LatencyVery highLower
FlexibilityCoarseFine-grained
Used byOlder systemsModern OS

Memory-Mapped Files

Map file contents directly to memory:
#include <sys/mman.h>
#include <fcntl.h>

int main() {
    int fd = open("data.bin", O_RDWR);
    
    struct stat st;
    fstat(fd, &st);
    size_t size = st.st_size;
    
    // Map file into memory
    void *addr = mmap(NULL, size,
                      PROT_READ | PROT_WRITE,  // Permissions
                      MAP_SHARED,               // Changes written to file
                      fd, 0);                   // File offset
    
    // Access file like memory
    char *data = (char *)addr;
    data[0] = 'X';  // This modifies the file!
    
    // Force write to disk
    msync(addr, size, MS_SYNC);
    
    // Cleanup
    munmap(addr, size);
    close(fd);
    
    return 0;
}

Use Cases

  1. Shared libraries: .so files mapped read-only
  2. Large files: Database files, video editing
  3. IPC: Multiple processes map same file
  4. Copy-on-write: Fork efficiency

IOMMU and DMA Remapping

Modern systems provide hardware-assisted memory protection for DMA devices: IOMMU Concept

IOMMU Architecture

IOMMU Architecture

DMA Remapping Use Cases

DMA Remapping

Linux IOMMU API

#include <linux/iommu.h>

// Check if IOMMU is available
if (iommu_present(&pci_bus_type)) {
    // IOMMU is enabled
}

// Allocate IOMMU domain (address space for device)
struct iommu_domain *domain = iommu_domain_alloc(&pci_bus_type);

// Attach device to domain
iommu_attach_device(domain, &pdev->dev);

// Map a page for DMA
dma_addr_t iova = 0x1000;
phys_addr_t paddr = page_to_phys(page);
iommu_map(domain, iova, paddr, PAGE_SIZE, IOMMU_READ | IOMMU_WRITE);

// Device can now DMA to IOVA 0x1000
// IOMMU translates to physical page

// Unmap when done
iommu_unmap(domain, iova, PAGE_SIZE);

// Detach device
iommu_detach_device(domain, &pdev->dev);
iommu_domain_free(domain);

IOMMU Performance Considerations

IOMMU Performance

Interview Deep Dive Questions

Answer:Internal Fragmentation:
  • Memory wasted inside an allocated block
  • Occurs with fixed-size allocation
  • Example: Allocating 128 bytes for a 100-byte request
┌────────────────────────────────────┐
│   Used (100)    │   Wasted (28)    │  ← Internal
└────────────────────────────────────┘
External Fragmentation:
  • Free memory between allocated blocks
  • Total free may be enough, but not contiguous
  • Occurs with variable-size allocation
┌───────┬────────┬───────┬────────┬───────┐
│ Used  │ Free   │ Used  │ Free   │ Used  │  ← External
│       │ (30K)  │       │ (50K)  │       │
└───────┴────────┴───────┴────────┴───────┘
Can't allocate 70K even though 80K total is free
Solutions:
  • Internal: Better size classes, slab allocator
  • External: Compaction, paging (most effective)
Answer:The buddy system uses coalescing to prevent fragmentation:Key Mechanisms:
  1. Power-of-2 sizes only:
    • All blocks are 2^n bytes
    • When split, creates two equal “buddies”
  2. Buddy address calculation:
    // Buddy of block at address A with size 2^k:
    buddy_address = A XOR (1 << k)
    
  3. Coalescing on free:
    • When freeing, check if buddy is also free
    • If yes, merge into larger block
    • Repeat until buddy not free or max size
Example:
Allocate 100KB from 1024KB block:

1024KB → split → 512KB + 512KB
512KB  → split → 256KB + 256KB  
256KB  → split → 128KB + 128KB ← Use one 128KB block

Free the 128KB block:
- Check buddy (adjacent 128KB): Free!
- Merge → 256KB
- Check buddy (adjacent 256KB): Free!
- Merge → 512KB
- Check buddy (adjacent 512KB): In use
- Stop. Now have 512KB free block.
Tradeoff: Up to 50% internal fragmentation for small requests
Answer:Different allocation patterns need different strategies:Buddy System (Page Allocator):
  • Allocates pages (4KB units)
  • Good for: DMA buffers, large allocations
  • Issues: Internal fragmentation for small objects
  • Used by: alloc_pages(), get_free_pages()
Slab Allocator (SLUB):
  • Allocates objects of fixed sizes
  • Good for: Frequently allocated kernel structures
  • Benefits: No fragmentation, object caching
  • Used by: kmalloc(), kmem_cache_alloc()
Hierarchy:
Application: malloc()


User Space: glibc allocator (ptmalloc)


Kernel: brk()/mmap() ─────────────────────┐
     │                                     │
     ▼                                     ▼
Slab Allocator (SLUB)              Direct page allocation
for small objects                   for large allocations
     │                                     │
     └──────────────┬──────────────────────┘

             Buddy System (Page Allocator)


             Physical Memory
Examples:
  • struct task_struct: Slab cache (same size, frequent)
  • kmalloc(4096): Could use buddy directly
  • Video buffer: Buddy system for contiguous pages
Answer:Linux OOM Handling:
  1. Memory Pressure Detection:
    • Buddy system can’t find pages
    • kswapd daemon activated
  2. Reclaim Attempts:
    Try to free memory:
    1. Drop clean page cache
    2. Write dirty pages to disk
    3. Swap out anonymous pages
    4. Drop slab caches (reclaimable)
    
  3. OOM Killer Activated (if reclaim fails):
    // Score each process (higher = more likely killed)
    oom_score = (memory_usage / total_memory) * 1000
    
    // Adjustments:
    // - Root processes: lower score
    // - oom_score_adj: admin override (-1000 to +1000)
    // - Long-running: slight penalty
    
  4. Victim Selection:
    • Highest oom_score wins
    • Process receives SIGKILL
    • Kernel frees its memory
Preventing OOM Kill:
# Make process immune
echo -1000 > /proc/<pid>/oom_score_adj

# Make process a target
echo 1000 > /proc/<pid>/oom_score_adj
Common Causes:
  • Memory leak in application
  • Insufficient swap
  • Overcommit settings too aggressive
  • Fork bomb
Answer:Overcommit allows allocating more virtual memory than physical RAM + swap.Why?
  • Fork creates complete copy of address space
  • Most pages never modified (COW)
  • Sparse arrays allocate but don’t use
Modes (/proc/sys/vm/overcommit_memory):
ValueModeBehavior
0HeuristicKernel estimates, rejects “obvious” overcommit
1AlwaysNever fail malloc (dangerous!)
2NeverStrict: CommitLimit = RAM + Swap * ratio
Example:
System: 8GB RAM, 4GB swap
overcommit_ratio: 50 (default)

CommitLimit = 8GB + (4GB * 0.5) = 10GB

Total allocations across all processes
cannot exceed 10GB virtual memory
View current state:
$ cat /proc/meminfo | grep Commit
CommitLimit:    12345678 kB
Committed_AS:    8765432 kB  # Currently allocated
Tradeoffs:
  • Overcommit (mode 0/1): More flexible, risk of OOM
  • Strict (mode 2): Safe for databases, may waste resources
Recommendation:
  • Databases: Mode 2 (predictable)
  • Development: Mode 0 (default)
  • Never in production: Mode 1

Practice Exercises

1

Implement First-Fit Allocator

Build a simple memory allocator with first-fit strategy and coalescing.
2

Buddy System Simulation

Simulate buddy allocation and visualize the splitting/merging process.
3

Fragmentation Analysis

Write a program to measure internal vs external fragmentation under different workloads.
4

OOM Score Calculator

Implement Linux’s OOM scoring algorithm.

Key Takeaways

Fragmentation Types

Internal (inside blocks) vs External (between blocks). Paging solves external.

Buddy System

Power-of-2 allocation with coalescing. Used for kernel page allocation.

Slab Allocator

Fixed-size object caching. Zero fragmentation for same-size objects.

Protection

Base/limit, segmentation, paging all provide memory isolation.

Next: Inter-Process Communication