Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

C Process Memory Layout

Memory Layout & Segments

Understanding how a C program uses memory is fundamental to systems programming. It’s the difference between writing code that “just works” and code that is efficient, secure, and robust.

Why is Memory Segmented?

The Design Rationale

Before examining the memory layout, understand why it’s organized this way: The problem: A flat memory model would be inefficient and insecure. If code and data were mixed, a bug could overwrite instructions, or a hacker could inject malicious code. The solution: Separate segments with different properties:
  1. Text segment (read-only, executable):
    • Prevents code modification (security)
    • Allows sharing between processes (efficiency)
    • Example: 100 Chrome tabs share one copy of Chrome code
  2. Data/BSS segments (read-write, fixed size):
    • Globals live here (known at compile time)
    • BSS is zero-initialized automatically (efficiency: no need to store zeros in binary)
    • Example: Global configuration, static lookup tables
  3. Heap (read-write, grows up):
    • Dynamic allocation for variable-sized data
    • Grows toward higher addresses
    • Example: malloc’d data, runtime-sized structures
  4. Stack (read-write, grows down):
    • Fast allocation (just move stack pointer)
    • Automatic cleanup (pop on return)
    • Grows toward lower addresses
    • Example: Local variables, function call frames
Why opposite growth directions? Maximizes available space. Stack and heap can grow toward each other until they meet (stack overflow).

Process Memory Layout

This diagram shows how a C program looks in memory (virtual address space) when it’s running. C Memory Layout

Examining Memory Segments

To understand the code below, you need to know what lives where:
  1. Text Segment: Read-only code instructions. Shared between processes running the same binary.
  2. Data Segment: Initialized global and static variables (e.g., int x = 10;).
  3. BSS Segment: Uninitialized global and static variables. Automatically zeroed by the kernel on startup.
  4. Heap: Dynamic memory (malloc). Grows upward (towards higher addresses).
  5. Stack: Local variables and function call frames. Grows downward (towards lower addresses).
#include <stdio.h>
#include <stdlib.h>

// TEXT segment (code)
void function(void) {
    printf("I'm in the text segment\n");
}

// DATA segment (initialized globals)
int global_init = 42;
static int static_init = 100;

// BSS segment (uninitialized globals)
int global_uninit;
static int static_uninit;

// Also DATA segment (initialized, read-only)
const int global_const = 999;
const char *string_literal = "Hello"; // Pointer in DATA, string in RODATA

int main(void) {
    // Stack segment
    int local = 10;
    int local_array[100];
    
    // Heap segment
    int *heap_ptr = malloc(1000 * sizeof(int));
    
    printf("=== Memory Segment Addresses ===\n\n");
    
    printf("TEXT (code):\n");
    printf("  function():        %p\n", (void*)function);
    printf("  main():            %p\n", (void*)main);
    
    printf("\nDATA (initialized):\n");
    printf("  global_init:       %p\n", (void*)&global_init);
    printf("  static_init:       %p\n", (void*)&static_init);
    printf("  global_const:      %p\n", (void*)&global_const);
    printf("  string_literal:    %p\n", (void*)string_literal);
    
    printf("\nBSS (uninitialized):\n");
    printf("  global_uninit:     %p\n", (void*)&global_uninit);
    printf("  static_uninit:     %p\n", (void*)&static_uninit);
    
    printf("\nHEAP:\n");
    printf("  heap_ptr:          %p\n", (void*)heap_ptr);
    
    printf("\nSTACK:\n");
    printf("  local:             %p\n", (void*)&local);
    printf("  local_array:       %p\n", (void*)local_array);
    printf("  heap_ptr (var):    %p\n", (void*)&heap_ptr);
    
    free(heap_ptr);
    return 0;
}

Viewing with Linux Tools

# Compile
gcc -g program.c -o program

# View segment sizes
size program
#    text    data     bss     dec     hex filename
#    2048     624      16    2688     a80 program

# View detailed sections
objdump -h program

# View memory map at runtime
./program &
cat /proc/$!/maps

# Or with pmap
pmap -x <pid>

The Stack

The stack is the most critical segment for program flow. It manages function calls, local variables, and return addresses.

Stack vs Heap: When to Use Which?

Before diving into stack mechanics, understand when to use stack vs heap: Use Stack when:
  • Size is known at compile time
  • Size is small (< few KB, typically < 1MB total)
  • Lifetime matches function scope
  • Performance is critical (stack is ~100x faster)
Use Heap when:
  • Size is determined at runtime
  • Size is large (> few KB)
  • Lifetime extends beyond function
  • Sharing data between functions
  • Building dynamic data structures
Performance comparison:
// Stack allocation: ~1-5 CPU cycles
// The compiler just subtracts from the stack pointer (one instruction).
// No searching, no locking, no metadata bookkeeping.
int buffer[100];  

// Heap allocation: ~100-500 CPU cycles
// malloc must: acquire a lock, search the free list, possibly split a block,
// update metadata, and return an aligned pointer. In a multi-threaded program,
// multiple threads contend on that lock, making it even slower.
int *buffer = malloc(100 * sizeof(int));
Practical tip: If you know the size at compile time and it is small (under a few KB), prefer the stack. The speed difference is real — in a tight loop allocating and freeing thousands of small objects, switching from heap to stack allocation can yield 10-50x speedups.

Function Call Mechanism

When a function is called, a new “frame” is pushed onto the stack. This frame contains everything the function needs to run. Function Call Mechanism

Stack Frame Structure

A stack frame (or activation record) typically contains:
  1. Function Arguments: Parameters passed to the function.
  2. Return Address: Where to jump back to when the function finishes.
  3. Saved Base Pointer (RBP): To restore the caller’s stack frame.
  4. Local Variables: Variables declared inside the function.
C Stack Frame Layout

Stack in Action

#include <stdio.h>

void level3(void) {
    int local3 = 3;
    printf("level3: &local3 = %p\n", (void*)&local3);
}

void level2(void) {
    int local2 = 2;
    printf("level2: &local2 = %p\n", (void*)&local2);
    level3();
}

void level1(void) {
    int local1 = 1;
    printf("level1: &local1 = %p\n", (void*)&local1);
    level2();
}

int main(void) {
    int local0 = 0;
    printf("main:   &local0 = %p\n", (void*)&local0);
    level1();
    return 0;
}

// Output shows decreasing addresses (stack grows down):
// main:   &local0 = 0x7fff5000
// level1: &local1 = 0x7fff4fd0  (lower)
// level2: &local2 = 0x7fff4fa0  (lower)
// level3: &local3 = 0x7fff4f70  (lower)

Stack Overflow

The stack has a fixed size (usually 8MB on Linux). If you recurse too deep or allocate huge arrays, you’ll hit the limit.
#include <stdio.h>

// This will crash with stack overflow
void infinite_recursion(int n) {
    int buffer[1000];  // 4KB per call
    printf("Call %d, buffer at %p\n", n, (void*)buffer);
    infinite_recursion(n + 1);
}

// Check stack limit
#include <sys/resource.h>

void print_stack_limit(void) {
    struct rlimit limit;
    getrlimit(RLIMIT_STACK, &limit);
    printf("Stack limit: %lu bytes (soft), %lu bytes (hard)\n",
           limit.rlim_cur, limit.rlim_max);
}

// Increase stack limit
void increase_stack_limit(void) {
    struct rlimit limit;
    getrlimit(RLIMIT_STACK, &limit);
    limit.rlim_cur = 64 * 1024 * 1024;  // 64 MB
    setrlimit(RLIMIT_STACK, &limit);
}

Variable-Length Arrays (VLAs)

#include <stdio.h>

void process(int n) {
    // VLA - allocated on stack at runtime (C99)
    int arr[n];  // DANGER: Large n = stack overflow!
    
    printf("VLA of %d ints at %p\n", n, (void*)arr);
    
    // VLAs cannot be initialized
    for (int i = 0; i < n; i++) {
        arr[i] = i;
    }
}

// Safer alternative: use heap for large/variable sizes
void process_safe(int n) {
    int *arr = malloc(n * sizeof(int));
    if (!arr) {
        // Handle allocation failure
        return;
    }
    // ... use arr ...
    free(arr);
}
VLAs are dangerous for large or user-controlled sizes. They were made optional in C11 for good reason. Prefer heap allocation for variable-size arrays.

The Heap

How malloc Works (Simplified)

The heap is managed by the C standard library (glibc). It requests memory from the OS using brk (for small allocations) or mmap (for large ones).
// malloc uses brk/sbrk for small allocations
// and mmap for large allocations

#include <unistd.h>

void examine_brk(void) {
    void *initial = sbrk(0);  // Current program break
    printf("Initial program break: %p\n", initial);
    
    void *p1 = malloc(1000);
    printf("After malloc(1000): program break = %p\n", sbrk(0));
    
    void *p2 = malloc(1000);
    printf("After malloc(1000): program break = %p\n", sbrk(0));
    
    // Large allocation uses mmap, not brk
    void *p3 = malloc(1024 * 1024);  // 1 MB
    printf("After malloc(1MB): program break = %p (unchanged for mmap)\n", sbrk(0));
    
    free(p1);
    free(p2);
    free(p3);
}

Heap Fragmentation

Fragmentation is like a parking lot where cars of different sizes come and go. After a while, you might have 20 empty spots scattered across the lot, but none of them are adjacent — so a bus (large allocation) cannot park even though there is plenty of total free space. This is external fragmentation, and it is one of the biggest problems with long-running C programs like servers and databases.
#include <stdio.h>
#include <stdlib.h>

void demonstrate_fragmentation(void) {
    void *ptrs[1000];
    
    // Allocate many small blocks
    for (int i = 0; i < 1000; i++) {
        ptrs[i] = malloc(100);
    }
    
    // Free every other one (creates holes)
    for (int i = 0; i < 1000; i += 2) {
        free(ptrs[i]);
    }
    
    // Now we have 500 100-byte holes
    // This 60KB allocation might fail even though we have 50KB free!
    void *big = malloc(60000);
    if (!big) {
        printf("Fragmentation: can't allocate contiguous 60KB\n");
    }
    
    // Clean up
    for (int i = 1; i < 1000; i += 2) {
        free(ptrs[i]);
    }
    free(big);
}

Memory Alignment

CPUs access memory most efficiently when data is aligned to its size (e.g., 4-byte integers on 4-byte boundaries).
#include <stdio.h>
#include <stddef.h>
#include <stdalign.h>  // C11
#include <stdlib.h>

struct Aligned {
    char a;      // 1 byte
                 // 3 bytes padding
    int b;       // 4 bytes (requires 4-byte alignment)
    char c;      // 1 byte
                 // 7 bytes padding
    double d;    // 8 bytes (requires 8-byte alignment)
};  // Total: 24 bytes (aligned to 8)

int main(void) {
    printf("sizeof(struct Aligned) = %zu\n", sizeof(struct Aligned));
    printf("alignof(struct Aligned) = %zu\n", alignof(struct Aligned));
    
    printf("\nOffset of a: %zu\n", offsetof(struct Aligned, a));
    printf("Offset of b: %zu\n", offsetof(struct Aligned, b));
    printf("Offset of c: %zu\n", offsetof(struct Aligned, c));
    printf("Offset of d: %zu\n", offsetof(struct Aligned, d));
    
    // Aligned allocation
    void *p = aligned_alloc(64, 1024);  // 64-byte aligned, 1024 bytes
    printf("\naligned_alloc(64, 1024) = %p\n", p);
    printf("Properly aligned: %s\n", ((uintptr_t)p % 64 == 0) ? "yes" : "no");
    free(p);
    
    return 0;
}

Why Alignment Matters

Unaligned access can be slow or even crash the program on some architectures (like ARM).
#include <stdio.h>
#include <stdint.h>
#include <time.h>

#define SIZE 10000000

// Unaligned access (slow on some architectures, UB on others)
void unaligned_access(void) {
    char buffer[SIZE + 8];
    
    // Misaligned int pointer
    int *p = (int*)(buffer + 1);  // Intentionally misaligned
    
    clock_t start = clock();
    for (int i = 0; i < SIZE / 4; i++) {
        p[i] = i;
    }
    clock_t end = clock();
    
    printf("Unaligned: %f seconds\n", 
           (double)(end - start) / CLOCKS_PER_SEC);
}

// Aligned access (fast)
void aligned_access(void) {
    int *p = malloc(SIZE);
    
    clock_t start = clock();
    for (int i = 0; i < SIZE / 4; i++) {
        p[i] = i;
    }
    clock_t end = clock();
    
    printf("Aligned:   %f seconds\n",
           (double)(end - start) / CLOCKS_PER_SEC);
    
    free(p);
}

Static and Thread-Local Storage

#include <stdio.h>
#include <pthread.h>

// Global (DATA/BSS segment, shared across threads)
int global_counter = 0;

// Static local (DATA/BSS segment, persists between calls)
void count_calls(void) {
    static int call_count = 0;  // Initialized once, persists
    call_count++;
    printf("Called %d times\n", call_count);
}

// Thread-local storage (each thread gets its own copy)
_Thread_local int thread_counter = 0;  // C11
// Or: __thread int thread_counter = 0;  // GCC extension

void* thread_func(void *arg) {
    int thread_id = *(int*)arg;
    
    for (int i = 0; i < 5; i++) {
        thread_counter++;  // Each thread has separate counter
        printf("Thread %d: thread_counter = %d\n", thread_id, thread_counter);
    }
    
    return NULL;
}

int main(void) {
    pthread_t t1, t2;
    int id1 = 1, id2 = 2;
    
    pthread_create(&t1, NULL, thread_func, &id1);
    pthread_create(&t2, NULL, thread_func, &id2);
    
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
    
    // Each thread's counter reached 5 independently
    
    return 0;
}

Memory-Mapped Files

Memory mapping (mmap) allows you to treat a file on disk as if it were in memory. This is how the OS loads executables and libraries.
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>

int main(void) {
    const char *filename = "testfile.txt";
    
    // Create and write to file
    int fd = open(filename, O_RDWR | O_CREAT | O_TRUNC, 0644);
    const char *text = "Hello, memory-mapped world!";
    write(fd, text, strlen(text));
    
    // Get file size
    struct stat sb;
    fstat(fd, &sb);
    
    // Memory-map the file
    char *mapped = mmap(NULL, sb.st_size, PROT_READ | PROT_WRITE,
                        MAP_SHARED, fd, 0);
    
    if (mapped == MAP_FAILED) {
        perror("mmap");
        close(fd);
        return 1;
    }
    
    // Access file contents through memory
    printf("Mapped content: %.*s\n", (int)sb.st_size, mapped);
    
    // Modify through memory (writes to file!)
    mapped[0] = 'J';  // Changes "Hello" to "Jello"
    
    // Sync changes to disk
    msync(mapped, sb.st_size, MS_SYNC);
    
    // Unmap
    munmap(mapped, sb.st_size);
    close(fd);
    
    return 0;
}

Exercises

1

Memory Map Explorer

Write a program that prints the address of variables in each segment (text, data, bss, heap, stack) and verifies they’re in the expected order.
2

Stack Size Probe

Write a recursive function that measures roughly how much stack space is available before a stack overflow.
3

Alignment Checker

Write a function bool is_aligned(void *ptr, size_t alignment) that checks if a pointer is properly aligned.
4

Memory Visualizer

Use /proc/self/maps to write a program that displays its own memory regions with human-readable labels.

Next Up

Dynamic Memory Management

Master malloc, custom allocators, and memory patterns

Interview Deep-Dive

Strong Answer:
  • Virtual memory and physical memory are not the same thing. When malloc requests memory via mmap, the kernel creates virtual address space mappings (page table entries) but does not allocate physical pages. Physical pages are only assigned on first write (demand paging / copy-on-write). The 4GB is virtual address space commitment; the 200MB is the actual physical pages that have been touched.
  • Additionally, the kernel may have swapped some pages to disk. Pages that were once resident but have not been accessed recently get evicted to swap, reducing RSS without reducing virtual size.
  • calloc is a special case: it maps zero-filled pages, and the kernel uses a single shared zero page for all untouched pages. Only when you write to a calloc’d page does the kernel allocate a real physical page (copy-on-write). A calloc(1, 1GB) might consume almost zero physical memory until you write to it.
  • This is also why overcommit exists on Linux: the kernel allows processes to allocate more virtual memory than physically available, betting that most of it will never be touched. This is configurable via /proc/sys/vm/overcommit_memory.
Follow-up: How does ASLR (Address Space Layout Randomization) affect the memory layout, and why does it matter for security?Follow-up Answer:
  • ASLR randomizes the base addresses of the stack, heap, shared libraries, and the executable’s text segment on each program execution. Without ASLR, an attacker who discovers a buffer overflow can hardcode the address of their shellcode or a useful gadget (like system()). With ASLR, those addresses change every run, turning a deterministic exploit into a probabilistic one. On 64-bit systems, the randomization entropy is large enough (28-40 bits depending on the region) that brute-force guessing is impractical. ASLR requires the binary to be compiled as a Position Independent Executable (PIE) with -fPIE -pie.
Strong Answer:
  • This is a deliberate design choice to maximize available space. With the stack starting at a high address and growing downward, and the heap starting at a low address and growing upward, the two regions grow toward each other, using the entire gap between them as available space. If they grew in the same direction, one would hit the other’s starting address much sooner.
  • In a modern 64-bit system with virtual memory, the two regions will never literally collide because the virtual address space is enormous (48-bit or 57-bit). The stack has a fixed limit (typically 8MB, configurable via ulimit -s), and exceeding it triggers a SIGSEGV (stack overflow). The heap can grow until the system runs out of virtual address space or the kernel denies the mmap/brk request.
  • On 32-bit embedded systems without an MMU, the collision is a real risk. The stack and heap share a flat memory space, and overflow in either direction silently corrupts the other, causing mysterious crashes far from the actual bug. This is why embedded systems often forbid heap allocation entirely and size the stack conservatively at compile time.
Follow-up: How would you detect a stack overflow in a production system?Follow-up Answer:
  • The kernel places a guard page (a page with no permissions) at the bottom of the stack. If the stack grows into it, the hardware triggers a page fault that becomes SIGSEGV. You can install a signal handler for SIGSEGV using sigaltstack (which provides an alternate stack for the handler itself, since the main stack is blown). GCC’s -fstack-protector-strong places canary values between local variables and the return address; if a buffer overflow overwrites the canary, the runtime detects it and aborts. For deeper monitoring, use getrlimit(RLIMIT_STACK) to check limits and log warnings when recursion depth approaches critical levels.
Strong Answer:
  • mmap asks the kernel to map a region of virtual address space. For file-backed mappings, the file’s contents are lazily loaded into memory on first access (page faults trigger disk reads). For anonymous mappings (MAP_ANONYMOUS), the kernel provides zero-filled pages on demand. In both cases, the actual memory consumption is proportional to the pages you touch, not the size you request.
  • malloc already uses mmap internally for large allocations (typically above 128KB in glibc). The advantage of calling mmap directly is control: you can specify alignment, use MAP_HUGETLB for huge pages (reducing TLB misses for very large datasets), use MAP_POPULATE to pre-fault all pages (avoiding latency spikes during access), and use madvise to tell the kernel about your access pattern (MADV_SEQUENTIAL for sequential scans, MADV_RANDOM for random access).
  • The key difference from malloc: munmap immediately returns memory to the OS. With malloc/free, glibc may hold onto freed memory in its free list for future allocations, so RSS does not decrease even after freeing. For a process that does a large computation and then wants to release that memory, mmap/munmap gives you that guarantee.
Follow-up: You are building a database engine that needs to access a 100GB file. How would you use mmap, and what are the pitfalls?Follow-up Answer:
  • Map the entire file with mmap(NULL, file_size, PROT_READ, MAP_PRIVATE, fd, 0). The OS pages data in and out transparently using the page cache. Use madvise(addr, len, MADV_RANDOM) for B-tree traversals (random access pattern) or MADV_SEQUENTIAL for full table scans. Pitfalls: (1) On 32-bit systems, you cannot map files larger than ~3GB. (2) Page faults on cold pages stall the thread for milliseconds (disk I/O), making latency unpredictable — this is why some databases like InnoDB manage their own buffer pool instead of relying on mmap. (3) The kernel’s page eviction policy may not match your workload — it might evict hot index pages to make room for a sequential scan. (4) Error handling is awkward: a disk read error during a page fault delivers SIGBUS, not an errno. You need a SIGBUS handler to recover gracefully.