Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Project: Memory Allocator

Implement a custom memory allocator that replaces the system malloc. This is the ultimate C project — you will understand exactly how dynamic memory works at the lowest level. Every time you call malloc(64) in any C program, an allocator must decide: which chunk of memory to hand back, how to track what is free and what is not, and how to do this fast enough that it does not become a bottleneck. Real-world allocators like glibc’s ptmalloc, Google’s tcmalloc, and Facebook’s jemalloc are engineering marvels that balance speed, fragmentation, thread scalability, and cache behavior. In this project, you will build a simplified version from scratch to understand the fundamental design decisions they all share.

Allocator Design

Think of the allocator as managing a parking lot. Cars (allocations) come and go at unpredictable times, leaving gaps of various sizes. The manager must decide where to park each new car (allocation strategy), when to re-stripe the lot to consolidate empty spaces (coalescing), and how to handle rush hour without creating a traffic jam (thread safety).
1

System Interface

Get raw memory pages from the kernel with sbrk (extend the data segment) or mmap (request a new memory region). This is like acquiring land for the parking lot — you ask the OS for a plot.
2

Block Management

Track which regions are free and which are allocated using metadata headers. Each block carries a small header that records its size, status, and links to neighboring blocks.
3

Fragmentation

Handle splitting (cutting a large block to satisfy a smaller request) and coalescing (merging adjacent free blocks back together). Without coalescing, your heap becomes Swiss cheese — lots of small holes, none big enough to use.
4

Performance

Optimize for speed and memory efficiency. First-fit is simple but slow for large heaps. Segregated free lists give O(1) allocation for common sizes — the same approach used by glibc’s ptmalloc and tcmalloc.

Block Structure

// malloc.h
#ifndef MY_MALLOC_H
#define MY_MALLOC_H

#include <stddef.h>
#include <stdint.h>
#include <stdbool.h>

// Block header - stored immediately before each user allocation.
// When the user calls malloc(64), we actually allocate HEADER_SIZE + 64 bytes.
// The header sits at the front, invisible to the caller. The pointer returned
// to the user points just past the header. Think of it like a shipping label
// attached to a package: the recipient never sees the label, but the postal
// system needs it to route the package correctly.
typedef struct block_header {
    size_t size;                  // Size of user data (not including header)
    bool is_free;                 // Is this block currently available for reuse?
    struct block_header *next;    // Next block in the global linked list of all blocks
    struct block_header *prev;    // Previous block (needed for O(1) coalescing)
    uint32_t magic;               // Canary value (0xDEADBEEF) to detect heap corruption
} block_header_t;

#define HEADER_SIZE sizeof(block_header_t)
#define BLOCK_MAGIC 0xDEADBEEF
#define MIN_BLOCK_SIZE 16

// Alignment for all allocations
#define ALIGNMENT 16
#define ALIGN(size) (((size) + (ALIGNMENT-1)) & ~(ALIGNMENT-1))

// Public interface
void *my_malloc(size_t size);
void my_free(void *ptr);
void *my_calloc(size_t nmemb, size_t size);
void *my_realloc(void *ptr, size_t size);

// Debug
void my_malloc_stats(void);

#endif

Basic Implementation

// malloc.c
#define _GNU_SOURCE
#include "malloc.h"
#include <unistd.h>
#include <sys/mman.h>
#include <string.h>
#include <stdio.h>
#include <pthread.h>

// Global state -- the allocator maintains a doubly-linked list of ALL blocks
// (both free and allocated). heap_start points to the first block ever created,
// heap_end points to the most recent. This list is the allocator's "map" of
// the entire heap -- walking it lets us find free blocks and coalesce neighbors.
static block_header_t *heap_start = NULL;
static block_header_t *heap_end = NULL;
// A single global mutex protects all allocator state. This is the simplest
// thread-safety approach, but also the biggest bottleneck: every malloc/free
// from any thread must acquire this lock. Production allocators like tcmalloc
// solve this with per-thread caches (thread-local free lists that avoid the
// global lock for common allocation sizes).
static pthread_mutex_t malloc_lock = PTHREAD_MUTEX_INITIALIZER;

// Statistics
static size_t total_allocated = 0;
static size_t total_freed = 0;
static size_t num_allocations = 0;
static size_t num_frees = 0;

// Get more memory from the kernel.
// There are two system calls for this, and production allocators use both:
//
// sbrk(n): Extends the program's data segment by n bytes. Simple and fast
//   but sequential -- you can only grow the heap from one end. Good for
//   small allocations because the memory is contiguous and cache-friendly.
//
// mmap(): Requests a new virtual memory region from the kernel. Can be placed
//   anywhere in the address space and individually returned to the OS via munmap().
//   Better for large allocations because freeing them actually releases memory
//   back to the OS (sbrk can only release from the top of the heap).
static void *request_memory(size_t size) {
    size_t total = HEADER_SIZE + size;
    
    if (total < 128 * 1024) {  // < 128KB: use sbrk for locality
        void *ptr = sbrk(total);
        if (ptr == (void*)-1) return NULL;
        return ptr;
    } else {
        // Large allocation: use mmap so it can be individually freed
        void *ptr = mmap(NULL, total,
                         PROT_READ | PROT_WRITE,
                         MAP_PRIVATE | MAP_ANONYMOUS,  // No file backing, private to this process
                         -1, 0);
        if (ptr == MAP_FAILED) return NULL;
        return ptr;
    }
}

// Find a free block of sufficient size.
// This uses "first-fit": return the first free block that is large enough.
// Alternatives include best-fit (less fragmentation, slower search) and
// segregated lists (O(1) allocation for common sizes -- see Advanced section).
static block_header_t *find_free_block(size_t size) {
    block_header_t *current = heap_start;
    
    while (current) {
        if (current->is_free && current->size >= size) {
            return current;
        }
        current = current->next;
    }
    
    return NULL;  // No suitable free block -- caller must request more from the kernel
}

// Split a block if it is much larger than needed.
// Without splitting, requesting 16 bytes from a 4KB free block wastes 4080 bytes.
// We carve out exactly what the caller needs and turn the remainder into a new
// free block that can satisfy future allocations.
static void split_block(block_header_t *block, size_t size) {
    size_t remaining = block->size - size - HEADER_SIZE;
    
    // Only split if the remainder can hold at least MIN_BLOCK_SIZE of user data.
    // Creating a free block smaller than MIN_BLOCK_SIZE wastes more space on the
    // header than it saves -- better to just give the caller the extra bytes.
    if (remaining < MIN_BLOCK_SIZE) return;
    
    // Create new block after this one
    block_header_t *new_block = (block_header_t*)((char*)block + HEADER_SIZE + size);
    new_block->size = remaining;
    new_block->is_free = true;
    new_block->magic = BLOCK_MAGIC;
    new_block->next = block->next;
    new_block->prev = block;
    
    if (block->next) {
        block->next->prev = new_block;
    } else {
        heap_end = new_block;
    }
    
    block->next = new_block;
    block->size = size;
}

// Coalesce adjacent free blocks
static void coalesce(block_header_t *block) {
    // Coalesce with next block
    if (block->next && block->next->is_free) {
        block->size += HEADER_SIZE + block->next->size;
        block->next = block->next->next;
        if (block->next) {
            block->next->prev = block;
        } else {
            heap_end = block;
        }
    }
    
    // Coalesce with previous block
    if (block->prev && block->prev->is_free) {
        block->prev->size += HEADER_SIZE + block->size;
        block->prev->next = block->next;
        if (block->next) {
            block->next->prev = block->prev;
        } else {
            heap_end = block->prev;
        }
    }
}

void *my_malloc(size_t size) {
    if (size == 0) return NULL;
    
    // Align size
    size = ALIGN(size);
    
    pthread_mutex_lock(&malloc_lock);
    
    // Try to find a free block
    block_header_t *block = find_free_block(size);
    
    if (block) {
        // Found a free block
        block->is_free = false;
        split_block(block, size);
    } else {
        // Need to request more memory
        block = request_memory(size);
        if (!block) {
            pthread_mutex_unlock(&malloc_lock);
            return NULL;
        }
        
        block->size = size;
        block->is_free = false;
        block->magic = BLOCK_MAGIC;
        block->next = NULL;
        block->prev = heap_end;
        
        if (heap_end) {
            heap_end->next = block;
        }
        heap_end = block;
        
        if (!heap_start) {
            heap_start = block;
        }
    }
    
    // Update stats
    total_allocated += block->size;
    num_allocations++;
    
    pthread_mutex_unlock(&malloc_lock);
    
    // Return pointer to user data (after header)
    return (void*)((char*)block + HEADER_SIZE);
}

void my_free(void *ptr) {
    if (!ptr) return;
    
    pthread_mutex_lock(&malloc_lock);
    
    // Get block header
    block_header_t *block = (block_header_t*)((char*)ptr - HEADER_SIZE);
    
    // Validate block
    if (block->magic != BLOCK_MAGIC) {
        fprintf(stderr, "my_free: corrupt heap detected!\n");
        pthread_mutex_unlock(&malloc_lock);
        return;
    }
    
    if (block->is_free) {
        fprintf(stderr, "my_free: double free detected!\n");
        pthread_mutex_unlock(&malloc_lock);
        return;
    }
    
    // Mark as free
    block->is_free = true;
    
    // Update stats
    total_freed += block->size;
    num_frees++;
    
    // Coalesce adjacent free blocks
    coalesce(block);
    
    pthread_mutex_unlock(&malloc_lock);
}

void *my_calloc(size_t nmemb, size_t size) {
    // Check for overflow
    size_t total = nmemb * size;
    if (nmemb != 0 && total / nmemb != size) {
        return NULL;  // Overflow
    }
    
    void *ptr = my_malloc(total);
    if (ptr) {
        memset(ptr, 0, total);
    }
    return ptr;
}

void *my_realloc(void *ptr, size_t size) {
    if (!ptr) return my_malloc(size);
    if (size == 0) {
        my_free(ptr);
        return NULL;
    }
    
    block_header_t *block = (block_header_t*)((char*)ptr - HEADER_SIZE);
    
    // If current block is big enough, just return it
    if (block->size >= size) {
        return ptr;
    }
    
    // Need to allocate new block
    void *new_ptr = my_malloc(size);
    if (!new_ptr) return NULL;
    
    memcpy(new_ptr, ptr, block->size);
    my_free(ptr);
    
    return new_ptr;
}

void my_malloc_stats(void) {
    pthread_mutex_lock(&malloc_lock);
    
    printf("=== Malloc Statistics ===\n");
    printf("Total allocated: %zu bytes\n", total_allocated);
    printf("Total freed: %zu bytes\n", total_freed);
    printf("In use: %zu bytes\n", total_allocated - total_freed);
    printf("Allocations: %zu\n", num_allocations);
    printf("Frees: %zu\n", num_frees);
    
    // Walk the heap
    size_t free_blocks = 0, used_blocks = 0;
    size_t free_bytes = 0, used_bytes = 0;
    
    block_header_t *block = heap_start;
    while (block) {
        if (block->is_free) {
            free_blocks++;
            free_bytes += block->size;
        } else {
            used_blocks++;
            used_bytes += block->size;
        }
        block = block->next;
    }
    
    printf("\nHeap blocks:\n");
    printf("  Free: %zu blocks, %zu bytes\n", free_blocks, free_bytes);
    printf("  Used: %zu blocks, %zu bytes\n", used_blocks, used_bytes);
    printf("  Fragmentation: %.2f%%\n", 
           free_bytes > 0 ? (double)free_blocks / (free_blocks + used_blocks) * 100 : 0);
    
    pthread_mutex_unlock(&malloc_lock);
}

Advanced: Best-Fit and Segregated Lists

The basic allocator scans a single linked list to find a free block. With thousands of blocks, this becomes painfully slow (O(n) per allocation). Segregated free lists solve this by maintaining separate lists for different size ranges — like a hardware store with separate bins for screws, bolts, and washers instead of one giant pile. When a 32-byte request arrives, you only search the 32-byte bin, which is typically very short. This is the core idea behind glibc’s ptmalloc: small allocations go to “fastbins” (singly-linked, no coalescing), medium ones go to “smallbins” (doubly-linked, sorted), and large ones go to “largebins” (sorted by size with skip lists). Our simplified version captures the key insight without all the production complexity.
// advanced_malloc.c
#include "malloc.h"

// Segregated free lists: one list per size class.
// Instead of searching ALL free blocks, we jump directly to the list
// that contains blocks of the right size. This turns O(n) into O(1)
// for the common case where a block of the exact size class is available.
#define NUM_SIZE_CLASSES 8
static block_header_t *free_lists[NUM_SIZE_CLASSES];

// Size class boundaries: 16, 32, 64, 128, 256, 512, 1024, >1024
static size_t size_class_boundaries[] = {16, 32, 64, 128, 256, 512, 1024, SIZE_MAX};

static int get_size_class(size_t size) {
    for (int i = 0; i < NUM_SIZE_CLASSES; i++) {
        if (size <= size_class_boundaries[i]) {
            return i;
        }
    }
    return NUM_SIZE_CLASSES - 1;
}

static void add_to_free_list(block_header_t *block) {
    int class = get_size_class(block->size);
    
    // Insert at head
    block->next = free_lists[class];
    if (free_lists[class]) {
        free_lists[class]->prev = block;
    }
    block->prev = NULL;
    free_lists[class] = block;
}

static void remove_from_free_list(block_header_t *block) {
    int class = get_size_class(block->size);
    
    if (block->prev) {
        block->prev->next = block->next;
    } else {
        free_lists[class] = block->next;
    }
    
    if (block->next) {
        block->next->prev = block->prev;
    }
}

static block_header_t *find_best_fit(size_t size) {
    int class = get_size_class(size);
    
    // Search in appropriate size class and larger
    for (int c = class; c < NUM_SIZE_CLASSES; c++) {
        block_header_t *best = NULL;
        block_header_t *current = free_lists[c];
        
        while (current) {
            if (current->size >= size) {
                if (!best || current->size < best->size) {
                    best = current;
                    // Exact fit - stop searching
                    if (best->size == size) break;
                }
            }
            current = current->next;
        }
        
        if (best) {
            remove_from_free_list(best);
            return best;
        }
    }
    
    return NULL;
}

Interposition: Replace System malloc

One of the most powerful testing techniques in systems programming: you can replace the system malloc with your own implementation without recompiling the target program. Linux’s dynamic linker resolves symbols at load time, and LD_PRELOAD forces it to look in your library first. This is the same mechanism that tools like Valgrind, Electric Fence, and tcmalloc use to inject themselves.
Pitfall: Your allocator must handle the bootstrapping problem. Some libc functions called during startup (like dlsym or printf) may themselves call malloc. If your allocator is not ready yet, you get an infinite recursion. Production interposition libraries use a small static bootstrap buffer for early allocations.
// Replace system malloc with our implementation.
// Compile as shared library: gcc -shared -fPIC -o libmymalloc.so malloc.c -lpthread
// Inject into any program: LD_PRELOAD=./libmymalloc.so ./your_program
// The target program's malloc/free calls will be routed to our implementation.

// Override standard functions
void *malloc(size_t size) {
    return my_malloc(size);
}

void free(void *ptr) {
    my_free(ptr);
}

void *calloc(size_t nmemb, size_t size) {
    return my_calloc(nmemb, size);
}

void *realloc(void *ptr, size_t size) {
    return my_realloc(ptr, size);
}

// Also needed for C++ programs
void *__libc_malloc(size_t size) { return my_malloc(size); }
void __libc_free(void *ptr) { my_free(ptr); }

Testing

// test_malloc.c
#include <stdio.h>
#include <string.h>
#include <assert.h>
#include "malloc.h"

void test_basic(void) {
    printf("Testing basic allocation...\n");
    
    int *p = my_malloc(sizeof(int));
    assert(p != NULL);
    *p = 42;
    assert(*p == 42);
    my_free(p);
    
    printf("  PASS\n");
}

void test_multiple(void) {
    printf("Testing multiple allocations...\n");
    
    void *ptrs[100];
    for (int i = 0; i < 100; i++) {
        ptrs[i] = my_malloc(64);
        assert(ptrs[i] != NULL);
        memset(ptrs[i], i, 64);
    }
    
    // Verify contents
    for (int i = 0; i < 100; i++) {
        char *p = ptrs[i];
        for (int j = 0; j < 64; j++) {
            assert(p[j] == (char)i);
        }
    }
    
    // Free odd indices
    for (int i = 1; i < 100; i += 2) {
        my_free(ptrs[i]);
        ptrs[i] = NULL;
    }
    
    // Allocate again - should reuse freed blocks
    for (int i = 1; i < 100; i += 2) {
        ptrs[i] = my_malloc(64);
        assert(ptrs[i] != NULL);
    }
    
    // Free all
    for (int i = 0; i < 100; i++) {
        my_free(ptrs[i]);
    }
    
    printf("  PASS\n");
}

void test_large(void) {
    printf("Testing large allocations...\n");
    
    void *p = my_malloc(1024 * 1024);  // 1MB
    assert(p != NULL);
    memset(p, 0xAB, 1024 * 1024);
    my_free(p);
    
    printf("  PASS\n");
}

void test_realloc(void) {
    printf("Testing realloc...\n");
    
    char *p = my_malloc(10);
    strcpy(p, "hello");
    
    p = my_realloc(p, 100);
    assert(strcmp(p, "hello") == 0);
    strcat(p, " world");
    
    my_free(p);
    
    printf("  PASS\n");
}

void test_calloc(void) {
    printf("Testing calloc...\n");
    
    int *arr = my_calloc(100, sizeof(int));
    for (int i = 0; i < 100; i++) {
        assert(arr[i] == 0);
    }
    my_free(arr);
    
    printf("  PASS\n");
}

int main(void) {
    printf("=== Memory Allocator Tests ===\n\n");
    
    test_basic();
    test_multiple();
    test_large();
    test_realloc();
    test_calloc();
    
    printf("\n=== All tests passed! ===\n");
    my_malloc_stats();
    
    return 0;
}

Performance Benchmarks

// benchmark.c
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include "malloc.h"

#define ITERATIONS 100000

void benchmark_malloc(const char *name, void *(*alloc)(size_t), void (*dealloc)(void*)) {
    void *ptrs[ITERATIONS];
    
    clock_t start = clock();
    
    for (int i = 0; i < ITERATIONS; i++) {
        ptrs[i] = alloc(64);
    }
    
    for (int i = 0; i < ITERATIONS; i++) {
        dealloc(ptrs[i]);
    }
    
    clock_t end = clock();
    double seconds = (double)(end - start) / CLOCKS_PER_SEC;
    
    printf("%s: %.4f seconds (%d ops/sec)\n", 
           name, seconds, (int)(ITERATIONS * 2 / seconds));
}

int main(void) {
    printf("=== Malloc Benchmark ===\n");
    
    benchmark_malloc("System malloc", malloc, free);
    benchmark_malloc("My malloc", my_malloc, my_free);
    
    return 0;
}

Debugging Features

// Add memory debugging features

typedef struct {
    const char *file;
    int line;
    size_t size;
} alloc_info_t;

// Track allocations for leak detection
#define MAX_TRACKED 10000
static alloc_info_t tracked[MAX_TRACKED];
static size_t tracked_count = 0;

void *my_malloc_debug(size_t size, const char *file, int line) {
    void *ptr = my_malloc(size);
    if (ptr && tracked_count < MAX_TRACKED) {
        tracked[tracked_count].file = file;
        tracked[tracked_count].line = line;
        tracked[tracked_count].size = size;
        tracked_count++;
    }
    return ptr;
}

#define my_malloc(size) my_malloc_debug(size, __FILE__, __LINE__)

void my_malloc_check_leaks(void) {
    printf("=== Memory Leak Report ===\n");
    for (size_t i = 0; i < tracked_count; i++) {
        printf("Leaked %zu bytes at %s:%d\n",
               tracked[i].size, tracked[i].file, tracked[i].line);
    }
    if (tracked_count == 0) {
        printf("No leaks detected!\n");
    }
}

Common Pitfalls

Alignment bugs: Returning unaligned pointers causes SIGBUS on some architectures and silent performance degradation on x86. Always round allocation sizes up to the alignment boundary (typically 16 bytes on 64-bit systems). If your test suite passes but benchmarks show unexpectedly slow performance, check alignment first.Forgetting to coalesce: Without coalescing, your allocator suffers from “external fragmentation” — plenty of total free memory, but no single block large enough for a request. After 10,000 allocation/free cycles, a non-coalescing allocator can waste 50%+ of heap space.Thread safety oversights: The most common bug is forgetting to hold the lock during coalescing or splitting. Both operations modify multiple blocks’ pointers, and a concurrent allocation that sees a half-updated linked list will corrupt the heap.sbrk is not thread-safe: On older Linux kernels, sbrk is not safe to call from multiple threads. Prefer mmap for new allocations in threaded programs, or protect sbrk calls with the global lock (as our implementation does).Header corruption goes undetected: Without the magic number check, a buffer overflow in user code can silently overwrite a block header, causing the allocator to follow corrupted linked list pointers on the next free. This produces crashes that look completely unrelated to the actual bug. The magic canary catches this early.

Learning Outcomes

  • How heap memory actually works at the system call level
  • Block headers, metadata overhead, and alignment requirements
  • Memory fragmentation (internal vs external) and coalescing strategies
  • Multiple allocation strategies (first-fit, best-fit, segregated lists)
  • Thread-safe allocator design and the performance cost of global locks
  • System calls: sbrk for contiguous growth, mmap for large/independent regions
  • Performance measurement and the real cost of malloc in tight loops
  • Memory debugging, leak detection, and corruption canaries

Extensions

Arena Allocator

Fast bump allocation with bulk free

Pool Allocator

Fixed-size allocations for objects

Thread-Local Caching

Per-thread free lists like tcmalloc

Memory Pools

Pre-allocated pools for game engines

Next Up

Build an HTTP Server

Create a concurrent web server from scratch