Implement a custom memory allocator that replaces the system malloc. This is the ultimate C project — you will understand exactly how dynamic memory works at the lowest level.Every time you call malloc(64) in any C program, an allocator must decide: which chunk of memory to hand back, how to track what is free and what is not, and how to do this fast enough that it does not become a bottleneck. Real-world allocators like glibc’s ptmalloc, Google’s tcmalloc, and Facebook’s jemalloc are engineering marvels that balance speed, fragmentation, thread scalability, and cache behavior. In this project, you will build a simplified version from scratch to understand the fundamental design decisions they all share.
Think of the allocator as managing a parking lot. Cars (allocations) come and go at unpredictable times, leaving gaps of various sizes. The manager must decide where to park each new car (allocation strategy), when to re-stripe the lot to consolidate empty spaces (coalescing), and how to handle rush hour without creating a traffic jam (thread safety).
1
System Interface
Get raw memory pages from the kernel with sbrk (extend the data segment) or mmap (request a new memory region). This is like acquiring land for the parking lot — you ask the OS for a plot.
2
Block Management
Track which regions are free and which are allocated using metadata headers. Each block carries a small header that records its size, status, and links to neighboring blocks.
3
Fragmentation
Handle splitting (cutting a large block to satisfy a smaller request) and coalescing (merging adjacent free blocks back together). Without coalescing, your heap becomes Swiss cheese — lots of small holes, none big enough to use.
4
Performance
Optimize for speed and memory efficiency. First-fit is simple but slow for large heaps. Segregated free lists give O(1) allocation for common sizes — the same approach used by glibc’s ptmalloc and tcmalloc.
// malloc.h#ifndef MY_MALLOC_H#define MY_MALLOC_H#include <stddef.h>#include <stdint.h>#include <stdbool.h>// Block header - stored immediately before each user allocation.// When the user calls malloc(64), we actually allocate HEADER_SIZE + 64 bytes.// The header sits at the front, invisible to the caller. The pointer returned// to the user points just past the header. Think of it like a shipping label// attached to a package: the recipient never sees the label, but the postal// system needs it to route the package correctly.typedef struct block_header { size_t size; // Size of user data (not including header) bool is_free; // Is this block currently available for reuse? struct block_header *next; // Next block in the global linked list of all blocks struct block_header *prev; // Previous block (needed for O(1) coalescing) uint32_t magic; // Canary value (0xDEADBEEF) to detect heap corruption} block_header_t;#define HEADER_SIZE sizeof(block_header_t)#define BLOCK_MAGIC 0xDEADBEEF#define MIN_BLOCK_SIZE 16// Alignment for all allocations#define ALIGNMENT 16#define ALIGN(size) (((size) + (ALIGNMENT-1)) & ~(ALIGNMENT-1))// Public interfacevoid *my_malloc(size_t size);void my_free(void *ptr);void *my_calloc(size_t nmemb, size_t size);void *my_realloc(void *ptr, size_t size);// Debugvoid my_malloc_stats(void);#endif
// malloc.c#define _GNU_SOURCE#include "malloc.h"#include <unistd.h>#include <sys/mman.h>#include <string.h>#include <stdio.h>#include <pthread.h>// Global state -- the allocator maintains a doubly-linked list of ALL blocks// (both free and allocated). heap_start points to the first block ever created,// heap_end points to the most recent. This list is the allocator's "map" of// the entire heap -- walking it lets us find free blocks and coalesce neighbors.static block_header_t *heap_start = NULL;static block_header_t *heap_end = NULL;// A single global mutex protects all allocator state. This is the simplest// thread-safety approach, but also the biggest bottleneck: every malloc/free// from any thread must acquire this lock. Production allocators like tcmalloc// solve this with per-thread caches (thread-local free lists that avoid the// global lock for common allocation sizes).static pthread_mutex_t malloc_lock = PTHREAD_MUTEX_INITIALIZER;// Statisticsstatic size_t total_allocated = 0;static size_t total_freed = 0;static size_t num_allocations = 0;static size_t num_frees = 0;// Get more memory from the kernel.// There are two system calls for this, and production allocators use both://// sbrk(n): Extends the program's data segment by n bytes. Simple and fast// but sequential -- you can only grow the heap from one end. Good for// small allocations because the memory is contiguous and cache-friendly.//// mmap(): Requests a new virtual memory region from the kernel. Can be placed// anywhere in the address space and individually returned to the OS via munmap().// Better for large allocations because freeing them actually releases memory// back to the OS (sbrk can only release from the top of the heap).static void *request_memory(size_t size) { size_t total = HEADER_SIZE + size; if (total < 128 * 1024) { // < 128KB: use sbrk for locality void *ptr = sbrk(total); if (ptr == (void*)-1) return NULL; return ptr; } else { // Large allocation: use mmap so it can be individually freed void *ptr = mmap(NULL, total, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, // No file backing, private to this process -1, 0); if (ptr == MAP_FAILED) return NULL; return ptr; }}// Find a free block of sufficient size.// This uses "first-fit": return the first free block that is large enough.// Alternatives include best-fit (less fragmentation, slower search) and// segregated lists (O(1) allocation for common sizes -- see Advanced section).static block_header_t *find_free_block(size_t size) { block_header_t *current = heap_start; while (current) { if (current->is_free && current->size >= size) { return current; } current = current->next; } return NULL; // No suitable free block -- caller must request more from the kernel}// Split a block if it is much larger than needed.// Without splitting, requesting 16 bytes from a 4KB free block wastes 4080 bytes.// We carve out exactly what the caller needs and turn the remainder into a new// free block that can satisfy future allocations.static void split_block(block_header_t *block, size_t size) { size_t remaining = block->size - size - HEADER_SIZE; // Only split if the remainder can hold at least MIN_BLOCK_SIZE of user data. // Creating a free block smaller than MIN_BLOCK_SIZE wastes more space on the // header than it saves -- better to just give the caller the extra bytes. if (remaining < MIN_BLOCK_SIZE) return; // Create new block after this one block_header_t *new_block = (block_header_t*)((char*)block + HEADER_SIZE + size); new_block->size = remaining; new_block->is_free = true; new_block->magic = BLOCK_MAGIC; new_block->next = block->next; new_block->prev = block; if (block->next) { block->next->prev = new_block; } else { heap_end = new_block; } block->next = new_block; block->size = size;}// Coalesce adjacent free blocksstatic void coalesce(block_header_t *block) { // Coalesce with next block if (block->next && block->next->is_free) { block->size += HEADER_SIZE + block->next->size; block->next = block->next->next; if (block->next) { block->next->prev = block; } else { heap_end = block; } } // Coalesce with previous block if (block->prev && block->prev->is_free) { block->prev->size += HEADER_SIZE + block->size; block->prev->next = block->next; if (block->next) { block->next->prev = block->prev; } else { heap_end = block->prev; } }}void *my_malloc(size_t size) { if (size == 0) return NULL; // Align size size = ALIGN(size); pthread_mutex_lock(&malloc_lock); // Try to find a free block block_header_t *block = find_free_block(size); if (block) { // Found a free block block->is_free = false; split_block(block, size); } else { // Need to request more memory block = request_memory(size); if (!block) { pthread_mutex_unlock(&malloc_lock); return NULL; } block->size = size; block->is_free = false; block->magic = BLOCK_MAGIC; block->next = NULL; block->prev = heap_end; if (heap_end) { heap_end->next = block; } heap_end = block; if (!heap_start) { heap_start = block; } } // Update stats total_allocated += block->size; num_allocations++; pthread_mutex_unlock(&malloc_lock); // Return pointer to user data (after header) return (void*)((char*)block + HEADER_SIZE);}void my_free(void *ptr) { if (!ptr) return; pthread_mutex_lock(&malloc_lock); // Get block header block_header_t *block = (block_header_t*)((char*)ptr - HEADER_SIZE); // Validate block if (block->magic != BLOCK_MAGIC) { fprintf(stderr, "my_free: corrupt heap detected!\n"); pthread_mutex_unlock(&malloc_lock); return; } if (block->is_free) { fprintf(stderr, "my_free: double free detected!\n"); pthread_mutex_unlock(&malloc_lock); return; } // Mark as free block->is_free = true; // Update stats total_freed += block->size; num_frees++; // Coalesce adjacent free blocks coalesce(block); pthread_mutex_unlock(&malloc_lock);}void *my_calloc(size_t nmemb, size_t size) { // Check for overflow size_t total = nmemb * size; if (nmemb != 0 && total / nmemb != size) { return NULL; // Overflow } void *ptr = my_malloc(total); if (ptr) { memset(ptr, 0, total); } return ptr;}void *my_realloc(void *ptr, size_t size) { if (!ptr) return my_malloc(size); if (size == 0) { my_free(ptr); return NULL; } block_header_t *block = (block_header_t*)((char*)ptr - HEADER_SIZE); // If current block is big enough, just return it if (block->size >= size) { return ptr; } // Need to allocate new block void *new_ptr = my_malloc(size); if (!new_ptr) return NULL; memcpy(new_ptr, ptr, block->size); my_free(ptr); return new_ptr;}void my_malloc_stats(void) { pthread_mutex_lock(&malloc_lock); printf("=== Malloc Statistics ===\n"); printf("Total allocated: %zu bytes\n", total_allocated); printf("Total freed: %zu bytes\n", total_freed); printf("In use: %zu bytes\n", total_allocated - total_freed); printf("Allocations: %zu\n", num_allocations); printf("Frees: %zu\n", num_frees); // Walk the heap size_t free_blocks = 0, used_blocks = 0; size_t free_bytes = 0, used_bytes = 0; block_header_t *block = heap_start; while (block) { if (block->is_free) { free_blocks++; free_bytes += block->size; } else { used_blocks++; used_bytes += block->size; } block = block->next; } printf("\nHeap blocks:\n"); printf(" Free: %zu blocks, %zu bytes\n", free_blocks, free_bytes); printf(" Used: %zu blocks, %zu bytes\n", used_blocks, used_bytes); printf(" Fragmentation: %.2f%%\n", free_bytes > 0 ? (double)free_blocks / (free_blocks + used_blocks) * 100 : 0); pthread_mutex_unlock(&malloc_lock);}
The basic allocator scans a single linked list to find a free block. With thousands of blocks, this becomes painfully slow (O(n) per allocation). Segregated free lists solve this by maintaining separate lists for different size ranges — like a hardware store with separate bins for screws, bolts, and washers instead of one giant pile. When a 32-byte request arrives, you only search the 32-byte bin, which is typically very short.This is the core idea behind glibc’s ptmalloc: small allocations go to “fastbins” (singly-linked, no coalescing), medium ones go to “smallbins” (doubly-linked, sorted), and large ones go to “largebins” (sorted by size with skip lists). Our simplified version captures the key insight without all the production complexity.
// advanced_malloc.c#include "malloc.h"// Segregated free lists: one list per size class.// Instead of searching ALL free blocks, we jump directly to the list// that contains blocks of the right size. This turns O(n) into O(1)// for the common case where a block of the exact size class is available.#define NUM_SIZE_CLASSES 8static block_header_t *free_lists[NUM_SIZE_CLASSES];// Size class boundaries: 16, 32, 64, 128, 256, 512, 1024, >1024static size_t size_class_boundaries[] = {16, 32, 64, 128, 256, 512, 1024, SIZE_MAX};static int get_size_class(size_t size) { for (int i = 0; i < NUM_SIZE_CLASSES; i++) { if (size <= size_class_boundaries[i]) { return i; } } return NUM_SIZE_CLASSES - 1;}static void add_to_free_list(block_header_t *block) { int class = get_size_class(block->size); // Insert at head block->next = free_lists[class]; if (free_lists[class]) { free_lists[class]->prev = block; } block->prev = NULL; free_lists[class] = block;}static void remove_from_free_list(block_header_t *block) { int class = get_size_class(block->size); if (block->prev) { block->prev->next = block->next; } else { free_lists[class] = block->next; } if (block->next) { block->next->prev = block->prev; }}static block_header_t *find_best_fit(size_t size) { int class = get_size_class(size); // Search in appropriate size class and larger for (int c = class; c < NUM_SIZE_CLASSES; c++) { block_header_t *best = NULL; block_header_t *current = free_lists[c]; while (current) { if (current->size >= size) { if (!best || current->size < best->size) { best = current; // Exact fit - stop searching if (best->size == size) break; } } current = current->next; } if (best) { remove_from_free_list(best); return best; } } return NULL;}
One of the most powerful testing techniques in systems programming: you can replace the system malloc with your own implementation without recompiling the target program. Linux’s dynamic linker resolves symbols at load time, and LD_PRELOAD forces it to look in your library first. This is the same mechanism that tools like Valgrind, Electric Fence, and tcmalloc use to inject themselves.
Pitfall: Your allocator must handle the bootstrapping problem. Some libc functions called during startup (like dlsym or printf) may themselves call malloc. If your allocator is not ready yet, you get an infinite recursion. Production interposition libraries use a small static bootstrap buffer for early allocations.
// Replace system malloc with our implementation.// Compile as shared library: gcc -shared -fPIC -o libmymalloc.so malloc.c -lpthread// Inject into any program: LD_PRELOAD=./libmymalloc.so ./your_program// The target program's malloc/free calls will be routed to our implementation.// Override standard functionsvoid *malloc(size_t size) { return my_malloc(size);}void free(void *ptr) { my_free(ptr);}void *calloc(size_t nmemb, size_t size) { return my_calloc(nmemb, size);}void *realloc(void *ptr, size_t size) { return my_realloc(ptr, size);}// Also needed for C++ programsvoid *__libc_malloc(size_t size) { return my_malloc(size); }void __libc_free(void *ptr) { my_free(ptr); }
Alignment bugs: Returning unaligned pointers causes SIGBUS on some architectures and silent performance degradation on x86. Always round allocation sizes up to the alignment boundary (typically 16 bytes on 64-bit systems). If your test suite passes but benchmarks show unexpectedly slow performance, check alignment first.Forgetting to coalesce: Without coalescing, your allocator suffers from “external fragmentation” — plenty of total free memory, but no single block large enough for a request. After 10,000 allocation/free cycles, a non-coalescing allocator can waste 50%+ of heap space.Thread safety oversights: The most common bug is forgetting to hold the lock during coalescing or splitting. Both operations modify multiple blocks’ pointers, and a concurrent allocation that sees a half-updated linked list will corrupt the heap.sbrk is not thread-safe: On older Linux kernels, sbrk is not safe to call from multiple threads. Prefer mmap for new allocations in threaded programs, or protect sbrk calls with the global lock (as our implementation does).Header corruption goes undetected: Without the magic number check, a buffer overflow in user code can silently overwrite a block header, causing the allocator to follow corrupted linked list pointers on the next free. This produces crashes that look completely unrelated to the actual bug. The magic canary catches this early.