Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Dynamic Memory Management
The heap is where long-lived, variable-sized data lives. Think of the stack as your desk — fast to use, but limited space, and everything gets cleared when you stand up (function returns). The heap is like a warehouse: you can store anything of any size for as long as you want, but you have to explicitly request space, keep track of where you put things, and return the space when you are done. Forget to return it and the warehouse fills up (memory leak). Return the same space twice and the inventory system breaks (double free).Understanding malloc Overhead
The Hidden Costs
Before usingmalloc, understand its costs:
The hidden work malloc does:
- Find free block: Search free list (O(n) in worst case)
- Split block: If block is larger than requested
- Update metadata: Track allocation size, free list pointers
- Alignment: Ensure proper alignment (adds padding)
- Thread safety: Lock/unlock mutex (multi-threaded programs)
- Time: ~100-500ns per allocation (200-1000 CPU cycles)
- Space: 8-16 bytes of metadata per allocation
- Fragmentation: Can waste 10-30% of heap over time
- Game engines (allocate every frame, 60 FPS = 16ms budget)
- Network servers (allocate per request, thousands/second)
- Real-time systems (unpredictable latency)
- High-frequency trading (microseconds matter)
The malloc Family
Basic Usage
realloc Gotchas
aligned_alloc (C11)
Memory Safety Patterns
Defensive Allocation
Ownership and RAII Patterns
Defer Pattern (Manual)
Custom Allocators
Why Custom Allocators?
Each allocator solves a specific problem that malloc handles poorly: Arena Allocator - Use when:- Problem: Many small allocations with same lifetime
- malloc issue: Overhead of tracking each allocation individually
- Solution: Allocate from bump pointer, free all at once
- Example: Per-request allocations in web server, temporary parsing data, compiler AST nodes
- Speedup: 10-100x faster than malloc
- Tradeoff: Can’t free individual allocations
- Problem: Many same-sized objects allocated/freed frequently
- malloc issue: Fragmentation from mixed-size allocations, search overhead
- Solution: Pre-allocate array of fixed-size blocks
- Example: Game entities, network packets, database records, object pools
- Speedup: 5-50x faster, zero fragmentation
- Tradeoff: Wasted space if pool size is wrong
- Problem: Kernel needs many instances of same struct type
- malloc issue: Initialization overhead, poor cache locality
- Solution: Caches of pre-initialized objects
- Example: Linux kernel (task_struct, inode, dentry), high-performance servers
- Benefit: Reduces initialization overhead, better cache locality
- Tradeoff: More complex implementation
Arena Allocator
Pool Allocator
Slab Allocator
Memory-Mapped I/O
Anonymous Memory Mapping
Debugging Memory Issues
Custom Debug Allocator
Exercises
Safe String Library
Implement
char *safe_strcat(const char *s1, const char *s2) that allocates a new string with the concatenation.Growing Array
Implement a dynamic array that doubles capacity when full, with
push, pop, get, and set operations.Arena with Chunks
Extend the arena allocator to allocate new chunks when current one is full, instead of failing.
Next Up
Preprocessor Mastery
Master macros, conditional compilation, and code generation
Interview Deep-Dive
Walk me through exactly what happens inside malloc when you call malloc(64). What data structures does it maintain?
Walk me through exactly what happens inside malloc when you call malloc(64). What data structures does it maintain?
Strong Answer:
- The specifics depend on the allocator (glibc ptmalloc2, tcmalloc, jemalloc), but the general flow in glibc is: (1) Acquire the arena lock (each thread tries to use its own arena to reduce contention). (2) Determine the size class — glibc rounds up to the nearest “bin” size and adds metadata overhead (typically 8-16 bytes for the chunk header containing size and status flags). (3) Check the thread’s tcache (thread-local cache of recently freed chunks of the same size class) — if a match is found, return it immediately with no locking. (4) Check the appropriate bin (small bins for exact sizes, unsorted bin as a fast cache, large bins sorted by size for larger allocations). (5) If no suitable free chunk exists, extend the heap via
sbrk(for small requests) ormmap(for requests above 128KB). (6) If the found chunk is significantly larger than needed, split it and return the remainder to the free list. (7) Write the chunk header (size, in-use flag, previous-chunk-size for coalescing) and return a pointer to the user data area (just past the header). - The key insight: every allocation carries hidden metadata. A
malloc(1)actually consumes at least 32 bytes (16 bytes of header + 16 bytes minimum chunk size in glibc). This is why millions of tiny allocations waste enormous amounts of memory.
free know how many bytes to release when you do not pass a size?Follow-up Answer:- The size is stored in the chunk header, which lives at a negative offset from the user pointer. When you call
free(ptr), the allocator computes(chunk_header*)(ptr - HEADER_SIZE)and reads the size field. This is also why passing a non-malloc’d pointer or a corrupted pointer tofreecauses heap corruption — it reads garbage as the size, which corrupts the free list data structures. Debug allocators add magic numbers (canaries) to the header so they can detect corruption at free time.
Describe the arena allocator pattern. When is it dramatically better than malloc, and what is the fundamental tradeoff?
Describe the arena allocator pattern. When is it dramatically better than malloc, and what is the fundamental tradeoff?
Strong Answer:
- An arena (also called a bump allocator or linear allocator) is the simplest possible allocator: maintain a pointer to the next free byte, and “allocate” by bumping that pointer forward. Deallocation is all-or-nothing — you reset the pointer to the beginning, freeing everything at once. There is no per-object free, no free list, no coalescing, no fragmentation.
- It is dramatically better when allocations share a common lifetime. The canonical example is per-request processing in a web server: during request handling, you allocate a parse tree, string buffers, intermediate results. At the end of the request, all of that is garbage. With malloc, you would need to track and individually free every allocation. With an arena, you call
arena_reset()— one pointer assignment frees everything instantly. - The fundamental tradeoff: you cannot free individual allocations. If you allocate A, B, C in an arena and need to free only B, you cannot. This makes arenas unsuitable for long-lived heterogeneous data (like a general-purpose heap). But for batch-lifetime data, arenas are 10-100x faster than malloc and produce zero fragmentation.
- Real-world users: compilers (AST nodes allocated per-compilation-unit, freed together), game engines (per-frame allocations reset every 16ms), and the Go runtime (goroutine stacks use arena-like allocation).
- Give each thread its own arena. Since arenas are only written to (bump the pointer) and reset by the same thread that allocates, no synchronization is needed. The parent thread allocates the arena buffers at startup and assigns one to each worker thread. This is exactly how request-scoped allocation works in multithreaded servers: each request handler thread has its own arena, allocates freely without contention, and resets at request end. If you truly need a shared arena, use
atomic_fetch_addon the offset for lock-free bump allocation — but this is rarely necessary.
What is heap fragmentation, and how would you diagnose and fix it in a long-running server?
What is heap fragmentation, and how would you diagnose and fix it in a long-running server?
Strong Answer:
- External fragmentation occurs when free memory is broken into many small non-contiguous chunks. You might have 100MB free total, but the largest contiguous block is only 1MB, so a 2MB allocation fails. Internal fragmentation is wasted space within allocated blocks due to alignment, minimum sizes, and size-class rounding.
- Diagnosis: Use
malloc_info(0, stderr)(glibc) orjemalloc’smalloc_stats_printto inspect heap state. Look at the ratio of free bytes to the largest free block — a high ratio means severe fragmentation. Monitor RSS growth over time; if RSS grows but your application’s logical data size is stable, fragmentation is likely. - Fixes: (1) Use pool allocators for same-sized objects that are frequently allocated and freed — this eliminates fragmentation for that size class entirely. (2) Use arena allocators for batch-lifetime data. (3) Switch to jemalloc or tcmalloc, which have better fragmentation characteristics than glibc’s allocator (jemalloc uses size-segregated pages, which bounds fragmentation). (4) Periodically compact data structures by allocating a new copy and freeing the old one (this works for caches, not for data with external pointers). (5) Redesign allocation patterns — the most common cause of fragmentation is interleaving allocations of different lifetimes and sizes.
free not always reduce the process’s RSS?Follow-up Answer:- glibc’s
freereturns memory to its internal free list but does not return it to the OS unless the freed block is at the very top of thebrkheap (and even then, only if a threshold is met). Memory in the middle of the heap cannot be returned viabrk. Formmap’d allocations (typically above 128KB),freedoes callmunmap, which immediately returns pages to the OS. This is a common source of confusion in production: a server frees 90% of its data but RSS barely changes, because the remaining 10% is scattered throughout the heap, preventingbrkfrom shrinking. Solutions: usemalloc_trim(0)to force glibc to release unused pages, switch to jemalloc (which usesmadvise(MADV_DONTNEED)to release pages within active regions), or usemmapdirectly for large allocations you know will be freed.