This module prepares you to read, understand, and contribute to the PostgreSQL codebase. You’ll learn the code structure, key patterns, and how to set up a development environment.Practical context: The PostgreSQL codebase is approximately 1.3 million lines of C, developed over 35+ years by hundreds of contributors. It is widely regarded as one of the best-structured large C codebases in existence. The code is heavily commented, follows consistent conventions, and uses patterns that were battle-tested long before “clean code” was a popular concept. If you can read PostgreSQL source, you can read virtually any C codebase.
Estimated Time: 14-16 hours Difficulty: Expert Prerequisite: Complete Modules 7-9 (internals) Outcome: Ready for first contribution
Practical tip: Never use -O0 for performance testing — the optimizer makes a dramatic difference in PostgreSQL’s performance (often 2-3x). Use -O0 -g3 for debugging with GDB/LLDB where you need accurate variable inspection, and -O2 -g for realistic performance profiling. Assertions (--enable-cassert) add about 10-15% overhead, so disable them for benchmarks but always enable them during development — they catch subtle bugs that would otherwise manifest as data corruption in production.
Option
Purpose
--enable-debug
Include debugging symbols (no runtime cost if using -O2)
/* PostgreSQL uses memory contexts for allocation */MemoryContext oldcontext;/* Switch to appropriate context */oldcontext = MemoryContextSwitchTo(CacheMemoryContext);/* Allocate memory (auto-freed when context is reset/deleted) */ptr = palloc(size);str = pstrdup("string");/* Restore previous context */MemoryContextSwitchTo(oldcontext);/* Reset context (free all memory in it) */MemoryContextReset(mycontext);
Error Handling
/* ereport for errors and logging */ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("invalid value for parameter \"%s\": %d", name, value), errdetail("Value must be between %d and %d.", min, max), errhint("Try using a smaller value.")));/* Log levels: DEBUG5-DEBUG1, LOG, INFO, NOTICE, WARNING, ERROR */elog(DEBUG1, "entering function %s", __func__);elog(LOG, "checkpoint starting");elog(WARNING, "skipping invalid entry");/* ERROR aborts transaction, FATAL terminates connection *//* PANIC crashes the server (only for catastrophic issues) */
Understanding these three structures is like learning the nouns of the PostgreSQL language. Nearly every function you will read in the codebase accepts, returns, or manipulates a Relation, a HeapTuple, or a Buffer. Master these and the rest of the code becomes dramatically more readable.
Practical tip: A HeapTuple is a pointer to tuple data, not a copy of it. If the buffer containing the tuple is released, the HeapTuple becomes a dangling pointer. When you need a tuple to survive beyond the current buffer lock, use heap_copytuple() to create an independent copy in the current memory context.
/* HeapTuple = pointer to tuple data + header */typedef struct HeapTupleData { uint32 t_len; /* Length of tuple */ ItemPointerData t_self; /* TID (block, offset) */ Oid t_tableOid; /* Table OID */ HeapTupleHeader t_data; /* -> tuple header+data */} HeapTupleData;typedef HeapTupleData *HeapTuple;/* Access tuple attributes */bool isnull;Datum value = heap_getattr(tuple, attnum, tupdesc, &isnull);/* Build new tuple */HeapTuple newtup = heap_form_tuple(tupdesc, values, nulls);
/* Buffer = index into shared buffer pool */typedef int Buffer;/* * Buffer operations follow a strict protocol: * 1. ReadBuffer - bring the page into shared buffers (or find it if cached) * 2. LockBuffer - acquire a content lock (SHARE for reads, EXCLUSIVE for writes) * 3. Access the page via BufferGetPage * 4. If modified: MarkBufferDirty (BEFORE releasing lock) * 5. LockBuffer UNLOCK - release the content lock * 6. ReleaseBuffer - decrement the pin count * * Forgetting MarkBufferDirty means your write is lost at the next checkpoint. * Forgetting ReleaseBuffer leaks a pin, eventually causing "no unpinned buffers" * errors that halt the server. */Buffer buf = ReadBuffer(rel, blocknum);LockBuffer(buf, BUFFER_LOCK_SHARE); /* Shared lock: concurrent readers OK */Page page = BufferGetPage(buf);/* ... access page data ... */LockBuffer(buf, BUFFER_LOCK_UNLOCK); /* Release content lock first */ReleaseBuffer(buf); /* Then release pin *//* For writes: must mark dirty BEFORE unlocking */MarkBufferDirty(buf); /* Tells checkpointer this page needs flushing */
Practical tip: Do not attach GDB to the postmaster process. Instead, connect with psql first, then attach GDB to the specific backend process serving your session. Run SELECT pg_backend_pid(); in psql to find your PID, then gdb -p <pid>. This avoids accidentally freezing the postmaster (which would block all new connections) and lets you debug a single session in isolation.
# Option 1: Start a fresh server under GDB (single-user mode)gdb --args $HOME/pg-dev/bin/postgres -D $HOME/pg-dev/data# Set breakpoint(gdb) break heap_insert(gdb) break costsize.c:200# Run(gdb) run# In another terminal, run SQLpsql -c "INSERT INTO test VALUES (1)"# Back in GDB(gdb) backtrace # Show call stack(gdb) print *tuple # Inspect HeapTuple(gdb) p *rel->rd_rel # Inspect relation metadata(gdb) next # Step over(gdb) step # Step into
The best way to learn a codebase is to make a small change to it. These two examples — adding a configuration parameter and adding a SQL function — are the PostgreSQL equivalents of “Hello, World.” They touch just enough of the infrastructure to teach you the patterns without requiring deep domain expertise.
Practical tip: Always run make check before submitting a patch. The PostgreSQL community takes regressions very seriously — a patch that introduces even one test failure will be rejected immediately. Run make check-world for the most thorough validation, but be aware it can take 10-30 minutes depending on your hardware. For iterative development, make check TESTS="your_test" gives fast feedback on just the tests relevant to your change.
# Full regression test suite (runs in ~2-5 minutes)make check# Specific test file (fast iteration during development)make check TESTS="select"# Parallel tests across ALL test suites including contrib (thorough but slow)make check-world -j4# TAP tests for command-line utilities (Perl required)make check -C src/bin/psql# Isolation tests for concurrency behavior (uses isolationtester)make check -C src/test/isolation# Code coverage report (useful for ensuring your patch tests all code paths)./configure --enable-coveragemake checkmake coverage-html# Open htmlcov/index.html to see which lines your tests exercised
Practical tip: Start with tcop/postgres.c and exec_simple_query(). This is the “main loop” of query processing — every SELECT, INSERT, UPDATE, and DELETE flows through here. Once you understand this function, you can follow any query from entry point to result. Read it with the call trace from Section 13.5 open in a second window. After postgres.c, costsize.c is the next most rewarding file to study because it reveals exactly how the planner assigns costs — knowledge that directly improves your ability to tune queries in production.