Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Module 13: PostgreSQL Source Code

This module prepares you to read, understand, and contribute to the PostgreSQL codebase. You’ll learn the code structure, key patterns, and how to set up a development environment. Practical context: The PostgreSQL codebase is approximately 1.3 million lines of C, developed over 35+ years by hundreds of contributors. It is widely regarded as one of the best-structured large C codebases in existence. The code is heavily commented, follows consistent conventions, and uses patterns that were battle-tested long before “clean code” was a popular concept. If you can read PostgreSQL source, you can read virtually any C codebase.
Estimated Time: 14-16 hours
Difficulty: Expert
Prerequisite: Complete Modules 7-9 (internals)
Outcome: Ready for first contribution

13.1 Repository Structure

┌─────────────────────────────────────────────────────────────────────────────┐
│                    POSTGRESQL SOURCE STRUCTURE                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   postgresql/                                                                │
│   ├── src/                        # Main source code                        │
│   │   ├── backend/               # Server-side code (THE CORE)              │
│   │   │   ├── access/            # Table/index access methods               │
│   │   │   ├── catalog/           # System catalog handling                  │
│   │   │   ├── commands/          # SQL command implementations              │
│   │   │   ├── executor/          # Query executor                           │
│   │   │   ├── optimizer/         # Query planner/optimizer                  │
│   │   │   ├── parser/            # SQL parser                               │
│   │   │   ├── postmaster/        # Process management                       │
│   │   │   ├── replication/       # Streaming/logical replication            │
│   │   │   ├── storage/           # Buffer pool, WAL, locks                  │
│   │   │   ├── tcop/              # Traffic cop (query dispatch)             │
│   │   │   └── utils/             # Utilities, memory, caching               │
│   │   ├── include/               # Header files                             │
│   │   ├── interfaces/            # Client libraries (libpq)                 │
│   │   ├── bin/                   # Command-line tools                       │
│   │   │   ├── psql/              # Interactive terminal                     │
│   │   │   ├── pg_dump/           # Backup utility                           │
│   │   │   └── initdb/            # Database initialization                  │
│   │   ├── pl/                    # Procedural languages                     │
│   │   └── test/                  # Regression tests                         │
│   ├── contrib/                   # Optional modules/extensions              │
│   ├── doc/                       # Documentation (SGML)                     │
│   └── config/                    # Build configuration                      │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Key Directories Deep Dive

SQL Parsing
parser/
├── scan.l           # Flex lexer (tokenization)
├── gram.y           # Bison grammar (syntax)
├── parser.c         # Parser entry point
├── analyze.c        # Semantic analysis
├── parse_expr.c     # Expression parsing
├── parse_clause.c   # FROM, WHERE, etc.
├── parse_func.c     # Function call resolution
├── parse_relation.c # Table reference resolution
└── parse_type.c     # Type handling

13.2 Development Environment Setup

Building from Source

# Clone repository
git clone https://git.postgresql.org/git/postgresql.git
cd postgresql

# Configure build (debug mode for development)
./configure \
    --enable-debug \
    --enable-cassert \
    --enable-tap-tests \
    --prefix=$HOME/pg-dev \
    CFLAGS="-O0 -g3"

# Build (use available cores)
make -j$(nproc)

# Install
make install

# Initialize database cluster
$HOME/pg-dev/bin/initdb -D $HOME/pg-dev/data

# Start server
$HOME/pg-dev/bin/pg_ctl -D $HOME/pg-dev/data -l logfile start

# Verify
$HOME/pg-dev/bin/psql -d postgres -c "SELECT version();"

Build Options Explained

Practical tip: Never use -O0 for performance testing — the optimizer makes a dramatic difference in PostgreSQL’s performance (often 2-3x). Use -O0 -g3 for debugging with GDB/LLDB where you need accurate variable inspection, and -O2 -g for realistic performance profiling. Assertions (--enable-cassert) add about 10-15% overhead, so disable them for benchmarks but always enable them during development — they catch subtle bugs that would otherwise manifest as data corruption in production.
OptionPurpose
--enable-debugInclude debugging symbols (no runtime cost if using -O2)
--enable-cassertEnable assertion checks (catches bugs early; ~10-15% overhead)
--enable-tap-testsEnable Perl TAP tests (required for running the full test suite)
CFLAGS="-O0"Disable optimization (variables visible in debugger, but 2-3x slower)
CFLAGS="-g3"Maximum debug info (includes macro definitions in debug symbols)

IDE Setup

// .vscode/c_cpp_properties.json
{
    "configurations": [{
        "name": "PostgreSQL",
        "includePath": [
            "${workspaceFolder}/src/include/**",
            "${workspaceFolder}/src/backend/**"
        ],
        "defines": [
            "FRONTEND",
            "HAVE_CONFIG_H"
        ],
        "compilerPath": "/usr/bin/gcc",
        "cStandard": "c11"
    }]
}

13.3 Code Conventions

Naming Conventions

/* Types: CamelCase */
typedef struct BufferDesc { ... } BufferDesc;
typedef struct RelationData *Relation;

/* Functions: lowercase_with_underscores */
extern Datum heap_insert(Relation relation, HeapTuple tup, ...);
extern void BufferGetTag(Buffer buffer, RelFileNode *rnode, ...);

/* Macros: UPPER_CASE */
#define BUFFER_LOCK_SHARE 1
#define InvalidOid ((Oid) 0)

/* Global variables: CamelCase or descriptive */
bool enable_seqscan = true;
int work_mem = 4096;

/* Local variables: short, lowercase */
int i, j, ntuples;
char *ptr;

Common Patterns

/* PostgreSQL uses memory contexts for allocation */
MemoryContext oldcontext;

/* Switch to appropriate context */
oldcontext = MemoryContextSwitchTo(CacheMemoryContext);

/* Allocate memory (auto-freed when context is reset/deleted) */
ptr = palloc(size);
str = pstrdup("string");

/* Restore previous context */
MemoryContextSwitchTo(oldcontext);

/* Reset context (free all memory in it) */
MemoryContextReset(mycontext);
/* ereport for errors and logging */
ereport(ERROR,
        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
         errmsg("invalid value for parameter \"%s\": %d",
                name, value),
         errdetail("Value must be between %d and %d.",
                  min, max),
         errhint("Try using a smaller value.")));

/* Log levels: DEBUG5-DEBUG1, LOG, INFO, NOTICE, WARNING, ERROR */
elog(DEBUG1, "entering function %s", __func__);
elog(LOG, "checkpoint starting");
elog(WARNING, "skipping invalid entry");

/* ERROR aborts transaction, FATAL terminates connection */
/* PANIC crashes the server (only for catastrophic issues) */
/* All parse/plan nodes inherit from Node */
typedef struct Node {
    NodeTag type;
} Node;

/* Check node type */
if (IsA(node, SeqScan))
    process_seqscan((SeqScan *) node);

/* Safe casting with nodeTag check */
SeqScan *scan = castNode(SeqScan, node);

/* Copy nodes (deep copy) */
Node *copy = copyObject(original);

/* Node comparison */
if (equal(node1, node2))
    ...
/* PostgreSQL uses custom List type (not arrays) */
List *mylist = NIL;  /* Empty list */

/* Add to list */
mylist = lappend(mylist, item);        /* Append */
mylist = lcons(item, mylist);          /* Prepend */
mylist = list_concat(list1, list2);    /* Concatenate */

/* Iterate */
ListCell *lc;
foreach(lc, mylist) {
    Node *item = lfirst(lc);
    /* or lfirst_int(lc) for int lists */
}

/* Get by index */
Node *third = list_nth(mylist, 2);

/* List length */
int len = list_length(mylist);

13.4 Key Data Structures

Understanding these three structures is like learning the nouns of the PostgreSQL language. Nearly every function you will read in the codebase accepts, returns, or manipulates a Relation, a HeapTuple, or a Buffer. Master these and the rest of the code becomes dramatically more readable.

Relation (Table)

/* RelationData represents an open relation (table/index) */
typedef struct RelationData {
    RelFileNode rd_node;        /* Physical file identifier */
    Form_pg_class rd_rel;       /* pg_class tuple */
    TupleDesc rd_att;           /* Tuple descriptor */
    Oid rd_id;                  /* Relation OID */
    
    /* Index info (if index) */
    Form_pg_index rd_index;
    
    /* Cached info */
    bytea *rd_options;          /* Parsed reloptions */
    
    /* ... many more fields */
} RelationData;

typedef RelationData *Relation;

/*
 * Open/close relations -- ALWAYS pair these.
 * The lock mode determines what concurrent operations are allowed.
 * AccessShareLock is the lightest (allows concurrent reads/writes).
 * Missing a table_close leaks a lock for the duration of the transaction.
 */
Relation rel = table_open(relid, AccessShareLock);
/* ... use relation ... */
table_close(rel, AccessShareLock);

HeapTuple

Practical tip: A HeapTuple is a pointer to tuple data, not a copy of it. If the buffer containing the tuple is released, the HeapTuple becomes a dangling pointer. When you need a tuple to survive beyond the current buffer lock, use heap_copytuple() to create an independent copy in the current memory context.
/* HeapTuple = pointer to tuple data + header */
typedef struct HeapTupleData {
    uint32 t_len;               /* Length of tuple */
    ItemPointerData t_self;     /* TID (block, offset) */
    Oid t_tableOid;             /* Table OID */
    HeapTupleHeader t_data;     /* -> tuple header+data */
} HeapTupleData;

typedef HeapTupleData *HeapTuple;

/* Access tuple attributes */
bool isnull;
Datum value = heap_getattr(tuple, attnum, tupdesc, &isnull);

/* Build new tuple */
HeapTuple newtup = heap_form_tuple(tupdesc, values, nulls);

Buffer

/* Buffer = index into shared buffer pool */
typedef int Buffer;

/*
 * Buffer operations follow a strict protocol:
 * 1. ReadBuffer - bring the page into shared buffers (or find it if cached)
 * 2. LockBuffer - acquire a content lock (SHARE for reads, EXCLUSIVE for writes)
 * 3. Access the page via BufferGetPage
 * 4. If modified: MarkBufferDirty (BEFORE releasing lock)
 * 5. LockBuffer UNLOCK - release the content lock
 * 6. ReleaseBuffer - decrement the pin count
 *
 * Forgetting MarkBufferDirty means your write is lost at the next checkpoint.
 * Forgetting ReleaseBuffer leaks a pin, eventually causing "no unpinned buffers"
 * errors that halt the server.
 */
Buffer buf = ReadBuffer(rel, blocknum);
LockBuffer(buf, BUFFER_LOCK_SHARE);   /* Shared lock: concurrent readers OK */

Page page = BufferGetPage(buf);
/* ... access page data ... */

LockBuffer(buf, BUFFER_LOCK_UNLOCK);  /* Release content lock first */
ReleaseBuffer(buf);                    /* Then release pin */

/* For writes: must mark dirty BEFORE unlocking */
MarkBufferDirty(buf);  /* Tells checkpointer this page needs flushing */

13.5 Tracing Code Flow

Example: SELECT Query Path

/* 1. Entry point: postgres.c */
exec_simple_query(query_string)

    ├── pg_parse_query()           /* parser/parser.c */
    │   └── raw_parser()
    │       └── base_yyparse()     /* gram.y generated */

    ├── pg_analyze_and_rewrite()
    │   ├── parse_analyze()        /* parser/analyze.c */
    │   └── pg_rewrite_query()     /* rewrite/rewriteHandler.c */

    ├── pg_plan_queries()
    │   └── planner()              /* optimizer/plan/planner.c */
    │       ├── subquery_planner()
    │       └── create_plan()

    └── PortalRun()
        └── ExecutorRun()          /* executor/execMain.c */
            └── ExecutePlan()
                └── ExecProcNode() /* executor/execProcnode.c */

Using GDB

Practical tip: Do not attach GDB to the postmaster process. Instead, connect with psql first, then attach GDB to the specific backend process serving your session. Run SELECT pg_backend_pid(); in psql to find your PID, then gdb -p <pid>. This avoids accidentally freezing the postmaster (which would block all new connections) and lets you debug a single session in isolation.
# Option 1: Start a fresh server under GDB (single-user mode)
gdb --args $HOME/pg-dev/bin/postgres -D $HOME/pg-dev/data

# Set breakpoint
(gdb) break heap_insert
(gdb) break costsize.c:200

# Run
(gdb) run

# In another terminal, run SQL
psql -c "INSERT INTO test VALUES (1)"

# Back in GDB
(gdb) backtrace           # Show call stack
(gdb) print *tuple        # Inspect HeapTuple
(gdb) p *rel->rd_rel      # Inspect relation metadata
(gdb) next                # Step over
(gdb) step                # Step into

13.6 Adding a Simple Feature

The best way to learn a codebase is to make a small change to it. These two examples — adding a configuration parameter and adding a SQL function — are the PostgreSQL equivalents of “Hello, World.” They touch just enough of the infrastructure to teach you the patterns without requiring deep domain expertise.

Example: Add New GUC Parameter

/* Step 1: Declare in src/backend/utils/misc/guc.c */
static int my_new_setting = 100;  /* default value */

/* Step 2: Add to ConfigureNamesInt array */
{
    {"my_new_setting", PGC_USERSET, CUSTOM_OPTIONS,
        gettext_noop("Description of my setting."),
        NULL
    },
    &my_new_setting,
    100,        /* default */
    0,          /* min */
    10000,      /* max */
    NULL, NULL, NULL
},

/* Step 3: Declare extern in src/include/utils/guc.h */
extern int my_new_setting;

/* Step 4: Use in code */
if (my_new_setting > threshold) {
    /* do something */
}

Example: Add New SQL Function

/* Step 1: Create function in src/backend/utils/adt/myfunc.c */
#include "postgres.h"
#include "fmgr.h"

PG_FUNCTION_INFO_V1(my_add_one);

Datum
my_add_one(PG_FUNCTION_ARGS)
{
    int32 input = PG_GETARG_INT32(0);
    PG_RETURN_INT32(input + 1);
}

/* Step 2: Add to src/include/catalog/pg_proc.dat */
{ oid => '9999', proname => 'my_add_one',
  prorettype => 'int4', proargtypes => 'int4',
  prosrc => 'my_add_one' },

/* Step 3: Add to Makefile */
OBJS += myfunc.o

/* Step 4: Run initdb to update catalogs, or: */
CREATE FUNCTION my_add_one(int) RETURNS int AS 'my_add_one' LANGUAGE internal;

13.7 Running Tests

Practical tip: Always run make check before submitting a patch. The PostgreSQL community takes regressions very seriously — a patch that introduces even one test failure will be rejected immediately. Run make check-world for the most thorough validation, but be aware it can take 10-30 minutes depending on your hardware. For iterative development, make check TESTS="your_test" gives fast feedback on just the tests relevant to your change.
# Full regression test suite (runs in ~2-5 minutes)
make check

# Specific test file (fast iteration during development)
make check TESTS="select"

# Parallel tests across ALL test suites including contrib (thorough but slow)
make check-world -j4

# TAP tests for command-line utilities (Perl required)
make check -C src/bin/psql

# Isolation tests for concurrency behavior (uses isolationtester)
make check -C src/test/isolation

# Code coverage report (useful for ensuring your patch tests all code paths)
./configure --enable-coverage
make check
make coverage-html
# Open htmlcov/index.html to see which lines your tests exercised

Writing Tests

-- src/test/regress/sql/mytest.sql
-- Test my_add_one function
SELECT my_add_one(5);
SELECT my_add_one(-1);
SELECT my_add_one(NULL);

-- Expected output in src/test/regress/expected/mytest.out
-- my_add_one
------------
          6
(1 row)

 my_add_one
------------
          0
(1 row)

 my_add_one
------------
           
(1 row)

13.8 Key Source Files to Study

Practical tip: Start with tcop/postgres.c and exec_simple_query(). This is the “main loop” of query processing — every SELECT, INSERT, UPDATE, and DELETE flows through here. Once you understand this function, you can follow any query from entry point to result. Read it with the call trace from Section 13.5 open in a second window. After postgres.c, costsize.c is the next most rewarding file to study because it reveals exactly how the planner assigns costs — knowledge that directly improves your ability to tune queries in production.
AreaFilesPurpose
Query Entrytcop/postgres.cMain query loop — start here
Parsingparser/gram.y, scan.lSQL syntax definition
Planningoptimizer/plan/planner.cPlan generation entry point
Cost Modeloptimizer/path/costsize.cCost estimation formulas
Executionexecutor/execMain.cQuery execution entry point
Buffer Poolstorage/buffer/bufmgr.cPage caching and replacement
WALaccess/transam/xlog.cWrite-ahead log (largest file in tree)
Transactionsaccess/transam/xact.cTransaction lifecycle management
Heap Accessaccess/heap/heapam.cTable tuple operations (CRUD)
Index Accessaccess/nbtree/nbtree.cB-tree index operations

13.9 Practice Exercise

Goal: Add “Planning Time: X ms” output to regular EXPLAIN (not just ANALYZE)Steps:
  1. Find where EXPLAIN output is generated (commands/explain.c)
  2. Study ExplainOnePlan() function
  3. Add timing capture before/after planning in ExplainOneQuery()
  4. Output the timing in ExplainPrintPlan()
  5. Write regression tests
  6. Submit patch to pgsql-hackers
Files to modify:
  • src/backend/commands/explain.c
  • src/test/regress/sql/explain.sql
  • src/test/regress/expected/explain.out

Next Module

Module 14: Contributing to PostgreSQL

Submit your first patch to PostgreSQL