Skip to main content

Module 13: PostgreSQL Source Code

This module prepares you to read, understand, and contribute to the PostgreSQL codebase. You’ll learn the code structure, key patterns, and how to set up a development environment.
Estimated Time: 14-16 hours
Difficulty: Expert
Prerequisite: Complete Modules 7-9 (internals)
Outcome: Ready for first contribution

13.1 Repository Structure

┌─────────────────────────────────────────────────────────────────────────────┐
│                    POSTGRESQL SOURCE STRUCTURE                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   postgresql/                                                                │
│   ├── src/                        # Main source code                        │
│   │   ├── backend/               # Server-side code (THE CORE)              │
│   │   │   ├── access/            # Table/index access methods               │
│   │   │   ├── catalog/           # System catalog handling                  │
│   │   │   ├── commands/          # SQL command implementations              │
│   │   │   ├── executor/          # Query executor                           │
│   │   │   ├── optimizer/         # Query planner/optimizer                  │
│   │   │   ├── parser/            # SQL parser                               │
│   │   │   ├── postmaster/        # Process management                       │
│   │   │   ├── replication/       # Streaming/logical replication            │
│   │   │   ├── storage/           # Buffer pool, WAL, locks                  │
│   │   │   ├── tcop/              # Traffic cop (query dispatch)             │
│   │   │   └── utils/             # Utilities, memory, caching               │
│   │   ├── include/               # Header files                             │
│   │   ├── interfaces/            # Client libraries (libpq)                 │
│   │   ├── bin/                   # Command-line tools                       │
│   │   │   ├── psql/              # Interactive terminal                     │
│   │   │   ├── pg_dump/           # Backup utility                           │
│   │   │   └── initdb/            # Database initialization                  │
│   │   ├── pl/                    # Procedural languages                     │
│   │   └── test/                  # Regression tests                         │
│   ├── contrib/                   # Optional modules/extensions              │
│   ├── doc/                       # Documentation (SGML)                     │
│   └── config/                    # Build configuration                      │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Key Directories Deep Dive

SQL Parsing
parser/
├── scan.l           # Flex lexer (tokenization)
├── gram.y           # Bison grammar (syntax)
├── parser.c         # Parser entry point
├── analyze.c        # Semantic analysis
├── parse_expr.c     # Expression parsing
├── parse_clause.c   # FROM, WHERE, etc.
├── parse_func.c     # Function call resolution
├── parse_relation.c # Table reference resolution
└── parse_type.c     # Type handling

13.2 Development Environment Setup

Building from Source

# Clone repository
git clone https://git.postgresql.org/git/postgresql.git
cd postgresql

# Configure build (debug mode for development)
./configure \
    --enable-debug \
    --enable-cassert \
    --enable-tap-tests \
    --prefix=$HOME/pg-dev \
    CFLAGS="-O0 -g3"

# Build (use available cores)
make -j$(nproc)

# Install
make install

# Initialize database cluster
$HOME/pg-dev/bin/initdb -D $HOME/pg-dev/data

# Start server
$HOME/pg-dev/bin/pg_ctl -D $HOME/pg-dev/data -l logfile start

# Verify
$HOME/pg-dev/bin/psql -d postgres -c "SELECT version();"

Build Options Explained

OptionPurpose
--enable-debugInclude debugging symbols
--enable-cassertEnable assertion checks
--enable-tap-testsEnable Perl TAP tests
CFLAGS="-O0"Disable optimization (better debugging)
CFLAGS="-g3"Maximum debug info

IDE Setup

// .vscode/c_cpp_properties.json
{
    "configurations": [{
        "name": "PostgreSQL",
        "includePath": [
            "${workspaceFolder}/src/include/**",
            "${workspaceFolder}/src/backend/**"
        ],
        "defines": [
            "FRONTEND",
            "HAVE_CONFIG_H"
        ],
        "compilerPath": "/usr/bin/gcc",
        "cStandard": "c11"
    }]
}

13.3 Code Conventions

Naming Conventions

/* Types: CamelCase */
typedef struct BufferDesc { ... } BufferDesc;
typedef struct RelationData *Relation;

/* Functions: lowercase_with_underscores */
extern Datum heap_insert(Relation relation, HeapTuple tup, ...);
extern void BufferGetTag(Buffer buffer, RelFileNode *rnode, ...);

/* Macros: UPPER_CASE */
#define BUFFER_LOCK_SHARE 1
#define InvalidOid ((Oid) 0)

/* Global variables: CamelCase or descriptive */
bool enable_seqscan = true;
int work_mem = 4096;

/* Local variables: short, lowercase */
int i, j, ntuples;
char *ptr;

Common Patterns

/* PostgreSQL uses memory contexts for allocation */
MemoryContext oldcontext;

/* Switch to appropriate context */
oldcontext = MemoryContextSwitchTo(CacheMemoryContext);

/* Allocate memory (auto-freed when context is reset/deleted) */
ptr = palloc(size);
str = pstrdup("string");

/* Restore previous context */
MemoryContextSwitchTo(oldcontext);

/* Reset context (free all memory in it) */
MemoryContextReset(mycontext);
/* ereport for errors and logging */
ereport(ERROR,
        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
         errmsg("invalid value for parameter \"%s\": %d",
                name, value),
         errdetail("Value must be between %d and %d.",
                  min, max),
         errhint("Try using a smaller value.")));

/* Log levels: DEBUG5-DEBUG1, LOG, INFO, NOTICE, WARNING, ERROR */
elog(DEBUG1, "entering function %s", __func__);
elog(LOG, "checkpoint starting");
elog(WARNING, "skipping invalid entry");

/* ERROR aborts transaction, FATAL terminates connection */
/* PANIC crashes the server (only for catastrophic issues) */
/* All parse/plan nodes inherit from Node */
typedef struct Node {
    NodeTag type;
} Node;

/* Check node type */
if (IsA(node, SeqScan))
    process_seqscan((SeqScan *) node);

/* Safe casting with nodeTag check */
SeqScan *scan = castNode(SeqScan, node);

/* Copy nodes (deep copy) */
Node *copy = copyObject(original);

/* Node comparison */
if (equal(node1, node2))
    ...
/* PostgreSQL uses custom List type (not arrays) */
List *mylist = NIL;  /* Empty list */

/* Add to list */
mylist = lappend(mylist, item);        /* Append */
mylist = lcons(item, mylist);          /* Prepend */
mylist = list_concat(list1, list2);    /* Concatenate */

/* Iterate */
ListCell *lc;
foreach(lc, mylist) {
    Node *item = lfirst(lc);
    /* or lfirst_int(lc) for int lists */
}

/* Get by index */
Node *third = list_nth(mylist, 2);

/* List length */
int len = list_length(mylist);

13.4 Key Data Structures

Relation (Table)

/* RelationData represents an open relation (table/index) */
typedef struct RelationData {
    RelFileNode rd_node;        /* Physical file identifier */
    Form_pg_class rd_rel;       /* pg_class tuple */
    TupleDesc rd_att;           /* Tuple descriptor */
    Oid rd_id;                  /* Relation OID */
    
    /* Index info (if index) */
    Form_pg_index rd_index;
    
    /* Cached info */
    bytea *rd_options;          /* Parsed reloptions */
    
    /* ... many more fields */
} RelationData;

typedef RelationData *Relation;

/* Open/close relations */
Relation rel = table_open(relid, AccessShareLock);
/* ... use relation ... */
table_close(rel, AccessShareLock);

HeapTuple

/* HeapTuple = pointer to tuple data + header */
typedef struct HeapTupleData {
    uint32 t_len;               /* Length of tuple */
    ItemPointerData t_self;     /* TID (block, offset) */
    Oid t_tableOid;             /* Table OID */
    HeapTupleHeader t_data;     /* -> tuple header+data */
} HeapTupleData;

typedef HeapTupleData *HeapTuple;

/* Access tuple attributes */
bool isnull;
Datum value = heap_getattr(tuple, attnum, tupdesc, &isnull);

/* Build new tuple */
HeapTuple newtup = heap_form_tuple(tupdesc, values, nulls);

Buffer

/* Buffer = index into shared buffer pool */
typedef int Buffer;

/* Buffer operations */
Buffer buf = ReadBuffer(rel, blocknum);
LockBuffer(buf, BUFFER_LOCK_SHARE);

Page page = BufferGetPage(buf);
/* ... access page data ... */

LockBuffer(buf, BUFFER_LOCK_UNLOCK);
ReleaseBuffer(buf);

/* Mark buffer dirty (for writes) */
MarkBufferDirty(buf);

13.5 Tracing Code Flow

Example: SELECT Query Path

/* 1. Entry point: postgres.c */
exec_simple_query(query_string)

    ├── pg_parse_query()           /* parser/parser.c */
    │   └── raw_parser()
    │       └── base_yyparse()     /* gram.y generated */

    ├── pg_analyze_and_rewrite()
    │   ├── parse_analyze()        /* parser/analyze.c */
    │   └── pg_rewrite_query()     /* rewrite/rewriteHandler.c */

    ├── pg_plan_queries()
    │   └── planner()              /* optimizer/plan/planner.c */
    │       ├── subquery_planner()
    │       └── create_plan()

    └── PortalRun()
        └── ExecutorRun()          /* executor/execMain.c */
            └── ExecutePlan()
                └── ExecProcNode() /* executor/execProcnode.c */

Using GDB

# Start PostgreSQL under GDB
gdb --args $HOME/pg-dev/bin/postgres -D $HOME/pg-dev/data

# Set breakpoint
(gdb) break heap_insert
(gdb) break costsize.c:200

# Run
(gdb) run

# In another terminal, run SQL
psql -c "INSERT INTO test VALUES (1)"

# Back in GDB
(gdb) backtrace           # Show call stack
(gdb) print *tuple        # Inspect HeapTuple
(gdb) p *rel->rd_rel      # Inspect relation metadata
(gdb) next                # Step over
(gdb) step                # Step into

13.6 Adding a Simple Feature

Example: Add New GUC Parameter

/* Step 1: Declare in src/backend/utils/misc/guc.c */
static int my_new_setting = 100;  /* default value */

/* Step 2: Add to ConfigureNamesInt array */
{
    {"my_new_setting", PGC_USERSET, CUSTOM_OPTIONS,
        gettext_noop("Description of my setting."),
        NULL
    },
    &my_new_setting,
    100,        /* default */
    0,          /* min */
    10000,      /* max */
    NULL, NULL, NULL
},

/* Step 3: Declare extern in src/include/utils/guc.h */
extern int my_new_setting;

/* Step 4: Use in code */
if (my_new_setting > threshold) {
    /* do something */
}

Example: Add New SQL Function

/* Step 1: Create function in src/backend/utils/adt/myfunc.c */
#include "postgres.h"
#include "fmgr.h"

PG_FUNCTION_INFO_V1(my_add_one);

Datum
my_add_one(PG_FUNCTION_ARGS)
{
    int32 input = PG_GETARG_INT32(0);
    PG_RETURN_INT32(input + 1);
}

/* Step 2: Add to src/include/catalog/pg_proc.dat */
{ oid => '9999', proname => 'my_add_one',
  prorettype => 'int4', proargtypes => 'int4',
  prosrc => 'my_add_one' },

/* Step 3: Add to Makefile */
OBJS += myfunc.o

/* Step 4: Run initdb to update catalogs, or: */
CREATE FUNCTION my_add_one(int) RETURNS int AS 'my_add_one' LANGUAGE internal;

13.7 Running Tests

# Full regression test suite
make check

# Specific test file
make check TESTS="select"

# Parallel tests
make check-world -j4

# TAP tests (for utilities)
make check -C src/bin/psql

# Isolation tests (concurrency)
make check -C src/test/isolation

# Code coverage
./configure --enable-coverage
make check
make coverage-html
# View htmlcov/index.html

Writing Tests

-- src/test/regress/sql/mytest.sql
-- Test my_add_one function
SELECT my_add_one(5);
SELECT my_add_one(-1);
SELECT my_add_one(NULL);

-- Expected output in src/test/regress/expected/mytest.out
-- my_add_one
------------
          6
(1 row)

 my_add_one
------------
          0
(1 row)

 my_add_one
------------
           
(1 row)

13.8 Key Source Files to Study

AreaFilesPurpose
Query Entrytcop/postgres.cMain query loop
Parsingparser/gram.y, scan.lSQL syntax
Planningoptimizer/plan/planner.cPlan generation
Cost Modeloptimizer/path/costsize.cCost estimation
Executionexecutor/execMain.cQuery execution
Buffer Poolstorage/buffer/bufmgr.cPage caching
WALaccess/transam/xlog.cWrite-ahead log
Transactionsaccess/transam/xact.cTransaction mgmt
Heap Accessaccess/heap/heapam.cTable operations
Index Accessaccess/nbtree/nbtree.cB-tree index

13.9 Practice Exercise

Goal: Add “Planning Time: X ms” output to regular EXPLAIN (not just ANALYZE)Steps:
  1. Find where EXPLAIN output is generated (commands/explain.c)
  2. Study ExplainOnePlan() function
  3. Add timing capture before/after planning in ExplainOneQuery()
  4. Output the timing in ExplainPrintPlan()
  5. Write regression tests
  6. Submit patch to pgsql-hackers
Files to modify:
  • src/backend/commands/explain.c
  • src/test/regress/sql/explain.sql
  • src/test/regress/expected/explain.out

Next Module

Module 14: Contributing to PostgreSQL

Submit your first patch to PostgreSQL