Module 13: PostgreSQL Source Code
13.1 Repository Structure
Key Directories Deep Dive
13.2 Development Environment Setup
Building from Source
Build Options Explained
IDE Setup
13.3 Code Conventions
Naming Conventions
Common Patterns
13.4 Key Data Structures
Relation (Table)
HeapTuple
Buffer
13.5 Tracing Code Flow
Example: SELECT Query Path
Using GDB
13.6 Adding a Simple Feature
Example: Add New GUC Parameter
Example: Add New SQL Function
13.7 Running Tests
Writing Tests
13.8 Key Source Files to Study
13.9 Practice Exercise
Next Module

Module 13: PostgreSQL Source Code

This module prepares you to read, understand, and contribute to the PostgreSQL codebase. You’ll learn the code structure, key patterns, and how to set up a development environment.

Estimated Time: 14-16 hours
Difficulty: Expert
Prerequisite: Complete Modules 7-9 (internals)
Outcome: Ready for first contribution

13.1 Repository Structure

┌─────────────────────────────────────────────────────────────────────────────┐
│                    POSTGRESQL SOURCE STRUCTURE                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   postgresql/                                                                │
│   ├── src/                        # Main source code                        │
│   │   ├── backend/               # Server-side code (THE CORE)              │
│   │   │   ├── access/            # Table/index access methods               │
│   │   │   ├── catalog/           # System catalog handling                  │
│   │   │   ├── commands/          # SQL command implementations              │
│   │   │   ├── executor/          # Query executor                           │
│   │   │   ├── optimizer/         # Query planner/optimizer                  │
│   │   │   ├── parser/            # SQL parser                               │
│   │   │   ├── postmaster/        # Process management                       │
│   │   │   ├── replication/       # Streaming/logical replication            │
│   │   │   ├── storage/           # Buffer pool, WAL, locks                  │
│   │   │   ├── tcop/              # Traffic cop (query dispatch)             │
│   │   │   └── utils/             # Utilities, memory, caching               │
│   │   ├── include/               # Header files                             │
│   │   ├── interfaces/            # Client libraries (libpq)                 │
│   │   ├── bin/                   # Command-line tools                       │
│   │   │   ├── psql/              # Interactive terminal                     │
│   │   │   ├── pg_dump/           # Backup utility                           │
│   │   │   └── initdb/            # Database initialization                  │
│   │   ├── pl/                    # Procedural languages                     │
│   │   └── test/                  # Regression tests                         │
│   ├── contrib/                   # Optional modules/extensions              │
│   ├── doc/                       # Documentation (SGML)                     │
│   └── config/                    # Build configuration                      │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Key Directories Deep Dive

src/backend/parser
src/backend/optimizer
src/backend/executor
src/backend/storage

SQL Parsing

parser/
├── scan.l           # Flex lexer (tokenization)
├── gram.y           # Bison grammar (syntax)
├── parser.c         # Parser entry point
├── analyze.c        # Semantic analysis
├── parse_expr.c     # Expression parsing
├── parse_clause.c   # FROM, WHERE, etc.
├── parse_func.c     # Function call resolution
├── parse_relation.c # Table reference resolution
└── parse_type.c     # Type handling

Query Planning

optimizer/
├── plan/
│   ├── planner.c       # Main planner entry
│   ├── createplan.c    # Convert paths to plans
│   └── subselect.c     # Subquery handling
├── path/
│   ├── allpaths.c      # Path generation
│   ├── indxpath.c      # Index paths
│   ├── joinpath.c      # Join paths
│   └── costsize.c      # Cost estimation
└── util/
    ├── clauses.c       # Expression utilities
    └── pathnode.c      # Path node creation

Query Execution

executor/
├── execMain.c          # Executor entry point
├── execProcnode.c      # Node dispatch
├── nodeSeqscan.c       # Sequential scan
├── nodeIndexscan.c     # Index scan
├── nodeBitmapscan.c    # Bitmap scan
├── nodeHashjoin.c      # Hash join
├── nodeMergejoin.c     # Merge join
├── nodeNestloop.c      # Nested loop join
├── nodeAgg.c           # Aggregation
└── nodeSort.c          # Sorting

Storage Layer

storage/
├── buffer/
│   ├── bufmgr.c       # Buffer manager
│   └── freelist.c     # Buffer replacement
├── file/
│   └── fd.c           # File descriptor mgmt
├── ipc/
│   └── shmem.c        # Shared memory
├── lmgr/
│   └── lock.c         # Lock manager
└── smgr/
    └── md.c           # Magnetic disk storage

13.2 Development Environment Setup

Building from Source

# Clone repository
git clone https://git.postgresql.org/git/postgresql.git
cd postgresql

# Configure build (debug mode for development)
./configure \
    --enable-debug \
    --enable-cassert \
    --enable-tap-tests \
    --prefix=$HOME/pg-dev \
    CFLAGS="-O0 -g3"

# Build (use available cores)
make -j$(nproc)

# Install
make install

# Initialize database cluster
$HOME/pg-dev/bin/initdb -D $HOME/pg-dev/data

# Start server
$HOME/pg-dev/bin/pg_ctl -D $HOME/pg-dev/data -l logfile start

# Verify
$HOME/pg-dev/bin/psql -d postgres -c "SELECT version();"

Build Options Explained

Option	Purpose
`--enable-debug`	Include debugging symbols
`--enable-cassert`	Enable assertion checks
`--enable-tap-tests`	Enable Perl TAP tests
`CFLAGS="-O0"`	Disable optimization (better debugging)
`CFLAGS="-g3"`	Maximum debug info

IDE Setup

// .vscode/c_cpp_properties.json
{
    "configurations": [{
        "name": "PostgreSQL",
        "includePath": [
            "${workspaceFolder}/src/include/**",
            "${workspaceFolder}/src/backend/**"
        ],
        "defines": [
            "FRONTEND",
            "HAVE_CONFIG_H"
        ],
        "compilerPath": "/usr/bin/gcc",
        "cStandard": "c11"
    }]
}

13.3 Code Conventions

Naming Conventions

/* Types: CamelCase */
typedef struct BufferDesc { ... } BufferDesc;
typedef struct RelationData *Relation;

/* Functions: lowercase_with_underscores */
extern Datum heap_insert(Relation relation, HeapTuple tup, ...);
extern void BufferGetTag(Buffer buffer, RelFileNode *rnode, ...);

/* Macros: UPPER_CASE */
#define BUFFER_LOCK_SHARE 1
#define InvalidOid ((Oid) 0)

/* Global variables: CamelCase or descriptive */
bool enable_seqscan = true;
int work_mem = 4096;

/* Local variables: short, lowercase */
int i, j, ntuples;
char *ptr;

Common Patterns

Memory Contexts

/* PostgreSQL uses memory contexts for allocation */
MemoryContext oldcontext;

/* Switch to appropriate context */
oldcontext = MemoryContextSwitchTo(CacheMemoryContext);

/* Allocate memory (auto-freed when context is reset/deleted) */
ptr = palloc(size);
str = pstrdup("string");

/* Restore previous context */
MemoryContextSwitchTo(oldcontext);

/* Reset context (free all memory in it) */
MemoryContextReset(mycontext);

Error Handling

/* ereport for errors and logging */
ereport(ERROR,
        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
         errmsg("invalid value for parameter \"%s\": %d",
                name, value),
         errdetail("Value must be between %d and %d.",
                  min, max),
         errhint("Try using a smaller value.")));

/* Log levels: DEBUG5-DEBUG1, LOG, INFO, NOTICE, WARNING, ERROR */
elog(DEBUG1, "entering function %s", __func__);
elog(LOG, "checkpoint starting");
elog(WARNING, "skipping invalid entry");

/* ERROR aborts transaction, FATAL terminates connection */
/* PANIC crashes the server (only for catastrophic issues) */

Node System

/* All parse/plan nodes inherit from Node */
typedef struct Node {
    NodeTag type;
} Node;

/* Check node type */
if (IsA(node, SeqScan))
    process_seqscan((SeqScan *) node);

/* Safe casting with nodeTag check */
SeqScan *scan = castNode(SeqScan, node);

/* Copy nodes (deep copy) */
Node *copy = copyObject(original);

/* Node comparison */
if (equal(node1, node2))
    ...

List Operations

/* PostgreSQL uses custom List type (not arrays) */
List *mylist = NIL;  /* Empty list */

/* Add to list */
mylist = lappend(mylist, item);        /* Append */
mylist = lcons(item, mylist);          /* Prepend */
mylist = list_concat(list1, list2);    /* Concatenate */

/* Iterate */
ListCell *lc;
foreach(lc, mylist) {
    Node *item = lfirst(lc);
    /* or lfirst_int(lc) for int lists */
}

/* Get by index */
Node *third = list_nth(mylist, 2);

/* List length */
int len = list_length(mylist);

13.4 Key Data Structures

Relation (Table)

/* RelationData represents an open relation (table/index) */
typedef struct RelationData {
    RelFileNode rd_node;        /* Physical file identifier */
    Form_pg_class rd_rel;       /* pg_class tuple */
    TupleDesc rd_att;           /* Tuple descriptor */
    Oid rd_id;                  /* Relation OID */
    
    /* Index info (if index) */
    Form_pg_index rd_index;
    
    /* Cached info */
    bytea *rd_options;          /* Parsed reloptions */
    
    /* ... many more fields */
} RelationData;

typedef RelationData *Relation;

/* Open/close relations */
Relation rel = table_open(relid, AccessShareLock);
/* ... use relation ... */
table_close(rel, AccessShareLock);

HeapTuple

/* HeapTuple = pointer to tuple data + header */
typedef struct HeapTupleData {
    uint32 t_len;               /* Length of tuple */
    ItemPointerData t_self;     /* TID (block, offset) */
    Oid t_tableOid;             /* Table OID */
    HeapTupleHeader t_data;     /* -> tuple header+data */
} HeapTupleData;

typedef HeapTupleData *HeapTuple;

/* Access tuple attributes */
bool isnull;
Datum value = heap_getattr(tuple, attnum, tupdesc, &isnull);

/* Build new tuple */
HeapTuple newtup = heap_form_tuple(tupdesc, values, nulls);

Buffer

/* Buffer = index into shared buffer pool */
typedef int Buffer;

/* Buffer operations */
Buffer buf = ReadBuffer(rel, blocknum);
LockBuffer(buf, BUFFER_LOCK_SHARE);

Page page = BufferGetPage(buf);
/* ... access page data ... */

LockBuffer(buf, BUFFER_LOCK_UNLOCK);
ReleaseBuffer(buf);

/* Mark buffer dirty (for writes) */
MarkBufferDirty(buf);

13.5 Tracing Code Flow

Example: SELECT Query Path

/* 1. Entry point: postgres.c */
exec_simple_query(query_string)
    │
    ├── pg_parse_query()           /* parser/parser.c */
    │   └── raw_parser()
    │       └── base_yyparse()     /* gram.y generated */
    │
    ├── pg_analyze_and_rewrite()
    │   ├── parse_analyze()        /* parser/analyze.c */
    │   └── pg_rewrite_query()     /* rewrite/rewriteHandler.c */
    │
    ├── pg_plan_queries()
    │   └── planner()              /* optimizer/plan/planner.c */
    │       ├── subquery_planner()
    │       └── create_plan()
    │
    └── PortalRun()
        └── ExecutorRun()          /* executor/execMain.c */
            └── ExecutePlan()
                └── ExecProcNode() /* executor/execProcnode.c */

Using GDB

# Start PostgreSQL under GDB
gdb --args $HOME/pg-dev/bin/postgres -D $HOME/pg-dev/data

# Set breakpoint
(gdb) break heap_insert
(gdb) break costsize.c:200

# Run
(gdb) run

# In another terminal, run SQL
psql -c "INSERT INTO test VALUES (1)"

# Back in GDB
(gdb) backtrace           # Show call stack
(gdb) print *tuple        # Inspect HeapTuple
(gdb) p *rel->rd_rel      # Inspect relation metadata
(gdb) next                # Step over
(gdb) step                # Step into

13.6 Adding a Simple Feature

Example: Add New GUC Parameter

/* Step 1: Declare in src/backend/utils/misc/guc.c */
static int my_new_setting = 100;  /* default value */

/* Step 2: Add to ConfigureNamesInt array */
{
    {"my_new_setting", PGC_USERSET, CUSTOM_OPTIONS,
        gettext_noop("Description of my setting."),
        NULL
    },
    &my_new_setting,
    100,        /* default */
    0,          /* min */
    10000,      /* max */
    NULL, NULL, NULL
},

/* Step 3: Declare extern in src/include/utils/guc.h */
extern int my_new_setting;

/* Step 4: Use in code */
if (my_new_setting > threshold) {
    /* do something */
}

Example: Add New SQL Function

/* Step 1: Create function in src/backend/utils/adt/myfunc.c */
#include "postgres.h"
#include "fmgr.h"

PG_FUNCTION_INFO_V1(my_add_one);

Datum
my_add_one(PG_FUNCTION_ARGS)
{
    int32 input = PG_GETARG_INT32(0);
    PG_RETURN_INT32(input + 1);
}

/* Step 2: Add to src/include/catalog/pg_proc.dat */
{ oid => '9999', proname => 'my_add_one',
  prorettype => 'int4', proargtypes => 'int4',
  prosrc => 'my_add_one' },

/* Step 3: Add to Makefile */
OBJS += myfunc.o

/* Step 4: Run initdb to update catalogs, or: */
CREATE FUNCTION my_add_one(int) RETURNS int AS 'my_add_one' LANGUAGE internal;

13.7 Running Tests

# Full regression test suite
make check

# Specific test file
make check TESTS="select"

# Parallel tests
make check-world -j4

# TAP tests (for utilities)
make check -C src/bin/psql

# Isolation tests (concurrency)
make check -C src/test/isolation

# Code coverage
./configure --enable-coverage
make check
make coverage-html
# View htmlcov/index.html

Writing Tests

-- src/test/regress/sql/mytest.sql
-- Test my_add_one function
SELECT my_add_one(5);
SELECT my_add_one(-1);
SELECT my_add_one(NULL);

-- Expected output in src/test/regress/expected/mytest.out
-- my_add_one
------------
          6
(1 row)

 my_add_one
------------
          0
(1 row)

 my_add_one
------------
           
(1 row)

13.8 Key Source Files to Study

Area	Files	Purpose
Query Entry	`tcop/postgres.c`	Main query loop
Parsing	`parser/gram.y`, `scan.l`	SQL syntax
Planning	`optimizer/plan/planner.c`	Plan generation
Cost Model	`optimizer/path/costsize.c`	Cost estimation
Execution	`executor/execMain.c`	Query execution
Buffer Pool	`storage/buffer/bufmgr.c`	Page caching
WAL	`access/transam/xlog.c`	Write-ahead log
Transactions	`access/transam/xact.c`	Transaction mgmt
Heap Access	`access/heap/heapam.c`	Table operations
Index Access	`access/nbtree/nbtree.c`	B-tree index

13.9 Practice Exercise

Exercise: Add Execution Time to EXPLAIN

Goal: Add “Planning Time: X ms” output to regular EXPLAIN (not just ANALYZE)Steps:

Find where EXPLAIN output is generated (commands/explain.c)
Study ExplainOnePlan() function
Add timing capture before/after planning in ExplainOneQuery()
Output the timing in ExplainPrintPlan()
Write regression tests
Submit patch to pgsql-hackers

Files to modify:

src/backend/commands/explain.c
src/test/regress/sql/explain.sql
src/test/regress/expected/explain.out

Next Module

Module 14: Contributing to PostgreSQL

Submit your first patch to PostgreSQL

Distributed Systems 14. Contributing

Overview

Testing & Code Quality

Crash Courses

AI Engineering

Math for ML - Understanding Linear Algebra

Probability & Statistics for ML

Math for ML - Understanding Calculus

ML Mastery

Deep Learning Mastery

NestJS Mastery

Microservices Mastery

Low Level Design

OOP Concepts

SOLID Principles

Design Patterns

LLD Case Studies

System Design (HLD)

Senior Level (L5+/Staff)

HLD Case Studies

Engineering Fundamentals

DevOps & Operations

Azure Cloud Engineering

AWS Cloud

AWS Monitoring & Observability

AWS Security Services

AWS Serverless

AWS Operations

AWS Advanced

AWS Case Studies

GCP Cloud Engineering

DevOps Tools

Database Engineering

HIPAA Compliance Mastery

Operating Systems

Linux Internals

Distributed Systems

Networking Mastery

Build Your Own X

Go Lang Mastery

C Programming

Classic Research Papers

Distributed System Tools

​Module 13: PostgreSQL Source Code

​13.1 Repository Structure

​Key Directories Deep Dive

​13.2 Development Environment Setup

​Building from Source

​Build Options Explained

​IDE Setup

​13.3 Code Conventions

​Naming Conventions

​Common Patterns

​13.4 Key Data Structures

​Relation (Table)

​HeapTuple

​Buffer

​13.5 Tracing Code Flow

​Example: SELECT Query Path

​Using GDB

​13.6 Adding a Simple Feature

​Example: Add New GUC Parameter

​Example: Add New SQL Function

​13.7 Running Tests

​Writing Tests

​13.8 Key Source Files to Study

​13.9 Practice Exercise

​Next Module

Module 14: Contributing to PostgreSQL

Module 13: PostgreSQL Source Code

13.1 Repository Structure

Key Directories Deep Dive

13.2 Development Environment Setup

Building from Source

Build Options Explained

IDE Setup

13.3 Code Conventions

Naming Conventions

Common Patterns

13.4 Key Data Structures

Relation (Table)

HeapTuple

Buffer

13.5 Tracing Code Flow

Example: SELECT Query Path

Using GDB

13.6 Adding a Simple Feature

Example: Add New GUC Parameter

Example: Add New SQL Function

13.7 Running Tests

Writing Tests

13.8 Key Source Files to Study

13.9 Practice Exercise

Next Module