Catalog System Deep Dive

This module explores PostgreSQL’s system catalog — the metadata that describes all database objects. Understanding the catalog is essential for extension development, performance tuning, and contributing to PostgreSQL.

Target Audience: Extension developers, DBAs, core contributors
Prerequisites: SQL foundations, basic PostgreSQL internals
Source Directories: src/backend/catalog/, src/include/catalog/
Interview Relevance: Staff+ database roles, extension development

Part 1: Catalog Architecture Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                    SYSTEM CATALOG ARCHITECTURE                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   System catalogs are regular tables stored in pg_catalog schema.           │
│   They describe ALL database objects: tables, columns, functions, etc.      │
│                                                                              │
│   Key Catalogs:                                                              │
│   ┌───────────────────────────────────────────────────────────────────────┐ │
│   │                                                                       │ │
│   │   pg_class ───────► Describes tables, indexes, views, sequences      │ │
│   │       │                                                               │ │
│   │       └───► pg_attribute ───► Columns of each relation               │ │
│   │       │                                                               │ │
│   │       └───► pg_index ───► Index information                          │ │
│   │                                                                       │ │
│   │   pg_type ───────► All data types                                    │ │
│   │       │                                                               │ │
│   │       └───► pg_operator ───► Operators for types                     │ │
│   │                                                                       │ │
│   │   pg_proc ───────► Functions and procedures                          │ │
│   │       │                                                               │ │
│   │       └───► pg_aggregate ───► Aggregate function details             │ │
│   │                                                                       │ │
│   │   pg_namespace ──► Schemas                                           │ │
│   │   pg_database ───► Databases                                         │ │
│   │   pg_tablespace ─► Tablespaces                                       │ │
│   │   pg_authid ─────► Roles/users                                       │ │
│   │                                                                       │ │
│   └───────────────────────────────────────────────────────────────────────┘ │
│                                                                              │
│   Catalogs → Cache → Backend Access                                         │
│   ─────────────────────────────────────────────────────────────────────────  │
│   Direct catalog access is slow (disk I/O).                                 │
│   PostgreSQL caches catalog data in memory:                                 │
│                                                                              │
│   ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐        │
│   │   pg_class      │───►│  RelCache       │───►│  Relation       │        │
│   │   (on disk)     │    │  (per-backend)  │    │  (in memory)    │        │
│   └─────────────────┘    └─────────────────┘    └─────────────────┘        │
│                                                                              │
│   ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐        │
│   │   pg_type       │───►│  TypeCache      │───►│  TypeCacheEntry │        │
│   │   pg_operator   │    │  (per-backend)  │    │  (in memory)    │        │
│   └─────────────────┘    └─────────────────┘    └─────────────────┘        │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Part 2: Key System Catalogs

2.1 pg_class — Relations

-- pg_class stores ALL "relations": tables, indexes, views, sequences, etc.

SELECT 
    oid,                    -- Object ID (unique identifier)
    relname,                -- Relation name
    relnamespace,           -- Schema OID (pg_namespace.oid)
    reltype,                -- Row type OID (pg_type.oid)
    reloftype,              -- Typed table's type (0 if not typed)
    relowner,               -- Owner (pg_authid.oid)
    relam,                  -- Access method (pg_am.oid for indexes)
    relfilenode,            -- Physical file name
    reltablespace,          -- Tablespace (0 = default)
    relpages,               -- Size in pages (estimated by ANALYZE)
    reltuples,              -- Row count (estimated by ANALYZE)
    relallvisible,          -- Pages marked all-visible
    reltoastrelid,          -- TOAST table OID (if any)
    relhasindex,            -- Has indexes?
    relisshared,            -- Shared across all databases?
    relpersistence,         -- 'p'=permanent, 'u'=unlogged, 't'=temp
    relkind,                -- 'r'=table, 'i'=index, 'v'=view, etc.
    relnatts,               -- Number of columns
    relchecks,              -- Number of CHECK constraints
    relhasrules,            -- Has rules?
    relhastriggers,         -- Has triggers?
    relhassubclass,         -- Has child tables?
    relrowsecurity,         -- RLS enabled?
    relforcerowsecurity,    -- Force RLS even for owner?
    relreplident,           -- Replica identity
    relispartition,         -- Is a partition?
    relrewrite,             -- During ALTER TABLE rewrite
    relfrozenxid,           -- Oldest XID for vacuum
    relminmxid,             -- Oldest MultiXact
    relacl,                 -- Access privileges
    reloptions              -- Storage options (WITH clause)
FROM pg_class
WHERE relname = 'users';

-- relkind values:
-- 'r' = ordinary table
-- 'i' = index
-- 'S' = sequence
-- 't' = TOAST table
-- 'v' = view
-- 'm' = materialized view
-- 'c' = composite type
-- 'f' = foreign table
-- 'p' = partitioned table
-- 'I' = partitioned index

2.2 pg_attribute — Columns

-- pg_attribute describes columns of ALL relations

SELECT 
    attrelid,           -- Table OID (pg_class.oid)
    attname,            -- Column name
    atttypid,           -- Data type OID (pg_type.oid)
    attstattarget,      -- Statistics target (-1 = use default)
    attlen,             -- Type length (-1 = variable)
    attnum,             -- Column number (1-based for user cols)
    attndims,           -- Array dimensions (0 if not array)
    attcacheoff,        -- Cached offset (filled at runtime)
    atttypmod,          -- Type modifier (e.g., varchar(100))
    attbyval,           -- Passed by value? (else by reference)
    attstorage,         -- Storage strategy (TOAST)
    attalign,           -- Alignment requirement
    attnotnull,         -- NOT NULL constraint?
    atthasdef,          -- Has DEFAULT?
    atthasmissing,      -- Has missing value for ALTER?
    attidentity,        -- 'a'=always, 'd'=by default, ''=none
    attgenerated,       -- 's'=stored generated, ''=none
    attisdropped,       -- Column was dropped (invisible)
    attislocal,         -- Defined locally (not inherited)
    attinhcount,        -- Number of inheritance ancestors
    attcollation,       -- Collation OID
    attacl,             -- Column-level privileges
    attoptions,         -- Attribute options
    attfdwoptions       -- FDW-specific options
FROM pg_attribute
WHERE attrelid = 'users'::regclass
  AND attnum > 0        -- Skip system columns
  AND NOT attisdropped  -- Skip dropped columns
ORDER BY attnum;

-- System columns (attnum < 0):
-- -1 = ctid (tuple ID)
-- -2 = xmin (insert transaction)
-- -3 = cmin (insert command)
-- -4 = xmax (delete transaction)
-- -5 = cmax (delete command)
-- -6 = tableoid (table OID)

2.3 pg_type — Data Types

-- pg_type defines ALL data types

SELECT 
    oid,
    typname,            -- Type name
    typnamespace,       -- Schema
    typowner,           -- Owner
    typlen,             -- Size (-1=variable, -2=cstring)
    typbyval,           -- Pass by value?
    typtype,            -- 'b'=base, 'c'=composite, 'd'=domain, etc.
    typcategory,        -- Category for parser (S=string, N=numeric, etc.)
    typispreferred,     -- Preferred in category?
    typisdefined,       -- Is defined?
    typdelim,           -- Array delimiter
    typrelid,           -- For composite: pg_class OID
    typelem,            -- For array: element type
    typarray,           -- Array type of this type
    typinput,           -- Input function
    typoutput,          -- Output function
    typreceive,         -- Binary input
    typsend,            -- Binary output
    typmodin,           -- Type modifier input
    typmodout,          -- Type modifier output
    typanalyze,         -- Custom analyze function
    typalign,           -- Alignment requirement
    typstorage,         -- TOAST strategy
    typnotnull,         -- For domains: NOT NULL?
    typbasetype,        -- For domains: base type
    typtypmod,          -- For domains: typmod
    typndims,           -- For domains: array dimensions
    typcollation,       -- Collation
    typdefaultbin,      -- Default expression (nodeToString)
    typdefault,         -- Default as text
    typacl              -- Access privileges
FROM pg_type
WHERE typname = 'int4';

-- typtype values:
-- 'b' = base type
-- 'c' = composite type (row type)
-- 'd' = domain
-- 'e' = enum
-- 'p' = pseudo-type (any, record, trigger, etc.)
-- 'r' = range type
-- 'm' = multirange type

2.4 pg_proc — Functions

-- pg_proc stores functions, procedures, and aggregates

SELECT 
    oid,
    proname,            -- Function name
    pronamespace,       -- Schema
    proowner,           -- Owner
    prolang,            -- Language (pg_language.oid)
    procost,            -- Estimated execution cost
    prorows,            -- Estimated rows (for set-returning)
    provariadic,        -- Variadic argument type (0=none)
    prosupport,         -- Planner support function
    prokind,            -- 'f'=function, 'p'=procedure, 'a'=aggregate, 'w'=window
    prosecdef,          -- Security definer?
    proleakproof,       -- Leak-proof?
    proisstrict,        -- Strict (NULL in = NULL out)?
    proretset,          -- Returns set?
    provolatile,        -- 'i'=immutable, 's'=stable, 'v'=volatile
    proparallel,        -- 's'=safe, 'r'=restricted, 'u'=unsafe
    pronargs,           -- Number of arguments
    pronargdefaults,    -- Number of arguments with defaults
    prorettype,         -- Return type OID
    proargtypes,        -- Argument types (oidvector)
    proallargtypes,     -- All arg types (including OUT)
    proargmodes,        -- Argument modes (i=IN, o=OUT, b=INOUT, v=VARIADIC)
    proargnames,        -- Argument names
    proargdefaults,     -- Default expressions
    protrftypes,        -- Transform types
    prosrc,             -- Source code or link symbol
    probin,             -- Shared library path
    proconfig,          -- proconfig options
    proacl              -- Access privileges
FROM pg_proc
WHERE proname = 'array_agg';

Part 3: Catalog Caching

3.1 System Cache (SysCache)

/* src/backend/utils/cache/syscache.c */

/*
 * SysCache provides hash-based lookup of catalog tuples.
 * Each cache is keyed by one or more columns.
 */

/* Cache identifiers */
typedef enum SysCacheIdentifier
{
    AMNAME,             /* pg_am by name */
    AMOID,              /* pg_am by OID */
    ATTNAME,            /* pg_attribute by (relid, name) */
    ATTNUM,             /* pg_attribute by (relid, attnum) */
    AUTHMEMROLEMEM,     /* pg_auth_members by (role, member) */
    AUTHNAME,           /* pg_authid by name */
    AUTHOID,            /* pg_authid by OID */
    CLAOID,             /* pg_class by OID */
    CLARELNAME,         /* pg_class by (namespace, name) */
    PROCNAMEARGSNSP,    /* pg_proc by (name, argtypes, namespace) */
    PROCOID,            /* pg_proc by OID */
    TYPEOID,            /* pg_type by OID */
    TYPENAME,           /* pg_type by (namespace, name) */
    /* ... many more ... */
} SysCacheIdentifier;

/* Looking up in syscache */
HeapTuple
SearchSysCache1(int cacheId, Datum key1)
{
    /* Returns cached copy of catalog tuple */
    /* Caller must call ReleaseSysCache() when done */
}

/* Example: Look up type by OID */
HeapTuple tuple = SearchSysCache1(TYPEOID, ObjectIdGetDatum(typeId));
if (HeapTupleIsValid(tuple))
{
    Form_pg_type typeForm = (Form_pg_type) GETSTRUCT(tuple);
    char *typname = NameStr(typeForm->typname);
    /* ... use type info ... */
    ReleaseSysCache(tuple);  /* Must release! */
}

/* Convenience macros */
Form_pg_class classForm = (Form_pg_class) GETSTRUCT(
    SearchSysCache1(CLAOID, ObjectIdGetDatum(relid)));

/* Or use GetSysCacheOid which returns just the OID */
Oid typeId = GetSysCacheOid2(TYPENAME,
                             Anum_pg_type_oid,
                             CStringGetDatum(typname),
                             ObjectIdGetDatum(namespaceId));

3.2 Relation Cache (RelCache)

/* src/backend/utils/cache/relcache.c */

/*
 * RelCache stores open relation descriptors.
 * Much more than just pg_class tuple - includes all relation metadata.
 */

typedef struct RelationData
{
    /* Basic info */
    Oid         rd_id;              /* Relation OID */
    Form_pg_class rd_rel;           /* COPY of pg_class row */
    TupleDesc   rd_att;             /* Tuple descriptor */
    
    /* Storage info */
    RelFileNode rd_node;            /* Physical storage identifiers */
    SMgrRelation rd_smgr;           /* Storage manager handle */
    
    /* Indexes */
    Oid        *rd_indexlist;       /* OIDs of indexes on relation */
    int         rd_indexvalid;      /* Is index list valid? */
    
    /* Constraints */
    Bitmapset  *rd_keyattr;         /* Key column numbers */
    
    /* Partition info */
    PartitionKey rd_partkey;        /* Partition key */
    PartitionDesc rd_partdesc;      /* Partition descriptor */
    
    /* Trigger info */
    TriggerDesc *trigdesc;          /* Trigger data */
    
    /* RLS info */
    RowSecurityDesc *rd_rsdesc;     /* Row security policies */
    
    /* Statistics */
    int32       rd_rel->relpages;   /* Size estimate */
    float4      rd_rel->reltuples;  /* Row estimate */
    
    /* ... many more fields ... */
} RelationData;

/* Opening a relation */
Relation
table_open(Oid relationId, LOCKMODE lockmode)
{
    Relation    rel;
    
    /* Look up in relcache */
    rel = RelationIdGetRelation(relationId);
    
    if (!RelationIsValid(rel))
        ereport(ERROR, ...);
    
    /* Acquire the requested lock */
    LockRelationOid(relationId, lockmode);
    
    return rel;
}

/* Closing a relation */
void
table_close(Relation relation, LOCKMODE lockmode)
{
    /* Release lock if we acquired one */
    if (lockmode != NoLock)
        UnlockRelationOid(relation->rd_id, lockmode);
    
    /* Decrement reference count */
    RelationClose(relation);
}

3.3 Cache Invalidation

/*
 * When catalog changes, all backends must invalidate cached copies.
 * PostgreSQL uses "sinval" (shared invalidation) messages.
 */

/* Invalidation message types */
typedef enum
{
    SHAREDINVALCATALOG_ID,      /* Catalog tuple changed */
    SHAREDINVALRELCACHE_ID,     /* Relation cache entry invalid */
    SHAREDINVALSMGR_ID,         /* Physical file changed */
    SHAREDINVALRELMAP_ID,       /* Relfilenode mapping changed */
    SHAREDINVALSNAPSHOT_ID      /* Snapshot-related invalidation */
} SharedInvalidationMessageType;

/* Sending invalidation */
void
CacheInvalidateHeapTuple(Relation relation, HeapTuple oldtuple, HeapTuple newtuple)
{
    /* Called when a catalog tuple is modified */
    
    /* Build invalidation message */
    SharedInvalidationMessage msg;
    msg.id = SHAREDINVALCATALOG_ID;
    msg.cat.catId = relation->rd_id;
    msg.cat.hashValue = GetTupleHashValue(...);
    
    /* Queue message for other backends */
    SendSharedInvalidMessages(&msg, 1);
}

/* Receiving invalidation */
void
InvalidateSystemCaches(void)
{
    /* Called at safe points (query boundaries, etc.) */
    
    /* Process pending invalidation messages */
    AcceptInvalidationMessages();
    
    /* Each message triggers cache cleanup */
    /* e.g., syscache entries removed, relcache entries rebuilt */
}

/*
 * Invalidation timing:
 *
 * 1. Transaction modifies pg_class
 * 2. At COMMIT: invalidation message queued
 * 3. Other backends check for messages at:
 *    - Start of each command
 *    - Lock acquisition
 *    - Explicit AcceptInvalidationMessages()
 * 4. Backend processes message, invalidates cache
 * 5. Next access rebuilds cache from disk
 */

Part 4: Catalog Access Patterns

4.1 Reading Catalog Information

-- Using information_schema (SQL standard, but slow)
SELECT column_name, data_type, is_nullable
FROM information_schema.columns
WHERE table_name = 'users';

-- Using pg_catalog (faster, more complete)
SELECT 
    a.attname,
    pg_catalog.format_type(a.atttypid, a.atttypmod) as data_type,
    NOT a.attnotnull as is_nullable
FROM pg_catalog.pg_attribute a
WHERE a.attrelid = 'users'::regclass
  AND a.attnum > 0
  AND NOT a.attisdropped;

-- Type resolution with regtype
SELECT 'int4'::regtype::oid;           -- 23
SELECT 23::regtype;                     -- integer
SELECT 'int4[]'::regtype;               -- integer[]

-- Relation resolution with regclass
SELECT 'users'::regclass::oid;          -- Relation OID
SELECT 'public.users'::regclass;        -- Schema-qualified

-- Function resolution with regprocedure
SELECT 'array_agg(anynonarray)'::regprocedure::oid;

4.2 Catalog Modification

/* Extension creating a new type */

/* Step 1: Build tuple for pg_type */
Datum values[Natts_pg_type];
bool nulls[Natts_pg_type];

memset(nulls, false, sizeof(nulls));

values[Anum_pg_type_typname - 1] = NameGetDatum(&typeName);
values[Anum_pg_type_typnamespace - 1] = ObjectIdGetDatum(namespace);
values[Anum_pg_type_typowner - 1] = ObjectIdGetDatum(GetUserId());
values[Anum_pg_type_typlen - 1] = Int16GetDatum(sizeof(MyType));
values[Anum_pg_type_typbyval - 1] = BoolGetDatum(true);
values[Anum_pg_type_typtype - 1] = CharGetDatum(TYPTYPE_BASE);
/* ... set all required fields ... */

/* Step 2: Insert into pg_type */
Relation typeRel = table_open(TypeRelationId, RowExclusiveLock);

HeapTuple tuple = heap_form_tuple(RelationGetDescr(typeRel),
                                  values, nulls);

/* Assign OID */
Oid typeOid = GetNewOidWithIndex(typeRel, TypeOidIndexId,
                                  Anum_pg_type_oid);
HeapTupleSetOid(tuple, typeOid);

CatalogTupleInsert(typeRel, tuple);

table_close(typeRel, RowExclusiveLock);

/* Step 3: Create dependency records */
ObjectAddress myself, referenced;

myself.classId = TypeRelationId;
myself.objectId = typeOid;
myself.objectSubId = 0;

referenced.classId = NamespaceRelationId;
referenced.objectId = namespace;
referenced.objectSubId = 0;

recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);

4.3 Using Catalog for Query Planning

/* Planner accessing catalog info */

/* Get relation statistics */
void
get_relation_info(PlannerInfo *root, Oid relationOid,
                  bool inhparent, RelOptInfo *rel)
{
    Relation    relation;
    
    relation = table_open(relationOid, NoLock);  /* Already locked */
    
    /* Get size estimates */
    rel->pages = relation->rd_rel->relpages;
    rel->tuples = relation->rd_rel->reltuples;
    rel->allvisfrac = (double) relation->rd_rel->relallvisible / 
                      (double) relation->rd_rel->relpages;
    
    /* Get index info */
    if (relation->rd_rel->relhasindex)
    {
        List *indexoidlist = RelationGetIndexList(relation);
        
        foreach(l, indexoidlist)
        {
            Oid indexoid = lfirst_oid(l);
            /* Build IndexOptInfo for each index */
            get_index_info(root, indexoid, inhparent, rel);
        }
    }
    
    table_close(relation, NoLock);
}

/* Get column statistics */
void
examine_variable(PlannerInfo *root, Node *node, int varRelid,
                 VariableStatData *vardata)
{
    /* Look up pg_statistic for the column */
    HeapTuple statstuple = SearchSysCache3(STATRELATTINH,
                                           ObjectIdGetDatum(relid),
                                           Int16GetDatum(attnum),
                                           BoolGetDatum(false));
    
    if (HeapTupleIsValid(statstuple))
    {
        /* Extract selectivity estimates, histograms, etc. */
        Form_pg_statistic stats = (Form_pg_statistic) GETSTRUCT(statstuple);
        vardata->stadistinct = stats->stadistinct;
        /* ... */
        ReleaseSysCache(statstuple);
    }
}

Part 5: Statistics (pg_statistic)

5.1 Understanding pg_statistic

-- pg_statistic stores column statistics for query planning

-- Raw access (complex structure)
SELECT * FROM pg_statistic WHERE starelid = 'users'::regclass;

-- Better: Use pg_stats view
SELECT 
    attname,
    null_frac,          -- Fraction of NULL values
    avg_width,          -- Average column width in bytes
    n_distinct,         -- Number of distinct values (or ratio if negative)
    most_common_vals,   -- Most common values
    most_common_freqs,  -- Frequencies of most common values
    histogram_bounds,   -- Bounds for histogram buckets
    correlation         -- Physical vs logical ordering correlation
FROM pg_stats
WHERE tablename = 'users';

-- n_distinct interpretation:
-- > 0: Estimated number of distinct values
-- -1: All values are distinct (unique)
-- < 0: abs(n_distinct) is fraction of rows that are distinct

-- correlation interpretation:
-- 1.0: Perfect correlation (values in physical order)
-- 0.0: No correlation (random order)
-- -1.0: Perfect anti-correlation (reverse order)
-- High correlation means index range scans are efficient

5.2 Extended Statistics

-- For correlated columns, single-column stats aren't enough

-- Create extended statistics for correlation
CREATE STATISTICS users_city_country_stats (dependencies, ndistinct, mcv)
ON city, country FROM users;

-- Force ANALYZE to compute
ANALYZE users;

-- View extended statistics
SELECT 
    stxname,
    stxkeys,            -- Column numbers
    stxkind,            -- Statistic kinds
    stxndistinct,       -- N-distinct for column combinations
    stxdependencies,    -- Functional dependencies
    stxexprs            -- Expression stats
FROM pg_statistic_ext_data sed
JOIN pg_statistic_ext se ON sed.stxoid = se.oid
WHERE se.stxrelid = 'users'::regclass;

-- Why extended stats matter:
-- Query: WHERE city = 'Paris' AND country = 'France'
-- Without extended stats: Planner assumes independence
--   selectivity = sel(city='Paris') * sel(country='France')
--   Might estimate 0.01 * 0.05 = 0.0005 (too low!)
-- With extended stats: Planner knows they're correlated
--   Correctly estimates rows where city=Paris implies country=France

Part 6: Dependency Tracking

6.1 pg_depend

-- pg_depend tracks dependencies between database objects

SELECT 
    classid::regclass as dependent_type,
    objid,
    objsubid,           -- 0=whole object, >0=column number
    refclassid::regclass as referenced_type,
    refobjid,
    refobjsubid,
    deptype             -- Dependency type
FROM pg_depend
WHERE refobjid = 'users'::regclass::oid
LIMIT 10;

-- deptype values:
-- 'n' = NORMAL: Drop of referenced obj drops dependent obj
-- 'a' = AUTO: Automatic dependency (created by system)
-- 'i' = INTERNAL: Dependent obj is component of referenced
-- 'P' = PARTITION_PRI: Partition primary dependency
-- 'S' = PARTITION_SEC: Partition secondary dependency
-- 'e' = EXTENSION: Dependent obj is member of extension

-- Find what depends on a table
SELECT 
    CASE classid
        WHEN 'pg_constraint'::regclass THEN 'constraint'
        WHEN 'pg_trigger'::regclass THEN 'trigger'
        WHEN 'pg_rewrite'::regclass THEN 'view/rule'
        WHEN 'pg_class'::regclass THEN 'index/sequence'
        ELSE classid::regclass::text
    END as type,
    objid::regclass::text as dependent_object
FROM pg_depend
WHERE refclassid = 'pg_class'::regclass
  AND refobjid = 'users'::regclass::oid
  AND deptype = 'n';

6.2 Using Dependencies in Code

/* Recording a dependency */
void
recordDependencyOn(const ObjectAddress *depender,
                   const ObjectAddress *referenced,
                   DependencyType behavior)
{
    /* Insert into pg_depend */
    Relation dependDesc = table_open(DependRelationId, RowExclusiveLock);
    
    Datum values[Natts_pg_depend];
    bool nulls[Natts_pg_depend];
    
    values[Anum_pg_depend_classid - 1] = ObjectIdGetDatum(depender->classId);
    values[Anum_pg_depend_objid - 1] = ObjectIdGetDatum(depender->objectId);
    values[Anum_pg_depend_objsubid - 1] = Int32GetDatum(depender->objectSubId);
    values[Anum_pg_depend_refclassid - 1] = ObjectIdGetDatum(referenced->classId);
    values[Anum_pg_depend_refobjid - 1] = ObjectIdGetDatum(referenced->objectId);
    values[Anum_pg_depend_refobjsubid - 1] = Int32GetDatum(referenced->objectSubId);
    values[Anum_pg_depend_deptype - 1] = CharGetDatum((char) behavior);
    
    CatalogTupleInsert(dependDesc, heap_form_tuple(...));
    table_close(dependDesc, RowExclusiveLock);
}

/* Checking dependencies before DROP */
ObjectAddresses *
findDependentObjects(const ObjectAddress *object,
                     int flags,
                     ObjectAddresses *pendingObjects,
                     const ObjectAddresses *alreadyDeleted)
{
    /* Recursively find all dependent objects */
    /* Returns list for cascade delete or error for restrict */
}

/* Example: Why DROP TABLE CASCADE works */
/*
 * 1. Find all objects depending on table
 * 2. For each NORMAL dependency: plan to drop
 * 3. Recursively find dependencies of those
 * 4. Sort by dependency order
 * 5. Drop in order (dependents first)
 */

Part 7: Catalog Access in Extensions

7.1 Safe Catalog Reading

/* Extension reading catalog safely */

#include "catalog/pg_type.h"
#include "utils/syscache.h"
#include "utils/lsyscache.h"

/* Use lsyscache.h convenience functions when possible */
char *
get_type_name(Oid typid)
{
    /* Convenience function - handles cache internally */
    return format_type_be(typid);
}

/* Or use syscache directly */
Oid
get_type_input_function(Oid typid)
{
    HeapTuple   tp;
    Form_pg_type typtup;
    Oid         result;
    
    tp = SearchSysCache1(TYPEOID, ObjectIdGetDatum(typid));
    if (!HeapTupleIsValid(tp))
        elog(ERROR, "cache lookup failed for type %u", typid);
    
    typtup = (Form_pg_type) GETSTRUCT(tp);
    result = typtup->typinput;
    
    ReleaseSysCache(tp);  /* Don't forget! */
    
    return result;
}

/* Scanning a catalog table */
void
scan_user_tables(void)
{
    Relation    rel;
    TableScanDesc scan;
    HeapTuple   tuple;
    
    rel = table_open(RelationRelationId, AccessShareLock);
    scan = table_beginscan_catalog(rel, 0, NULL);
    
    while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
    {
        Form_pg_class classForm = (Form_pg_class) GETSTRUCT(tuple);
        
        if (classForm->relkind == RELKIND_RELATION &&
            classForm->relnamespace != PG_CATALOG_NAMESPACE)
        {
            elog(NOTICE, "Found table: %s", NameStr(classForm->relname));
        }
    }
    
    table_endscan(scan);
    table_close(rel, AccessShareLock);
}

7.2 Catalog Modification in Extensions

/* Creating a custom catalog table (advanced) */

/* 
 * Extensions typically use regular tables, not catalogs.
 * But some need true catalog tables for:
 * - Integration with DROP CASCADE
 * - Visibility in pg_dump
 * - Participation in cache invalidation
 */

void
CreateMyExtensionCatalog(void)
{
    Relation    rel;
    TupleDesc   tupdesc;
    Oid         relid;
    
    /* Define columns */
    tupdesc = CreateTemplateTupleDesc(3);
    TupleDescInitEntry(tupdesc, 1, "id", OIDOID, -1, 0);
    TupleDescInitEntry(tupdesc, 2, "name", TEXTOID, -1, 0);
    TupleDescInitEntry(tupdesc, 3, "value", INT4OID, -1, 0);
    
    /* Create the table */
    relid = heap_create_with_catalog(
        "my_catalog",           /* Table name */
        PG_CATALOG_NAMESPACE,   /* Put in pg_catalog */
        InvalidOid,             /* Tablespace */
        InvalidOid,             /* Type OID (auto-assign) */
        InvalidOid,             /* Array type OID (auto-assign) */
        GetUserId(),            /* Owner */
        TableAmOid,             /* Heap access method */
        tupdesc,                /* Tuple descriptor */
        NIL,                    /* Column defaults */
        RELKIND_RELATION,       /* Relation kind */
        RELPERSISTENCE_PERMANENT,
        false,                  /* Not shared */
        false,                  /* Not mapped */
        ONCOMMIT_NOOP,          /* On commit action */
        NULL,                   /* Reloptions */
        false,                  /* Use user's acl */
        true,                   /* Allow system columns */
        false,                  /* Is internal */
        InvalidOid,             /* relrewrite */
        NULL                    /* typaddress */
    );
    
    /* Create index on id */
    /* ... */
    
    /* Record in pg_extension_config_dump for pg_dump */
    /* ... */
}

Part 8: Interview Questions

Q: What happens when you ALTER TABLE ADD COLUMN?

Answer:Catalog changes:

Insert new row into pg_attribute with new column info
Update pg_class.relnatts (increment column count)
If column has default: update pg_attrdef
If column has constraint: update pg_constraint

Cache invalidation: 5. Send invalidation message for the relation 6. All backends will rebuild their RelCache entryData changes (depends on column type):

No default, nullable: No data changes! Just metadata.
With constant default (PG11+): Store default in catalog, compute on read
With volatile default or old versions: Rewrite entire table

The key insight: Adding a nullable column without default is nearly instant regardless of table size, because it’s metadata-only.

Q: How does PostgreSQL know what to drop when you DROP TABLE CASCADE?

Answer:

Query pg_depend: Find all objects with NORMAL dependency on the table

SELECT * FROM pg_depend 
WHERE refobjid = 'mytable'::regclass 
AND deptype = 'n';

Recursive expansion: For each dependent object, find its dependents
Topological sort: Order objects so dependents are dropped before what they depend on
Execute drops: Drop in sorted order

Dependency types:

Views depending on table
Foreign key constraints
Triggers on the table
Indexes
Policies (RLS)
Sequences (if SERIAL)

Without CASCADE: Report error listing dependents

Q: Why does PostgreSQL cache catalog data per-backend instead of using shared memory?

Answer:Per-backend caching advantages:

No locking overhead: Reading cached data requires no locks. Shared cache would need read locks for every access.
Transaction isolation: Each backend sees catalog as of its snapshot. Shared cache would need version management.
Simplicity: No complex shared data structure. Just hash tables in local memory.
Cache warmth: Each backend caches what it needs. Working sets often differ.

Trade-offs:

Memory usage: Same data cached multiple times
- Mitigated: Catalog data is small compared to data pages
Cache invalidation: Must propagate changes
- Solved: Shared invalidation message queue (sinval)
Cold start: New backends start with empty cache
- Mitigated: Relation cache init file preloads common entries

The key insight: For catalog data, avoiding lock contention is worth the memory duplication.

Overview

Testing & Code Quality

Crash Courses

AI Engineering

Math for ML - Understanding Linear Algebra

Probability & Statistics for ML

Math for ML - Understanding Calculus

ML Mastery

Deep Learning Mastery

NestJS Mastery

Microservices Mastery

Low Level Design

OOP Concepts

SOLID Principles

Design Patterns

LLD Case Studies

System Design (HLD)

Senior Level (L5+/Staff)

HLD Case Studies

Engineering Fundamentals

DevOps & Operations

Azure Cloud Engineering

AWS Cloud

AWS Monitoring & Observability

AWS Security Services

AWS Serverless

AWS Operations

AWS Advanced

AWS Case Studies

GCP Cloud Engineering

DevOps Tools

Database Engineering

HIPAA Compliance Mastery

Operating Systems

Linux Internals

Distributed Systems

Networking Mastery

Build Your Own X

Go Lang Mastery

C Programming

Classic Research Papers

Distributed System Tools

​Catalog System Deep Dive

​Part 1: Catalog Architecture Overview

​Part 2: Key System Catalogs

​2.1 pg_class — Relations

​2.2 pg_attribute — Columns

​2.3 pg_type — Data Types

​2.4 pg_proc — Functions

​Part 3: Catalog Caching

​3.1 System Cache (SysCache)

​3.2 Relation Cache (RelCache)

​3.3 Cache Invalidation

​Part 4: Catalog Access Patterns

​4.1 Reading Catalog Information

​4.2 Catalog Modification

​4.3 Using Catalog for Query Planning

​Part 5: Statistics (pg_statistic)

​5.1 Understanding pg_statistic

​5.2 Extended Statistics

​Part 6: Dependency Tracking

​6.1 pg_depend

​6.2 Using Dependencies in Code

​Part 7: Catalog Access in Extensions

​7.1 Safe Catalog Reading

​7.2 Catalog Modification in Extensions

​Part 8: Interview Questions

​Next Steps

Extension Development

PostgreSQL Source Code

Catalog System Deep Dive

Part 1: Catalog Architecture Overview

Part 2: Key System Catalogs

2.1 pg_class — Relations

2.2 pg_attribute — Columns

2.3 pg_type — Data Types

2.4 pg_proc — Functions

Part 3: Catalog Caching

3.1 System Cache (SysCache)

3.2 Relation Cache (RelCache)

3.3 Cache Invalidation

Part 4: Catalog Access Patterns

4.1 Reading Catalog Information

4.2 Catalog Modification

4.3 Using Catalog for Query Planning

Part 5: Statistics (pg_statistic)

5.1 Understanding pg_statistic

5.2 Extended Statistics

Part 6: Dependency Tracking

6.1 pg_depend

6.2 Using Dependencies in Code

Part 7: Catalog Access in Extensions

7.1 Safe Catalog Reading

7.2 Catalog Modification in Extensions

Part 8: Interview Questions

Next Steps