Understanding how PostgreSQL works internally transforms you from someone who uses databases to someone who truly understands them. This knowledge helps you make better design decisions and debug complex issues.
Hands-On Lab 1: Inspecting PostgreSQL Processes and Shared Memory
Goal: Master the tools for observing PostgreSQL’s internal process architecture and memory structures.
Examine the Process Hierarchy:
Copy
-- View all PostgreSQL processesSELECT pid, usename AS username, application_name, client_addr, backend_start, state, backend_type, queryFROM pg_stat_activityORDER BY backend_type, backend_start;-- Identify process types:-- - client backend: Your query connections-- - autovacuum launcher/worker: Cleanup processes-- - logical replication launcher/worker: Replication processes-- - background writer: Writes dirty buffers-- - checkpointer: Performs checkpoints-- - walwriter: Writes WAL records
Investigate Shared Memory Configuration:
Copy
-- Check shared memory settingsSELECT name, setting, unit, context, short_descFROM pg_settingsWHERE name IN ( 'shared_buffers', 'wal_buffers', 'max_connections', 'max_prepared_transactions', 'max_locks_per_transaction', 'shared_memory_size')ORDER BY name;-- Calculate total shared memory usageSELECT pg_size_pretty( current_setting('shared_buffers')::bigint * (SELECT setting FROM pg_settings WHERE name = 'block_size')::bigint) AS shared_buffers_size;
Monitor Buffer Pool Activity:
Copy
-- Install pg_buffercache extension (if not already installed)CREATE EXTENSION IF NOT EXISTS pg_buffercache;-- See what tables are cached in shared buffersSELECT c.relname AS table_name, COUNT(*) AS buffer_count, pg_size_pretty(COUNT(*) * 8192) AS size_in_cache, ROUND(100.0 * COUNT(*) / (SELECT setting FROM pg_settings WHERE name='shared_buffers')::int, 2) AS percent_of_cacheFROM pg_buffercache bJOIN pg_class c ON b.relfilenode = pg_relation_filenode(c.oid)WHERE b.reldatabase = (SELECT oid FROM pg_database WHERE datname = current_database()) AND c.relkind IN ('r', 'i') -- tables and indexesGROUP BY c.relnameORDER BY buffer_count DESCLIMIT 20;-- Check buffer cache hit ratioSELECT SUM(heap_blks_hit) AS heap_read_hits, SUM(heap_blks_read) AS heap_disk_reads, ROUND(100.0 * SUM(heap_blks_hit) / NULLIF(SUM(heap_blks_hit) + SUM(heap_blks_read), 0), 2) AS cache_hit_ratioFROM pg_statio_user_tables;-- Target: > 99% for well-tuned systems
Analyze WAL Activity:
Copy
-- Check current WAL positionSELECT pg_current_wal_lsn() AS current_wal_position, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0')) AS total_wal_generated;-- View WAL writer statisticsSELECT wal_records, wal_fpi, -- Full Page Images wal_bytes, pg_size_pretty(wal_bytes) AS wal_bytes_pretty, wal_buffers_full, wal_write, wal_sync, stats_resetFROM pg_stat_wal;
Monitor Background Workers:
Copy
-- Check background writer statsSELECT checkpoints_timed, -- Scheduled checkpoints checkpoints_req, -- Requested checkpoints (may indicate tuning needed) buffers_checkpoint, -- Buffers written during checkpoint buffers_clean, -- Buffers cleaned by bgwriter buffers_backend, -- Buffers written by backends (should be low) buffers_alloc, -- Buffers allocated stats_resetFROM pg_stat_bgwriter;-- If buffers_backend is high, consider increasing bgwriter activity
Challenge Exercises:
Create a table with 1 million rows and observe how it affects the buffer cache
Run a full table scan and watch the buffer pool fill up
Compare cache hit ratios before and after the scan
Identify which auxiliary process is consuming the most CPU using pg_stat_activity
Calculate the percentage of shared memory used by each major component
When you can confidently navigate PostgreSQL’s internal monitoring views and interpret the results, you’ve mastered process and memory inspection.
Hands-On Lab 2: Observing MVCC Behavior and VACUUM Effects
Goal: Understand how MVCC creates tuple versions and how VACUUM reclaims space.
Set Up a Test Environment:
Copy
-- Create a test tableCREATE TABLE mvcc_test ( id SERIAL PRIMARY KEY, data TEXT, updated_at TIMESTAMP DEFAULT NOW());-- Insert initial dataINSERT INTO mvcc_test (data)SELECT 'Initial data ' || iFROM generate_series(1, 10000) AS i;-- Check initial table sizeSELECT pg_size_pretty(pg_table_size('mvcc_test')) AS table_size, pg_size_pretty(pg_total_relation_size('mvcc_test')) AS total_size_with_indexes;
Observe Tuple Versioning:
Copy
-- Enable pageinspect extensionCREATE EXTENSION IF NOT EXISTS pageinspect;-- View tuple details including xmin and xmaxSELECT ctid, -- Physical location (page, offset) xmin, -- Transaction ID that created this tuple xmax, -- Transaction ID that deleted/updated (0 if current) id, dataFROM mvcc_testWHERE id <= 5;-- Note the xmin values - they're the transaction IDs that created these rows
Create Dead Tuples Through Updates:
Copy
-- Update every row to create dead tuplesUPDATE mvcc_test SET data = 'Updated once', updated_at = NOW();-- Check table statsSELECT n_live_tup AS live_tuples, n_dead_tup AS dead_tuples, n_tup_upd AS total_updates, n_tup_del AS total_deletes, last_vacuum, last_autovacuumFROM pg_stat_user_tablesWHERE relname = 'mvcc_test';-- Check table size (should have increased significantly)SELECT pg_size_pretty(pg_table_size('mvcc_test')) AS table_size_after_update;
Observe Dead Tuples with Multiple Updates:
Copy
-- Do several more updates to accumulate dead tuplesUPDATE mvcc_test SET data = 'Updated twice' WHERE id % 2 = 0;UPDATE mvcc_test SET data = 'Updated thrice' WHERE id % 3 = 0;UPDATE mvcc_test SET data = 'Updated fourth' WHERE id % 4 = 0;-- Check bloat accumulationSELECT relname AS table_name, n_live_tup, n_dead_tup, ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_tuple_percent, pg_size_pretty(pg_table_size(relname::regclass)) AS table_size, last_vacuum, last_autovacuumFROM pg_stat_user_tablesWHERE relname = 'mvcc_test';-- You should see significant dead tuple accumulation
Run VACUUM and Observe Effects:
Copy
-- Run VACUUM (not FULL)VACUUM VERBOSE mvcc_test;-- Check resultsSELECT n_live_tup, n_dead_tup, pg_size_pretty(pg_table_size('mvcc_test')) AS table_sizeFROM pg_stat_user_tablesWHERE relname = 'mvcc_test';-- Note: Table size may not shrink much because VACUUM doesn't return space to OS-- It marks space as reusable within the table
Experiment with VACUUM FULL:
Copy
-- VACUUM FULL rewrites entire table (locks table exclusively)VACUUM FULL VERBOSE mvcc_test;-- Check size reductionSELECT pg_size_pretty(pg_table_size('mvcc_test')) AS table_size_after_vacuum_full;-- Should see significant size reduction
Observe Transaction ID Visibility:
Copy
-- In one session, start a transactionBEGIN;SELECT txid_current() AS my_transaction_id;INSERT INTO mvcc_test (data) VALUES ('Uncommitted data');-- Note the transaction IDSELECT ctid, xmin, xmax, id, dataFROM mvcc_test WHERE data = 'Uncommitted data';-- In another session (different connection), try to see this rowSELECT * FROM mvcc_test WHERE data = 'Uncommitted data';-- Result: Empty! This demonstrates MVCC isolation-- Back in first session, commitCOMMIT;-- Now second session can see itSELECT * FROM mvcc_test WHERE data = 'Uncommitted data';
Monitor Autovacuum Activity:
Copy
-- Check autovacuum configurationSELECT name, setting, unit, short_descFROM pg_settingsWHERE name LIKE 'autovacuum%'ORDER BY name;-- View recent autovacuum activitySELECT schemaname, relname, last_autovacuum, autovacuum_count, n_live_tup, n_dead_tupFROM pg_stat_user_tablesWHERE autovacuum_count > 0ORDER BY last_autovacuum DESC NULLS LAST;
Advanced Challenge:
Create a workload with 50,000 updates and measure bloat accumulation
Configure autovacuum to run more aggressively and observe the difference
Use pageinspect to examine the internal structure of a page before and after VACUUM
Simulate a long-running transaction and observe how it prevents VACUUM from reclaiming space
Calculate the “bloat ratio” for all tables in your database
When you can explain how MVCC creates multiple tuple versions, how dead tuples accumulate, and how VACUUM reclaims space (with and without returning it to the OS), you’ve mastered PostgreSQL’s core concurrency mechanism.
-- See system columns for any rowSELECT ctid, -- Physical location (page, offset) xmin, -- Transaction that created xmax, -- Transaction that deleted (0 if live) cmin, -- Command ID for insert cmax, -- Command ID for delete tableoid, -- OID of the table *FROM users WHERE id = 1;-- Result:-- ctid | (0,1) -- Page 0, tuple 1-- xmin | 12345 -- Created by txn 12345-- xmax | 0 -- Not deleted-- cmin | 0 -- First command in its txn-- cmax | 0
-- Simplified visibility logic (actual code is more complex)-- A tuple is visible if:-- 1. xmin is committed AND-- 2. xmin started before our snapshot AND-- 3. (xmax is 0 OR xmax is aborted OR xmax started after our snapshot)-- Check transaction statusSELECT * FROM pg_stat_activity WHERE pid = pg_backend_pid();-- View commit log statusSELECT pg_xact_status(xmin) as xmin_status, pg_xact_status(xmax) as xmax_statusFROM users WHERE id = 1;-- Returns: 'committed', 'aborted', 'in progress', or NULL
-- REPEATABLE READ takes a snapshot at first queryBEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;SELECT * FROM accounts WHERE id = 1; -- Snapshot taken HERE-- Returns balance = 1000-- Another transaction updates and commits...SELECT * FROM accounts WHERE id = 1;-- Still returns balance = 1000! (same snapshot)COMMIT;-- New transaction sees the updateSELECT * FROM accounts WHERE id = 1;-- Returns balance = 1200
┌─────────────────────────────────────────────────────────────────────────────┐│ THE DEAD TUPLE PROBLEM │├─────────────────────────────────────────────────────────────────────────────┤│ ││ After many updates, pages fill with dead tuples: ││ ││ Page: ││ ┌──────────────────────────────────────────────────────────────────────┐ ││ │ [DEAD] [LIVE] [DEAD] [DEAD] [LIVE] [DEAD] [LIVE] [DEAD] [DEAD] │ ││ │ │ ││ │ 70% dead tuples = wasted space + slow scans! │ ││ └──────────────────────────────────────────────────────────────────────┘ ││ ││ Problems: ││ • Table bloat: 10GB of data uses 100GB on disk ││ • Slow sequential scans: must read dead tuples too ││ • Index bloat: indexes point to dead tuples ││ • Transaction ID wraparound risk ││ ││ After VACUUM: ││ ┌──────────────────────────────────────────────────────────────────────┐ ││ │ [FREE] [LIVE] [FREE] [FREE] [LIVE] [FREE] [LIVE] [FREE] [FREE] │ ││ │ │ ││ │ Space marked as free for reuse (but not returned to OS) │ ││ └──────────────────────────────────────────────────────────────────────┘ ││ ││ After VACUUM FULL (rewrites table): ││ ┌────────────────────────────────────────────────────────────────────┐ ││ │ [LIVE] [LIVE] [LIVE] │ ││ │ │ ││ │ Compacted! Space returned to OS. But table is locked during. │ ││ └────────────────────────────────────────────────────────────────────┘ ││ │└─────────────────────────────────────────────────────────────────────────────┘
-- View current settingsSELECT name, setting, unit, short_descFROM pg_settingsWHERE name LIKE '%autovacuum%';-- Per-table settings for high-traffic tablesALTER TABLE orders SET ( autovacuum_vacuum_threshold = 1000, -- Min dead tuples autovacuum_vacuum_scale_factor = 0.01, -- + 1% of table autovacuum_analyze_threshold = 500, autovacuum_analyze_scale_factor = 0.005, autovacuum_vacuum_cost_delay = 2 -- Less aggressive);-- Check if autovacuum is runningSELECT pid, datname, usename, state, query, backend_start, xact_startFROM pg_stat_activity WHERE query LIKE '%autovacuum%';
-- View WAL settingsSHOW wal_level; -- minimal, replica, logicalSHOW max_wal_size; -- Maximum WAL size before checkpointSHOW min_wal_size; -- Minimum WAL to keepSHOW wal_buffers; -- WAL buffer size-- View current WAL positionSELECT pg_current_wal_lsn();-- View WAL file infoSELECT * FROM pg_stat_wal;-- Force a checkpoint (flushes all data to disk)CHECKPOINT;-- View checkpoint statsSELECT * FROM pg_stat_bgwriter;
-- Buffer cache hit ratioSELECT sum(heap_blks_hit) / nullif(sum(heap_blks_hit) + sum(heap_blks_read), 0) as cache_hit_ratioFROM pg_statio_user_tables;-- Should be > 0.99 for well-tuned system-- Per-table buffer statsSELECT relname, heap_blks_read, -- Blocks read from disk heap_blks_hit, -- Blocks found in cache round(100.0 * heap_blks_hit / nullif(heap_blks_hit + heap_blks_read, 0), 2) as hit_ratioFROM pg_statio_user_tablesORDER BY heap_blks_read DESCLIMIT 20;-- What's in the buffer cache? (requires pg_buffercache extension)CREATE EXTENSION pg_buffercache;SELECT c.relname, count(*) AS buffers, pg_size_pretty(count(*) * 8192) AS size_in_cacheFROM pg_buffercache bJOIN pg_class c ON b.relfilenode = c.relfilenodeWHERE b.reldatabase = (SELECT oid FROM pg_database WHERE datname = current_database())GROUP BY c.relnameORDER BY count(*) DESCLIMIT 20;
-- Setup: Create a test scenarioCREATE TABLE visibility_test (id SERIAL PRIMARY KEY, value TEXT);-- Session 1BEGIN;INSERT INTO visibility_test (value) VALUES ('first');-- Note the xmin-- Session 2 (before Session 1 commits)SELECT ctid, xmin, xmax, * FROM visibility_test;-- What do you see?-- Session 1COMMIT;-- Session 2SELECT ctid, xmin, xmax, * FROM visibility_test;-- What changed?
-- Create and bloat a tableCREATE TABLE bloat_test ASSELECT generate_series(1, 100000) AS id, 'data' AS value;-- Update every row multiple timesUPDATE bloat_test SET value = 'updated1';UPDATE bloat_test SET value = 'updated2';UPDATE bloat_test SET value = 'updated3';-- Check bloatSELECT pg_size_pretty(pg_table_size('bloat_test')) AS table_size, n_live_tup, n_dead_tup, last_vacuumFROM pg_stat_user_tables WHERE relname = 'bloat_test';-- Run vacuum and compareVACUUM bloat_test;-- Check sizes again