Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Module 9: PostgreSQL Architecture
Understanding PostgreSQL’s architecture is essential for debugging production issues, performance tuning, and contributing to the codebase. This module covers the complete system design.Estimated Time: 10-12 hours
Difficulty: Advanced
OSS Relevance: Critical — understanding overall system
Interview Value: Staff/Principal level discussions
Difficulty: Advanced
OSS Relevance: Critical — understanding overall system
Interview Value: Staff/Principal level discussions
9.1 Process Model
Process Details
| Process | Purpose | Key Behavior |
|---|---|---|
| Postmaster | Supervisor | Forks backends, manages startup/shutdown |
| Backend | Client queries | One per connection, runs queries |
| Background Writer | Dirty page flush | Writes dirty buffers to disk gradually |
| Checkpointer | Checkpoints | Periodic full sync of dirty pages |
| WAL Writer | WAL flush | Flushes WAL to disk periodically |
| Autovacuum Launcher | Vacuum scheduling | Spawns autovacuum workers |
| Stats Collector | Statistics | Aggregates activity statistics |
| Archiver | WAL archiving | Archives WAL files (if enabled) |
9.2 Memory Architecture
Memory Configuration Best Practices
9.3 Buffer Pool Management
The buffer pool is the heart of PostgreSQL’s I/O system. It sits between your queries and the disk, caching frequently-accessed 8KB pages in RAM so the database avoids slow disk reads. Real-world analogy: The buffer pool is like the counter at a busy pharmacy. The pharmacist (backend process) doesn’t walk to the warehouse (disk) for every prescription. Instead, the most requested medicines sit on the counter (buffer pool). When a medicine is requested, the pharmacist checks the counter first. If it’s there (a “buffer hit”), they hand it over instantly. If not (a “buffer miss”), they fetch it from the warehouse and place it on the counter for next time — bumping a less-popular item off the limited counter space.Buffer Architecture
Clock-Sweep Replacement
PostgreSQL uses a clock-sweep algorithm (approximating LRU) for buffer replacement.9.4 WAL (Write-Ahead Logging)
WAL ensures durability and crash recovery.WAL Principles
WAL Structure
LSN (Log Sequence Number)
9.5 Checkpoint & Recovery
Checkpoint Process
9.6 Vacuum & Autovacuum
Why VACUUM is Needed
Autovacuum Configuration
9.7 Monitoring Architecture Health
9.8 Interview Deep Dives
Q: Explain PostgreSQL's process model and why it uses forking
Q: Explain PostgreSQL's process model and why it uses forking
Expected Answer:
- PostgreSQL uses multi-process (not multi-threaded) model
- Each client gets dedicated backend process (fork from postmaster)
- Advantages: Process isolation (one crash doesn’t affect others), simpler shared memory model, works well on all UNIX systems
- Disadvantages: Higher per-connection overhead (~10MB), fork can be slow
- Historical: predates pthreads standardization, proven reliability
- Modern mitigations: connection pooling (PgBouncer)
Q: What is WAL and why is it important?
Q: What is WAL and why is it important?
Expected Answer:
- Write-Ahead Logging: all changes logged before applying to data files
- Durability: committed transactions survive crashes (redo from WAL)
- Performance: sequential WAL writes faster than random data page writes
- Replication: streaming replication sends WAL to standbys
- PITR: point-in-time recovery using WAL archives
- Key insight: can lose dirty buffers, but WAL enables full recovery
Q: How does PostgreSQL handle memory management?
Q: How does PostgreSQL handle memory management?
Expected Answer:
- Shared memory (shared_buffers): page cache for all backends, buffer descriptors
- Local memory (work_mem): per-backend for sorts/hashes, careful with connections
- Memory contexts: hierarchical palloc/pfree for easy cleanup
- Sizing: shared_buffers ~25% RAM, work_mem × max_connections < 50% RAM
- Buffer replacement: clock-sweep algorithm (approximate LRU)
9.9 Source Code Reference
Next Module
Module 10: Replication & HA
Build highly available PostgreSQL systems
Interview Deep-Dive (Conversational)
You said shared_buffers should be 25% of RAM. What happens if you set it to 75%? And why is not 'more cache = better' true here?
You said shared_buffers should be 25% of RAM. What happens if you set it to 75%? And why is not 'more cache = better' true here?
Explain the autovacuum subsystem. Why does it exist, what happens when it falls behind, and how do you tune it for a high-write workload?
Explain the autovacuum subsystem. Why does it exist, what happens when it falls behind, and how do you tune it for a high-write workload?
Strong Answer:
- Autovacuum exists because PostgreSQL’s MVCC creates dead tuples on every UPDATE and DELETE. Without cleanup, tables grow without bound, indexes bloat, and eventually the 32-bit transaction ID wraps around, forcing the database to shut down. Autovacuum is not optional — it is a core component of PostgreSQL’s correctness guarantee.
- When it falls behind: the symptoms cascade. Dead tuples accumulate, causing table bloat (sequential scans read more pages), index bloat (index lookups traverse more pages), and eventually the
autovacuum_freeze_max_agethreshold triggers an aggressive “anti-wraparound” vacuum that cannot be cancelled, causing I/O storms. At its worst, the database refuses new transactions withWARNING: database must be vacuumed within X transactions. - Tuning for high-write workloads: (1) Increase
autovacuum_max_workersfrom 3 to 5-6. (2) Reduceautovacuum_vacuum_scale_factorto 0.01 on high-churn tables (vacuum after 1% of rows are dead instead of 20%). (3) Increaseautovacuum_vacuum_cost_limitfrom 200 to 1000-2000 to let vacuum do more work before sleeping. (4) Reduceautovacuum_naptimefrom 1min to 15s for faster reaction. (5) Set per-table settings on the busiest tables:ALTER TABLE events SET (autovacuum_vacuum_scale_factor = 0.005, autovacuum_vacuum_cost_limit = 2000). - Monitor: watch
n_dead_tupandlast_autovacuumin pg_stat_user_tables. Alert when n_dead_tup exceeds n_live_tup on any table, or when last_autovacuum is more than 1 hour ago on a high-write table.
The checkpointer process just triggered a 'checkpoint too frequent' warning. What is happening and what do you do?
The checkpointer process just triggered a 'checkpoint too frequent' warning. What is happening and what do you do?
Strong Answer:
- This warning means checkpoints are being triggered by WAL volume (max_wal_size) rather than the configured interval (checkpoint_timeout). The database is generating WAL faster than the checkpoint interval can accommodate, so the system forces early checkpoints to prevent running out of WAL space.
- Impact: each checkpoint flushes ALL dirty buffers to disk. Frequent checkpoints mean frequent full I/O flushes, causing latency spikes. They also mean full_page_writes (which happen after each checkpoint) trigger more often, inflating WAL volume further in a vicious cycle.
- Fix: increase
max_wal_sizeto accommodate the WAL volume generated within one checkpoint_timeout interval. Calculate: checkpg_stat_bgwriter.checkpoints_req(checkpoints triggered by WAL size) vscheckpoints_timed(triggered by timeout). Ifcheckpoints_reqdominates, double max_wal_size and re-evaluate. A typical production setting for write-heavy workloads is max_wal_size = 4GB-16GB with checkpoint_timeout = 15min. - Also ensure
checkpoint_completion_target = 0.9so checkpoint I/O is spread over 90% of the interval rather than hitting all at once. And verify that your storage can handle the checkpoint write burst — if underlying disk I/O is the bottleneck, no amount of tuning will help.