Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Module 8: Query Processing Pipeline
Understanding how PostgreSQL processes queries is essential for optimization and OSS contribution. This module traces the complete journey from SQL text to result set. Real-world analogy: The query processing pipeline is like a restaurant kitchen. Your SQL query is the customer’s order. The parser is the waiter who writes down the order in a structured format. The analyzer checks that the dishes exist on the menu and resolves any ambiguity (“the special” becomes “grilled salmon”). The rewriter applies house rules (“all burgers come with fries” is like view expansion). The planner is the head chef deciding the most efficient sequence to prepare the meal. And the executor is the kitchen staff actually cooking and plating the dishes.Difficulty: Advanced
OSS Relevance: High — core source code areas
Interview Value: Senior/Staff level deep dives
8.1 Query Processing Overview
8.2 The Parser
Lexical Analysis (Lexer)
The lexer (scan.l) breaks SQL text into tokens.
Syntax Analysis (Parser)
The parser (gram.y) builds a raw parse tree from tokens.
Parse Tree Visualization
8.3 The Analyzer
The analyzer performs semantic analysis — converting the raw parse tree to a Query tree with resolved references. While the parser only checks that the SQL is grammatically valid (correct syntax), the analyzer checks that it actually makes sense: Do the referenced tables exist? Do the columns belong to those tables? Are the data types compatible? Practical tip: Most SQL errors that developers encounter in daily work are caught at the analyzer stage, not the parser. “Column not found,” “operator does not exist,” and “ambiguous column reference” are all analyzer errors. Understanding this distinction helps when reading PostgreSQL error messages — parser errors mean broken syntax, analyzer errors mean semantically invalid (but syntactically correct) SQL.Key Tasks
- Name Resolution
- Type Checking
- Function Resolution
Query Tree Structure
8.4 The Rewriter
The rewriter transforms query trees based on rules — primarily for view expansion. This is a relatively thin stage for most queries, but it is architecturally important because it is the mechanism that makes views work transparently. Practical tip: The rewriter is also the reason that a query against a view and the equivalent query against the base table produce identical execution plans. There is no performance penalty for using views — by the time the planner sees the query, the view has already been inlined. The exception is materialized views, which skip the rewriter entirely and read from their own stored table.View Expansion
Rule System
8.5 The Planner/Optimizer
The most complex component — generates an efficient execution plan.CBO vs. RBO: The Architectural Shift
Modern databases primarily use Cost-Based Optimization (CBO), but understanding Rule-Based Optimization (RBO) is essential for legacy systems and specific query rewrites.| Feature | Rule-Based (RBO) | Cost-Based (CBO) |
|---|---|---|
| Logic | Fixed hierarchy of rules (e.g., “Always use Index if available”) | Evaluates multiple paths based on statistics |
| Data Awareness | No knowledge of data distribution or size | Deeply aware of row counts, histograms, and I/O costs |
| Flexibility | Predictable but often suboptimal | Adaptive and finds complex optimizations |
| Performance | Fast planning, potentially slow execution | Slower planning (search space), faster execution |
ANALYZE) to calculate the “cost” of various execution strategies.
The Three Stages of Planning
-
Preprocessing (Simplification):
- Expression Simplification: Constant folding (e.g.,
2+2becomes4). - Subquery Flattening: Converting
IN (SELECT ...)to semi-joins or inner joins when possible. - Predicate Pushdown: Moving filters as deep as possible into the plan tree.
- Expression Simplification: Constant folding (e.g.,
-
Path Generation & Join Ordering:
- The planner generates Paths (access methods like SeqScan, IndexScan, BitmapScan).
- It uses Dynamic Programming to find the cheapest join order for up to
geqo_thresholdtables (default 12). - Genetic Query Optimizer (GEQO): For complex queries with many joins, PostgreSQL switches to a heuristic (genetic) algorithm to prevent the search space from exploding factorially ().
-
Costing & Selection:
- Every path is assigned a cost based on:
seq_page_cost: Cost of a sequential page read.random_page_cost: Cost of a random page read (critical for HDD vs SSD tuning).cpu_tuple_cost: Processing overhead per row.
- The cheapest path is converted into a Plan Tree.
- Every path is assigned a cost based on:
Join Planning Internals
Cost Model Tuning
Path vs Plan
Real-world analogy: Think of Paths as flight itinerary options and Plans as your boarding pass. When you search for flights from NYC to Tokyo, the system generates dozens of Paths: direct flights, one-stop via LAX, two-stop via London and Singapore. Each Path has an estimated cost (price + time). You compare them and pick the cheapest. That chosen Path is then converted into a Plan — the actual boarding pass with gate numbers, seat assignments, and departure times that the airline will execute.8.6 The Executor
The executor runs the plan tree, producing result tuples. Real-world analogy: The Volcano model works like a vending machine chain. The top node (say, a Sort) asks the node below it (a Filter) for the next item. The Filter in turn asks the node below it (a SeqScan) for the next item. Each node only does work when asked by the node above it. This “demand-driven” or “pull-based” approach means that a LIMIT 10 query can stop the entire pipeline after 10 rows — the lower nodes never even produce the remaining rows. It is inherently lazy, and that laziness is a major performance advantage. Practical tip: The Volcano model also explains why Sort and HashAggregate nodes are “pipeline breakers” — they cannot emit their first output tuple until they have consumed all input tuples. A Sort must see every row before it can tell you which one comes first. When you see high startup cost in EXPLAIN output, it usually indicates a pipeline breaker. This is why adding ORDER BY to a query that already has LIMIT can dramatically change performance characteristics.Execution Model
PostgreSQL uses iterator (volcano) model — each node is an iterator.Pull-Based Execution
Plan Node Types
| Node Type | Description | Example |
|---|---|---|
| SeqScan | Full table scan | Seq Scan on users |
| IndexScan | Scan using index | Index Scan using users_pkey |
| BitmapHeapScan | Bitmap index scan | Multiple indexes combined |
| NestLoop | Nested loop join | Small outer, any inner |
| HashJoin | Hash join | Equality joins |
| MergeJoin | Merge join | Pre-sorted inputs |
| Sort | Sort tuples | ORDER BY |
| Aggregate | Aggregation | GROUP BY, COUNT, SUM |
| Limit | Limit rows | LIMIT/OFFSET |
8.7 Expression Evaluation & JIT
Expression Evaluation
JIT Compilation
PostgreSQL can JIT-compile expressions using LLVM for better performance. Instead of interpreting the expression tree node by node at runtime, JIT converts frequently evaluated expressions into native machine code — the same optimization that makes Java’s HotSpot or JavaScript’s V8 fast for hot loops. Practical tip: JIT is not free. The compilation step itself takes time (typically 10-100ms), so it only pays off for queries that process many rows through complex expressions. This is why PostgreSQL gates JIT behind cost thresholds — for a simpleSELECT * FROM users WHERE id = 5, JIT compilation would take longer than just running the query. The default thresholds are tuned for OLAP workloads; for pure OLTP, you can safely leave JIT at its default settings or even disable it entirely with SET jit = off to avoid the overhead.
8.8 Practical: Trace Query Execution
Using Debug Output
Practical tip: These debug settings produce enormous amounts of output. Never enable them on a production server — they are strictly for development builds. On a local development instance, they are invaluable for understanding exactly how much time the parser, planner, and executor each consume for a given query.Examining Parse Trees
Interview Questions
Q: What happens when you run SELECT * FROM users?
Q: What happens when you run SELECT * FROM users?
- Parser: Tokenizes SQL, validates syntax, creates raw parse tree
- Analyzer: Resolves
usersto table OID,*to column list, type checks - Rewriter: Expands if
usersis a view - Planner: Considers SeqScan vs IndexScan, estimates costs, picks cheapest
- Executor: Initializes SeqScan node, iterates through heap pages, returns tuples
Q: Why might the planner choose a sequential scan over an index scan?
Q: Why might the planner choose a sequential scan over an index scan?
- Table is small (index overhead not worth it)
- Query returns large % of rows (random I/O worse than sequential)
- Statistics are stale (run ANALYZE)
- Index isn’t usable (function on column, wrong operator class)
enable_indexscan = off(diagnostic only!)
Q: Explain the difference between Path and Plan in PostgreSQL
Q: Explain the difference between Path and Plan in PostgreSQL
- Path: Represents a possible access strategy with estimated cost
- Multiple paths generated for each relation (seq scan, index scans, etc.)
- Paths are compared by cost to find the cheapest
- Plan: The chosen path converted to executable form
- Plan has actual operator implementation details
- Only one plan is executed
8.9 OSS Contribution: Source Code Areas
Key Files to Study
Next Module
Module 9: PostgreSQL Architecture
Interview Deep-Dive
The PostgreSQL planner chose a Nested Loop Join for a query that should obviously use a Hash Join. What went wrong and how do you investigate?
The PostgreSQL planner chose a Nested Loop Join for a query that should obviously use a Hash Join. What went wrong and how do you investigate?
- The most likely cause is a cardinality mis-estimate. The planner chose Nested Loop because it estimated the outer side would return a small number of rows (say 10), making Nested Loop efficient — O(10 * log m) with an index on the inner side. But the actual outer side returned 100,000 rows, turning the Nested Loop into O(100,000 * log m) when a Hash Join at O(n + m) would have been far cheaper.
- Investigation: run
EXPLAIN ANALYZEand compareestimated rowsvsactual rowsat every node. The problematic node is where the estimate diverges from reality. Common causes: stale statistics (run ANALYZE), correlated columns that the planner assumes are independent (use extended statistics:CREATE STATISTICS), skewed data distribution, or a function call that prevents the planner from estimating selectivity. - Fix hierarchy: (1)
ANALYZEthe table. (2) Increasedefault_statistics_targetor per-column targets. (3) Create extended statistics for correlated columns. (4) As a last resort, useSET enable_nestloop = offfor the session, but document why and file a ticket for a permanent fix.
WHERE city = 'NYC' AND zip_code = '10001' because it multiplies the individual selectivities. CREATE STATISTICS stats_city_zip (dependencies, ndistinct) ON city, zip_code FROM addresses tells the planner about this correlation, fixing the estimate.How does PostgreSQL's cost-based optimizer estimate the cost of an Index Scan vs a Sequential Scan, and when does it get the decision wrong?
How does PostgreSQL's cost-based optimizer estimate the cost of an Index Scan vs a Sequential Scan, and when does it get the decision wrong?
- The planner uses a cost model with several key parameters:
seq_page_cost(cost of reading one page sequentially, default 1.0),random_page_cost(cost of reading a random page, default 4.0),cpu_tuple_cost(CPU cost of processing one tuple), andcpu_index_tuple_cost(CPU cost of processing one index entry). - For a Sequential Scan: cost = number_of_pages * seq_page_cost + number_of_rows * cpu_tuple_cost. It reads every page sequentially and checks every row against the filter.
- For an Index Scan: cost = index_pages * random_page_cost + matching_index_entries * cpu_index_tuple_cost + matching_rows * random_page_cost (for heap fetches) + matching_rows * cpu_tuple_cost. The random_page_cost reflects that index and heap fetches are random I/O.
- The decision goes wrong in two common scenarios: (1) On SSDs, random reads are nearly as fast as sequential reads, but the default
random_page_cost = 4.0was calibrated for spinning disks. The planner overestimates index scan cost and chooses Seq Scan. Fix: setrandom_page_cost = 1.1for SSD. (2) When most of the table is cached in shared_buffers or OS cache, disk I/O cost is irrelevant but the planner does not know what is cached.effective_cache_sizeis a hint but imprecise.
WHERE status = 'active', it checks if ‘active’ appears in the MCV list and uses its frequency directly. For values not in the MCV list, it assumes uniform distribution among the remaining fraction. For range predicates, it interpolates within histogram buckets. This is why ANALYZE is critical — without fresh statistics, the planner is estimating based on stale or default assumptions (default selectivity for unknown predicates is typically 0.5% for equality, which can be wildly wrong).Explain JIT compilation in PostgreSQL. When does it help, when does it hurt, and how do you control it?
Explain JIT compilation in PostgreSQL. When does it help, when does it hurt, and how do you control it?
- JIT (Just-In-Time compilation, added in PostgreSQL 11) uses LLVM to compile query expressions and tuple deforming into native machine code at runtime. Instead of interpreting expression evaluation (walk a tree of function pointers for each tuple), the JIT compiler generates a specialized function that evaluates the entire expression in a tight loop.
- When it helps: analytical queries that process millions of rows through complex expressions, aggregations, or WHERE clauses. The per-tuple overhead reduction compounds over millions of iterations. A SUM over 100M rows with a complex CASE expression can see 2-5x speedup.
- When it hurts: short OLTP queries where the JIT compilation time (typically 5-50ms) exceeds the total execution time. If your query runs in 2ms, spending 20ms compiling it is a net loss. Also, the first execution of a prepared statement pays the JIT cost, creating a latency spike.
- Control:
jit = on/off(global toggle),jit_above_cost = 100000(only JIT queries with estimated cost above this threshold),jit_inline_above_cost = 500000(enable function inlining above this cost),jit_optimize_above_cost = 500000(enable LLVM optimizations above this cost). For OLTP workloads, I setjit_above_costhigh (500000+) to ensure only truly expensive analytical queries trigger JIT. For dedicated analytics instances, lower thresholds are appropriate.
SET LOCAL jit = off; within a transaction disables JIT for that transaction only. Or use SET jit_above_cost = 1000000000; to effectively disable it by setting an unreachably high threshold. This is the approach I use for OLTP connection pools where individual query latency matters more than throughput.