Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Module 5: Query Optimization
Every developer eventually faces the dreaded slow query. This module teaches you a systematic approach to diagnose performance issues and apply the right fixes. The most important mindset shift in query optimization: Stop guessing. The difference between a junior and a senior engineer facing a slow query is that the junior starts adding indexes, while the senior runsEXPLAIN ANALYZE first. Over 80% of query performance problems become obvious once you read the execution plan. The plan tells you exactly where time is being spent — you just need to learn the language it speaks.
Estimated Time: 10-12 hours
Hands-On: Real-world slow query analysis
Key Skill: Reading execution plans and applying targeted optimizations
Hands-On: Real-world slow query analysis
Key Skill: Reading execution plans and applying targeted optimizations
5.1 The Query Optimization Process
Follow this systematic approach for any slow query:5.2 Finding Slow Queries
Using pg_stat_statements
Query Logging
5.3 Reading Execution Plans Like a Pro
EXPLAIN Options
EXPLAIN (ANALYZE, BUFFERS) rather than plain EXPLAIN ANALYZE. The BUFFERS output is what separates a superficial analysis from a real diagnosis. Without it, you know a node took 50ms, but you do not know whether that was 50ms of CPU computation or 50ms of waiting for disk reads.
Anatomy of a Plan
Common Plan Nodes
Seq Scan
Seq Scan
Reads entire table, checking each row against filter.When it’s OK: Small tables, selecting most rows
When it’s bad: Large tables, selecting few rows → Add index
Index Scan
Index Scan
Uses index to find rows, then fetches from table.Good for: Selective queries (small % of rows)
Watch for: High
Buffers: read means data not cachedIndex Only Scan
Index Only Scan
All data comes from the index itself — no heap (table) access needed. This is the fastest scan type because it avoids the random I/O of fetching rows from the table.Best case scenario!
Requires: A covering index (includes all columns the query needs) AND a recently vacuumed table (so the visibility map is up-to-date).Common pitfall: If
Heap Fetches is close to the row count, you are losing most of the benefit of an index-only scan. This usually means VACUUM has not run recently enough. Check pg_stat_user_tables.n_dead_tup for the table.
Watch for: Heap Fetches > 0 means some table access neededBitmap Scan
Bitmap Scan
Two-phase: build bitmap from index, then fetch matching rows.Good for: Medium selectivity, OR conditions
Better than: Multiple Index Scans combined
Nested Loop Join
Nested Loop Join
For each row in outer, scan inner for matches.Best for: Small outer result, indexed inner
Beware: O(n × m) if both sides are large!
Hash Join
Hash Join
Build hash table from smaller side, probe with larger.Best for: No index on join column, larger result sets
Watch for:
Batches > 1 means spilling to diskMerge Join
Merge Join
Sort both sides, merge like a zipper.Best for: Both sides already sorted (or have sorted index)
Cost: O(n log n + m log m + n + m)
5.4 Common Performance Problems
Problem 1: Missing Index
Problem 2: Poor Statistics
Problem 3: Inefficient Join Order
Problem 4: Memory Spilling to Disk
Problem 5: N+1 Queries
5.5 Query Rewriting Techniques
Simplifying Subqueries
EXISTS vs IN vs JOIN
Pagination Optimization
Aggregation Optimization
5.6 Configuration Tuning
Key Performance Parameters
Dynamic Configuration
5.7 Query Plan Caching
Understanding Prepared Statements
Plan Instability
5.8 Practice: Optimize These Queries
Query 1: Dashboard Stats
Optimized Solution
Optimized Solution
Query 2: Product Search
Optimized Solution
Optimized Solution
Next Module
Module 6: PostgreSQL Internals
Understand how PostgreSQL works under the hood
Interview Deep-Dive
A query that was running in 50ms yesterday is now taking 30 seconds. Nothing in the application changed. Walk me through your diagnosis.
A query that was running in 50ms yesterday is now taking 30 seconds. Nothing in the application changed. Walk me through your diagnosis.
Strong Answer:
- This is a classic plan regression scenario. The query itself did not change, but something the planner depends on did. My diagnosis checklist, in order of likelihood:
- (1) Statistics changed. Run
EXPLAIN ANALYZEand compare estimated rows vs. actual rows. If the estimates are wildly off (e.g., estimated 100 rows, actual 500,000), the planner chose a nested loop where it should have chosen a hash join, or an index scan where it should have chosen a sequential scan. Fix: runANALYZEon the affected tables. If a specific column has skewed distribution, increase its statistics target:ALTER TABLE orders ALTER COLUMN status SET STATISTICS 1000; ANALYZE orders; - (2) An index was dropped, became invalid, or bloated. Check
pg_stat_user_indexesfor the relevant indexes. Check if a concurrent REINDEX failed partway, leaving an invalid index. Check index size vs. table size ratio — if the index is larger than the table, it is severely bloated and needs REINDEX CONCURRENTLY. - (3) Table bloat. If autovacuum has been struggling (check
n_dead_tupinpg_stat_user_tables), the table may have doubled in size with dead tuples. Sequential scans now read 2x the data. The planner’s cost estimates may also be wrong becausepg_class.relpagesreflects the bloated size. - (4) Memory pressure. If another workload consumed shared_buffers or the OS page cache was evicted (perhaps a backup ran), the same query now hits disk instead of cache. Check
EXPLAIN (ANALYZE, BUFFERS)— ifBuffers: readis high where it was previously allshared hit, this is the cause. - (5) Lock contention. The query itself is fast, but it is waiting for a lock held by another transaction. Check
pg_stat_activityforwait_event_type = 'Lock'.
plan_cache_mode = force_custom_plan for queries that are sensitive to parameter values. (2) Monitor plan changes using pg_stat_statements — track the mean_exec_time over time and alert when it jumps by more than 3x. (3) For the most critical queries, maintain a test suite that runs EXPLAIN on production-like data volumes and asserts on the plan shape (e.g., “this query must use Index Scan, not Seq Scan”). This catches regressions before deployment.Explain the difference between a Hash Join, Merge Join, and Nested Loop Join. When does the planner choose each, and when does it get it wrong?
Explain the difference between a Hash Join, Merge Join, and Nested Loop Join. When does the planner choose each, and when does it get it wrong?
Strong Answer:
- Nested Loop: For each row in the outer relation, scan the inner relation for matches. O(n * m) in the worst case, but with an index on the inner relation’s join column, the inner scan is O(log m) per row, making it O(n * log m). The planner chooses this when the outer relation is small (a few hundred rows or fewer) and the inner relation has a good index. It gets it wrong when the planner underestimates the outer relation’s cardinality — estimating 100 rows but getting 100,000 turns a fast plan into a catastrophe.
- Hash Join: Build a hash table from the smaller relation (the “build” side), then probe it with each row from the larger relation (the “probe” side). O(n + m) with O(min(n,m)) memory. The planner chooses this for equi-joins when neither side has a useful sort order and at least one side fits in work_mem. Watch for
Batches: 4in EXPLAIN — this means the hash table did not fit in work_mem and spilled to disk, dramatically slowing the join. Fix: increase work_mem for the session, or add an index to enable a nested loop instead. - Merge Join: Sort both relations on the join key, then merge them like a zipper. O(n log n + m log m) for the sorts, then O(n + m) for the merge. The planner chooses this when both sides are already sorted (from an index scan or a preceding sort) or when the result itself needs to be sorted. Merge joins are rare in OLTP but common in analytics and data warehouse queries with large sorted result sets.
- The planner gets it wrong most often on nested loops: a cardinality underestimate on the outer side turns what should be a hash join into a devastating nested loop. If you see a nested loop with
loops=500000in the inner child, that is almost certainly a mis-estimate. Run ANALYZE, increase statistics targets on the relevant columns, or in extreme cases useSET enable_nestloop = offfor that specific query session to force the planner away from the bad plan.
SET enable_hashjoin = off, SET enable_mergejoin = off, etc. This is a blunt instrument — I would only use it as a temporary workaround with a comment explaining why and a ticket to fix the underlying statistics issue. A better approach is pg_hint_plan extension which lets you annotate individual queries with hints like /*+ HashJoin(orders users) */ without affecting the global planner behavior. The principle: fix the inputs (statistics, indexes) rather than overriding the outputs (plan choices).How do you optimize a query that aggregates data across 500 million rows for a real-time dashboard?
How do you optimize a query that aggregates data across 500 million rows for a real-time dashboard?
Strong Answer:
- You do not. Real-time aggregation over 500M rows is fundamentally incompatible with sub-second response times on a single PostgreSQL instance. The correct approach is pre-aggregation at an appropriate granularity.
- Strategy 1: Materialized views with periodic refresh. Create a materialized view that pre-computes daily/hourly aggregates. Refresh it every N minutes with
REFRESH MATERIALIZED VIEW CONCURRENTLY. The dashboard queries the materialized view (which has thousands of rows, not millions) and responds in milliseconds. The tradeoff is data staleness bounded by the refresh interval. - Strategy 2: Summary tables maintained by triggers. For true real-time needs, maintain a summary table that is updated incrementally. An AFTER INSERT trigger on the events table increments the counter in the summary table. The dashboard reads the summary table. The tradeoff is write overhead (every insert triggers a summary update) and complexity in handling UPDATE/DELETE on the source table.
- Strategy 3: BRIN-indexed partitioned table with partition pruning. If the dashboard always filters by time range (e.g., last 30 days), partition the table by month. The query only scans 1-2 partitions instead of the full 500M rows. Combined with a BRIN index on the timestamp, this can reduce the scan to a manageable subset.
- Strategy 4: External OLAP engine. For truly complex real-time analytics, replicate the data to ClickHouse, Apache Druid, or TimescaleDB which are purpose-built for analytical aggregations. PostgreSQL’s logical replication can feed these systems with minimal lag.