Query Execution Plans Mastery

Module Duration: 8-10 hours Difficulty: Intermediate to Advanced Hands-On: 20+ real-world query analysis exercises Outcome: Read execution plans like a database internals engineer

Why Execution Plans Matter

The Core Problem: Your query is slow. But WHY?

-- This query takes 5 seconds. What's wrong?
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id, u.name
ORDER BY order_count DESC;

-- Without execution plans: Pure guesswork
-- - Add an index? Which column?
-- - Rewrite the query? How?
-- - Increase memory? By how much?

-- With execution plans: Surgical precision
-- - See exactly what's slow (Seq Scan on orders: 4.8s)
-- - Know the exact fix (Index on orders.user_id)
-- - Verify improvement (Now 50ms)

What You’ll Master:

Read and interpret every node type (Seq Scan, Index Scan, Hash Join, etc.)
Identify performance bottlenecks instantly
Understand cost calculations and row estimates
Optimize queries systematically, not randomly
Compare execution across PostgreSQL, MySQL, SQL Server

This is not just EXPLAIN syntax. This is learning to think like the query planner, predict its decisions, and guide it to the fastest execution path.

Part 1: Execution Plan Fundamentals

What is an Execution Plan?

An execution plan is the step-by-step recipe the database uses to execute your query. Analogy: Cooking a Meal

Recipe (Execution Plan):
1. Boil water (Table Scan)
2. Chop vegetables (Filter)
3. Combine ingredients (Join)
4. Sort portions (Sort)
5. Serve on plates (Return Results)

Just like:
- Different chefs (planners) might prepare differently
- Some methods are faster (index vs scan)
- Order matters (join order affects speed)
- Resources matter (memory = better performance)

The Two Commands: EXPLAIN vs EXPLAIN ANALYZE

-- EXPLAIN: Shows the PLAN (doesn't execute)
EXPLAIN
SELECT * FROM users WHERE email = 'alice@example.com';

-- Output:
Index Scan using users_email_idx on users  (cost=0.42..8.44 rows=1 width=40)
  Index Cond: (email = 'alice@example.com'::text)

-- EXPLAIN ANALYZE: Shows PLAN + ACTUAL EXECUTION
EXPLAIN ANALYZE
SELECT * FROM users WHERE email = 'alice@example.com';

-- Output:
Index Scan using users_email_idx on users  (cost=0.42..8.44 rows=1 width=40)
                                            (actual time=0.025..0.027 rows=1 loops=1)
  Index Cond: (email = 'alice@example.com'::text)
Planning Time: 0.123 ms
Execution Time: 0.052 ms

Key Differences:

Aspect	EXPLAIN	EXPLAIN ANALYZE
Executes query?	❌ No	✅ Yes
Shows actual times?	❌ No	✅ Yes
Shows actual rows?	❌ No	✅ Yes
Safe for production?	✅ Yes	⚠️ Caution (runs query!)
Use when	Checking plan	Diagnosing slow query

EXPLAIN ANALYZE actually runs the query! For writes (INSERT/UPDATE/DELETE), wrap in transaction:

BEGIN;
EXPLAIN ANALYZE DELETE FROM huge_table WHERE ...;
ROLLBACK;  -- Prevent actual deletion

Understanding the Output Format

EXPLAIN (FORMAT TEXT)  -- Default, human-readable tree
EXPLAIN (FORMAT JSON)  -- Programmatic parsing
EXPLAIN (FORMAT YAML)  -- Structured alternative
EXPLAIN (FORMAT XML)   -- For tool integration

We’ll use TEXT format for learning (most common):

Nested Loop  (cost=0.00..1000.00 rows=100 width=50)
  ->  Seq Scan on users  (cost=0.00..100.00 rows=10 width=30)
        Filter: (created_at > '2024-01-01')
  ->  Index Scan on orders  (cost=0.42..80.00 rows=10 width=20)
        Index Cond: (user_id = users.id)

Reading the Tree:

- Top-level node (Nested Loop) executed LAST
- Indented nodes (children) executed FIRST
- Bottom-to-top execution flow
- Left-to-right for siblings at same level

Part 2: Decoding the Numbers

Cost Explained

Index Scan using users_pkey on users  (cost=0.42..8.44 rows=1 width=40)
                                            ^^^^    ^^^^
                                        startup cost  total cost

What is “cost”?

Not real time (not milliseconds!)
Abstract units based on I/O operations
Default: 1 sequential page read = 1.0 cost
Used to compare execution paths

Cost Components:

// PostgreSQL cost calculation (simplified)
cost = (pages_read * seq_page_cost) +
       (index_pages * random_page_cost) +
       (tuples_processed * cpu_tuple_cost) +
       (operators_applied * cpu_operator_cost)

// Default values (postgresql.conf):
seq_page_cost = 1.0          // Sequential disk read
random_page_cost = 4.0       // Random disk read (4x worse!)
cpu_tuple_cost = 0.01        // Process one row
cpu_operator_cost = 0.0025   // Apply one filter/operator

Example Calculation:

-- Table: 10,000 rows, 1,000 pages
-- Query: Sequential scan with filter

Seq Scan cost calculation:
= (1000 pages * 1.0 seq_page_cost) +     -- Read all pages
  (10000 rows * 0.01 cpu_tuple_cost) +   -- Process all rows
  (10000 * 0.0025 cpu_operator_cost)     -- Apply filter
= 1000 + 100 + 25
= 1125.00

Startup Cost vs Total Cost

Sort  (cost=1000.00..1050.00 rows=100 width=50)
           ^^^^^^^^^ ^^^^^^^^
           startup   total

Startup Cost: Work before first row returned

Useful for queries with LIMIT
Matters for interactive applications

Total Cost: Complete execution

What you care about for batch jobs

Example:

-- Sort has high startup (must collect all rows first)
EXPLAIN
SELECT * FROM users ORDER BY created_at LIMIT 10;

Sort  (cost=5000.00..5500.00 ...)  -- High startup
  ->  Seq Scan on users (cost=0.00..100.00 ...)

-- vs Index Scan (low startup, returns rows immediately)
EXPLAIN
SELECT * FROM users WHERE id = 5 LIMIT 10;

Index Scan (cost=0.42..8.44 ...)  -- Low startup, immediate results

Row Estimates vs Actual Rows

Seq Scan on orders  (cost=0.00..1000.00 rows=50 width=100)
                                        ^^^^
                                    ESTIMATED rows

EXPLAIN ANALYZE output:
Seq Scan on orders  (cost=0.00..1000.00 rows=50 width=100)
                    (actual time=0.1..10.5 rows=5000 loops=1)
                                           ^^^^^^^^^
                                        ACTUAL rows (100x off!)

Why Row Estimates Matter:

Bad estimate (planner thinks 50 rows, actually 5000):
  ↓
Chooses wrong join method (Nested Loop instead of Hash Join)
  ↓
Slow query (10 seconds instead of 100ms)

Common Causes of Bad Estimates:

Outdated statistics: Run ANALYZE
Correlated columns: Planner assumes independence
Complex WHERE conditions: Planner guesses conservatively
Functions in WHERE: Planner can’t estimate WHERE lower(email) = ...

Fixing Bad Estimates:

-- 1. Update statistics
ANALYZE users;

-- 2. Increase statistics target for important columns
ALTER TABLE users ALTER COLUMN email SET STATISTICS 1000;
ANALYZE users;

-- 3. Create expression index for function-based queries
CREATE INDEX users_lower_email_idx ON users(lower(email));

-- 4. Use extended statistics for correlated columns
CREATE STATISTICS users_city_state_stats (dependencies)
ON city, state FROM users;
ANALYZE users;

Width

Seq Scan on users  (cost=0.00..100.00 rows=1000 width=40)
                                                 ^^^^^^^^
                                              Average row size in bytes

What it means:

Average bytes per row returned
Used for memory calculations (work_mem, hash tables)
Includes only columns in SELECT

Example:

-- All columns
SELECT * FROM users;  -- width=200 (name + email + address + ...)

-- Specific columns
SELECT id, name FROM users;  -- width=40 (just id + name)

Part 3: Scan Methods Deep Dive

Sequential Scan (Seq Scan)

When Used: Reading most/all rows from table.

EXPLAIN ANALYZE
SELECT * FROM users WHERE created_at > '2020-01-01';
-- Returns 95% of rows

Seq Scan on users  (cost=0.00..1000.00 rows=9500 width=100)
                   (actual time=0.05..15.2 rows=9500 loops=1)
  Filter: (created_at > '2020-01-01'::date)
  Rows Removed by Filter: 500

How It Works:

1. Read table pages sequentially from disk
2. For each page:
   - Load into buffer pool
   - Scan all tuples
   - Apply filter
   - Return matching rows
3. Continue until end of table

Performance Characteristics:

Advantages:
✅ Efficient for reading large % of table (>5-10%)
✅ Utilizes disk sequential read (fast)
✅ No index overhead

Disadvantages:
❌ Reads entire table (wasteful if few matches)
❌ Slow for selective queries

Example - When Seq Scan is Optimal:

-- Table: 1 million rows
-- Query returns 500k rows (50%)

-- Seq Scan: Read 1M rows, return 500k
Cost: 10,000 (sequential page reads)

-- Index Scan alternative: Access 500k rows via index
Cost: 500,000 * 4.0 (random page reads) = 2,000,000
      ↑ 200x worse!

Planner correctly chooses Seq Scan

Index Scan

When Used: Reading specific rows via index.

CREATE INDEX users_email_idx ON users(email);

EXPLAIN ANALYZE
SELECT * FROM users WHERE email = 'alice@example.com';

Index Scan using users_email_idx on users
  (cost=0.42..8.44 rows=1 width=100)
  (actual time=0.025..0.027 rows=1 loops=1)
  Index Cond: (email = 'alice@example.com'::text)

How It Works:

1. Search B-tree index for 'alice@example.com'
2. Find leaf node with pointer to heap tuple
3. Fetch heap page containing the row
4. Return row

For multiple rows:
  Repeat steps 2-4 for each matching index entry

Performance:

Advantages:
✅ Fast for selective queries (<5% of table)
✅ Returns rows in index order (helps ORDER BY)

Disadvantages:
❌ Random I/O (slower than sequential)
❌ Heap lookups add overhead

Cost Breakdown:

-- Startup cost (0.42):
- B-tree descent (log N)
- Typically 3-4 levels

-- Total cost (8.44):
- Index read + Heap tuple fetch

Index Only Scan

When Used: Index contains all needed columns (covering index).

CREATE INDEX users_email_name_idx ON users(email, name);

EXPLAIN ANALYZE
SELECT name FROM users WHERE email = 'alice@example.com';

Index Only Scan using users_email_name_idx on users
  (cost=0.42..4.44 rows=1 width=15)
  (actual time=0.015..0.016 rows=1 loops=1)
  Index Cond: (email = 'alice@example.com'::text)
  Heap Fetches: 0  ← No heap access!

How It Works:

Search index for 'alice@example.com'
Read 'name' directly from index leaf
Return result
   (No heap tuple fetch needed!)

Massive Performance Win:

Index Scan:        Index read + Heap fetch = ~8.44 cost
Index Only Scan:   Index read only = ~4.44 cost
                   ↑ 2x faster

Caveat - Visibility Map:

Index Only Scan ... (Heap Fetches: 342)
                                   ^^^^
                                 Had to check heap for some rows

Why?
- PostgreSQL MVCC: Need to check row visibility
- Visibility map tracks "all tuples on page visible"
- Old/dead tuples → Heap fetch required

Solution: VACUUM regularly
VACUUM users;
-- Updates visibility map → More index-only scans

Bitmap Index Scan

When Used: Combining multiple indexes OR fetching many rows.

CREATE INDEX users_city_idx ON users(city);
CREATE INDEX users_age_idx ON users(age);

EXPLAIN ANALYZE
SELECT * FROM users WHERE city = 'NYC' AND age > 30;

Bitmap Heap Scan on users  (cost=100.00..500.00 rows=500)
  Recheck Cond: ((city = 'NYC') AND (age > 30))
  ->  BitmapAnd  (cost=100.00..100.00 rows=500)
        ->  Bitmap Index Scan on users_city_idx
        ->  Bitmap Index Scan on users_age_idx

How It Works:

1. Scan users_city_idx → Create bitmap of matching TIDs (tuple IDs)
   Bitmap: [page 5, page 12, page 23, ...]

2. Scan users_age_idx → Create second bitmap
   Bitmap: [page 3, page 12, page 45, ...]

3. AND bitmaps together
   Result: [page 12, ...] (pages in both bitmaps)

4. Sort pages by physical location
   [page 12] → Sequential access!

5. Fetch heap pages in sorted order
6. Recheck conditions (bitmap is lossy)

Why Bitmap?

Problem with regular Index Scan on multiple rows:
  - Fetches rows in index order
  - Random heap access
  - Slow

Bitmap solution:
  - Buffers TIDs in memory
  - Sorts by heap location
  - Sequential heap access
  - Much faster for moderate result sets (100-10000 rows)

Bitmap vs Index Scan Decision:

Few rows (<100):     Index Scan (direct access)
Moderate (100-10k):  Bitmap Index Scan (sorted access)
Many (>10k):         Seq Scan (read entire table)

Part 4: Join Methods Explained

Nested Loop Join

Concept: For each row in outer table, scan inner table.

EXPLAIN ANALYZE
SELECT u.name, o.total
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE u.id = 5;

Nested Loop  (cost=0.84..16.88 rows=1)
  ->  Index Scan on users u  (cost=0.42..8.44 rows=1)
        Index Cond: (id = 5)
  ->  Index Scan on orders o  (cost=0.42..8.44 rows=1)
        Index Cond: (user_id = 5)

Pseudocode:

result = []
for outer_row in outer_table:  # users
    for inner_row in inner_table:  # orders
        if outer_row.id == inner_row.user_id:
            result.append((outer_row, inner_row))
return result

Performance:

Cost: O(outer_rows * inner_rows) per match

Best case: Inner table has index on join key
  - Index Scan for each outer row
  - Cost: outer_rows * index_lookup_cost

Worst case: No index on inner table
  - Seq Scan for EACH outer row
  - Cost: outer_rows * table_size
  - Catastrophic!

When Optimal:

✅ Small outer table (few rows)
✅ Index exists on inner table join key
✅ Selective WHERE clause on outer table

Example:
- Outer: 1 user (WHERE user_id = 5)
- Inner: Orders with index on user_id
- Result: 1 index lookup = Fast!

When Terrible:

❌ Large outer table
❌ No index on inner table
❌ Cartesian product (no join condition)

Example:
- Outer: 10,000 users
- Inner: 1 million orders, NO index
- Result: 10,000 table scans = Days!

Hash Join

Concept: Build hash table on one input, probe with other.

EXPLAIN ANALYZE
SELECT u.name, o.total
FROM users u
JOIN orders o ON u.id = o.user_id;

Hash Join  (cost=500.00..5000.00 rows=10000)
  Hash Cond: (o.user_id = u.id)
  ->  Seq Scan on orders o  (cost=0.00..2000.00 rows=100000)
  ->  Hash  (cost=100.00..100.00 rows=10000)
        ->  Seq Scan on users u  (cost=0.00..100.00 rows=10000)

Algorithm:

# Phase 1: Build hash table (smaller table)
hash_table = {}
for row in users:
    hash_table[row.id] = row

# Phase 2: Probe (larger table)
result = []
for row in orders:
    if row.user_id in hash_table:
        result.append((hash_table[row.user_id], row))
return result

Performance:

Build phase:   O(smaller_table_rows)
Probe phase:   O(larger_table_rows)
Total:         O(N + M) - Linear!

Memory requirement:
  Hash table size ≈ smaller_table_rows * average_row_width
  Must fit in work_mem (default 4MB)

When Optimal:

✅ Large tables without indexes
✅ Equi-joins (=, not <, >, !=)
✅ Smaller table fits in work_mem
✅ Both tables scanned anyway (no indexes help)

Example:
- 10,000 users × 100,000 orders
- Build hash on users (smaller)
- Probe with orders
- Time: Seconds (vs hours with Nested Loop)

Memory Spills:

Hash Join  (...) (actual time=1000..5000 ...)
  Buckets: 16384  Batches: 4  Memory Usage: 8192kB
                          ^
                    Spilled to disk! (work_mem too small)

When hash table > work_mem:
  - Splits into batches
  - Writes batches to temp files on disk
  - Slower (disk I/O)

Fix: Increase work_mem
SET work_mem = '256MB';

Merge Join

Concept: Sort both inputs, merge like merge sort.

EXPLAIN ANALYZE
SELECT u.name, o.total
FROM users u
JOIN orders o ON u.id = o.user_id
ORDER BY u.id;

Merge Join  (cost=2000.00..3000.00 rows=10000)
  Merge Cond: (u.id = o.user_id)
  ->  Index Scan using users_pkey on users u
  ->  Sort  (cost=1500.00..1750.00 rows=100000)
        Sort Key: o.user_id
        ->  Seq Scan on orders o

Algorithm:

# Both inputs sorted by join key
users_sorted = sorted(users, key=lambda r: r.id)
orders_sorted = sorted(orders, key=lambda r: r.user_id)

i, j = 0, 0
result = []
while i < len(users_sorted) and j < len(orders_sorted):
    if users_sorted[i].id == orders_sorted[j].user_id:
        result.append((users_sorted[i], orders_sorted[j]))
        j += 1
    elif users_sorted[i].id < orders_sorted[j].user_id:
        i += 1
    else:
        j += 1
return result

Performance:

Best case: Both inputs pre-sorted (indexes)
  Cost: O(N + M) - Linear scan

Worst case: Must sort both inputs
  Cost: O(N log N + M log M + N + M)
  Still better than Nested Loop for large datasets

When Optimal:

✅ Both inputs already sorted (indexes on join keys)
✅ Query includes ORDER BY on join key
✅ Large datasets (competitive with Hash Join)

Example:
SELECT * FROM users u
JOIN orders o ON u.id = o.user_id
ORDER BY u.id;

- users has primary key on id (sorted)
- orders has index on user_id (sorted)
- Merge join reads both in order
- Result already sorted for ORDER BY
- No additional sort step needed!

Join Method Comparison

┌──────────────────────────────────────────────────────────────────────────┐
│                    JOIN METHOD DECISION TREE                             │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Small outer table (<100 rows) + Index on inner?                        │
│     YES → NESTED LOOP                                                    │
│                                                                          │
│  Both inputs pre-sorted?                                                 │
│     YES → MERGE JOIN                                                     │
│                                                                          │
│  Smaller table fits in work_mem?                                        │
│     YES → HASH JOIN                                                      │
│     NO  → MERGE JOIN (or increase work_mem)                             │
│                                                                          │
│  Very large datasets (billions of rows)?                                 │
│     → MERGE JOIN (most stable memory usage)                             │
│                                                                          │
└──────────────────────────────────────────────────────────────────────────┘

Advanced Join Internals

Hash Join: Memory & Partitioning

When the build-side relation exceeds work_mem, PostgreSQL uses Hybrid Hash Join:

Partitioning: Both relations are partitioned into $2^k$ batches based on the hash key.
Batch Processing: Batch 0 is processed in memory. Batches 1..N are spilled to temporary files.
Recursive Processing: Each batch is then loaded and hashed. If a batch is still too large, it is partitioned again.

Performance Tip: If you see Batches: > 1 in EXPLAIN ANALYZE, increasing work_mem can convert the join to a single-pass in-memory operation, often yielding a 5-10x speedup.

Merge Join: The Power of Index-Only Merges

Merge Join is the only algorithm that can return rows in a sorted order without an explicit sort step if the underlying indexes support it.

-- Query benefitting from Merge Join
SELECT * FROM orders o 
JOIN order_items i ON o.id = i.order_id 
ORDER BY o.id;
-- Planner uses Merge Join + Index Scan on both sides.
-- No Sort nodes in the plan!

Real-World Example:

-- Same query, different row counts

-- Scenario 1: 10 users, 100 orders
Nested Loop  ← Optimal (few outer rows, indexed inner)

-- Scenario 2: 10K users, 100K orders, no indexes
Hash Join    ← Optimal (linear time, fits in memory)

-- Scenario 3: 10M users, 100M orders
Merge Join   ← Optimal (stable memory, sorts needed anyway)

Part 5: Aggregation & Sorting

GROUP BY Execution

EXPLAIN ANALYZE
SELECT city, COUNT(*)
FROM users
GROUP BY city;

HashAggregate  (cost=1500.00..1550.00 rows=1000)
  Group Key: city
  ->  Seq Scan on users  (cost=0.00..1000.00 rows=10000)

Two Methods: 1. HashAggregate (default for unsorted input):

# Build hash table
groups = {}
for row in users:
    if row.city not in groups:
        groups[row.city] = {'count': 0}
    groups[row.city]['count'] += 1

return groups.values()

# Memory: O(distinct_groups)
# Time: O(N)

2. GroupAggregate (for sorted input):

# Input sorted by city
sorted_users = sorted(users, key=lambda r: r.city)

results = []
current_city = None
current_count = 0

for row in sorted_users:
    if row.city != current_city:
        if current_city:
            results.append({'city': current_city, 'count': current_count})
        current_city = row.city
        current_count = 1
    else:
        current_count += 1

return results

# Memory: O(1) - Streaming!
# Time: O(N)

When Each is Used:

-- HashAggregate (unsorted input)
SELECT city, COUNT(*) FROM users GROUP BY city;

-- GroupAggregate (sorted by index)
CREATE INDEX users_city_idx ON users(city);
SELECT city, COUNT(*) FROM users GROUP BY city;

GroupAggregate  (cost=0.00..500.00 rows=1000)
  Group Key: city
  ->  Index Only Scan using users_city_idx on users
        ↑ Already sorted!

Memory Issues:

-- Too many groups → HashAggregate spills to disk
HashAggregate  (Batches: 5  Memory Usage: 65536kB)
                     ^^^
                  Disk spills

-- Solution 1: Increase work_mem
SET work_mem = '256MB';

-- Solution 2: Force GroupAggregate with index
CREATE INDEX ON users(city);
-- Now planner uses sorted index scan

Sorting

EXPLAIN ANALYZE
SELECT * FROM users ORDER BY created_at DESC LIMIT 10;

Limit  (cost=500.00..500.02 rows=10)
  ->  Sort  (cost=500.00..525.00 rows=10000)
        Sort Key: created_at DESC
        Sort Method: top-N heapsort  Memory: 25kB
        ->  Seq Scan on users  (cost=0.00..100.00 rows=10000)

Sort Methods: 1. Quicksort (general purpose):

Used when: Sorting all rows
Memory: All rows must fit in work_mem
Time: O(N log N)

2. Top-N Heapsort (for LIMIT):

# Only keep top N rows in heap
heap = []
for row in users:
    if len(heap) < 10:
        heappush(heap, row)
    elif row.created_at > heap[0].created_at:
        heapreplace(heap, row)

return sorted(heap, reverse=True)

# Memory: O(N) where N = LIMIT
# Time: O(M log N) where M = total rows, N = LIMIT

3. External Sort (disk-based):

When: Data > work_mem
Process:
  1. Read work_mem chunks
  2. Sort each chunk
  3. Write to temp files
  4. Merge sorted chunks

Slow! Avoid by increasing work_mem

Checking for Disk Sorts:

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM huge_table ORDER BY created_at;

Sort  (...) (actual time=5000..10000 ...)
  Sort Method: external merge  Disk: 204800kB
                    ^^^^^^^^^^^
                 Spilled to disk!

-- Fix:
SET work_mem = '512MB';
-- Now:
Sort Method: quicksort  Memory: 204800kB
                        ^^^
                     In-memory!

Avoiding Sorts with Indexes:

-- Without index: Full table sort
EXPLAIN SELECT * FROM users ORDER BY created_at;
Sort + Seq Scan

-- With index: No sort needed!
CREATE INDEX users_created_at_idx ON users(created_at);
EXPLAIN SELECT * FROM users ORDER BY created_at;
Index Scan using users_created_at_idx
  ↑ Returns rows in sorted order

Part 6: Advanced Plan Analysis

Nested Loops: Understanding the Multiplier Effect

EXPLAIN ANALYZE
SELECT *
FROM users u
JOIN orders o ON u.id = o.user_id
JOIN products p ON o.product_id = p.id;

Nested Loop  (cost=0.84..50000.00 rows=100)
             (actual time=0.1..2500.5 rows=100 loops=1)
  ->  Nested Loop  (cost=0.42..10000.00 rows=100)
                   (actual time=0.05..500.2 rows=100 loops=1)
        ->  Seq Scan on users u  (cost=0.00..100.00 rows=10)
                                (actual time=0.01..5.0 rows=10 loops=1)
              Filter: (active = true)
              Rows Removed by Filter: 90
        ->  Index Scan on orders o  (cost=0.42..980.00 rows=10)
                                    (actual time=0.02..48.0 rows=10 loops=10)
                                                                   ^^^^^^^^
                                                                Ran 10 times!
              Index Cond: (user_id = u.id)
  ->  Index Scan on products p  (cost=0.42..398.00 rows=1)
                                (actual time=0.01..20.0 rows=1 loops=100)
                                                                ^^^^^^^^
                                                             Ran 100 times!
        Index Cond: (id = o.product_id)

Understanding loops:

loops=1:   Operation ran once
loops=10:  Operation ran 10 times (once per outer row)
loops=100: Operation ran 100 times (once per combined outer rows)

Total cost = node_cost × loops

Example:
- Index Scan on products: cost=398 × loops=100
- Total: 39,800
- If planner expected loops=1 but actual=100 → 100x slower!

Optimizing High-Loop Nested Joins:

-- Problem: Inner scans repeated many times
Nested Loop (loops=10000)
  -> Large outer scan
  -> Index scan repeated 10000 times

-- Solution 1: Switch to Hash Join
SET enable_nestloop = off;
-- Forces planner to consider Hash Join

-- Solution 2: Better WHERE clause selectivity
WHERE users.active = true  -- Reduce outer rows
AND orders.status = 'pending'  -- Reduce inner rows per outer

-- Solution 3: Materialization
CREATE TEMP TABLE filtered_users AS
SELECT * FROM users WHERE active = true;

SELECT * FROM filtered_users u
JOIN orders o ON u.id = o.user_id;
-- Now outer is small → Fewer loops

6.2 Interpreting Buffers and WAL

Using EXPLAIN (ANALYZE, BUFFERS) reveals the I/O cost of your query.

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM large_table WHERE id = 123;
-- Output:
-- Index Scan using large_table_pkey on large_table (actual time=0.05..0.06 rows=1 loops=1)
--   Buffers: shared hit=4 read=1

shared hit: Pages found in the PostgreSQL Buffer Cache (Fast).
shared read: Pages read from the OS/Disk (Slow).
shared dirtied: Pages modified by this query.
shared written: Pages written to disk (checkpointer/bgwriter).

Principal Observation: A high shared read count on the first run that disappears on the second run indicates a “cold cache” problem. If shared hit is always high but the query is still slow, you are likely CPU-bound or facing lock contention.

6.3 Join Tree Shapes: Left-Deep vs. Bushy

PostgreSQL typically generates Left-Deep Trees because they are easier to optimize and allow for pipelined execution (Volcano model). However, for very large joins, a Bushy Tree (joining the results of two joins) might be more efficient, though the search space is much larger.

Subquery Execution

-- Subquery in WHERE
EXPLAIN ANALYZE
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders WHERE total > 1000);

-- Three possible plans:

-- Plan A: Semi Join (efficient)
Hash Semi Join  (cost=500.00..1000.00 rows=100)
  Hash Cond: (users.id = orders.user_id)
  ->  Seq Scan on users
  ->  Hash
        ->  Seq Scan on orders
              Filter: (total > 1000)

-- Plan B: Subplan (inefficient)
Seq Scan on users  (cost=0.00..1000000.00 rows=100)
  Filter: (hashed SubPlan 1)
  SubPlan 1
    ->  Seq Scan on orders  (cost=0.00..500.00)
          Filter: (total > 1000)
          ↑ Subquery executed ONCE, results hashed

-- Plan C: Correlated Subplan (catastrophic)
Seq Scan on users  (cost=0.00..10000000.00 rows=100)
  Filter: (SubPlan 1)
  SubPlan 1
    ->  Seq Scan on orders  (cost=0.00..500.00 rows=1)
          Filter: ((user_id = users.id) AND (total > 1000))
          ↑ Subquery executed FOR EACH USER!

Rewriting Correlated Subqueries:

-- BAD: Correlated subquery
SELECT u.name
FROM users u
WHERE EXISTS (
    SELECT 1 FROM orders o
    WHERE o.user_id = u.id AND o.total > 1000
);
-- SubPlan executed once per user

-- GOOD: Semi Join
SELECT u.name
FROM users u
WHERE u.id IN (
    SELECT user_id FROM orders WHERE total > 1000
);
-- Hash Semi Join (single hash table)

-- BETTER: Explicit JOIN
SELECT DISTINCT u.name
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.total > 1000;
-- Hash Join (most efficient)

CTEs: Optimization Fence (Pre-PostgreSQL 12)

-- PostgreSQL 11 and earlier:
WITH expensive_cte AS (
    SELECT * FROM huge_table WHERE complex_condition
)
SELECT * FROM expensive_cte WHERE simple_condition;

-- Plan (PostgreSQL 11):
CTE Scan on expensive_cte
  Filter: simple_condition
  CTE expensive_cte
    ->  Seq Scan on huge_table
          Filter: complex_condition

-- Problem: CTE materialized BEFORE outer WHERE applied
-- Result: Processes all huge_table rows even if only need few

-- PostgreSQL 12+: Inlines CTEs
Seq Scan on huge_table
  Filter: (complex_condition AND simple_condition)
  ↑ Optimized! Applies both filters together

-- Force materialization if needed:
WITH expensive_cte AS MATERIALIZED (...)

Part 7: Cross-Database Comparison

PostgreSQL vs MySQL vs SQL Server

Getting Execution Plans:

-- PostgreSQL
EXPLAIN ANALYZE
SELECT ...;

-- MySQL
EXPLAIN FORMAT=JSON
SELECT ...;

-- SQL Server
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
GO
SELECT ...;
GO

Plan Output Format Comparison: PostgreSQL (Tree, bottom-up):

Nested Loop
  ->  Seq Scan on users
  ->  Index Scan on orders

MySQL (Table, top-down):

{
  "query_block": {
    "select_id": 1,
    "nested_loop": [
      {"table": "users", "access_type": "ALL"},
      {"table": "orders", "access_type": "ref", "key": "user_id_idx"}
    ]
  }
}

SQL Server (Graphical + XML):

<ShowPlanXML>
  <QueryPlan>
    <RelOp NodeId="0" LogicalOp="Inner Join" PhysicalOp="Nested Loops">
      <RelOp NodeId="1" PhysicalOp="Clustered Index Scan">
        <Object Table="users" />
      </RelOp>
      <RelOp NodeId="2" PhysicalOp="Index Seek">
        <Object Table="orders" Index="IX_user_id" />
      </RelOp>
    </RelOp>
  </QueryPlan>
</ShowPlanXML>

Scan Methods: Terminology Differences

Concept	PostgreSQL	MySQL	SQL Server
Full table read	Seq Scan	Table Scan	Table Scan / Clustered Index Scan
Index-based read	Index Scan	Index Range Scan	Index Seek
Read index only	Index Only Scan	Covering Index	Index Seek (with includes)
Multiple index combo	Bitmap Index Scan	Index Merge	Index Intersection

Example: Finding via Index PostgreSQL:

Index Scan using users_email_idx on users
  Index Cond: (email = 'alice@example.com')

MySQL:

{
  "table": "users",
  "access_type": "ref",
  "key": "users_email_idx",
  "key_len": "767",
  "rows_examined": 1
}

SQL Server:

<IndexSeek>
  <Object Database="mydb" Schema="dbo" Table="users" Index="IX_users_email"/>
  <SeekPredicates>
    <SeekKeys>
      <Prefix ScanType="EQ">
        <RangeColumns>
          <ColumnReference Column="email"/>
        </RangeColumns>
        <RangeExpressions>
          <ScalarOperator>
            <Const ConstValue="'alice@example.com'"/>
          </ScalarOperator>
        </RangeExpressions>
      </Prefix>
    </SeekKeys>
  </SeekPredicates>
</IndexSeek>

Join Algorithms

PostgreSQL: Nested Loop, Hash Join, Merge Join MySQL:

Nested Loop Join (only method until MySQL 8.0.18)
Hash Join (MySQL 8.0.18+)
Block Nested Loop (optimization of Nested Loop)

SQL Server:

Nested Loops
Hash Match
Merge Join
Adaptive Join (SQL Server 2017+, chooses at runtime!)

Example: Hash Join PostgreSQL:

Hash Join  (cost=500..1000 rows=100)
  Hash Cond: (o.user_id = u.id)
  ->  Seq Scan on orders o
  ->  Hash
        ->  Seq Scan on users u

MySQL 8.0.18+:

{
  "query_block": {
    "nested_loop": [
      {
        "table": "u",
        "access_type": "ALL"
      },
      {
        "hash_join": {
          "access_type": "hash",
          "join_condition": "o.user_id = u.id",
          "table": "o"
        }
      }
    ]
  }
}

SQL Server:

<Hash Match HashKeysBuild="u.id" HashKeysProbe="o.user_id">
  <RelOp PhysicalOp="Table Scan">
    <Object Table="users" Alias="u"/>
  </RelOp>
  <RelOp PhysicalOp="Table Scan">
    <Object Table="orders" Alias="o"/>
  </RelOp>
</Hash Match>

Cost Metrics

PostgreSQL:

cost=0.42..8.44
   ^^^^    ^^^
 startup  total

Abstract units
Based on page costs (seq_page_cost, random_page_cost)
Relative comparison only

MySQL:

{
  "cost_info": {
    "read_cost": "10.50",
    "eval_cost": "2.00",
    "prefix_cost": "12.50",
    "data_read_per_join": "8K"
  }
}

Cost in “cost units” (not milliseconds)
prefix_cost = cumulative cost
Often inaccurate in older versions

SQL Server:

<RelOp EstimatedTotalSubtreeCost="0.0065" EstimatedCPUCost="0.0001" EstimatedIOCost="0.003">

Breaks down CPU vs I/O
More granular than PostgreSQL
Shows actual vs estimated

Actual Execution Stats

PostgreSQL:

EXPLAIN (ANALYZE, BUFFERS)
...
(actual time=0.025..0.027 rows=1 loops=1)
Buffers: shared hit=4 read=0

actual time: milliseconds
Buffers: shows cache hits vs disk reads

MySQL:

EXPLAIN FORMAT=TREE
SELECT ...;

-> Nested loop inner join
    -> Table scan on u (actual time=0.1..10.5 rows=100)
    -> Index lookup on o (actual time=0.01..0.02 rows=5)

MySQL 8.0.18+ shows actual times in TREE format
Older versions: No actual execution stats in EXPLAIN

SQL Server:

SET STATISTICS TIME ON;
SET STATISTICS IO ON;

SELECT ...;

-- Output:
SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 1 ms.
SQL Server Execution Times:
   CPU time = 15 ms, elapsed time = 42 ms.

Table 'users'. Scan count 1, logical reads 5, physical reads 0.
Table 'orders'. Scan count 100, logical reads 500, physical reads 10.

Separate timing output
Detailed I/O stats per table

Part 8: Hands-On Practice (20 Exercises)

Exercise 1: Basic Scan Analysis

Setup:

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name TEXT,
    category TEXT,
    price DECIMAL(10, 2),
    stock INT
);

INSERT INTO products
SELECT
    i,
    'Product ' || i,
    (ARRAY['Electronics', 'Clothing', 'Food'])[1 + mod(i, 3)],
    random() * 1000,
    floor(random() * 100)
FROM generate_series(1, 100000) i;

Query:

EXPLAIN ANALYZE
SELECT * FROM products WHERE price > 500;

Questions:

What scan method is used?
What’s the estimated vs actual row count?
How many rows were filtered?
What’s the execution time?

Expected Answer:

Seq Scan on products  (cost=0.00..2834.00 rows=50000 width=...)
                      (actual time=0.015..25.123 rows=49823 loops=1)
  Filter: (price > '500'::numeric)
  Rows Removed by Filter: 50177
Planning Time: 0.142 ms
Execution Time: 27.456 ms

Analysis:
1. Seq Scan (no index on price)
2. Estimate: 50,000 rows; Actual: 49,823 (very close!)
3. Filtered: 50,177 rows
4. Time: ~27ms

Optimization Challenge:

-- Add index
CREATE INDEX products_price_idx ON products(price);

-- Re-run
EXPLAIN ANALYZE
SELECT * FROM products WHERE price > 500;

-- Still Seq Scan! Why?
-- Answer: Query returns ~50% of rows
-- Seq Scan cheaper than 50,000 index lookups

-- Try more selective query:
EXPLAIN ANALYZE
SELECT * FROM products WHERE price > 900;

-- Now Index Scan!
Bitmap Heap Scan on products  (cost=542.84..2156.12 rows=9952)
  Recheck Cond: (price > '900'::numeric)
  ->  Bitmap Index Scan on products_price_idx

Exercise 2: Join Method Investigation

Setup:

CREATE TABLE customers (
    id SERIAL PRIMARY KEY,
    name TEXT,
    city TEXT
);

CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    total DECIMAL(10, 2)
);

INSERT INTO customers
SELECT i, 'Customer ' || i, (ARRAY['NYC', 'LA', 'Chicago'])[1 + mod(i, 3)]
FROM generate_series(1, 10000) i;

INSERT INTO orders
SELECT
    i,
    1 + floor(random() * 10000)::INT,
    CURRENT_DATE - floor(random() * 365)::INT,
    random() * 1000
FROM generate_series(1, 100000) i;

ANALYZE customers, orders;

Query 1: Small result set:

EXPLAIN ANALYZE
SELECT c.name, o.total
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE c.id = 42;

Expected:

Nested Loop  (cost=0.84..120.00 rows=10)
  ->  Index Scan using customers_pkey on customers c
        Index Cond: (id = 42)
  ->  Seq Scan on orders o
        Filter: (customer_id = 42)

Why Nested Loop?
- Outer: 1 row (highly selective)
- Inner: Sequential scan (no index on customer_id yet)
- Only scans orders once

Optimization:

CREATE INDEX orders_customer_id_idx ON orders(customer_id);

EXPLAIN ANALYZE
SELECT c.name, o.total
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE c.id = 42;

-- Now:
Nested Loop  (cost=0.84..25.00 rows=10)
  ->  Index Scan using customers_pkey on customers c
  ->  Index Scan using orders_customer_id_idx on orders o
        Index Cond: (customer_id = 42)

-- Much faster! Index on inner relation

Query 2: Large result set:

EXPLAIN ANALYZE
SELECT c.name, COUNT(o.id)
FROM customers c
JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name;

Expected:

Hash Join  (cost=3000.00..8000.00 rows=10000)
  Hash Cond: (o.customer_id = c.id)
  ->  Seq Scan on orders o
  ->  Hash
        ->  Seq Scan on customers c

Why Hash Join?
- Large result set (all customers)
- Both tables fully scanned
- Hash Join optimal for this pattern

Exercise 3: Aggregation Performance

Query:

EXPLAIN ANALYZE
SELECT category, AVG(price), COUNT(*)
FROM products
GROUP BY category;

Without Index:

HashAggregate  (cost=2834.00..2834.03 rows=3)
  Group Key: category
  ->  Seq Scan on products

With Index:

CREATE INDEX products_category_idx ON products(category);

EXPLAIN ANALYZE
SELECT category, AVG(price), COUNT(*)
FROM products
GROUP BY category;

-- Now:
GroupAggregate  (cost=0.42..5000.00 rows=3)
  Group Key: category
  ->  Index Only Scan using products_category_idx on products

-- Streaming aggregation (less memory)

Exercise 4: Subquery Optimization

Bad Query:

EXPLAIN ANALYZE
SELECT name, price
FROM products
WHERE id IN (
    SELECT product_id
    FROM order_items
    WHERE quantity > 10
);

If Subquery Correlated (bad):

Seq Scan on products
  Filter: (SubPlan 1)
  SubPlan 1
    ->  Seq Scan on order_items
          Filter: ((product_id = products.id) AND (quantity > 10))
-- Disaster! Subquery runs for EACH product

Optimized (Semi Join):

Hash Semi Join  (cost=1000.00..2000.00 rows=500)
  Hash Cond: (products.id = order_items.product_id)
  ->  Seq Scan on products
  ->  Hash
        ->  Seq Scan on order_items
              Filter: (quantity > 10)
-- Single hash table, much faster

Alternative Rewrite:

SELECT DISTINCT p.name, p.price
FROM products p
JOIN order_items oi ON p.id = oi.product_id
WHERE oi.quantity > 10;

-- Hash Join (even better)

Exercise 5: Sorting vs Index

Query 1: Sort Required:

EXPLAIN ANALYZE
SELECT * FROM products
ORDER BY price DESC
LIMIT 10;

Without Index:

Limit  (cost=5250.00..5250.03 rows=10)
  ->  Sort  (cost=5250.00..5500.00 rows=100000)
        Sort Key: price DESC
        Sort Method: top-N heapsort  Memory: 25kB
        ->  Seq Scan on products

-- Top-N heapsort (efficient for LIMIT)
-- Only keeps top 10 in memory

With Index:

CREATE INDEX products_price_desc_idx ON products(price DESC);

EXPLAIN ANALYZE
SELECT * FROM products
ORDER BY price DESC
LIMIT 10;

-- Now:
Limit  (cost=0.42..0.85 rows=10)
  ->  Index Scan using products_price_desc_idx on products

-- No sort needed! Index already sorted DESC
-- Just read first 10 entries

Part 9: Production Debugging Scenarios

Scenario 1: Unexpectedly Slow Query

Symptom:

-- This query suddenly became slow (was fast last week)
SELECT * FROM orders WHERE status = 'pending';

-- Takes 30 seconds now (was 100ms before)

Diagnosis:

EXPLAIN ANALYZE
SELECT * FROM orders WHERE status = 'pending';

Seq Scan on orders  (cost=0.00..50000.00 rows=1000000)
                    (actual time=0.01..15234.56 rows=999500)
  Filter: (status = 'pending'::text)
  Rows Removed by Filter: 500

-- Problem: Index exists but not used!
-- Estimated 1,000,000 rows, actual 999,500 rows

Root Cause:

-- Check index
\d orders
-- Index "orders_status_idx" BTREE (status)

-- Check statistics
SELECT
    attname,
    n_distinct,
    most_common_vals,
    most_common_freqs
FROM pg_stats
WHERE tablename = 'orders' AND attname = 'status';

-- Result:
n_distinct: 3  (only 3 distinct values)
most_common_vals: {pending, completed, cancelled}
most_common_freqs: {0.9995, 0.0003, 0.0002}

-- Ah! "pending" is 99.95% of rows
-- Planner correctly chose Seq Scan (index would be slower)

Why it Changed:

Last week:
- 1M orders, 10K pending (1%) → Index Scan
- Fast!

This week:
- 1M orders, 999.5K pending (99.95%) → Seq Scan
- Slow (but optimal given data distribution)

Solutions:

-- Solution 1: Index on selective status
CREATE INDEX orders_completed_idx ON orders(id)
WHERE status = 'completed';

-- Now:
SELECT * FROM orders WHERE status = 'completed';
-- Uses partial index (fast)

SELECT * FROM orders WHERE status = 'pending';
-- Seq Scan (correct, most rows)

-- Solution 2: Partition table
CREATE TABLE orders_pending (LIKE orders);
CREATE TABLE orders_completed (LIKE orders);

-- Move old completed orders out of main table

Scenario 2: Join Performance Regression

Symptom:

-- Join query degraded from 1s to 60s
SELECT u.name, COUNT(o.id)
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;

Diagnosis:

EXPLAIN (ANALYZE, BUFFERS)
SELECT u.name, COUNT(o.id)
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;

Hash Left Join  (cost=5000.00..100000.00 rows=10000)
                (actual time=1000..58234 rows=10000 loops=1)
  Hash Cond: (u.id = o.user_id)
  Buffers: shared hit=50000 read=150000
          ^^^^^^^^^^^^^^^^^^
       Massive disk reads!
  ->  Seq Scan on users u
  ->  Hash  (cost=3000.00..3000.00 rows=5000000)
            (actual time=50000..50000 rows=5000000 loops=1)
        Buckets: 32768  Batches: 64  Memory Usage: 4096kB
                                ^^^
                         Spilled to disk!
        ->  Seq Scan on orders o

Root Cause:

- Orders table grew from 1M to 5M rows
- Hash table > work_mem (4MB default)
- Spilled to disk in 64 batches
- Each batch: Read from disk, hash, probe
- Result: 60x slowdown

Solution:

-- Temporary fix: Increase work_mem for this session
SET work_mem = '256MB';

-- Verify:
EXPLAIN (ANALYZE, BUFFERS)
SELECT u.name, COUNT(o.id)
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;

Hash Left Join  (...)
  Hash  (...)
        Buckets: 32768  Batches: 1  Memory Usage: 256000kB
                                ^
                          In-memory now!
-- Back to 1s execution time

-- Permanent fix: Increase globally
ALTER DATABASE mydb SET work_mem = '256MB';

-- OR: Index optimization
CREATE INDEX orders_user_id_covering_idx
ON orders(user_id) INCLUDE (id);

-- May allow Merge Join instead of Hash Join

Scenario 3: Cardinality Mis-estimation

Symptom:

-- Query uses wrong join method
SELECT *
FROM products p
JOIN categories c ON p.category_id = c.id
WHERE p.price > 100 AND c.active = true;

-- Expected: Few rows
-- Actual: Takes forever

Diagnosis:

EXPLAIN ANALYZE
SELECT *
FROM products p
JOIN categories c ON p.category_id = c.id
WHERE p.price > 100 AND c.active = true;

Nested Loop  (cost=0.00..5000.00 rows=10)
             (actual time=0.1..15000.5 rows=50000 loops=1)
  ->  Seq Scan on categories c  (cost=0.00..10.00 rows=1)
                                (actual time=0.01..0.15 rows=5 loops=1)
        Filter: (active = true)
  ->  Index Scan on products p  (cost=0.42..490.00 rows=10)
                                (actual time=0.02..2800.0 rows=10000 loops=5)
                                                                     ^^^^^^^
                                                                  Ran 5 times!
        Index Cond: (category_id = c.id)
        Filter: (price > '100'::numeric)

-- Problem: Planner thought 10 rows, actually 50,000
-- Chose Nested Loop → Disaster (50K index scans)

Root Cause:

-- Check statistics
SELECT
    tablename,
    attname,
    n_distinct,
    correlation
FROM pg_stats
WHERE tablename = 'products' AND attname IN ('price', 'category_id');

-- price column:
n_distinct: 1000  (Actually 10,000 distinct values)
correlation: 0.1  (Outdated)

-- Statistics stale!

Solution:

-- Update statistics
ANALYZE products;

-- Increase statistics detail for important columns
ALTER TABLE products ALTER COLUMN price SET STATISTICS 1000;
ALTER TABLE products ALTER COLUMN category_id SET STATISTICS 1000;
ANALYZE products;

-- Re-run query:
Hash Join  (cost=1000.00..8000.00 rows=50000)
  Hash Cond: (p.category_id = c.id)
  ->  Seq Scan on products p
        Filter: (price > '100'::numeric)
  ->  Hash
        ->  Seq Scan on categories c
              Filter: (active = true)

-- Now chooses Hash Join (correct!)
-- Execution time: 1s (was 15s)

Part 10: Best Practices & Checklist

Pre-Production Query Review Checklist

Before deploying a new query to production:

□ Run EXPLAIN ANALYZE on production-sized data
  - Use realistic data volumes
  - Test with actual distribution (not uniform test data)

□ Check for Seq Scans on large tables
  - Acceptable if: Returns >5% of rows
  - Red flag if: Returns <1% of rows
  - Fix: Add appropriate index

□ Verify row estimates are reasonable
  - If actual/estimated > 10x: Run ANALYZE
  - If still off: Consider extended statistics

□ Check for memory spills
  - Look for "Batches > 1" in Hash nodes
  - Look for "external merge" in Sort nodes
  - Fix: Increase work_mem or reduce data

□ Look for high loop counts in joins
  - Nested Loop with loops=10000+ → Consider Hash Join
  - Fix: Add index or rewrite query

□ Validate ORDER BY optimization
  - Sort node present? Consider index
  - LIMIT present? Ensure Top-N heap sort

□ Test with concurrent load
  - Queries fast alone but slow under load?
  - May indicate lock contention or cache pressure

□ Set reasonable statement timeout
  SET statement_timeout = '30s';
  - Prevents runaway queries

Performance Tuning Parameters

work_mem (per query, per operation):

-- Default: 4MB (often too small)
-- Affects: Hash joins, sorts, bitmap operations

-- Finding optimal value:
-- 1. Run EXPLAIN (ANALYZE, BUFFERS)
-- 2. Check for "Batches > 1" or "external merge"
-- 3. Increase work_mem gradually

-- For specific query:
SET work_mem = '256MB';

-- Globally (careful! per operation, can multiply):
ALTER DATABASE mydb SET work_mem = '64MB';

-- Caution: 10 concurrent queries × 5 hash joins each × 64MB = 3.2GB memory!

effective_cache_size (hint to planner):

-- Tells planner how much RAM available for caching
-- Doesn't allocate memory, just informs decisions

-- Rule of thumb: 50-75% of system RAM
-- 16GB RAM server:
ALTER SYSTEM SET effective_cache_size = '12GB';

-- Makes planner favor index scans (knows data likely cached)

random_page_cost:

-- Default: 4.0 (assumes spinning disk)
-- For SSD: 1.1 - 1.5 (random ≈ sequential)

-- SSDs:
ALTER SYSTEM SET random_page_cost = 1.1;
SELECT pg_reload_conf();

-- Makes planner favor index scans on SSDs

Statistics targets:

-- Default: 100 (samples ~3000 rows)
-- For important columns: 1000+ (samples ~30,000 rows)

ALTER TABLE products ALTER COLUMN category_id SET STATISTICS 1000;
ANALYZE products;

-- Better histograms → Better estimates → Better plans

Common Pitfalls

1. OFFSET for pagination:

-- BAD: OFFSET scales linearly
SELECT * FROM products
ORDER BY id
LIMIT 10 OFFSET 100000;
-- Must scan 100,010 rows, return 10

-- GOOD: Keyset pagination
SELECT * FROM products
WHERE id > 100000
ORDER BY id
LIMIT 10;
-- Scans 10 rows via index

2. OR in WHERE clause:

-- BAD: Often forces Seq Scan
SELECT * FROM users
WHERE email = 'alice@example.com' OR phone = '555-1234';
-- Planner can't use indexes efficiently

-- GOOD: UNION ALL
SELECT * FROM users WHERE email = 'alice@example.com'
UNION ALL
SELECT * FROM users WHERE phone = '555-1234' AND email != 'alice@example.com';
-- Uses both indexes

3. Function calls prevent index use:

-- BAD: Can't use index on email
SELECT * FROM users WHERE lower(email) = 'alice@example.com';

-- GOOD: Expression index
CREATE INDEX users_lower_email_idx ON users(lower(email));
-- Now index can be used

-- OR: Store normalized
ALTER TABLE users ADD COLUMN email_lower TEXT;
UPDATE users SET email_lower = lower(email);
CREATE INDEX users_email_lower_idx ON users(email_lower);

Summary

You’ve now mastered:

✅ Reading execution plans fluently
✅ Understanding cost calculations and row estimates
✅ Identifying performance bottlenecks (scans, joins, sorts)
✅ Optimizing queries systematically
✅ Debugging production issues
✅ Cross-database plan comparisons (PostgreSQL, MySQL, SQL Server)

Key Takeaways:

Always EXPLAIN ANALYZE before optimizing
Row estimates matter - stale stats cause bad plans
Indexes aren’t magic - wrong query, wrong index, or high selectivity = no help
Join method matters - Nested Loop vs Hash vs Merge depends on data
Memory spills kill performance - tune work_mem appropriately
Think like the planner - predict its choices, guide it

Overview

Testing & Code Quality

Crash Courses

AI Engineering

Math for ML - Understanding Linear Algebra

Probability & Statistics for ML

Math for ML - Understanding Calculus

ML Mastery

Deep Learning Mastery

NestJS Mastery

Microservices Mastery

Low Level Design

OOP Concepts

SOLID Principles

Design Patterns

LLD Case Studies

System Design (HLD)

Senior Level (L5+/Staff)

HLD Case Studies

Engineering Fundamentals

DevOps & Operations

Azure Cloud Engineering

AWS Cloud

AWS Monitoring & Observability

AWS Security Services

AWS Serverless

AWS Operations

AWS Advanced

AWS Case Studies

GCP Cloud Engineering

DevOps Tools

Database Engineering

HIPAA Compliance Mastery

Operating Systems

Linux Internals

Distributed Systems

Networking Mastery

Build Your Own X

Go Lang Mastery

C Programming

Classic Research Papers

Distributed System Tools

​Query Execution Plans Mastery

​Why Execution Plans Matter

​Part 1: Execution Plan Fundamentals

​What is an Execution Plan?

​The Two Commands: EXPLAIN vs EXPLAIN ANALYZE

​Understanding the Output Format

​Part 2: Decoding the Numbers

​Cost Explained

​Startup Cost vs Total Cost

​Row Estimates vs Actual Rows

​Width

​Part 3: Scan Methods Deep Dive

​Sequential Scan (Seq Scan)

​Index Scan

​Index Only Scan

​Bitmap Index Scan

​Part 4: Join Methods Explained

​Nested Loop Join

​Hash Join

​Merge Join

​Join Method Comparison

​Advanced Join Internals

​Hash Join: Memory & Partitioning

​Merge Join: The Power of Index-Only Merges

​Part 5: Aggregation & Sorting

​GROUP BY Execution

​Sorting

​Part 6: Advanced Plan Analysis

​Nested Loops: Understanding the Multiplier Effect

​6.2 Interpreting Buffers and WAL

​6.3 Join Tree Shapes: Left-Deep vs. Bushy

​Subquery Execution

​CTEs: Optimization Fence (Pre-PostgreSQL 12)

​Part 7: Cross-Database Comparison

​PostgreSQL vs MySQL vs SQL Server

​Scan Methods: Terminology Differences

​Join Algorithms

​Cost Metrics

Query Execution Plans Mastery

Why Execution Plans Matter

Part 1: Execution Plan Fundamentals

What is an Execution Plan?

The Two Commands: EXPLAIN vs EXPLAIN ANALYZE

Understanding the Output Format

Part 2: Decoding the Numbers

Cost Explained

Startup Cost vs Total Cost

Row Estimates vs Actual Rows

Width

Part 3: Scan Methods Deep Dive

Sequential Scan (Seq Scan)

Index Scan

Index Only Scan

Bitmap Index Scan

Part 4: Join Methods Explained

Nested Loop Join

Hash Join

Merge Join

Join Method Comparison

Advanced Join Internals

Hash Join: Memory & Partitioning

Merge Join: The Power of Index-Only Merges

Part 5: Aggregation & Sorting

GROUP BY Execution

Sorting

Part 6: Advanced Plan Analysis

Nested Loops: Understanding the Multiplier Effect

6.2 Interpreting Buffers and WAL

6.3 Join Tree Shapes: Left-Deep vs. Bushy

Subquery Execution

CTEs: Optimization Fence (Pre-PostgreSQL 12)

Part 7: Cross-Database Comparison

PostgreSQL vs MySQL vs SQL Server

Scan Methods: Terminology Differences

Join Algorithms

Cost Metrics

Actual Execution Stats

Part 8: Hands-On Practice (20 Exercises)

Exercise 1: Basic Scan Analysis

Exercise 2: Join Method Investigation

Exercise 3: Aggregation Performance

Exercise 4: Subquery Optimization

Exercise 5: Sorting vs Index

Part 9: Production Debugging Scenarios

Scenario 1: Unexpectedly Slow Query

Scenario 2: Join Performance Regression

Scenario 3: Cardinality Mis-estimation

Part 10: Best Practices & Checklist

Pre-Production Query Review Checklist

Performance Tuning Parameters

Common Pitfalls

Summary

What’s Next?