Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Indexing Deep Dive - B-tree structure and index types

Module 4: Indexing Deep Dive

Indexing is the single most impactful skill for database performance. A well-placed index can turn a query from minutes to milliseconds. This module gives you deep understanding of how indexes work and when to use them.
Estimated Time: 12-15 hours
Hands-On: Index analysis labs with real query plans
Key Skill: Knowing exactly which index to create for any query

4.1 How Indexes Work: Understanding the Fundamentals

What is an Index? (The Big Picture)

Real-World Analogy: Book Index vs. Table of Contents Think of a database table like a book:
  • Without an index: To find a topic, you’d read every page from start to finish (full table scan)
  • With an index: You look at the index at the back, find the page number, and jump directly there
Database Index = Book Index:
  • The index is a separate structure that stores values in sorted order
  • Each value points to where the actual data lives (like page numbers)
  • Finding data becomes much faster!
Another Analogy: Phone Book A phone book is essentially an index:
  • Names are sorted alphabetically (the index)
  • Each name points to a phone number and address (the data)
  • To find “Smith, John”, you don’t read every entry - you jump to the “S” section!
In Databases:
  • An index is a separate data structure that stores column values in sorted order
  • Each value points to the actual row location (like a page number)
  • Queries can use the index to find rows quickly instead of scanning everything

Why Do We Need Indexes?

The Performance Problem: Imagine a table with 10 million users:
users table (10,000,000 rows):
┌────┬──────────┬─────────────────────┐
│ id │   name   │       email         │
├────┼──────────┼─────────────────────┤
│  1 │ Alice    │ alice@example.com   │
│  2 │ Bob      │ bob@example.com    │
│  3 │ Charlie  │ charlie@example.com│
│ ...│ ...      │ ...                 │
│ 10M│ Zoe      │ zoe@example.com    │
└────┴──────────┴─────────────────────┘
Query: Find user with email = ‘john@example.com Without Index (Full Table Scan):
  1. Database starts at row 1
  2. Checks: Is email = ‘john@example.com’? No → Move to row 2
  3. Checks: Is email = ‘john@example.com’? No → Move to row 3
  4. … continues checking all 10 million rows …
  5. Finally finds it at row 5,234,567
  6. Time: 5-10 seconds (must check millions of rows!)
With Index:
  1. Database looks at the index (sorted by email)
  2. Uses binary search to find ‘john@example.com
  3. Finds it in ~25 comparisons (log₂(10,000,000) ≈ 24)
  4. Index says: “The row is at location X”
  5. Database jumps directly to location X
  6. Time: 1-5 milliseconds (only ~25 comparisons!)
The Difference:
  • Without index: O(n) - Linear time (check every row)
  • With index: O(log n) - Logarithmic time (binary search)
For 10 million rows:
  • Without index: Up to 10 million checks
  • With index: ~25 checks

The Problem: Full Table Scans

Without indexes, every query must scan every row:
-- Without index: scans 10 million rows
SELECT * FROM users WHERE email = 'john@example.com';
-- Time: 5-10 seconds

-- With index: jumps directly to the row
-- Time: 1-5 milliseconds
┌─────────────────────────────────────────────────────────────────────────┐
│                    FULL TABLE SCAN vs INDEX LOOKUP                      │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  SEQUENTIAL SCAN (No Index)              B-TREE INDEX LOOKUP            │
│  ───────────────────────────             ─────────────────              │
│                                                                         │
│  ┌─────┬─────┬─────┬─────┐              ┌─────────────────┐            │
│  │Row 1│Row 2│Row 3│ ... │              │    Root Node    │            │
│  └──┬──┴──┬──┴──┬──┴─────┘              │   [M values]    │            │
│     │     │     │                        └────────┬────────┘            │
│     ▼     ▼     ▼                                 │                     │
│   Check Check Check                       ┌───────┴───────┐             │
│   each  each  each                        ▼               ▼             │
│   row   row   row                    ┌─────────┐    ┌─────────┐        │
│     │     │     │                    │ Branch  │    │ Branch  │        │
│     ▼     ▼     ▼                    └────┬────┘    └────┬────┘        │
│  10 million comparisons                   │              │              │
│                                           ▼              ▼              │
│  O(n) - Linear time              ┌──────────┐    ┌──────────┐          │
│                                  │   Leaf   │    │   Leaf   │          │
│                                  │ (Target!)│    │          │          │
│                                  └──────────┘    └──────────┘          │
│                                                                         │
│                                  O(log n) - Logarithmic time            │
│                                  ~25 comparisons for 10M rows           │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Understanding B-Tree: The Most Common Index Type

What is a B-Tree? (Before the technical details) A B-Tree is like a multi-level directory structure: Real-World Analogy: Library Organization Think of a library:
  • Top level (Root): “Books A-H” go left, “Books I-P” go middle, “Books Q-Z” go right
  • Middle level (Branches): “Books A-C” go left, “Books D-F” go middle, “Books G-H” go right
  • Bottom level (Leaves): Actual books sorted alphabetically
To find “Database Design” book:
  1. Root: “D” is in “A-H” section → Go left
  2. Branch: “D” is in “D-F” section → Go middle
  3. Leaf: Find “Database Design” → Done!
In Databases:
  • Root node: Contains ranges that point to branch nodes
  • Branch nodes: Contain ranges that point to leaf nodes
  • Leaf nodes: Contain actual values with pointers to table rows
Why B-Tree?
  • Balanced: All paths from root to leaves are the same length (guaranteed fast lookups)
  • Sorted: Values are in order (enables range queries like “find all prices between 10and10 and 20”)
  • Wide: Each node can hold many values (reduces tree height, fewer disk reads)
Visual Example (Simplified): Imagine indexing user IDs from 1 to 100:
                    [Root: 50]
                   /          \
          [Branch: 25]      [Branch: 75]
          /        \         /        \
    [1-12] [13-24] [26-37] [38-49] [51-62] [63-74] [76-87] [88-100]
     ↓       ↓       ↓       ↓       ↓       ↓       ↓        ↓
   Actual user rows with those IDs
To find user ID 23:
  1. Root: 23 < 50 → Go left
  2. Branch: 23 > 25? No, 23 is in left branch → Go to [13-24] leaf
  3. Leaf: Find 23 → Get pointer to actual row
  4. Total: 3 steps (instead of checking all 100 rows!)

B-Tree Index Structure

PostgreSQL’s default index type. Perfect for equality and range queries. Key Properties Explained:
  1. Balanced: All leaf nodes are at the same depth
    • Why? Ensures every lookup takes the same number of steps
    • Example: Finding ID 1 takes same steps as finding ID 1,000,000
  2. Sorted: Values are stored in order
    • Why? Enables range queries (“find all orders between Jan 1 and Jan 31”)
    • Example: Leaf nodes are linked, so you can scan ranges efficiently
  3. Dense: Every value in the table has an entry in the index
    • Why? Can find any value quickly
    • Trade-off: Index takes up space (but it’s worth it for speed!)
  4. Multi-level: Tree has multiple levels (root → branches → leaves)
    • Why? Keeps tree height small (fewer disk reads)
    • Example: 10 million rows might only need 3-4 levels
B-Tree Index Structure Visualization
┌─────────────────────────────────────────────────────────────────────────┐
│                         B-TREE STRUCTURE                                │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│                          ┌─────────────────────┐                        │
│                          │   [30]    [60]      │  Root Node             │
│                          └──┬─────┬─────┬──────┘                        │
│                             │     │     │                               │
│              ┌──────────────┘     │     └──────────────┐                │
│              ▼                    ▼                    ▼                │
│     ┌──────────────┐     ┌──────────────┐     ┌──────────────┐         │
│     │ [10] [20]    │     │ [40] [50]    │     │ [70] [80]    │         │
│     └──┬──┬──┬─────┘     └──┬──┬──┬─────┘     └──┬──┬──┬─────┘         │
│        │  │  │              │  │  │              │  │  │                │
│        ▼  ▼  ▼              ▼  ▼  ▼              ▼  ▼  ▼                │
│   Leaf Nodes: Contain actual data pointers (ctid: page, offset)        │
│                                                                         │
│   ┌─────────────────────────────────────────────────────────────┐      │
│   │  [5,8,10] ──▶ [12,15,20] ──▶ [25,28,30] ──▶ ...            │      │
│   └─────────────────────────────────────────────────────────────┘      │
│   Leaf nodes are linked for efficient range scans                      │
│                                                                         │
│   Properties:                                                           │
│   • Balanced: All leaves at same depth                                  │
│   • Sorted: Keys in order (enables range queries)                       │
│   • Dense: Each leaf contains multiple keys                            │
│   • Linked: Leaves form a linked list                                  │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

What Indexes Support

-- B-Tree supports these operators:
-- <, <=, =, >=, >, BETWEEN, IN, IS NULL, IS NOT NULL

-- Equality lookup: O(log n)
SELECT * FROM users WHERE email = 'john@example.com';

-- Range scan: O(log n + k) where k is result size
SELECT * FROM orders WHERE created_at > '2024-01-01';

-- Sorting (if index order matches): No extra sort needed!
SELECT * FROM products ORDER BY price DESC LIMIT 10;

-- PREFIX pattern matching (starts with)
SELECT * FROM users WHERE name LIKE 'John%';  -- Uses index!
SELECT * FROM users WHERE name LIKE '%John';  -- Cannot use index!

4.2 Creating and Managing Indexes

Basic Index Creation

-- Single column index
CREATE INDEX idx_users_email ON users(email);

-- Unique index (also enforces uniqueness)
CREATE UNIQUE INDEX idx_users_email_unique ON users(email);

-- Multi-column (composite) index
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at DESC);

-- Index with included columns (covering index)
CREATE INDEX idx_orders_user_covering ON orders(user_id) 
INCLUDE (status, total);

-- Partial index (index only some rows)
CREATE INDEX idx_orders_pending ON orders(created_at)
WHERE status = 'pending';

-- Expression index (index on computed value)
CREATE INDEX idx_users_lower_email ON users(lower(email));

-- Concurrent creation (doesn't lock table)
CREATE INDEX CONCURRENTLY idx_products_name ON products(name);

Index Naming Convention

idx_<table>_<columns>_<type>

Examples:
- idx_users_email          -- Single column
- idx_orders_user_date     -- Multiple columns
- idx_users_email_unique   -- Unique index
- idx_orders_pending_part  -- Partial index
- idx_users_email_gin      -- Specific type (GIN, GiST, etc.)

Dropping and Rebuilding

-- Drop index
DROP INDEX idx_users_email;

-- Drop concurrently (doesn't lock)
DROP INDEX CONCURRENTLY idx_users_email;

-- Rebuild index (useful after lots of updates/deletes)
REINDEX INDEX idx_users_email;

-- Rebuild concurrently (PostgreSQL 12+)
REINDEX INDEX CONCURRENTLY idx_users_email;

-- Rebuild all indexes on a table
REINDEX TABLE users;

4.3 Multi-Column Indexes

Understanding column order in composite indexes is crucial.

The Leftmost Prefix Rule

CREATE INDEX idx_orders_user_status_date ON orders(user_id, status, created_at);
This index supports queries on:
  • (user_id) - First column
  • (user_id, status) - First two columns
  • (user_id, status, created_at) - All columns
  • (status) - Not leftmost column!
  • (status, created_at) - Not leftmost!
  • (user_id, created_at) - Skips status
┌─────────────────────────────────────────────────────────────────────────┐
│                    COMPOSITE INDEX STRUCTURE                            │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
```sql
CREATE INDEX idx_sessions_token ON sessions USING hash(token);

-- Only supports =
SELECT * FROM sessions WHERE token = 'abc123xyz';  -- Fast!
SELECT * FROM sessions WHERE token > 'abc';         -- Cannot use!
Hash indexes are rarely needed. B-Tree handles equality almost as fast and is more versatile. Use Hash only when you’re certain you’ll never need range queries.

GIN (Generalized Inverted Index)

Best for: Array containment, JSONB, full-text search
-- Array operations
CREATE INDEX idx_posts_tags ON posts USING gin(tags);
SELECT * FROM posts WHERE tags @> ARRAY['postgresql'];
SELECT * FROM posts WHERE tags && ARRAY['sql', 'nosql'];  -- Overlap

-- JSONB operations
CREATE INDEX idx_products_attrs ON products USING gin(attributes);
SELECT * FROM products WHERE attributes @> '{"color": "red"}';
SELECT * FROM products WHERE attributes ? 'size';

-- Full-text search
CREATE INDEX idx_articles_search ON articles 
USING gin(to_tsvector('english', title || ' ' || body));

SELECT * FROM articles 
WHERE to_tsvector('english', title || ' ' || body) 
      @@ to_tsquery('english', 'postgresql & performance');
┌─────────────────────────────────────────────────────────────────────────┐
│                         GIN INDEX STRUCTURE                             │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  Data:                                                                  │
│  ┌────────────────────────────────────────────────────┐                │
│  │ id │ tags                                          │                │
│  ├────┼───────────────────────────────────────────────┤                │
│  │ 1  │ ['postgresql', 'database', 'sql']             │                │
│  │ 2  │ ['python', 'postgresql']                      │                │
│  │ 3  │ ['javascript', 'react']                       │                │
│  │ 4  │ ['database', 'nosql', 'mongodb']              │                │
│  └────┴───────────────────────────────────────────────┘                │
│                                                                         │
│  GIN Index (Inverted):                                                  │
│  ┌─────────────────────────────────────────────────────┐               │
│  │ Key          │ Posting List (row IDs)              │               │
│  ├──────────────┼─────────────────────────────────────┤               │
│  │ database     │ [1, 4]                              │               │
│  │ javascript   │ [3]                                 │               │
│  │ mongodb      │ [4]                                 │               │
│  │ nosql        │ [4]                                 │               │
│  │ postgresql   │ [1, 2]                              │               │
│  │ python       │ [2]                                 │               │
│  │ react        │ [3]                                 │               │
│  │ sql          │ [1]                                 │               │
│  └──────────────┴─────────────────────────────────────┘               │
│                                                                         │
│  Query: tags @> ARRAY['postgresql']                                    │
│  → Look up 'postgresql' → [1, 2] → Fetch rows 1 and 2                  │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

GiST (Generalized Search Tree)

Best for: Geometric data, ranges, full-text with phrases
-- Range types
CREATE INDEX idx_events_duration ON events USING gist(during);
SELECT * FROM events 
WHERE during && '[2024-01-01, 2024-01-31]'::daterange;  -- Overlaps

-- Geometric (with PostGIS)
CREATE INDEX idx_locations_point ON locations USING gist(geom);
SELECT * FROM locations 
WHERE ST_DWithin(geom, ST_MakePoint(-73.99, 40.73)::geography, 1000);

-- Full-text with phrase search
CREATE INDEX idx_docs_search ON documents 
USING gist(to_tsvector('english', content));

BRIN (Block Range Index)

Best for: Very large tables with naturally ordered data (time series)
-- Perfect for append-only time series data
CREATE INDEX idx_logs_time ON logs USING brin(created_at);

-- Much smaller than B-Tree: stores min/max per block range
-- Table: 100GB → B-Tree: 2GB, BRIN: 2MB
┌─────────────────────────────────────────────────────────────────────────┐
│                         BRIN INDEX STRUCTURE                            │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  Table blocks (pages):                                                  │
│  ┌─────────────┬─────────────┬─────────────┬─────────────┐             │
│  │ Block 0-127 │ Block 128-255│ Block 256-383│ Block 384+  │             │
│  │ Jan 1-7     │ Jan 8-14    │ Jan 15-21   │ Jan 22+     │             │
│  └─────────────┴─────────────┴─────────────┴─────────────┘             │
│                                                                         │
│  BRIN Index (tiny!):                                                    │
│  ┌───────────────────────────────────────────────────────┐             │
│  │ Block Range  │ Min Date    │ Max Date                 │             │
│  ├──────────────┼─────────────┼──────────────────────────┤             │
│  │ 0-127        │ Jan 1       │ Jan 7                    │             │
│  │ 128-255      │ Jan 8       │ Jan 14                   │             │
│  │ 256-383      │ Jan 15      │ Jan 21                   │             │
│  │ 384-511      │ Jan 22      │ Jan 28                   │             │
│  └──────────────┴─────────────┴──────────────────────────┘             │
│                                                                         │
│  Query: WHERE created_at = 'Jan 10'                                    │
│  → Check BRIN: Jan 10 could be in blocks 128-255                       │
│  → Scan only those blocks (not entire table!)                          │
│                                                                         │
│  Requirements:                                                          │
│  • Data must be physically clustered by index column                   │
│  • Works best for append-only tables                                   │
│  • Loses effectiveness if data is scattered                           │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

4.5 Advanced Index Techniques

Partial Indexes

Index only the rows you actually query.
-- Only index active users (most queries filter for active)
CREATE INDEX idx_users_email_active ON users(email)
WHERE deleted_at IS NULL;

-- Only index pending orders (statuses you filter often)
CREATE INDEX idx_orders_pending ON orders(created_at)
WHERE status = 'pending';

-- Index high-value orders
CREATE INDEX idx_orders_high_value ON orders(user_id, created_at)
WHERE total > 1000;

-- Size comparison
-- Full index: 500MB
-- Partial index: 50MB (if only 10% of rows match condition)

Expression Indexes

Index computed values.
-- Case-insensitive email lookup
CREATE INDEX idx_users_lower_email ON users(lower(email));

-- Now this uses the index:
SELECT * FROM users WHERE lower(email) = 'john@example.com';

-- Extract year for date queries
CREATE INDEX idx_orders_year ON orders(EXTRACT(YEAR FROM created_at));

-- JSONB field extraction
CREATE INDEX idx_products_brand ON products((attributes->>'brand'));
SELECT * FROM products WHERE attributes->>'brand' = 'Apple';

Covering Indexes (Index-Only Scans)

Include extra columns to avoid table lookups entirely.
-- Regular index
CREATE INDEX idx_orders_user ON orders(user_id);

-- Query needs status and total:
SELECT status, total FROM orders WHERE user_id = 123;
-- 1. Lookup user_id in index → get row pointer
-- 2. Fetch row from table → get status, total
-- 2 operations!

-- Covering index (PostgreSQL 11+)
CREATE INDEX idx_orders_user_covering ON orders(user_id) 
INCLUDE (status, total);

-- Now:
-- 1. Lookup user_id in index → status, total already there!
-- Index-only scan (faster!)

-- Use EXPLAIN to verify:
EXPLAIN SELECT status, total FROM orders WHERE user_id = 123;
-- "Index Only Scan using idx_orders_user_covering"

Unique Indexes for Constraints

-- Primary key creates unique index automatically
CREATE TABLE users (
    id SERIAL PRIMARY KEY  -- Creates unique index on id
);

-- UNIQUE constraint creates unique index
ALTER TABLE users ADD CONSTRAINT unique_email UNIQUE (email);
-- Creates idx_users_email (unique)

-- Unique with condition (partial unique)
CREATE UNIQUE INDEX idx_users_email_active ON users(email)
WHERE deleted_at IS NULL;
-- Only active users need unique emails

-- Unique across multiple columns
CREATE UNIQUE INDEX idx_product_variants ON products(product_id, size, color);
-- Each combination must be unique

4.4 Internal Mechanics: B-Trees vs. LSM-Trees

While B-Trees are the standard for most relational databases, modern systems often choose between B-Trees and LSM-Trees depending on the read/write profile.

B-Trees: The Read-Optimized Classic

A B-Tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. B-Tree Leaf Splits (The Write Penalty): When you insert a new key into a full node (typically 8KB or 16KB page), the database must perform a leaf split:
  1. A new page is allocated.
  2. Half the keys from the full node are moved to the new page.
  3. The parent node is updated to include a pointer to the new page.
  4. If the parent is also full, the split propagates up the tree.
Performance Impact: Leaf splits are expensive because they require extra I/O and locks on the tree structure, potentially causing latency spikes during heavy write bursts.

LSM-Trees: The Write-Optimized Modernist

Log-Structured Merge-Trees (LSM-Trees) are used in write-heavy systems like Cassandra, Bigtable, and RocksDB (and as an option in some relational extensions).
  1. Memtable: All writes are first buffered in an in-memory sorted structure (the memtable). This is fast because it’s purely in-RAM.
  2. SSTables: Once the memtable is full, it is flushed to disk as a sorted, immutable file called an SSTable (Sorted String Table).
  3. Compaction: Background processes merge smaller SSTables into larger ones, removing deleted entries and resolving updates.
Trade-offs:
  • Write Performance: Superior to B-Trees because writes are always sequential flushes.
  • Read Performance: Potentially slower than B-Trees because a single read might have to check multiple SSTables (Read Amplification).
FeatureB-Tree (PostgreSQL/MySQL)LSM-Tree (Bigtable/Cassandra)
Write TypeRandom / In-place updateSequential / Append-only
Write EfficiencyLower (Leaf splits/Locking)Very High
Read EfficiencyVery High (Direct path)Lower (May scan multiple files)
Space EfficiencyLower (Fragmented pages)Higher (Compressed SSTables)

4.5 Advanced Indexing Strategies

Partial Indexes for High Cardinality

A partial index is built over a subset of a table defined by a conditional expression. This is exceptionally powerful for optimizing queries that target a specific lifecycle state of a record.
-- Index only active accounts
-- Huge space savings if 90% of accounts are 'deactivated'
CREATE INDEX idx_active_accounts ON accounts(user_id)
WHERE status = 'active';

Functional and Expression Indexes

You can index the result of a function or expression, rather than just column values. This is essential when the query logic involves transformations.
-- Indexing a truncated date for reporting
CREATE INDEX idx_orders_day ON orders (date_trunc('day', created_at));

-- Indexing a specific key in a JSONB blob
CREATE INDEX idx_item_sku ON order_items ((details->>'sku'));

Covering Indexes and the Visibility Map

A Covering Index uses the INCLUDE clause to add payload columns to a B-Tree’s leaf nodes. This enables Index-Only Scans, but there’s a catch: The Visibility Map. For an index-only scan to be truly fast, the database must verify that the data in the index is “visible” to the current transaction without checking the table heap.
  • If the page is marked as “all-visible” in the Visibility Map, the DB skips the heap.
  • If the page has recently changed and isn’t marked yet, the DB must still fetch the row from disk, negating the speed benefit.
Pro-tip: Regular VACUUM (or aggressive autovacuum) is required to keep the Visibility Map fresh and ensure Index-Only Scans stay fast.

4.6 Reading Execution Plans

Understanding EXPLAIN output is crucial for index optimization.

EXPLAIN Basics

-- Basic explain (estimated costs only)
EXPLAIN SELECT * FROM users WHERE email = 'john@example.com';

-- With actual execution stats
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'john@example.com';

-- Full details
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) 
SELECT * FROM users WHERE email = 'john@example.com';

Reading the Plan

EXPLAIN (ANALYZE, BUFFERS)
SELECT u.name, COUNT(o.id) as order_count
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id, u.name
ORDER BY order_count DESC
LIMIT 10;
Limit  (cost=1500.00..1500.03 rows=10 width=40) (actual time=45.2..45.3 rows=10 loops=1)
  Buffers: shared hit=1200 read=50
  ->  Sort  (cost=1500.00..1510.00 rows=4000 width=40) (actual time=45.2..45.2 rows=10 loops=1)
        Sort Key: (count(o.id)) DESC
        Sort Method: top-N heapsort  Memory: 25kB
        ->  HashAggregate  (cost=1200.00..1250.00 rows=4000 width=40) (actual time=35.5..40.1 rows=4000 loops=1)
              Group Key: u.id
              Batches: 1  Memory Usage: 600kB
              ->  Hash Join  (cost=100.00..1000.00 rows=20000 width=32) (actual time=5.2..25.4 rows=20000 loops=1)
                    Hash Cond: (o.user_id = u.id)
                    ->  Seq Scan on orders o  (cost=0.00..500.00 rows=30000 width=8) (actual time=0.01..8.5 rows=30000 loops=1)
                    ->  Hash  (cost=80.00..80.00 rows=4000 width=24) (actual time=4.8..4.8 rows=4000 loops=1)
                          Buckets: 8192  Batches: 1  Memory Usage: 250kB
                          ->  Index Scan using idx_users_created on users u  (cost=0.29..80.00 rows=4000 width=24)
                                Index Cond: (created_at > '2024-01-01')
                                Buffers: shared hit=100
Planning Time: 0.5 ms
Execution Time: 45.5 ms

Key Metrics to Watch

MetricGood SignBad Sign
Index Scan✅ Using index
Seq Scan on large table❌ Missing index
Buffers: shared hit✅ Data in cache
Buffers: read⚠️ Disk I/O
actual rowsestimated rows✅ Good statistics
actual rows >> estimated rows❌ Run ANALYZE
Sort Method: quicksort Memory✅ Fits in memory
Sort Method: external merge Disk❌ Needs more work_mem

Common Scan Types

┌─────────────────────────────────────────────────────────────────────────┐
│                         SCAN TYPE COMPARISON                            │
├──────────────────────┬──────────────────────────────────────────────────┤
│ Seq Scan             │ Reads entire table, row by row                  │
│                      │ O(n) - Use when: no index, or selecting most    │
│                      │ of the table                                     │
├──────────────────────┼──────────────────────────────────────────────────┤
│ Index Scan           │ Traverse index, then fetch rows from table      │
│                      │ O(log n + k) - Use when: selecting few rows     │
├──────────────────────┼──────────────────────────────────────────────────┤
│ Index Only Scan      │ All data from index, no table access            │
│                      │ Fastest! Requires covering index + visible rows │
├──────────────────────┼──────────────────────────────────────────────────┤
│ Bitmap Index Scan    │ Build bitmap of matching rows, then fetch       │
│                      │ Good for: medium selectivity, multiple indexes  │
│                      │ combining with OR                                │
└──────────────────────┴──────────────────────────────────────────────────┘

4.7 Index Maintenance

Monitoring Index Usage

-- Find unused indexes
SELECT 
    schemaname,
    relname AS table,
    indexrelname AS index,
    idx_scan AS times_used,
    pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;

-- Find most used indexes
SELECT 
    relname AS table,
    indexrelname AS index,
    idx_scan AS times_used,
    idx_tup_read AS rows_read,
    idx_tup_fetch AS rows_fetched
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC
LIMIT 20;

-- Tables with too many sequential scans (may need indexes)
SELECT 
    relname AS table,
    seq_scan,
    idx_scan,
    CASE WHEN idx_scan > 0 THEN 
        ROUND(100.0 * seq_scan / (seq_scan + idx_scan), 2)
    END AS seq_scan_pct,
    pg_size_pretty(pg_relation_size(relid)) AS size
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_scan DESC
LIMIT 20;

Index Bloat

Indexes accumulate dead entries over time, becoming larger and slower.
-- Check index size vs table size
SELECT 
    relname AS table,
    pg_size_pretty(pg_table_size(relid)) AS table_size,
    pg_size_pretty(pg_indexes_size(relid)) AS indexes_size,
    ROUND(100.0 * pg_indexes_size(relid) / NULLIF(pg_table_size(relid), 0), 2) AS index_ratio
FROM pg_stat_user_tables
ORDER BY pg_indexes_size(relid) DESC;

-- Rebuild bloated index
REINDEX INDEX CONCURRENTLY idx_orders_user;

-- Or create replacement and swap
CREATE INDEX CONCURRENTLY idx_orders_user_new ON orders(user_id);
DROP INDEX idx_orders_user;
ALTER INDEX idx_orders_user_new RENAME TO idx_orders_user;

Keeping Statistics Fresh

-- Update statistics for a table
ANALYZE users;

-- Update for entire database
ANALYZE;

-- Check when statistics were last updated
SELECT 
    relname,
    last_analyze,
    last_autoanalyze,
    n_live_tup,
    n_dead_tup
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

-- Configure autovacuum for busy tables
ALTER TABLE orders SET (
    autovacuum_analyze_scale_factor = 0.01,  -- Analyze at 1% change
    autovacuum_vacuum_scale_factor = 0.01    -- Vacuum at 1% dead tuples
);

4.8 Practice Exercises

Exercise 1: Index Design

Given these queries, design optimal indexes:
-- Query 1: User lookup by email
SELECT * FROM users WHERE email = 'john@example.com';

-- Query 2: User's recent orders
SELECT * FROM orders 
WHERE user_id = 123 
ORDER BY created_at DESC 
LIMIT 10;

-- Query 3: Pending orders older than 1 hour
SELECT * FROM orders 
WHERE status = 'pending' 
AND created_at < NOW() - INTERVAL '1 hour';

-- Query 4: Product search by category and price range
SELECT * FROM products 
WHERE category_id = 5 
AND price BETWEEN 100 AND 500 
ORDER BY price;

-- Query 5: Count of orders by status per user (dashboard)
SELECT user_id, status, COUNT(*) 
FROM orders 
GROUP BY user_id, status;
-- Query 1: Simple unique index
CREATE UNIQUE INDEX idx_users_email ON users(email);

-- Query 2: Composite with sort order
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at DESC);

-- Query 3: Partial index on pending orders
CREATE INDEX idx_orders_pending_created ON orders(created_at)
WHERE status = 'pending';

-- Query 4: Composite for filter + sort
CREATE INDEX idx_products_category_price ON products(category_id, price);

-- Query 5: Covering index for aggregation
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
-- Or include count columns if other columns needed

Exercise 2: Analyze Slow Query

-- This query is slow. Analyze and fix it.
EXPLAIN ANALYZE
SELECT 
    p.name,
    p.price,
    c.name as category_name,
    AVG(r.rating) as avg_rating,
    COUNT(r.id) as review_count
FROM products p
LEFT JOIN categories c ON p.category_id = c.id
LEFT JOIN reviews r ON p.id = r.product_id
WHERE lower(p.name) LIKE '%laptop%'
AND p.price > 500
AND p.status = 'active'
GROUP BY p.id, p.name, p.price, c.name
HAVING COUNT(r.id) > 10
ORDER BY avg_rating DESC
LIMIT 20;
-- Problems identified:
-- 1. lower(p.name) LIKE '%laptop%' - Cannot use index (function + leading wildcard)
-- 2. No covering index for the main filter conditions
-- 3. Reviews aggregation might be slow

-- Solution 1: Full-text search for name
ALTER TABLE products ADD COLUMN name_search tsvector 
    GENERATED ALWAYS AS (to_tsvector('english', name)) STORED;
CREATE INDEX idx_products_name_search ON products USING gin(name_search);

-- Solution 2: Index for filters
CREATE INDEX idx_products_active_price ON products(price)
WHERE status = 'active';

-- Solution 3: Index for reviews aggregation
CREATE INDEX idx_reviews_product ON reviews(product_id);

-- Optimized query:
SELECT 
    p.name,
    p.price,
    c.name as category_name,
    AVG(r.rating) as avg_rating,
    COUNT(r.id) as review_count
FROM products p
LEFT JOIN categories c ON p.category_id = c.id
LEFT JOIN reviews r ON p.id = r.product_id
WHERE p.name_search @@ to_tsquery('english', 'laptop')  -- Use full-text search
AND p.price > 500
AND p.status = 'active'
GROUP BY p.id, p.name, p.price, c.name
HAVING COUNT(r.id) > 10
ORDER BY avg_rating DESC
LIMIT 20;

Next Module

Module 5: Query Optimization

Learn systematic approaches to optimize any slow query

Interview Deep-Dive

Strong Answer:
  • The index can be partially used. PostgreSQL will use the B-Tree to navigate to the correct user_id value (the leading column), but then it must scan all entries for that user_id and apply the status filter as a “filter condition” rather than an “index condition.” The created_at column in the middle breaks the ability to skip directly to the status values.
  • This is the “leftmost prefix” rule in action. A B-Tree index on (A, B, C) is effectively three indexes: (A), (A, B), and (A, B, C). It is NOT usable as (A, C) with a skip scan — PostgreSQL does not currently implement B-Tree skip scans (though MySQL 8.0 does in limited cases, and it is under discussion for PostgreSQL).
  • In EXPLAIN output, you will see the query use an Index Scan with Index Cond: (user_id = 123) and Filter: (status = 'active'). The “Filter” line means rows are being fetched from the index and then discarded — wasted I/O. If the user has 100,000 entries but only 500 are active, you are reading 100,000 index entries to return 500 rows.
  • The fix depends on query patterns: if filtering by (user_id, status) is common, create a separate index on (user_id, status) or reorder the composite to (user_id, status, created_at). If both patterns are common, the cost of maintaining two indexes may be justified.
**Follow-up: How would you approach index design for a table with 15 different query patterns? You cannot create 15 indexes.This is where index consolidation becomes an art. I group queries by their leading filter column and sort requirements. Often 3-4 well-designed composite indexes can cover 12-15 query patterns. I prioritize by query frequency (pg_stat_statements tells me which queries run most often), then design indexes that satisfy the highest-frequency queries perfectly and the lower-frequency ones acceptably. I also look for opportunities to use partial indexes — if 80% of queries filter WHERE status = ‘active’, a partial index on the active rows is smaller and faster than a full index. The monitoring loop is: deploy, check pg_stat_user_indexes for idx_scan counts, drop any index with zero usage after 30 days.**
Strong Answer:
  • BRIN (Block Range Index) stores the minimum and maximum values for a range of physical table blocks (default 128 blocks per range). Instead of indexing every row, it indexes block ranges. This makes BRIN indexes extremely small — a BRIN index on a 100GB table might be 200KB versus 2GB for a B-Tree.
  • BRIN is ideal when the physical order of rows on disk strongly correlates with the indexed column’s logical order. The canonical example is a time-series table where rows are always inserted in chronological order. The BRIN index can say “blocks 0-127 contain timestamps from Jan 1-7, blocks 128-255 contain Jan 8-14” and skip irrelevant blocks entirely.
  • The failure mode is data that is not physically ordered. If you do bulk loads out of order, or if UPDATE operations create new tuple versions on random pages (because PostgreSQL’s MVCC appends updated tuples wherever there is free space), the BRIN ranges become very wide. A range that should say “Jan 1-7” instead says “Jan 1 to Dec 31” because an updated row landed in that block range. At that point, BRIN cannot eliminate any blocks and degrades to a sequential scan.
  • To maintain BRIN effectiveness: use append-only tables (INSERT only, no UPDATE), CLUSTER the table periodically if updates are unavoidable, and set a smaller pages_per_range for finer granularity at the cost of a slightly larger index.
Follow-up: You mentioned CLUSTER. What does it do, and why is it not commonly used in production?CLUSTER physically reorders the table rows on disk to match the order of a specified index. After CLUSTER, a BRIN index on the clustered column becomes maximally effective. The problem is that CLUSTER takes an ACCESS EXCLUSIVE lock on the table for the entire duration of the rewrite — for a 100GB table, that could be hours of total downtime. It also requires roughly 2x the table’s disk space temporarily. There is no CLUSTER CONCURRENTLY. For this reason, most production systems use pg_repack instead, which does an online table rewrite with minimal locking. Alternatively, if the table is partitioned by time, you can CLUSTER individual old partitions during off-peak hours since they receive no new writes.
Strong Answer:
  • Over-indexing has three costs that compound: (1) Write amplification — every INSERT must update all 12 indexes, every UPDATE to an indexed column must update the affected indexes, and every DELETE must mark entries in all indexes for cleanup. If the table has high write throughput, the cumulative index maintenance can easily double or triple the write latency. (2) Vacuum overhead — autovacuum must clean dead tuples from the table AND all 12 indexes. If vacuum cannot keep up, index bloat causes further degradation in a vicious cycle. (3) Planner overhead — the query planner must evaluate all available indexes for every query, increasing planning time. With 12 indexes and complex queries, planning time can exceed execution time.
  • The diagnostic process: query pg_stat_user_indexes to find indexes with idx_scan = 0 (never used) or very low scan counts. Drop those first. Then identify redundant indexes — an index on (user_id) is redundant if you also have (user_id, created_at), since the composite index covers single-column lookups. Use the pg_stat_user_tables view to check if n_dead_tup is growing faster than autovacuum can clean it.
  • The target: a well-tuned table typically has 3-5 indexes. Each index should justify its existence with measurable query performance improvement that outweighs its write cost. I use the rule: if an index is not used in at least 1% of the queries hitting that table (measured over 30 days), it should be dropped.
Follow-up: How do you safely drop an index in production without risking a query regression?First, mark the index as invalid without dropping it: there is no native “disable index” in PostgreSQL, but you can use UPDATE pg_index SET indisvalid = false WHERE indexrelid = 'idx_name'::regclass. This tells the planner to stop using the index, but the index still exists and is still maintained (so you can re-enable it quickly). Monitor for 1-2 weeks. If no query regressions appear, drop the index with DROP INDEX CONCURRENTLY idx_name. The CONCURRENTLY flag avoids taking an ACCESS EXCLUSIVE lock on the table. If you see regressions, re-enable with UPDATE pg_index SET indisvalid = true — instant rollback.