Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Query Optimization Concept

Module 5: Query Optimization

Every developer eventually faces the dreaded slow query. This module teaches you a systematic approach to diagnose performance issues and apply the right fixes. The most important mindset shift in query optimization: Stop guessing. The difference between a junior and a senior engineer facing a slow query is that the junior starts adding indexes, while the senior runs EXPLAIN ANALYZE first. Over 80% of query performance problems become obvious once you read the execution plan. The plan tells you exactly where time is being spent — you just need to learn the language it speaks.
Estimated Time: 10-12 hours
Hands-On: Real-world slow query analysis
Key Skill: Reading execution plans and applying targeted optimizations

5.1 The Query Optimization Process

Follow this systematic approach for any slow query:
┌─────────────────────────────────────────────────────────────────────────┐
│                    QUERY OPTIMIZATION WORKFLOW                          │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  1. MEASURE                                                             │
│     └──▶ Identify slow queries (pg_stat_statements, logs)              │
│          Get baseline execution time                                    │
│                                                                         │
│  2. EXPLAIN ANALYZE                                                     │
│     └──▶ Get execution plan with actual timings                        │
│          Identify the slowest operations                               │
│                                                                         │
│  3. DIAGNOSE                                                            │
│     └──▶ Sequential scans on large tables? → Add index                 │
│          Poor row estimates? → Run ANALYZE                             │
│          Memory spills to disk? → Increase work_mem                    │
│          Too many rows processed? → Rewrite query                      │
│                                                                         │
│  4. APPLY FIX                                                           │
│     └──▶ Add/modify index                                              │
│          Rewrite query                                                 │
│          Adjust configuration                                          │
│          Restructure schema                                            │
│                                                                         │
│  5. VERIFY                                                              │
│     └──▶ EXPLAIN ANALYZE again                                         │
│          Measure improvement                                           │
│          Test with production-like data                                │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

5.2 Finding Slow Queries

Using pg_stat_statements

-- Enable the extension (requires server restart or shared_preload_libraries)
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Top 10 slowest queries by total time
SELECT 
    substring(query, 1, 100) AS query_preview,
    calls,
    round(total_exec_time::numeric, 2) AS total_time_ms,
    round(mean_exec_time::numeric, 2) AS avg_time_ms,
    round(stddev_exec_time::numeric, 2) AS stddev_ms,
    rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

-- Queries with highest average time
SELECT 
    substring(query, 1, 100) AS query_preview,
    calls,
    round(mean_exec_time::numeric, 2) AS avg_time_ms,
    round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) AS pct_total
FROM pg_stat_statements
WHERE calls > 10  -- Filter out rare queries
ORDER BY mean_exec_time DESC
LIMIT 20;

-- Queries with high variance (inconsistent performance)
SELECT 
    substring(query, 1, 100) AS query_preview,
    calls,
    round(mean_exec_time::numeric, 2) AS avg_ms,
    round(min_exec_time::numeric, 2) AS min_ms,
    round(max_exec_time::numeric, 2) AS max_ms
FROM pg_stat_statements
WHERE calls > 100 
AND stddev_exec_time > mean_exec_time  -- High variance
ORDER BY stddev_exec_time DESC
LIMIT 10;

Query Logging

-- Log queries slower than 1 second
ALTER SYSTEM SET log_min_duration_statement = '1000';  -- milliseconds
SELECT pg_reload_conf();

-- Log all queries (careful in production - lots of logs!)
ALTER SYSTEM SET log_statement = 'all';

-- View recent slow queries from logs
-- (Use your log viewer or tail the log files)

5.3 Reading Execution Plans Like a Pro

EXPLAIN Options

-- Basic plan (estimates only -- does NOT run the query)
-- Use this when you want to see the plan without side effects (e.g., DELETE/UPDATE)
EXPLAIN SELECT * FROM orders WHERE user_id = 123;

-- With actual execution (RUNS the query! Be careful with INSERT/UPDATE/DELETE)
-- Wrap destructive queries in a transaction and ROLLBACK:
--   BEGIN; EXPLAIN ANALYZE DELETE FROM orders WHERE ...; ROLLBACK;
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 123;

-- Full analysis with buffer stats (the gold standard for production debugging)
-- BUFFERS shows how many 8KB pages were read from cache vs disk
-- This is often more useful than timing for diagnosing I/O problems
EXPLAIN (ANALYZE, BUFFERS, TIMING, FORMAT TEXT)
SELECT * FROM orders WHERE user_id = 123;

-- JSON format for programmatic analysis (useful for tools like pganalyze, dalibo)
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT * FROM orders WHERE user_id = 123;
Practical tip: Always use EXPLAIN (ANALYZE, BUFFERS) rather than plain EXPLAIN ANALYZE. The BUFFERS output is what separates a superficial analysis from a real diagnosis. Without it, you know a node took 50ms, but you do not know whether that was 50ms of CPU computation or 50ms of waiting for disk reads.

Anatomy of a Plan

Limit  (cost=0.43..8.45 rows=10 width=152) (actual time=0.025..0.042 rows=10 loops=1)
  ^^^^   ^^^^^^^^^^^^     ^^^^   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^   ^^^^^^^^
    |         |             |              |                               |
    |         |             |              |                               |
  Node      Cost          Rows          Actual execution                 Loops
  Type    estimate       estimate          stats

  cost=startup..total   - Estimated cost units (not milliseconds!)
  rows=10               - Estimated rows returned
  actual time=0.025     - Real time for first row (ms)
  actual time=..0.042   - Real time for all rows (ms)
  rows=10               - Actual rows returned
  loops=1               - How many times this node executed

Common Plan Nodes

Reads entire table, checking each row against filter.
Seq Scan on orders  (cost=0.00..15406.00 rows=500000 width=100)
  Filter: (status = 'pending'::text)
  Rows Removed by Filter: 480000
When it’s OK: Small tables, selecting most rows When it’s bad: Large tables, selecting few rows → Add index
Uses index to find rows, then fetches from table.
Index Scan using idx_orders_user on orders  (cost=0.42..8.44 rows=1 width=100)
  Index Cond: (user_id = 123)
Good for: Selective queries (small % of rows) Watch for: High Buffers: read means data not cached
All data comes from the index itself — no heap (table) access needed. This is the fastest scan type because it avoids the random I/O of fetching rows from the table.
Index Only Scan using idx_orders_user_status on orders  (cost=0.42..4.44 rows=1 width=16)
  Index Cond: (user_id = 123)
  Heap Fetches: 0     -- 0 means the visibility map was current (table recently vacuumed)
Best case scenario! Requires: A covering index (includes all columns the query needs) AND a recently vacuumed table (so the visibility map is up-to-date).Common pitfall: If Heap Fetches is close to the row count, you are losing most of the benefit of an index-only scan. This usually means VACUUM has not run recently enough. Check pg_stat_user_tables.n_dead_tup for the table. Watch for: Heap Fetches > 0 means some table access needed
Two-phase: build bitmap from index, then fetch matching rows.
Bitmap Heap Scan on orders  (cost=10.00..100.00 rows=500 width=100)
  Recheck Cond: (status = 'pending'::text)
  ->  Bitmap Index Scan on idx_orders_status  (cost=0.00..9.88 rows=500 width=0)
        Index Cond: (status = 'pending'::text)
Good for: Medium selectivity, OR conditions Better than: Multiple Index Scans combined
For each row in outer, scan inner for matches.
Nested Loop  (cost=0.43..123.45 rows=10 width=200)
  ->  Index Scan on users  (cost=0.42..8.44 rows=1 width=100)
  ->  Index Scan on orders  (cost=0.01..115.00 rows=10 width=100)
        Index Cond: (user_id = users.id)
Best for: Small outer result, indexed inner Beware: O(n × m) if both sides are large!
Build hash table from smaller side, probe with larger.
Hash Join  (cost=100.00..500.00 rows=10000 width=200)
  Hash Cond: (orders.user_id = users.id)
  ->  Seq Scan on orders  (cost=0.00..300.00 rows=10000 width=100)
  ->  Hash  (cost=80.00..80.00 rows=1000 width=100)
        Buckets: 1024  Memory Usage: 80kB
        ->  Seq Scan on users  (cost=0.00..80.00 rows=1000 width=100)
Best for: No index on join column, larger result sets Watch for: Batches > 1 means spilling to disk
Sort both sides, merge like a zipper.
Merge Join  (cost=500.00..600.00 rows=10000 width=200)
  Merge Cond: (orders.user_id = users.id)
  ->  Sort  (cost=300.00..325.00 rows=10000)
        Sort Key: orders.user_id
  ->  Sort  (cost=200.00..210.00 rows=5000)
        Sort Key: users.id
Best for: Both sides already sorted (or have sorted index) Cost: O(n log n + m log m + n + m)

5.4 Common Performance Problems

Problem 1: Missing Index

-- Symptom: Seq Scan on large table for selective query
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 12345;

--    Seq Scan on orders (cost=0.00..25000.00 rows=100 width=100) 
--                       (actual time=150.5..1523.2 rows=95 loops=1)
--      Filter: (user_id = 12345)
--      Rows Removed by Filter: 999905

-- Fix: Add index
CREATE INDEX idx_orders_user ON orders(user_id);

-- After:
--    Index Scan using idx_orders_user on orders 
--      (cost=0.42..95.25 rows=100 width=100) 
--      (actual time=0.03..0.52 rows=95 loops=1)

Problem 2: Poor Statistics

-- Symptom: Estimated rows wildly different from actual
EXPLAIN ANALYZE SELECT * FROM users WHERE status = 'vip';

--    Seq Scan on users (cost=0.00..1000.00 rows=50000 width=100) 
--                      (actual time=0.02..5.23 rows=50 loops=1)
--                                     ^^^^^^^            ^^
--                                   Estimated 50000, actual 50!

-- Fix: Update statistics
ANALYZE users;

-- Or for specific column:
ALTER TABLE users ALTER COLUMN status SET STATISTICS 1000;
ANALYZE users;

Problem 3: Inefficient Join Order

-- PostgreSQL usually picks good join order, but sometimes...
-- Symptom: Small table scanned many times in Nested Loop

EXPLAIN ANALYZE
SELECT * FROM orders o
JOIN users u ON o.user_id = u.id
JOIN products p ON o.product_id = p.id
WHERE u.country = 'US';

-- If planner gets it wrong, force join order:
SET join_collapse_limit = 1;  -- Respect explicit join order
SELECT * FROM 
    (SELECT * FROM users WHERE country = 'US') u  -- Filter first
    JOIN orders o ON o.user_id = u.id
    JOIN products p ON o.product_id = p.id;

Problem 4: Memory Spilling to Disk

-- Symptom: Sort or Hash spills to disk
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders ORDER BY created_at DESC;

--    Sort (cost=50000.00..52500.00 rows=1000000 width=100)
--      Sort Method: external merge  Disk: 120000kB  ← Problem!
--                   ^^^^^^^^^^^^^^
--    Better: Sort Method: quicksort  Memory: 25kB

-- Fix 1: Increase work_mem for this session
SET work_mem = '256MB';

-- Fix 2: Add index to avoid sort
CREATE INDEX idx_orders_created_desc ON orders(created_at DESC);

Problem 5: N+1 Queries

-- Symptom: Same query pattern repeated many times in logs
-- Application does:
-- 1. SELECT * FROM orders WHERE status = 'pending'
-- 2. For each order: SELECT * FROM users WHERE id = <user_id>

-- Fix: Single query with JOIN
SELECT o.*, u.name, u.email
FROM orders o
JOIN users u ON o.user_id = u.id
WHERE o.status = 'pending';

-- Or batch lookup
SELECT * FROM users WHERE id IN (1, 2, 3, 4, 5, ...);

5.5 Query Rewriting Techniques

Simplifying Subqueries

-- Subquery in WHERE (often inefficient)
SELECT * FROM products
WHERE id IN (
    SELECT product_id FROM order_items
    WHERE order_id IN (
        SELECT id FROM orders WHERE user_id = 123
    )
);

-- Rewritten with JOINs (usually faster)
SELECT DISTINCT p.*
FROM products p
JOIN order_items oi ON p.id = oi.product_id
JOIN orders o ON oi.order_id = o.id
WHERE o.user_id = 123;

EXISTS vs IN vs JOIN

-- Find users who have placed orders

-- Method 1: IN (subquery executes first, can be slow with large results)
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders);

-- Method 2: EXISTS (often faster, stops at first match)
SELECT * FROM users u
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);

-- Method 3: JOIN (depends on data, may return duplicates)
SELECT DISTINCT u.*
FROM users u
JOIN orders o ON u.id = o.user_id;

-- Method 4: Semi-join with LATERAL (PostgreSQL-specific, very flexible)
SELECT u.*
FROM users u
WHERE EXISTS (
    SELECT 1 FROM orders o WHERE o.user_id = u.id LIMIT 1
);

Pagination Optimization

-- OFFSET-based (slow for large offsets)
SELECT * FROM orders ORDER BY id LIMIT 20 OFFSET 10000;
-- Must scan and discard 10000 rows!

-- Keyset/Cursor pagination (fast, O(log n))
SELECT * FROM orders 
WHERE id > 12345  -- Last seen ID
ORDER BY id 
LIMIT 20;

-- For complex sorts, use composite cursor
SELECT * FROM orders
WHERE (created_at, id) < ('2024-01-15 10:00:00', 54321)
ORDER BY created_at DESC, id DESC
LIMIT 20;

Aggregation Optimization

-- Counting with filters (scans filtered rows)
SELECT COUNT(*) FROM orders WHERE status = 'pending';

-- If you need approximate count, use table stats (instant!)
SELECT reltuples::bigint AS approximate_count
FROM pg_class WHERE relname = 'orders';

-- Pre-aggregated summary tables for dashboards
CREATE MATERIALIZED VIEW order_stats AS
SELECT 
    date_trunc('day', created_at) AS day,
    status,
    COUNT(*) AS count,
    SUM(total) AS revenue
FROM orders
GROUP BY 1, 2;

-- Refresh periodically
REFRESH MATERIALIZED VIEW CONCURRENTLY order_stats;

5.6 Configuration Tuning

Key Performance Parameters

-- Memory settings
shared_buffers = '4GB'          -- 25% of RAM for dedicated server
effective_cache_size = '12GB'   -- 75% of RAM (includes OS cache)
work_mem = '256MB'              -- Memory per operation (careful!)
maintenance_work_mem = '1GB'    -- For VACUUM, CREATE INDEX

-- Planner settings
random_page_cost = 1.1          -- Lower for SSD (default 4.0 for HDD)
effective_io_concurrency = 200  -- Higher for SSD
default_statistics_target = 100 -- Increase for better estimates

-- Parallelism
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
parallel_tuple_cost = 0.01
parallel_setup_cost = 100

Dynamic Configuration

-- Change for current session only
SET work_mem = '512MB';

-- Change for current transaction
SET LOCAL work_mem = '512MB';

-- Change system-wide (requires reload or restart)
ALTER SYSTEM SET shared_buffers = '8GB';
SELECT pg_reload_conf();

-- View current settings
SHOW work_mem;
SHOW ALL;

-- View non-default settings
SELECT name, setting, unit, source
FROM pg_settings
WHERE source != 'default';

5.7 Query Plan Caching

Understanding Prepared Statements

-- Generic plan (cached, parameter agnostic)
PREPARE get_user(int) AS SELECT * FROM users WHERE id = $1;
EXECUTE get_user(123);

-- PostgreSQL uses generic plan after 5 executions
-- or earlier if custom plan cost is similar

-- Force custom plan (uses actual parameter values for planning)
SET plan_cache_mode = 'force_custom_plan';

-- View cached plans
SELECT * FROM pg_prepared_statements;

Plan Instability

-- Problem: Same query, different plans based on statistics

-- Check if your query gets different plans
EXPLAIN ANALYZE SELECT * FROM orders WHERE status = 'pending';
-- Run multiple times, compare plans

-- If plans are unstable, consider:
-- 1. More frequent ANALYZE
-- 2. Increase statistics target
ALTER TABLE orders ALTER COLUMN status SET STATISTICS 1000;
ANALYZE orders;

-- 3. Use pg_hint_plan extension to force specific plan

5.8 Practice: Optimize These Queries

Query 1: Dashboard Stats

-- Current query takes 30 seconds
SELECT 
    COUNT(*) FILTER (WHERE status = 'pending') AS pending_count,
    COUNT(*) FILTER (WHERE status = 'processing') AS processing_count,
    COUNT(*) FILTER (WHERE status = 'shipped') AS shipped_count,
    SUM(total) FILTER (WHERE status = 'completed') AS revenue
FROM orders
WHERE created_at > CURRENT_DATE - INTERVAL '30 days';
-- Problem: Full scan of orders table, counting multiple times

-- Solution 1: Materialized view (best for dashboards)
CREATE MATERIALIZED VIEW daily_order_stats AS
SELECT 
    date_trunc('day', created_at) AS day,
    status,
    COUNT(*) AS count,
    SUM(total) AS total
FROM orders
GROUP BY 1, 2;

CREATE UNIQUE INDEX ON daily_order_stats(day, status);

-- Query becomes instant:
SELECT 
    SUM(count) FILTER (WHERE status = 'pending') AS pending_count,
    SUM(count) FILTER (WHERE status = 'processing') AS processing_count,
    SUM(count) FILTER (WHERE status = 'shipped') AS shipped_count,
    SUM(total) FILTER (WHERE status = 'completed') AS revenue
FROM daily_order_stats
WHERE day > CURRENT_DATE - INTERVAL '30 days';

-- Solution 2: Partial indexes + single pass
CREATE INDEX idx_orders_30d ON orders(created_at, status) 
WHERE created_at > CURRENT_DATE - INTERVAL '30 days';
-- Note: Partial index condition must match query exactly

-- Solution 3: Use status-specific indexes for counts
CREATE INDEX idx_orders_pending ON orders(created_at) WHERE status = 'pending';
-- Then count each status separately with UNION ALL
-- Takes 5+ seconds
SELECT p.*, c.name as category_name, 
       COUNT(r.id) as review_count, 
       AVG(r.rating) as avg_rating
FROM products p
LEFT JOIN categories c ON p.category_id = c.id
LEFT JOIN reviews r ON r.product_id = p.id
WHERE lower(p.name) LIKE '%wireless%'
  AND p.price BETWEEN 50 AND 200
  AND p.status = 'active'
GROUP BY p.id, c.name
HAVING AVG(r.rating) >= 4.0 OR COUNT(r.id) = 0
ORDER BY avg_rating DESC NULLS LAST, review_count DESC
LIMIT 20;
-- Problems:
-- 1. lower(p.name) LIKE '%wireless%' can't use index
-- 2. Aggregating reviews for every matched product
-- 3. HAVING filter after expensive GROUP BY

-- Solution: Full-text search + pre-aggregated stats

-- Add full-text search column
ALTER TABLE products ADD COLUMN search_vector tsvector
    GENERATED ALWAYS AS (to_tsvector('english', name || ' ' || COALESCE(description, ''))) STORED;
CREATE INDEX idx_products_search ON products USING gin(search_vector);

-- Pre-aggregate review stats (update via trigger or periodic job)
ALTER TABLE products ADD COLUMN review_count INTEGER DEFAULT 0;
ALTER TABLE products ADD COLUMN avg_rating DECIMAL(3,2);

-- Create index for the filter conditions
CREATE INDEX idx_products_active_price ON products(price)
WHERE status = 'active';

-- Optimized query:
SELECT p.*, c.name as category_name
FROM products p
LEFT JOIN categories c ON p.category_id = c.id
WHERE p.search_vector @@ to_tsquery('english', 'wireless')
  AND p.price BETWEEN 50 AND 200
  AND p.status = 'active'
  AND (p.avg_rating >= 4.0 OR p.review_count = 0)
ORDER BY p.avg_rating DESC NULLS LAST, p.review_count DESC
LIMIT 20;

-- Went from 5s to under 50ms

Next Module

Module 6: PostgreSQL Internals

Understand how PostgreSQL works under the hood

Interview Deep-Dive

Strong Answer:
  • This is a classic plan regression scenario. The query itself did not change, but something the planner depends on did. My diagnosis checklist, in order of likelihood:
  • (1) Statistics changed. Run EXPLAIN ANALYZE and compare estimated rows vs. actual rows. If the estimates are wildly off (e.g., estimated 100 rows, actual 500,000), the planner chose a nested loop where it should have chosen a hash join, or an index scan where it should have chosen a sequential scan. Fix: run ANALYZE on the affected tables. If a specific column has skewed distribution, increase its statistics target: ALTER TABLE orders ALTER COLUMN status SET STATISTICS 1000; ANALYZE orders;
  • (2) An index was dropped, became invalid, or bloated. Check pg_stat_user_indexes for the relevant indexes. Check if a concurrent REINDEX failed partway, leaving an invalid index. Check index size vs. table size ratio — if the index is larger than the table, it is severely bloated and needs REINDEX CONCURRENTLY.
  • (3) Table bloat. If autovacuum has been struggling (check n_dead_tup in pg_stat_user_tables), the table may have doubled in size with dead tuples. Sequential scans now read 2x the data. The planner’s cost estimates may also be wrong because pg_class.relpages reflects the bloated size.
  • (4) Memory pressure. If another workload consumed shared_buffers or the OS page cache was evicted (perhaps a backup ran), the same query now hits disk instead of cache. Check EXPLAIN (ANALYZE, BUFFERS) — if Buffers: read is high where it was previously all shared hit, this is the cause.
  • (5) Lock contention. The query itself is fast, but it is waiting for a lock held by another transaction. Check pg_stat_activity for wait_event_type = 'Lock'.
Follow-up: How would you prevent plan regressions from happening in the first place?Three strategies: (1) Pin critical query plans using prepared statements with plan_cache_mode = force_custom_plan for queries that are sensitive to parameter values. (2) Monitor plan changes using pg_stat_statements — track the mean_exec_time over time and alert when it jumps by more than 3x. (3) For the most critical queries, maintain a test suite that runs EXPLAIN on production-like data volumes and asserts on the plan shape (e.g., “this query must use Index Scan, not Seq Scan”). This catches regressions before deployment.
Strong Answer:
  • Nested Loop: For each row in the outer relation, scan the inner relation for matches. O(n * m) in the worst case, but with an index on the inner relation’s join column, the inner scan is O(log m) per row, making it O(n * log m). The planner chooses this when the outer relation is small (a few hundred rows or fewer) and the inner relation has a good index. It gets it wrong when the planner underestimates the outer relation’s cardinality — estimating 100 rows but getting 100,000 turns a fast plan into a catastrophe.
  • Hash Join: Build a hash table from the smaller relation (the “build” side), then probe it with each row from the larger relation (the “probe” side). O(n + m) with O(min(n,m)) memory. The planner chooses this for equi-joins when neither side has a useful sort order and at least one side fits in work_mem. Watch for Batches: 4 in EXPLAIN — this means the hash table did not fit in work_mem and spilled to disk, dramatically slowing the join. Fix: increase work_mem for the session, or add an index to enable a nested loop instead.
  • Merge Join: Sort both relations on the join key, then merge them like a zipper. O(n log n + m log m) for the sorts, then O(n + m) for the merge. The planner chooses this when both sides are already sorted (from an index scan or a preceding sort) or when the result itself needs to be sorted. Merge joins are rare in OLTP but common in analytics and data warehouse queries with large sorted result sets.
  • The planner gets it wrong most often on nested loops: a cardinality underestimate on the outer side turns what should be a hash join into a devastating nested loop. If you see a nested loop with loops=500000 in the inner child, that is almost certainly a mis-estimate. Run ANALYZE, increase statistics targets on the relevant columns, or in extreme cases use SET enable_nestloop = off for that specific query session to force the planner away from the bad plan.
Follow-up: Can you force a specific join type in PostgreSQL? Should you?You can disable join types using SET enable_hashjoin = off, SET enable_mergejoin = off, etc. This is a blunt instrument — I would only use it as a temporary workaround with a comment explaining why and a ticket to fix the underlying statistics issue. A better approach is pg_hint_plan extension which lets you annotate individual queries with hints like /*+ HashJoin(orders users) */ without affecting the global planner behavior. The principle: fix the inputs (statistics, indexes) rather than overriding the outputs (plan choices).
Strong Answer:
  • You do not. Real-time aggregation over 500M rows is fundamentally incompatible with sub-second response times on a single PostgreSQL instance. The correct approach is pre-aggregation at an appropriate granularity.
  • Strategy 1: Materialized views with periodic refresh. Create a materialized view that pre-computes daily/hourly aggregates. Refresh it every N minutes with REFRESH MATERIALIZED VIEW CONCURRENTLY. The dashboard queries the materialized view (which has thousands of rows, not millions) and responds in milliseconds. The tradeoff is data staleness bounded by the refresh interval.
  • Strategy 2: Summary tables maintained by triggers. For true real-time needs, maintain a summary table that is updated incrementally. An AFTER INSERT trigger on the events table increments the counter in the summary table. The dashboard reads the summary table. The tradeoff is write overhead (every insert triggers a summary update) and complexity in handling UPDATE/DELETE on the source table.
  • Strategy 3: BRIN-indexed partitioned table with partition pruning. If the dashboard always filters by time range (e.g., last 30 days), partition the table by month. The query only scans 1-2 partitions instead of the full 500M rows. Combined with a BRIN index on the timestamp, this can reduce the scan to a manageable subset.
  • Strategy 4: External OLAP engine. For truly complex real-time analytics, replicate the data to ClickHouse, Apache Druid, or TimescaleDB which are purpose-built for analytical aggregations. PostgreSQL’s logical replication can feed these systems with minimal lag.
Follow-up: What is the difference between REFRESH MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW CONCURRENTLY?Without CONCURRENTLY, the refresh takes an ACCESS EXCLUSIVE lock on the materialized view — all readers block until the refresh completes. For a large aggregation, that could be minutes of downtime for the dashboard. With CONCURRENTLY, PostgreSQL builds the new data in a temporary location and then swaps it in atomically, allowing readers to continue querying the old data during the rebuild. The cost: CONCURRENTLY requires a unique index on the materialized view and takes roughly 2x the time because it must diff the old and new versions. Always use CONCURRENTLY in production.