Skip to main content

Performance Engineering: The Bottleneck Analysis Guide

Performance engineering is not about “tuning parameters”—it is a systematic process of identifying the bottleneck in a complex, layered system and applying the minimum change required to shift that bottleneck to a more scalable resource.
Target Audience: Staff/Principal Database Engineers
Core Methodology: The USE Method + Wait Event Profiling
Key Goal: Move from “guessing” to “evidential analysis”

The Performance Engineering Mindset

A principal engineer view of performance follows three laws:
  1. The Law of Amdahl: The speedup of a system is limited by its serial components.
  2. The Law of Little: Average number of items in a system = arrival rate × average time spent in system (L = λW).
  3. The Law of Bottlenecks: Performance is always limited by the most saturated resource. Optimizing anything else is a waste of time.

Phase 0: Triage & The Four Golden Signals

Before diving into profiles, you must identify the symptom. We use the “Four Golden Signals” to determine if the system is healthy.
SignalMetricPrincipal Question
LatencyP99 Response TimeIs the database slow, or is it just a few “outlier” queries?
ThroughputQueries Per Second (QPS)Are we handling the load, or is the engine stalling?
ErrorsRollback/Conflict RateAre we failing due to deadlocks or serialization issues?
SaturationResource % (CPU/IO)How much “headroom” is left before a total collapse?

Triage Query: The 30-Second Snapshot

-- Identify the "Current Reality" of the engine
SELECT 
    state,
    wait_event_type,
    wait_event,
    count(*) as count,
    round(100.0 * count(*) / sum(count(*)) over(), 2) as pct
FROM pg_stat_activity
WHERE state != 'idle'
GROUP BY 1, 2, 3
ORDER BY 4 DESC;
If wait_event_type is NULL for most sessions, you are CPU Bound. If it is ‘IO’, you are Disk Bound. If it is ‘LWLock’, you are Contention Bound.

Phase 1: Resource Saturation (The USE Method)

Developed by Brendan Gregg, the USE Method should be applied to every resource (CPU, Memory, Disk, Network).

1.1 CPU: Computation vs. Context Switching

Utilization: top or mpstat. High utilization (>80%) is expected in high-performance DBs. Saturation: Look at the Run Queue Length (load average). If Load > Number of Cores, tasks are waiting for a CPU slot. Principal Observation: High CPU with low throughput often indicates Spinlock Contention or Excessive Context Switching.
# Check context switches per second
vmstat 1
# 'cs' column > 100k/sec usually indicates a problem (e.g., too many connections)

1.2 Memory: The Buffer Hit Ratio Myth

Utilization: free -m. Saturation: Scanning/Paging. Look at si/so (swap in/out) in vmstat. Staff Engineer Insight: A “99% Buffer Cache Hit Ratio” can be a trap. If you have 1,000,000 hits and 10,000 reads, your ratio is 99%. But if those 10,000 reads are random I/Os on a slow disk, your latency will be terrible. Focus on the absolute number of reads/sec, not the ratio.

1.3 Disk: IOPS vs. Throughput

Utilization: %util in iostat. Saturation: await (Average Wait Time).
iostat -xy 1
# If await > 5ms on an NVMe drive, the storage is saturated.
# If %util is 100% but rkB/s is low, you have a Random I/O bottleneck.

Phase 2: Database Internal Waits

Phase 2: Database Internal Waits

If resources are not saturated but latency is high, the database is likely waiting on internal synchronization.

2.1 The Hierarchy of Waits

Wait ClassPrincipal ArchetypeRoot Cause
LWLockbuffer_contentShared Buffer Contention. Too many backends trying to access the same page (Hot Block).
LWLockWALWriteLockWAL Serializer Bottleneck. Too many small transactions; slow fsync to WAL disk.
LockrelationCatalog Contention. An uncommitted DDL (e.g., ALTER TABLE) is blocking all DML.
IODataFileReadCache Miss. Query is pulling cold data from disk into the buffer pool.

2.2 Advanced: ProcArrayLock Contention

In high-concurrency systems (1000+ TPS), ProcArrayLock often becomes the global bottleneck. This lock is used when a transaction needs to compute a “Snapshot” of the system (visibility). Staff Engineer Insight: If you see ProcArrayLock as a top wait event, scaling CPU won’t help. You must reduce the transaction rate, use a connection pooler (PgBouncer) to reduce backend count, or move to a version of Postgres (14+) with snapshot caching optimizations.

Phase 3: Code-Level Profiling (Flamegraphs)

When you are CPU Bound and cannot find a simple query fix, you must look at what the CPU is actually doing inside the Postgres binary.

3.1 Capturing the Flamegraph

# Profile all postgres backends for 60 seconds
perf record -F 99 -g -p $(pgrep -d',' postgres) -- sleep 60

# Generate the SVG
perf script | ./stackcollapse-perf.pl | ./flamegraph.pl > db_profile.svg

3.2 Interpreting “Staff-Level” Patterns

PatternCode PathMeaning
Wide Plateauhash_searchThe Hash Table (for Joins or Aggs) is huge and spending time in bucket lookup. Consider work_mem tuning.
Wide Plateauslot_deform_tupleThe Defragmentation Tax. Postgres spends massive CPU “deforming” rows to find a column at the end of a wide table. Move heavily accessed columns to the front.
Jagged TowersExecQualMassive expression evaluation. Likely a complex WHERE clause with non-indexed functions.
Wide TowerLWLockAcquireLock Contention. The engine is spending more time waiting for metadata than doing work.

Phase 4: Query-Level Optimization (The 80/20 Rule)

You cannot optimize every query. A principal engineer identifies the 20% of queries causing 80% of the load.

4.1 The pg_stat_statements Strategy

-- Identify queries with the highest I/O impact
SELECT 
    query,
    calls,
    round(total_exec_time::numeric, 2) as total_ms,
    shared_blks_read + shared_blks_written as io_blocks,
    round(100.0 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0), 2) as hit_ratio
FROM pg_stat_statements
ORDER BY io_blocks DESC
LIMIT 10;
Staff Engineer Insight: Look for queries with high calls but low rows. This indicates index-thrashing or unnecessary repeat work. A query called 10,000 times a second that takes 1ms is often more dangerous than a 10s query called once an hour.

4.2 The “Temp File” Trap

If work_mem is too small, Postgres spills sorts and hashes to disk. This is a silent performance killer.
SELECT query, temp_blks_read, temp_blks_written
FROM pg_stat_statements
WHERE temp_blks_read > 0
ORDER BY 2 DESC;
Solution: Increase work_mem for specific users or queries, or optimize the join to reduce result set size.

Phase 5: Concurrency & Lock Contention

5.1 The Blocker Tree

Use this recursive query to find the “Root Blocker” causing a pile-up.
WITH RECURSIVE blockers AS (
  SELECT 
    pid, 
    pg_blocking_pids(pid) AS pids 
  FROM pg_stat_activity 
  WHERE cardinality(pg_blocking_pids(pid)) > 0
  UNION ALL
  SELECT 
    s.pid, 
    pg_blocking_pids(s.pid) 
  FROM pg_stat_activity s 
  JOIN blockers b ON s.pid = ANY(b.pids)
)
SELECT 
  pid, 
  state, 
  wait_event, 
  query 
FROM pg_stat_activity 
WHERE pid IN (SELECT unnest(pids) FROM blockers)
  AND pid NOT IN (SELECT pid FROM blockers);

5.2 Common Contention Archetypes

  1. The Hot Row: Millions of updates to a single row (e.g., a counter). Solution: Use batching or CRDT-style additive counters.
  2. The Foreign Key Lock: Updating a child table requires a shared lock on the parent. If many children update simultaneously, the parent becomes a bottleneck.
  3. The Index Build: Creating an index without CONCURRENTLY locks the whole table for writes.

Phase 6: Production Troubleshooting Runbook

SymptomInitial CheckProbable Root Cause
Spiking Latencypg_stat_activityLock pile-up or a “Plan Flip” (CBO chose a bad plan due to stale stats).
High Load Avgvmstat / topHigh context switching or Spinlock contention.
I/O SaturationiostatCheckpoint spike or a huge Sequential Scan.
Out of Memorydmesgwork_mem is set too high for the connection count.

Summary: The Principal’s Workflow

  1. Start at the OS: Is there a resource (CPU/IO) at 100%? (USE Method)
  2. Look at the Waits: Is the database waiting on itself? (pg_stat_activity)
  3. Find the Culprit: Which queries are causing the most impact? (pg_stat_statements)
  4. Profile if Needed: If CPU is high but queries look fine, use perf and Flamegraphs.
  5. Optimize Evidentially: Change one variable, measure the throughput shift, and repeat.

5.2 PgBouncer Configuration

# pgbouncer.ini for high performance

[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb

[pgbouncer]
# Connection pooling mode
pool_mode = transaction  # Best for most workloads

# Pool sizing
default_pool_size = 20      # Connections per user/database pair
min_pool_size = 5           # Keep some connections warm
reserve_pool_size = 5       # Extra for bursts
reserve_pool_timeout = 3    # How long before using reserve

# Connection limits
max_client_conn = 1000      # Max client connections to pgbouncer
max_db_connections = 50     # Max connections to actual database

# Timeouts
query_timeout = 30          # Kill query if longer than 30s
client_idle_timeout = 600   # Close idle clients after 10min
server_idle_timeout = 60    # Close idle server connections

# Performance
tcp_keepalive = 1
tcp_keepidle = 60
tcp_keepintvl = 10
tcp_keepcnt = 3

# Logging
log_connections = 0         # Reduce log spam in production
log_disconnections = 0
log_pooler_errors = 1

# Admin
admin_users = pgbouncer_admin
stats_users = pgbouncer_stats

5.3 Memory Configuration

-- Memory settings for different workload types

-- OLTP workload (many small queries)
shared_buffers = '8GB'              -- 25% of RAM (up to ~8GB)
effective_cache_size = '24GB'       -- 75% of RAM
work_mem = '64MB'                   -- Conservative for many connections
maintenance_work_mem = '2GB'        -- For vacuum, index builds
wal_buffers = '64MB'

-- OLAP workload (few large queries)
shared_buffers = '32GB'             -- Can be larger for analytics
effective_cache_size = '96GB'       
work_mem = '2GB'                    -- Larger for complex queries
maintenance_work_mem = '4GB'
max_parallel_workers_per_gather = 8

-- Work_mem calculation for hash joins:
/*
 * Memory needed = (inner_rows × tuple_width × hash_overhead)
 * 
 * For 1M rows, 100 bytes/row:
 * Memory = 1,000,000 × 100 × 1.5 ≈ 150MB
 * 
 * Set work_mem = 256MB to fit with margin
 * 
 * Warning: work_mem is PER OPERATION
 * Complex query with 5 sorts/hashes = 5 × work_mem
 * With 100 connections: potential 5 × 256MB × 100 = 128GB!
 */

Part 6: Lock Contention Analysis

6.1 Lock Monitoring

-- Current locks (who's blocking whom)
SELECT 
    blocked_locks.pid AS blocked_pid,
    blocked_activity.usename AS blocked_user,
    blocking_locks.pid AS blocking_pid,
    blocking_activity.usename AS blocking_user,
    blocked_activity.query AS blocked_statement,
    blocking_activity.query AS blocking_statement
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity 
    ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks 
    ON blocking_locks.locktype = blocked_locks.locktype
    AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
    AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
    AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
    AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
    AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
    AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
    AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
    AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
    AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
    AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity 
    ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;

-- Lock types and their conflicts
/*
┌─────────────────────────────────────────────────────────────────────────────┐
│                      LOCK CONFLICT MATRIX                                    │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│ Requested →   ACCESS  ROW    ROW    SHARE  SHARE  SHARE   EXCL   ACCESS    │
│ Held ↓        SHARE   SHARE  EXCL   UPDATE ROW    UPDATE         EXCL      │
│                                            EXCL   EXCL                       │
│ ─────────────────────────────────────────────────────────────────────────── │
│ ACCESS SHARE    -       -      -      -      -      -       -      X        │
│ ROW SHARE       -       -      -      -      -      -       X      X        │
│ ROW EXCLUSIVE   -       -      -      -      -      X       X      X        │
│ SHARE UPDATE EX -       -      -      -      X      X       X      X        │
│ SHARE           -       -      -      -      X      X       X      X        │
│ SHARE ROW EXCL  -       -      X      X      X      X       X      X        │
│ EXCLUSIVE       -       X      X      X      X      X       X      X        │
│ ACCESS EXCL     X       X      X      X      X      X       X      X        │
│                                                                              │
│ X = Conflict (must wait)                                                    │
│ - = No conflict (can proceed)                                               │
│                                                                              │
│ Common operations and their locks:                                          │
│ • SELECT: ACCESS SHARE                                                      │
│ • INSERT/UPDATE/DELETE: ROW EXCLUSIVE                                       │
│ • CREATE INDEX CONCURRENTLY: SHARE UPDATE EXCLUSIVE                         │
│ • VACUUM FULL, REINDEX: ACCESS EXCLUSIVE                                    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
*/

6.2 Deadlock Analysis

-- Enable deadlock logging
-- In postgresql.conf:
-- log_lock_waits = on
-- deadlock_timeout = 1s

-- Check logs for deadlock patterns:
/*
LOG:  process 12345 detected deadlock while waiting for ShareLock on 
      transaction 987654 after 1000.123 ms
DETAIL:  Process holding the lock: 12346. Wait queue: .
CONTEXT:  while updating tuple (0,42) in relation "orders"
STATEMENT:  UPDATE orders SET status = 'shipped' WHERE id = 100

LOG:  process 12345 still waiting for ShareLock on transaction 987654 
      after 1000.456 ms
DETAIL:  Process holding the lock: 12346. Wait queue: 12345.
CONTEXT:  while updating tuple (0,42) in relation "orders"
STATEMENT:  UPDATE orders SET status = 'shipped' WHERE id = 100
*/

-- Analyze deadlock from logs:
/*
 Deadlock scenario:
 
 Transaction A:                  Transaction B:
 UPDATE orders WHERE id=100      UPDATE orders WHERE id=200
 (holds lock on row 100)         (holds lock on row 200)
 ...                             ...
 UPDATE orders WHERE id=200      UPDATE orders WHERE id=100
 (waits for B's lock)            (waits for A's lock)
 
 ↓ DEADLOCK
 
 Solution: Consistent lock ordering
 - Always acquire locks in same order (e.g., by id ASC)
 - Or use SELECT ... FOR UPDATE with explicit ordering
*/

Part 7: Production Troubleshooting Runbook

7.1 High CPU Usage

-- Step 1: Identify CPU-heavy queries
SELECT 
    pid,
    usename,
    state,
    wait_event_type,
    wait_event,
    left(query, 100)
FROM pg_stat_activity
WHERE state = 'active'
  AND wait_event_type IS NULL  -- Not waiting = actually running
ORDER BY backend_start;

-- Step 2: Check pg_stat_statements
SELECT 
    query,
    calls,
    mean_exec_time,
    total_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

-- Step 3: Profile with perf
-- perf record -g -p <pid> -- sleep 30
-- perf report

-- Common causes:
-- • Missing index (ExecSeqScan in profile)
-- • Complex expression evaluation (ExecEvalExpr)
-- • Hash table operations (hash_search)
-- • Sort operations (tuplesort_*)

7.2 High I/O Wait

-- Step 1: Check wait events
SELECT 
    wait_event_type,
    wait_event,
    count(*)
FROM pg_stat_activity
WHERE state = 'active'
  AND wait_event_type IN ('IO', 'BufferPin')
GROUP BY wait_event_type, wait_event;

-- Step 2: Check buffer cache hit ratio
SELECT 
    sum(heap_blks_read) as heap_read,
    sum(heap_blks_hit) as heap_hit,
    sum(heap_blks_hit) / 
        nullif(sum(heap_blks_hit) + sum(heap_blks_read), 0) as ratio
FROM pg_statio_user_tables;
-- Should be > 0.99 for OLTP

-- Step 3: Check checkpoint frequency
SELECT 
    checkpoints_timed,
    checkpoints_req,  -- Requested = forced = BAD
    buffers_checkpoint,
    buffers_backend   -- Backend writes = BAD, should be 0
FROM pg_stat_bgwriter;

-- Step 4: System level
-- iostat -xz 1
-- iotop

-- Common causes:
-- • shared_buffers too small
-- • Checkpoint too aggressive (checkpoint_completion_target low)
-- • seq scans on large tables
-- • Bloated tables need VACUUM

7.3 Lock Contention

-- Step 1: Check current locks
SELECT 
    locktype,
    relation::regclass,
    mode,
    granted,
    pid,
    left(query, 50) as query
FROM pg_locks l
JOIN pg_stat_activity a ON l.pid = a.pid
WHERE NOT granted
ORDER BY relation;

-- Step 2: Find blocking queries
SELECT * FROM pg_blocking_pids(12345);  -- Replace with waiting PID

-- Step 3: Check for long-running transactions
SELECT 
    pid,
    now() - xact_start as xact_duration,
    now() - query_start as query_duration,
    state,
    left(query, 60)
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
ORDER BY xact_start;

-- Step 4: Kill if necessary
SELECT pg_terminate_backend(12345);  -- Last resort

-- Prevention:
-- • Use statement_timeout and idle_in_transaction_session_timeout
-- • Use lock_timeout
-- • Avoid long transactions in OLTP

Part 8: Interview Questions

Staff/Principal Level

Systematic Approach:
  1. Immediate checks (< 1 min)
    • pg_stat_activity: What’s running, what’s waiting?
    • pg_locks: Any blocking?
    • System metrics: CPU, I/O, memory via monitoring
  2. Compare to baseline
    • pg_stat_statements: Which queries slowed down?
    • Was there a deploy? Schema change?
    • Traffic pattern change?
  3. Drill down
    • EXPLAIN ANALYZE the slow queries
    • Check for plan changes (pg_stat_plans if available)
    • Statistics stale? Run ANALYZE
  4. System level if needed
    • iostat: I/O saturation?
    • vmstat: Memory pressure?
    • perf/flamegraph: CPU profile
  5. Common root causes
    • Autovacuum not keeping up
    • Statistics out of date → bad plans
    • Connection saturation
    • Lock contention from long transaction
    • Checkpoint storm
Architecture:
  1. Metrics Collection
    • pg_stat_* views polled every 10-60s
    • pg_stat_statements for query patterns
    • OS metrics (node_exporter)
    • Custom metrics via pgbouncer stats
  2. Storage
    • Time-series database (Prometheus, InfluxDB, TimescaleDB)
    • Retention: 15 days high-res, 1 year aggregated
  3. Dashboards
    • Golden signals: latency, throughput, errors, saturation
    • Per-query latency percentiles
    • Replication lag
    • Vacuum progress
    • Connection pool stats
  4. Alerting
    • Latency P99 > threshold
    • Replication lag > 30s
    • Connections > 80% max
    • Disk usage > 80%
    • Long-running transactions > 10 min
  5. Query Insights
    • Track query fingerprints over time
    • Detect plan regressions
    • Identify optimization opportunities
Strategy:
  1. Read Scaling
    • Read replicas (streaming replication)
    • Load balancer (HAProxy, pgpool) with read/write splitting
    • Connection pooling (PgBouncer) in front of each node
    • Caching layer (Redis) for hot data
  2. Write Scaling (harder)
    • Vertical scaling first (bigger instance)
    • Application-level sharding
    • Citus for distributed PostgreSQL
    • Partitioning for time-series data
  3. Connection Management
    • PgBouncer with transaction pooling
    • Aim for ~50 actual connections per node
    • 1000s of clients → PgBouncer → 50 DB connections
  4. Query Optimization
    • Prepared statements (reduce parsing)
    • Efficient indexes
    • Minimize lock contention
  5. Infrastructure
    • NVMe storage for IOPS
    • Sufficient RAM for working set
    • Network: 10+ Gbps between components
Reality check: 1M QPS usually requires:
  • Many read replicas (10-50+)
  • Aggressive caching
  • Sharding for writes
  • Not all queries hit the database

Production Troubleshooting Scenario 1: High-CPU Query Analysis

Incident Report

Symptoms: Database server showing 95%+ CPU utilization. Application reporting 5x normal query latency.
Impact: User-facing API timing out, customer complaints escalating.
Time to Resolve: Target < 15 minutes

Investigation Workflow

Step 1: Identify CPU-Intensive Queries (2 minutes)
-- Find queries consuming the most CPU time
SELECT 
    left(query, 100) as query_preview,
    calls,
    total_exec_time::numeric(12,2) as total_exec_ms,
    mean_exec_time::numeric(10,2) as mean_exec_ms,
    (total_exec_time / sum(total_exec_time) OVER ()) * 100 as pct_total_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

-- Check currently running expensive queries
SELECT 
    pid,
    usename,
    application_name,
    now() - query_start as duration,
    state,
    left(query, 100) as query_preview
FROM pg_stat_activity
WHERE state = 'active' 
  AND query NOT LIKE '%pg_stat_activity%'
ORDER BY query_start;
What to Look For:
  • Queries with high total_exec_time but reasonable mean_exec_time → high call frequency issue
  • Queries with high mean_exec_time → individual query performance issue
  • Long-running queries that started recently → may be holding locks
Step 2: Analyze the Problematic Query (3 minutes)
-- Get the full query text if truncated
SELECT query FROM pg_stat_statements WHERE queryid = <identified_queryid>;

-- Check execution plan
EXPLAIN (ANALYZE, BUFFERS, VERBOSE) 
<paste the problematic query here>;
Common High-CPU Patterns:
  1. Full Table Scans on Large Tables
    • Look for Seq Scan on tables with millions of rows
    • Missing index or statistics out of date
  2. Nested Loop Joins with High Row Estimates
    • Nested Loop on large result sets is CPU-intensive
    • May need hash join or merge join instead
  3. Complex Aggregations Without Indexes
    • GROUP BY on non-indexed columns
    • Window functions scanning entire tables
  4. Regular Expression or String Operations
    • LIKE '%pattern%' (can’t use indexes)
    • regexp_match on large text fields
Step 3: Check for Statistics Issues (2 minutes)
-- Check when tables were last analyzed
SELECT 
    schemaname,
    relname,
    last_analyze,
    last_autoanalyze,
    n_live_tup,
    n_dead_tup
FROM pg_stat_user_tables
WHERE relname IN (
    -- List tables from the problematic query
    'users', 'orders', 'products'
)
ORDER BY last_analyze NULLS FIRST;

-- If statistics are stale, run ANALYZE
ANALYZE users;
ANALYZE orders;
-- Check if plan improves
Step 4: Immediate Mitigation (3 minutes) Option A: Kill the Offending Queries
-- Identify specific PIDs
SELECT 
    pid,
    usename,
    left(query, 80),
    now() - query_start as duration
FROM pg_stat_activity
WHERE query ILIKE '%<pattern from step 1>%'
  AND state = 'active';

-- Terminate them
SELECT pg_terminate_backend(pid) FROM pg_stat_activity 
WHERE query ILIKE '%<pattern>%' AND state = 'active';
Option B: Add Missing Index
-- If Seq Scan is the issue and WHERE clause is clear
CREATE INDEX CONCURRENTLY idx_users_created_at ON users(created_at);
-- CONCURRENTLY prevents table locking
Option C: Query Rewrite (if possible)
-- Example: Replace full table scan with indexed lookup
-- Bad:
SELECT * FROM orders WHERE status IN ('pending', 'processing');

-- Better:
SELECT * FROM orders WHERE status = 'pending'
UNION ALL
SELECT * FROM orders WHERE status = 'processing';
-- If there's an index on status, this uses it twice instead of seq scan
Step 5: Monitor and Verify (5 minutes)
-- Watch CPU and query stats
SELECT 
    sum(total_exec_time) as total_time,
    sum(calls) as total_calls
FROM pg_stat_statements;

-- Check every 30 seconds, verify total_time growth rate decreased
-- Monitor system CPU with `top` or monitoring dashboard

Post-Incident Analysis

Root Cause Categories:
  1. Missing Index: Application added new query pattern without proper indexing
  2. Statistics Drift: Large data changes made planner choose wrong plan
  3. Query Regression: Application deployed with inefficient query
  4. Load Spike: Sudden traffic increase exposed existing slow query
Preventive Measures:
  • Enable auto_explain with log_min_duration to catch slow queries before they become incidents
  • Set up alerts on CPU >80% sustained for >2 minutes
  • Run ANALYZE after bulk data imports
  • Review new queries in development with EXPLAIN ANALYZE
  • Implement query review process before production deployment

Production Troubleshooting Scenario 2: I/O-Bound Workload with Low Cache Hit Ratio

Incident Report

Symptoms: Response times degraded 10x. Disk I/O is saturated at 100%. Cache hit ratio dropped from 99% to 85%.
Impact: All database-dependent services experiencing timeouts.
Time to Resolve: Target < 20 minutes

Investigation Workflow

Step 1: Confirm I/O Saturation (1 minute)
# Check disk I/O utilization (Linux)
iostat -x 1 5

# Look for:
# - %util near 100% (disk saturated)
# - await > 10ms (high latency)
# - r/s and w/s (read/write operations per second)

# PostgreSQL wait events
SELECT 
    wait_event_type,
    wait_event,
    count(*) as waiting_backends
FROM pg_stat_activity
WHERE wait_event IS NOT NULL
GROUP BY wait_event_type, wait_event
ORDER BY count(*) DESC;

-- Look for: DataFileRead, DataFileWrite, WALWrite
Step 2: Identify Cache-Unfriendly Queries (3 minutes)
-- Queries with worst buffer hit ratios
SELECT 
    left(query, 100) as query_preview,
    calls,
    shared_blks_read as disk_reads,
    shared_blks_hit as cache_hits,
    shared_blks_read + shared_blks_hit as total_blks,
    round(100.0 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0), 2) as cache_hit_pct
FROM pg_stat_statements
WHERE shared_blks_read > 1000  -- Significant disk activity
ORDER BY shared_blks_read DESC
LIMIT 20;

-- Identify specific tables being read from disk
SELECT 
    schemaname,
    relname as table_name,
    heap_blks_read as disk_reads,
    heap_blks_hit as cache_hits,
    heap_blks_read + heap_blks_hit as total_accesses,
    round(100.0 * heap_blks_hit / nullif(heap_blks_hit + heap_blks_read, 0), 2) as hit_ratio
FROM pg_statio_user_tables
WHERE heap_blks_read > 0
ORDER BY heap_blks_read DESC
LIMIT 20;
Step 3: Analyze Query Plans for I/O Issues (4 minutes)
-- Get explain plan with buffer statistics
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
<paste problematic query>;

-- Look for:
-- • "Buffers: shared read=XXXXX" - high numbers mean disk reads
-- • Seq Scans on large tables
-- • Index scans that read too many pages
-- • Sorts or aggregations spilling to disk
Common I/O Problem Patterns:
  1. Large Sequential Scans
    • Reading entire large tables from disk
    • Solution: Add appropriate indexes or partition data
  2. Index Scans with Poor Selectivity
    • Index scan fetching >10% of table (seq scan might be better)
    • Solution: Consider covering index or change query
  3. Work Memory Exhaustion
    • Sorts/hashes spilling to temp files
    • Look for “external merge disk” in plan
    • Solution: Increase work_mem
  4. Table Bloat
    • Dead tuples force reading more pages
    • Solution: Run VACUUM or VACUUM FULL
Step 4: Check for Bloat and Fragmentation (3 minutes)
-- Identify bloated tables
SELECT 
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as total_size,
    n_live_tup,
    n_dead_tup,
    round(100.0 * n_dead_tup / nullif(n_live_tup + n_dead_tup, 0), 2) as dead_tup_pct,
    last_vacuum,
    last_autovacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;

-- Check if autovacuum is running
SELECT * FROM pg_stat_progress_vacuum;
Step 5: Immediate Mitigation (5 minutes) Option A: Reduce Query Load
-- Kill or throttle the I/O-heavy queries
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE query ILIKE '%<identified pattern>%'
  AND state = 'active'
  AND backend_type = 'client backend';

-- Or set a statement timeout for that session
ALTER DATABASE mydb SET statement_timeout = '5s';
Option B: Emergency VACUUM
-- If bloat is the issue
VACUUM (ANALYZE, VERBOSE) problematic_table;

-- Monitor progress
SELECT * FROM pg_stat_progress_vacuum;
Option C: Add Targeted Index
-- If missing index is clear from EXPLAIN
CREATE INDEX CONCURRENTLY idx_orders_user_created 
ON orders(user_id, created_at) 
INCLUDE (status, total);  -- Covering index to avoid heap lookups
Option D: Temporarily Increase Shared Buffers
-- If you have available RAM and can restart
-- postgresql.conf:
-- shared_buffers = 8GB  (increase from current value)
-- effective_cache_size = 24GB  (hint for planner)

-- Or for immediate session-level help:
SET work_mem = '256MB';  -- For current session only
Step 6: Monitor Recovery (4 minutes)
-- Watch cache hit ratio improve
SELECT 
    sum(heap_blks_hit) as cache_hits,
    sum(heap_blks_read) as disk_reads,
    round(100.0 * sum(heap_blks_hit) / 
          nullif(sum(heap_blks_hit) + sum(heap_blks_read), 0), 2) as hit_ratio
FROM pg_statio_user_tables;

-- Check every 30 seconds, target >99%

-- Monitor I/O wait events
SELECT 
    wait_event_type,
    count(*) as waiting
FROM pg_stat_activity
WHERE wait_event_type = 'IO'
GROUP BY wait_event_type;
-- Should decrease significantly

Post-Incident Analysis

Root Cause Categories:
  1. Working Set Exceeds Cache: Data accessed frequently doesn’t fit in shared_buffers
  2. Sudden Query Change: New query pattern reads cold data from disk
  3. Bloat Accumulation: VACUUM not keeping up, forcing reads of many dead-tuple pages
  4. Cache Eviction Storm: Massive seq scan pushed frequently-used data out of cache
Preventive Measures:
  • Right-size shared_buffers: Aim for 25-40% of system RAM
  • Monitor bloat proactively: Set up alerts when dead_tup_pct >20%
  • Tune autovacuum aggressively: Lower autovacuum_vacuum_scale_factor
  • Implement query review: Catch full table scans in development
  • Set up I/O alerts: Alert when disk await >10ms sustained or cache hit ratio under 95%
  • Consider read replicas: Offload analytical queries to dedicated instances

Module Mastery Checklist

Complete this checklist to confirm you’ve mastered Performance Engineering:

Performance Investigation Framework

  • Apply the 6-layer performance stack to diagnose issues systematically
  • Implement the four golden signals for database monitoring
  • Conduct root cause analysis using appropriate metrics at each layer
  • Triage production incidents within the first 5 minutes
  • Communicate performance findings to technical and non-technical stakeholders

Statistics and Monitoring

  • Configure and query pg_stat_statements for query performance analysis
  • Interpret wait event statistics to identify resource bottlenecks
  • Use pg_stat_activity to monitor active connections and queries
  • Analyze pg_stat_user_tables for table-level performance metrics
  • Build custom monitoring queries for production dashboards

CPU Profiling and System Analysis

  • Use perf and flamegraphs to profile PostgreSQL processes
  • Identify hot code paths and CPU-intensive operations
  • Interpret flamegraph patterns for different workload types
  • Analyze system-level CPU metrics and context switches
  • Apply profiling insights to optimize queries

I/O Performance

  • Monitor and optimize buffer cache hit ratios
  • Identify queries causing excessive physical I/O
  • Diagnose disk saturation using iostat and wait events
  • Configure shared_buffers and effective_cache_size appropriately
  • Implement strategies to reduce I/O amplification

Query Optimization

  • Read and interpret EXPLAIN and EXPLAIN ANALYZE output
  • Identify suboptimal scan types and join algorithms
  • Recognize when statistics are stale and run ANALYZE
  • Force better query plans when necessary
  • Optimize queries to use appropriate indexes

Concurrency and Locking

  • Diagnose lock contention using pg_locks and wait events
  • Identify and resolve deadlocks
  • Understand lock mode compatibility
  • Optimize transaction scope to minimize lock duration
  • Debug serialization failures in high-concurrency scenarios

Production Troubleshooting

  • Diagnose and resolve high-CPU query workloads
  • Fix I/O-bound performance degradation
  • Respond to sudden query slowdowns systematically
  • Debug connection pool exhaustion
  • Identify and remediate bloat-related performance issues

Configuration Tuning

  • Tune shared_buffers based on workload and available RAM
  • Configure work_mem to prevent disk spills
  • Optimize checkpoint settings for write performance
  • Tune autovacuum for production workloads
  • Apply connection pooling strategies effectively

Practical Application

  • Complete production troubleshooting scenarios
  • Build a monitoring dashboard for PostgreSQL
  • Conduct performance reviews of new queries before deployment
  • Implement proactive alerting for performance metrics
  • Document performance baselines and SLOs for production systems

Next Steps