Performance Engineering: The Bottleneck Analysis Guide

Performance engineering is not about “tuning parameters”—it is a systematic process of identifying the bottleneck in a complex, layered system and applying the minimum change required to shift that bottleneck to a more scalable resource.

Target Audience: Staff/Principal Database Engineers
Core Methodology: The USE Method + Wait Event Profiling
Key Goal: Move from “guessing” to “evidential analysis”

The Performance Engineering Mindset

A principal engineer view of performance follows three laws:

The Law of Amdahl: The speedup of a system is limited by its serial components.
The Law of Little: Average number of items in a system = arrival rate × average time spent in system (L = λW).
The Law of Bottlenecks: Performance is always limited by the most saturated resource. Optimizing anything else is a waste of time.

Phase 0: Triage & The Four Golden Signals

Before diving into profiles, you must identify the symptom. We use the “Four Golden Signals” to determine if the system is healthy.

Signal	Metric	Principal Question
Latency	P99 Response Time	Is the database slow, or is it just a few “outlier” queries?
Throughput	Queries Per Second (QPS)	Are we handling the load, or is the engine stalling?
Errors	Rollback/Conflict Rate	Are we failing due to deadlocks or serialization issues?
Saturation	Resource % (CPU/IO)	How much “headroom” is left before a total collapse?

Triage Query: The 30-Second Snapshot

-- Identify the "Current Reality" of the engine
SELECT 
    state,
    wait_event_type,
    wait_event,
    count(*) as count,
    round(100.0 * count(*) / sum(count(*)) over(), 2) as pct
FROM pg_stat_activity
WHERE state != 'idle'
GROUP BY 1, 2, 3
ORDER BY 4 DESC;

If wait_event_type is NULL for most sessions, you are CPU Bound. If it is ‘IO’, you are Disk Bound. If it is ‘LWLock’, you are Contention Bound.

Phase 1: Resource Saturation (The USE Method)

Developed by Brendan Gregg, the USE Method should be applied to every resource (CPU, Memory, Disk, Network).

1.1 CPU: Computation vs. Context Switching

Utilization: top or mpstat. High utilization (>80%) is expected in high-performance DBs. Saturation: Look at the Run Queue Length (load average). If Load > Number of Cores, tasks are waiting for a CPU slot. Principal Observation: High CPU with low throughput often indicates Spinlock Contention or Excessive Context Switching.

# Check context switches per second
vmstat 1
# 'cs' column > 100k/sec usually indicates a problem (e.g., too many connections)

1.2 Memory: The Buffer Hit Ratio Myth

Utilization: free -m. Saturation: Scanning/Paging. Look at si/so (swap in/out) in vmstat. Staff Engineer Insight: A “99% Buffer Cache Hit Ratio” can be a trap. If you have 1,000,000 hits and 10,000 reads, your ratio is 99%. But if those 10,000 reads are random I/Os on a slow disk, your latency will be terrible. Focus on the absolute number of reads/sec, not the ratio.

1.3 Disk: IOPS vs. Throughput

Utilization: %util in iostat. Saturation: await (Average Wait Time).

iostat -xy 1
# If await > 5ms on an NVMe drive, the storage is saturated.
# If %util is 100% but rkB/s is low, you have a Random I/O bottleneck.

Phase 2: Database Internal Waits

If resources are not saturated but latency is high, the database is likely waiting on internal synchronization.

2.1 The Hierarchy of Waits

Wait Class	Principal Archetype	Root Cause
LWLock	`buffer_content`	Shared Buffer Contention. Too many backends trying to access the same page (Hot Block).
LWLock	`WALWriteLock`	WAL Serializer Bottleneck. Too many small transactions; slow fsync to WAL disk.
Lock	`relation`	Catalog Contention. An uncommitted DDL (e.g., `ALTER TABLE`) is blocking all DML.
IO	`DataFileRead`	Cache Miss. Query is pulling cold data from disk into the buffer pool.

2.2 Advanced: ProcArrayLock Contention

In high-concurrency systems (1000+ TPS), ProcArrayLock often becomes the global bottleneck. This lock is used when a transaction needs to compute a “Snapshot” of the system (visibility). Staff Engineer Insight: If you see ProcArrayLock as a top wait event, scaling CPU won’t help. You must reduce the transaction rate, use a connection pooler (PgBouncer) to reduce backend count, or move to a version of Postgres (14+) with snapshot caching optimizations.

Phase 3: Code-Level Profiling (Flamegraphs)

When you are CPU Bound and cannot find a simple query fix, you must look at what the CPU is actually doing inside the Postgres binary.

3.1 Capturing the Flamegraph

# Profile all postgres backends for 60 seconds
perf record -F 99 -g -p $(pgrep -d',' postgres) -- sleep 60

# Generate the SVG
perf script | ./stackcollapse-perf.pl | ./flamegraph.pl > db_profile.svg

3.2 Interpreting “Staff-Level” Patterns

Pattern	Code Path	Meaning
Wide Plateau	`hash_search`	The Hash Table (for Joins or Aggs) is huge and spending time in bucket lookup. Consider `work_mem` tuning.
Wide Plateau	`slot_deform_tuple`	The Defragmentation Tax. Postgres spends massive CPU “deforming” rows to find a column at the end of a wide table. Move heavily accessed columns to the front.
Jagged Towers	`ExecQual`	Massive expression evaluation. Likely a complex `WHERE` clause with non-indexed functions.
Wide Tower	`LWLockAcquire`	Lock Contention. The engine is spending more time waiting for metadata than doing work.

Phase 4: Query-Level Optimization (The 80/20 Rule)

You cannot optimize every query. A principal engineer identifies the 20% of queries causing 80% of the load.

4.1 The pg_stat_statements Strategy

-- Identify queries with the highest I/O impact
SELECT 
    query,
    calls,
    round(total_exec_time::numeric, 2) as total_ms,
    shared_blks_read + shared_blks_written as io_blocks,
    round(100.0 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0), 2) as hit_ratio
FROM pg_stat_statements
ORDER BY io_blocks DESC
LIMIT 10;

Staff Engineer Insight: Look for queries with high calls but low rows. This indicates index-thrashing or unnecessary repeat work. A query called 10,000 times a second that takes 1ms is often more dangerous than a 10s query called once an hour.

4.2 The “Temp File” Trap

If work_mem is too small, Postgres spills sorts and hashes to disk. This is a silent performance killer.

SELECT query, temp_blks_read, temp_blks_written
FROM pg_stat_statements
WHERE temp_blks_read > 0
ORDER BY 2 DESC;

Solution: Increase work_mem for specific users or queries, or optimize the join to reduce result set size.

Phase 5: Concurrency & Lock Contention

5.1 The Blocker Tree

Use this recursive query to find the “Root Blocker” causing a pile-up.

WITH RECURSIVE blockers AS (
  SELECT 
    pid, 
    pg_blocking_pids(pid) AS pids 
  FROM pg_stat_activity 
  WHERE cardinality(pg_blocking_pids(pid)) > 0
  UNION ALL
  SELECT 
    s.pid, 
    pg_blocking_pids(s.pid) 
  FROM pg_stat_activity s 
  JOIN blockers b ON s.pid = ANY(b.pids)
)
SELECT 
  pid, 
  state, 
  wait_event, 
  query 
FROM pg_stat_activity 
WHERE pid IN (SELECT unnest(pids) FROM blockers)
  AND pid NOT IN (SELECT pid FROM blockers);

5.2 Common Contention Archetypes

The Hot Row: Millions of updates to a single row (e.g., a counter). Solution: Use batching or CRDT-style additive counters.
The Foreign Key Lock: Updating a child table requires a shared lock on the parent. If many children update simultaneously, the parent becomes a bottleneck.
The Index Build: Creating an index without CONCURRENTLY locks the whole table for writes.

Phase 6: Production Troubleshooting Runbook

Symptom	Initial Check	Probable Root Cause
Spiking Latency	`pg_stat_activity`	Lock pile-up or a “Plan Flip” (CBO chose a bad plan due to stale stats).
High Load Avg	`vmstat` / `top`	High context switching or Spinlock contention.
I/O Saturation	`iostat`	Checkpoint spike or a huge Sequential Scan.
Out of Memory	`dmesg`	`work_mem` is set too high for the connection count.

Summary: The Principal’s Workflow

Start at the OS: Is there a resource (CPU/IO) at 100%? (USE Method)
Look at the Waits: Is the database waiting on itself? (pg_stat_activity)
Find the Culprit: Which queries are causing the most impact? (pg_stat_statements)
Profile if Needed: If CPU is high but queries look fine, use perf and Flamegraphs.
Optimize Evidentially: Change one variable, measure the throughput shift, and repeat.

5.2 PgBouncer Configuration

# pgbouncer.ini for high performance

[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb

[pgbouncer]
# Connection pooling mode
pool_mode = transaction  # Best for most workloads

# Pool sizing
default_pool_size = 20      # Connections per user/database pair
min_pool_size = 5           # Keep some connections warm
reserve_pool_size = 5       # Extra for bursts
reserve_pool_timeout = 3    # How long before using reserve

# Connection limits
max_client_conn = 1000      # Max client connections to pgbouncer
max_db_connections = 50     # Max connections to actual database

# Timeouts
query_timeout = 30          # Kill query if longer than 30s
client_idle_timeout = 600   # Close idle clients after 10min
server_idle_timeout = 60    # Close idle server connections

# Performance
tcp_keepalive = 1
tcp_keepidle = 60
tcp_keepintvl = 10
tcp_keepcnt = 3

# Logging
log_connections = 0         # Reduce log spam in production
log_disconnections = 0
log_pooler_errors = 1

# Admin
admin_users = pgbouncer_admin
stats_users = pgbouncer_stats

5.3 Memory Configuration

-- Memory settings for different workload types

-- OLTP workload (many small queries)
shared_buffers = '8GB'              -- 25% of RAM (up to ~8GB)
effective_cache_size = '24GB'       -- 75% of RAM
work_mem = '64MB'                   -- Conservative for many connections
maintenance_work_mem = '2GB'        -- For vacuum, index builds
wal_buffers = '64MB'

-- OLAP workload (few large queries)
shared_buffers = '32GB'             -- Can be larger for analytics
effective_cache_size = '96GB'       
work_mem = '2GB'                    -- Larger for complex queries
maintenance_work_mem = '4GB'
max_parallel_workers_per_gather = 8

-- Work_mem calculation for hash joins:
/*
 * Memory needed = (inner_rows × tuple_width × hash_overhead)
 * 
 * For 1M rows, 100 bytes/row:
 * Memory = 1,000,000 × 100 × 1.5 ≈ 150MB
 * 
 * Set work_mem = 256MB to fit with margin
 * 
 * Warning: work_mem is PER OPERATION
 * Complex query with 5 sorts/hashes = 5 × work_mem
 * With 100 connections: potential 5 × 256MB × 100 = 128GB!
 */

Part 6: Lock Contention Analysis

6.1 Lock Monitoring

-- Current locks (who's blocking whom)
SELECT 
    blocked_locks.pid AS blocked_pid,
    blocked_activity.usename AS blocked_user,
    blocking_locks.pid AS blocking_pid,
    blocking_activity.usename AS blocking_user,
    blocked_activity.query AS blocked_statement,
    blocking_activity.query AS blocking_statement
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity 
    ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks 
    ON blocking_locks.locktype = blocked_locks.locktype
    AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
    AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
    AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
    AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
    AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
    AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
    AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
    AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
    AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
    AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity 
    ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;

-- Lock types and their conflicts
/*
┌─────────────────────────────────────────────────────────────────────────────┐
│                      LOCK CONFLICT MATRIX                                    │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│ Requested →   ACCESS  ROW    ROW    SHARE  SHARE  SHARE   EXCL   ACCESS    │
│ Held ↓        SHARE   SHARE  EXCL   UPDATE ROW    UPDATE         EXCL      │
│                                            EXCL   EXCL                       │
│ ─────────────────────────────────────────────────────────────────────────── │
│ ACCESS SHARE    -       -      -      -      -      -       -      X        │
│ ROW SHARE       -       -      -      -      -      -       X      X        │
│ ROW EXCLUSIVE   -       -      -      -      -      X       X      X        │
│ SHARE UPDATE EX -       -      -      -      X      X       X      X        │
│ SHARE           -       -      -      -      X      X       X      X        │
│ SHARE ROW EXCL  -       -      X      X      X      X       X      X        │
│ EXCLUSIVE       -       X      X      X      X      X       X      X        │
│ ACCESS EXCL     X       X      X      X      X      X       X      X        │
│                                                                              │
│ X = Conflict (must wait)                                                    │
│ - = No conflict (can proceed)                                               │
│                                                                              │
│ Common operations and their locks:                                          │
│ • SELECT: ACCESS SHARE                                                      │
│ • INSERT/UPDATE/DELETE: ROW EXCLUSIVE                                       │
│ • CREATE INDEX CONCURRENTLY: SHARE UPDATE EXCLUSIVE                         │
│ • VACUUM FULL, REINDEX: ACCESS EXCLUSIVE                                    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
*/

6.2 Deadlock Analysis

-- Enable deadlock logging
-- In postgresql.conf:
-- log_lock_waits = on
-- deadlock_timeout = 1s

-- Check logs for deadlock patterns:
/*
LOG:  process 12345 detected deadlock while waiting for ShareLock on 
      transaction 987654 after 1000.123 ms
DETAIL:  Process holding the lock: 12346. Wait queue: .
CONTEXT:  while updating tuple (0,42) in relation "orders"
STATEMENT:  UPDATE orders SET status = 'shipped' WHERE id = 100

LOG:  process 12345 still waiting for ShareLock on transaction 987654 
      after 1000.456 ms
DETAIL:  Process holding the lock: 12346. Wait queue: 12345.
CONTEXT:  while updating tuple (0,42) in relation "orders"
STATEMENT:  UPDATE orders SET status = 'shipped' WHERE id = 100
*/

-- Analyze deadlock from logs:
/*
 Deadlock scenario:
 
 Transaction A:                  Transaction B:
 UPDATE orders WHERE id=100      UPDATE orders WHERE id=200
 (holds lock on row 100)         (holds lock on row 200)
 ...                             ...
 UPDATE orders WHERE id=200      UPDATE orders WHERE id=100
 (waits for B's lock)            (waits for A's lock)
 
 ↓ DEADLOCK
 
 Solution: Consistent lock ordering
 - Always acquire locks in same order (e.g., by id ASC)
 - Or use SELECT ... FOR UPDATE with explicit ordering
*/

Part 7: Production Troubleshooting Runbook

7.1 High CPU Usage

-- Step 1: Identify CPU-heavy queries
SELECT 
    pid,
    usename,
    state,
    wait_event_type,
    wait_event,
    left(query, 100)
FROM pg_stat_activity
WHERE state = 'active'
  AND wait_event_type IS NULL  -- Not waiting = actually running
ORDER BY backend_start;

-- Step 2: Check pg_stat_statements
SELECT 
    query,
    calls,
    mean_exec_time,
    total_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

-- Step 3: Profile with perf
-- perf record -g -p <pid> -- sleep 30
-- perf report

-- Common causes:
-- • Missing index (ExecSeqScan in profile)
-- • Complex expression evaluation (ExecEvalExpr)
-- • Hash table operations (hash_search)
-- • Sort operations (tuplesort_*)

7.2 High I/O Wait

-- Step 1: Check wait events
SELECT 
    wait_event_type,
    wait_event,
    count(*)
FROM pg_stat_activity
WHERE state = 'active'
  AND wait_event_type IN ('IO', 'BufferPin')
GROUP BY wait_event_type, wait_event;

-- Step 2: Check buffer cache hit ratio
SELECT 
    sum(heap_blks_read) as heap_read,
    sum(heap_blks_hit) as heap_hit,
    sum(heap_blks_hit) / 
        nullif(sum(heap_blks_hit) + sum(heap_blks_read), 0) as ratio
FROM pg_statio_user_tables;
-- Should be > 0.99 for OLTP

-- Step 3: Check checkpoint frequency
SELECT 
    checkpoints_timed,
    checkpoints_req,  -- Requested = forced = BAD
    buffers_checkpoint,
    buffers_backend   -- Backend writes = BAD, should be 0
FROM pg_stat_bgwriter;

-- Step 4: System level
-- iostat -xz 1
-- iotop

-- Common causes:
-- • shared_buffers too small
-- • Checkpoint too aggressive (checkpoint_completion_target low)
-- • seq scans on large tables
-- • Bloated tables need VACUUM

7.3 Lock Contention

-- Step 1: Check current locks
SELECT 
    locktype,
    relation::regclass,
    mode,
    granted,
    pid,
    left(query, 50) as query
FROM pg_locks l
JOIN pg_stat_activity a ON l.pid = a.pid
WHERE NOT granted
ORDER BY relation;

-- Step 2: Find blocking queries
SELECT * FROM pg_blocking_pids(12345);  -- Replace with waiting PID

-- Step 3: Check for long-running transactions
SELECT 
    pid,
    now() - xact_start as xact_duration,
    now() - query_start as query_duration,
    state,
    left(query, 60)
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
ORDER BY xact_start;

-- Step 4: Kill if necessary
SELECT pg_terminate_backend(12345);  -- Last resort

-- Prevention:
-- • Use statement_timeout and idle_in_transaction_session_timeout
-- • Use lock_timeout
-- • Avoid long transactions in OLTP

Part 8: Interview Questions

Staff/Principal Level

Q: How would you diagnose a sudden 10x latency increase in production?

Systematic Approach:

Immediate checks (< 1 min)
- pg_stat_activity: What’s running, what’s waiting?
- pg_locks: Any blocking?
- System metrics: CPU, I/O, memory via monitoring
Compare to baseline
- pg_stat_statements: Which queries slowed down?
- Was there a deploy? Schema change?
- Traffic pattern change?
Drill down
- EXPLAIN ANALYZE the slow queries
- Check for plan changes (pg_stat_plans if available)
- Statistics stale? Run ANALYZE
System level if needed
- iostat: I/O saturation?
- vmstat: Memory pressure?
- perf/flamegraph: CPU profile
Common root causes
- Autovacuum not keeping up
- Statistics out of date → bad plans
- Connection saturation
- Lock contention from long transaction
- Checkpoint storm

Q: Design a monitoring system for PostgreSQL at scale

Architecture:

Metrics Collection
- pg_stat_* views polled every 10-60s
- pg_stat_statements for query patterns
- OS metrics (node_exporter)
- Custom metrics via pgbouncer stats
Storage
- Time-series database (Prometheus, InfluxDB, TimescaleDB)
- Retention: 15 days high-res, 1 year aggregated
Dashboards
- Golden signals: latency, throughput, errors, saturation
- Per-query latency percentiles
- Replication lag
- Vacuum progress
- Connection pool stats
Alerting
- Latency P99 > threshold
- Replication lag > 30s
- Connections > 80% max
- Disk usage > 80%
- Long-running transactions > 10 min
Query Insights
- Track query fingerprints over time
- Detect plan regressions
- Identify optimization opportunities

Q: How would you scale PostgreSQL to handle 1M QPS?

Strategy:

Read Scaling
- Read replicas (streaming replication)
- Load balancer (HAProxy, pgpool) with read/write splitting
- Connection pooling (PgBouncer) in front of each node
- Caching layer (Redis) for hot data
Write Scaling (harder)
- Vertical scaling first (bigger instance)
- Application-level sharding
- Citus for distributed PostgreSQL
- Partitioning for time-series data
Connection Management
- PgBouncer with transaction pooling
- Aim for ~50 actual connections per node
- 1000s of clients → PgBouncer → 50 DB connections
Query Optimization
- Prepared statements (reduce parsing)
- Efficient indexes
- Minimize lock contention
Infrastructure
- NVMe storage for IOPS
- Sufficient RAM for working set
- Network: 10+ Gbps between components

Reality check: 1M QPS usually requires:

Many read replicas (10-50+)
Aggressive caching
Sharding for writes
Not all queries hit the database

Production Troubleshooting Scenario 1: High-CPU Query Analysis

Incident Report

Symptoms: Database server showing 95%+ CPU utilization. Application reporting 5x normal query latency.
Impact: User-facing API timing out, customer complaints escalating.
Time to Resolve: Target < 15 minutes

Investigation Workflow

Step 1: Identify CPU-Intensive Queries (2 minutes)

-- Find queries consuming the most CPU time
SELECT 
    left(query, 100) as query_preview,
    calls,
    total_exec_time::numeric(12,2) as total_exec_ms,
    mean_exec_time::numeric(10,2) as mean_exec_ms,
    (total_exec_time / sum(total_exec_time) OVER ()) * 100 as pct_total_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

-- Check currently running expensive queries
SELECT 
    pid,
    usename,
    application_name,
    now() - query_start as duration,
    state,
    left(query, 100) as query_preview
FROM pg_stat_activity
WHERE state = 'active' 
  AND query NOT LIKE '%pg_stat_activity%'
ORDER BY query_start;

What to Look For:

Queries with high total_exec_time but reasonable mean_exec_time → high call frequency issue
Queries with high mean_exec_time → individual query performance issue
Long-running queries that started recently → may be holding locks

Step 2: Analyze the Problematic Query (3 minutes)

-- Get the full query text if truncated
SELECT query FROM pg_stat_statements WHERE queryid = <identified_queryid>;

-- Check execution plan
EXPLAIN (ANALYZE, BUFFERS, VERBOSE) 
<paste the problematic query here>;

Common High-CPU Patterns:

Full Table Scans on Large Tables
- Look for Seq Scan on tables with millions of rows
- Missing index or statistics out of date
Nested Loop Joins with High Row Estimates
- Nested Loop on large result sets is CPU-intensive
- May need hash join or merge join instead
Complex Aggregations Without Indexes
- GROUP BY on non-indexed columns
- Window functions scanning entire tables
Regular Expression or String Operations
- LIKE '%pattern%' (can’t use indexes)
- regexp_match on large text fields

Step 3: Check for Statistics Issues (2 minutes)

-- Check when tables were last analyzed
SELECT 
    schemaname,
    relname,
    last_analyze,
    last_autoanalyze,
    n_live_tup,
    n_dead_tup
FROM pg_stat_user_tables
WHERE relname IN (
    -- List tables from the problematic query
    'users', 'orders', 'products'
)
ORDER BY last_analyze NULLS FIRST;

-- If statistics are stale, run ANALYZE
ANALYZE users;
ANALYZE orders;
-- Check if plan improves

Step 4: Immediate Mitigation (3 minutes) Option A: Kill the Offending Queries

-- Identify specific PIDs
SELECT 
    pid,
    usename,
    left(query, 80),
    now() - query_start as duration
FROM pg_stat_activity
WHERE query ILIKE '%<pattern from step 1>%'
  AND state = 'active';

-- Terminate them
SELECT pg_terminate_backend(pid) FROM pg_stat_activity 
WHERE query ILIKE '%<pattern>%' AND state = 'active';

Option B: Add Missing Index

-- If Seq Scan is the issue and WHERE clause is clear
CREATE INDEX CONCURRENTLY idx_users_created_at ON users(created_at);
-- CONCURRENTLY prevents table locking

Option C: Query Rewrite (if possible)

-- Example: Replace full table scan with indexed lookup
-- Bad:
SELECT * FROM orders WHERE status IN ('pending', 'processing');

-- Better:
SELECT * FROM orders WHERE status = 'pending'
UNION ALL
SELECT * FROM orders WHERE status = 'processing';
-- If there's an index on status, this uses it twice instead of seq scan

Step 5: Monitor and Verify (5 minutes)

-- Watch CPU and query stats
SELECT 
    sum(total_exec_time) as total_time,
    sum(calls) as total_calls
FROM pg_stat_statements;

-- Check every 30 seconds, verify total_time growth rate decreased
-- Monitor system CPU with `top` or monitoring dashboard

Post-Incident Analysis

Root Cause Categories:

Missing Index: Application added new query pattern without proper indexing
Statistics Drift: Large data changes made planner choose wrong plan
Query Regression: Application deployed with inefficient query
Load Spike: Sudden traffic increase exposed existing slow query

Preventive Measures:

Enable auto_explain with log_min_duration to catch slow queries before they become incidents
Set up alerts on CPU >80% sustained for >2 minutes
Run ANALYZE after bulk data imports
Review new queries in development with EXPLAIN ANALYZE
Implement query review process before production deployment

Production Troubleshooting Scenario 2: I/O-Bound Workload with Low Cache Hit Ratio

Incident Report

Symptoms: Response times degraded 10x. Disk I/O is saturated at 100%. Cache hit ratio dropped from 99% to 85%.
Impact: All database-dependent services experiencing timeouts.
Time to Resolve: Target < 20 minutes

Investigation Workflow

Step 1: Confirm I/O Saturation (1 minute)

# Check disk I/O utilization (Linux)
iostat -x 1 5

# Look for:
# - %util near 100% (disk saturated)
# - await > 10ms (high latency)
# - r/s and w/s (read/write operations per second)

# PostgreSQL wait events

SELECT 
    wait_event_type,
    wait_event,
    count(*) as waiting_backends
FROM pg_stat_activity
WHERE wait_event IS NOT NULL
GROUP BY wait_event_type, wait_event
ORDER BY count(*) DESC;

-- Look for: DataFileRead, DataFileWrite, WALWrite

Step 2: Identify Cache-Unfriendly Queries (3 minutes)

-- Queries with worst buffer hit ratios
SELECT 
    left(query, 100) as query_preview,
    calls,
    shared_blks_read as disk_reads,
    shared_blks_hit as cache_hits,
    shared_blks_read + shared_blks_hit as total_blks,
    round(100.0 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0), 2) as cache_hit_pct
FROM pg_stat_statements
WHERE shared_blks_read > 1000  -- Significant disk activity
ORDER BY shared_blks_read DESC
LIMIT 20;

-- Identify specific tables being read from disk
SELECT 
    schemaname,
    relname as table_name,
    heap_blks_read as disk_reads,
    heap_blks_hit as cache_hits,
    heap_blks_read + heap_blks_hit as total_accesses,
    round(100.0 * heap_blks_hit / nullif(heap_blks_hit + heap_blks_read, 0), 2) as hit_ratio
FROM pg_statio_user_tables
WHERE heap_blks_read > 0
ORDER BY heap_blks_read DESC
LIMIT 20;

Step 3: Analyze Query Plans for I/O Issues (4 minutes)

-- Get explain plan with buffer statistics
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
<paste problematic query>;

-- Look for:
-- • "Buffers: shared read=XXXXX" - high numbers mean disk reads
-- • Seq Scans on large tables
-- • Index scans that read too many pages
-- • Sorts or aggregations spilling to disk

Common I/O Problem Patterns:

Large Sequential Scans
- Reading entire large tables from disk
- Solution: Add appropriate indexes or partition data
Index Scans with Poor Selectivity
- Index scan fetching >10% of table (seq scan might be better)
- Solution: Consider covering index or change query
Work Memory Exhaustion
- Sorts/hashes spilling to temp files
- Look for “external merge disk” in plan
- Solution: Increase work_mem
Table Bloat
- Dead tuples force reading more pages
- Solution: Run VACUUM or VACUUM FULL

Step 4: Check for Bloat and Fragmentation (3 minutes)

-- Identify bloated tables
SELECT 
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as total_size,
    n_live_tup,
    n_dead_tup,
    round(100.0 * n_dead_tup / nullif(n_live_tup + n_dead_tup, 0), 2) as dead_tup_pct,
    last_vacuum,
    last_autovacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;

-- Check if autovacuum is running
SELECT * FROM pg_stat_progress_vacuum;

Step 5: Immediate Mitigation (5 minutes) Option A: Reduce Query Load

-- Kill or throttle the I/O-heavy queries
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE query ILIKE '%<identified pattern>%'
  AND state = 'active'
  AND backend_type = 'client backend';

-- Or set a statement timeout for that session
ALTER DATABASE mydb SET statement_timeout = '5s';

Option B: Emergency VACUUM

-- If bloat is the issue
VACUUM (ANALYZE, VERBOSE) problematic_table;

-- Monitor progress
SELECT * FROM pg_stat_progress_vacuum;

Option C: Add Targeted Index

-- If missing index is clear from EXPLAIN
CREATE INDEX CONCURRENTLY idx_orders_user_created 
ON orders(user_id, created_at) 
INCLUDE (status, total);  -- Covering index to avoid heap lookups

Option D: Temporarily Increase Shared Buffers

-- If you have available RAM and can restart
-- postgresql.conf:
-- shared_buffers = 8GB  (increase from current value)
-- effective_cache_size = 24GB  (hint for planner)

-- Or for immediate session-level help:
SET work_mem = '256MB';  -- For current session only

Step 6: Monitor Recovery (4 minutes)

-- Watch cache hit ratio improve
SELECT 
    sum(heap_blks_hit) as cache_hits,
    sum(heap_blks_read) as disk_reads,
    round(100.0 * sum(heap_blks_hit) / 
          nullif(sum(heap_blks_hit) + sum(heap_blks_read), 0), 2) as hit_ratio
FROM pg_statio_user_tables;

-- Check every 30 seconds, target >99%

-- Monitor I/O wait events
SELECT 
    wait_event_type,
    count(*) as waiting
FROM pg_stat_activity
WHERE wait_event_type = 'IO'
GROUP BY wait_event_type;
-- Should decrease significantly

Post-Incident Analysis

Root Cause Categories:

Working Set Exceeds Cache: Data accessed frequently doesn’t fit in shared_buffers
Sudden Query Change: New query pattern reads cold data from disk
Bloat Accumulation: VACUUM not keeping up, forcing reads of many dead-tuple pages
Cache Eviction Storm: Massive seq scan pushed frequently-used data out of cache

Preventive Measures:

Right-size shared_buffers: Aim for 25-40% of system RAM
Monitor bloat proactively: Set up alerts when dead_tup_pct >20%
Tune autovacuum aggressively: Lower autovacuum_vacuum_scale_factor
Implement query review: Catch full table scans in development
Set up I/O alerts: Alert when disk await >10ms sustained or cache hit ratio under 95%
Consider read replicas: Offload analytical queries to dedicated instances

Module Mastery Checklist

Complete this checklist to confirm you’ve mastered Performance Engineering:

Performance Investigation Framework

Apply the 6-layer performance stack to diagnose issues systematically
Implement the four golden signals for database monitoring
Conduct root cause analysis using appropriate metrics at each layer
Triage production incidents within the first 5 minutes
Communicate performance findings to technical and non-technical stakeholders

Statistics and Monitoring

Configure and query pg_stat_statements for query performance analysis
Interpret wait event statistics to identify resource bottlenecks
Use pg_stat_activity to monitor active connections and queries
Analyze pg_stat_user_tables for table-level performance metrics
Build custom monitoring queries for production dashboards

CPU Profiling and System Analysis

Use perf and flamegraphs to profile PostgreSQL processes
Identify hot code paths and CPU-intensive operations
Interpret flamegraph patterns for different workload types
Analyze system-level CPU metrics and context switches
Apply profiling insights to optimize queries

I/O Performance

Monitor and optimize buffer cache hit ratios
Identify queries causing excessive physical I/O
Diagnose disk saturation using iostat and wait events
Configure shared_buffers and effective_cache_size appropriately
Implement strategies to reduce I/O amplification

Query Optimization

Read and interpret EXPLAIN and EXPLAIN ANALYZE output
Identify suboptimal scan types and join algorithms
Recognize when statistics are stale and run ANALYZE
Force better query plans when necessary
Optimize queries to use appropriate indexes

Concurrency and Locking

Diagnose lock contention using pg_locks and wait events
Identify and resolve deadlocks
Understand lock mode compatibility
Optimize transaction scope to minimize lock duration
Debug serialization failures in high-concurrency scenarios

Production Troubleshooting

Diagnose and resolve high-CPU query workloads
Fix I/O-bound performance degradation
Respond to sudden query slowdowns systematically
Debug connection pool exhaustion
Identify and remediate bloat-related performance issues

Configuration Tuning

Tune shared_buffers based on workload and available RAM
Configure work_mem to prevent disk spills
Optimize checkpoint settings for write performance
Tune autovacuum for production workloads
Apply connection pooling strategies effectively

Practical Application

Complete production troubleshooting scenarios
Build a monitoring dashboard for PostgreSQL
Conduct performance reviews of new queries before deployment
Implement proactive alerting for performance metrics
Document performance baselines and SLOs for production systems

Overview

Testing & Code Quality

Crash Courses

AI Engineering

Math for ML - Understanding Linear Algebra

Probability & Statistics for ML

Math for ML - Understanding Calculus

ML Mastery

Deep Learning Mastery

NestJS Mastery

Microservices Mastery

Low Level Design

OOP Concepts

SOLID Principles

Design Patterns

LLD Case Studies

System Design (HLD)

Senior Level (L5+/Staff)

HLD Case Studies

Engineering Fundamentals

DevOps & Operations

Azure Cloud Engineering

AWS Cloud

AWS Monitoring & Observability

AWS Security Services

AWS Serverless

AWS Operations

AWS Advanced

AWS Case Studies

GCP Cloud Engineering

DevOps Tools

Database Engineering

HIPAA Compliance Mastery

Operating Systems

Linux Internals

Distributed Systems

Networking Mastery

Build Your Own X

Go Lang Mastery

C Programming

Classic Research Papers

Distributed System Tools

​Performance Engineering: The Bottleneck Analysis Guide

​The Performance Engineering Mindset

​Phase 0: Triage & The Four Golden Signals

​Triage Query: The 30-Second Snapshot

​Phase 1: Resource Saturation (The USE Method)

​1.1 CPU: Computation vs. Context Switching

​1.2 Memory: The Buffer Hit Ratio Myth

​1.3 Disk: IOPS vs. Throughput

​Phase 2: Database Internal Waits

​Phase 2: Database Internal Waits

​2.1 The Hierarchy of Waits

​2.2 Advanced: ProcArrayLock Contention

​Phase 3: Code-Level Profiling (Flamegraphs)

​3.1 Capturing the Flamegraph

​3.2 Interpreting “Staff-Level” Patterns

​Phase 4: Query-Level Optimization (The 80/20 Rule)

​4.1 The pg_stat_statements Strategy

​4.2 The “Temp File” Trap

​Phase 5: Concurrency & Lock Contention

​5.1 The Blocker Tree

​5.2 Common Contention Archetypes

​Phase 6: Production Troubleshooting Runbook

​Summary: The Principal’s Workflow

​5.2 PgBouncer Configuration

​5.3 Memory Configuration

​Part 6: Lock Contention Analysis

​6.1 Lock Monitoring

​6.2 Deadlock Analysis

​Part 7: Production Troubleshooting Runbook

​7.1 High CPU Usage

​7.2 High I/O Wait

​7.3 Lock Contention

​Part 8: Interview Questions

​Staff/Principal Level

​Production Troubleshooting Scenario 1: High-CPU Query Analysis

​Incident Report

​Investigation Workflow

​Post-Incident Analysis

Performance Engineering: The Bottleneck Analysis Guide

The Performance Engineering Mindset

Phase 0: Triage & The Four Golden Signals

Triage Query: The 30-Second Snapshot

Phase 1: Resource Saturation (The USE Method)

1.1 CPU: Computation vs. Context Switching

1.2 Memory: The Buffer Hit Ratio Myth

1.3 Disk: IOPS vs. Throughput

Phase 2: Database Internal Waits

Phase 2: Database Internal Waits

2.1 The Hierarchy of Waits

2.2 Advanced: ProcArrayLock Contention

Phase 3: Code-Level Profiling (Flamegraphs)

3.1 Capturing the Flamegraph

3.2 Interpreting “Staff-Level” Patterns

Phase 4: Query-Level Optimization (The 80/20 Rule)

4.1 The pg_stat_statements Strategy

4.2 The “Temp File” Trap

Phase 5: Concurrency & Lock Contention

5.1 The Blocker Tree

5.2 Common Contention Archetypes

Phase 6: Production Troubleshooting Runbook

Summary: The Principal’s Workflow

5.2 PgBouncer Configuration

5.3 Memory Configuration

Part 6: Lock Contention Analysis

6.1 Lock Monitoring

6.2 Deadlock Analysis

Part 7: Production Troubleshooting Runbook

7.1 High CPU Usage

7.2 High I/O Wait

7.3 Lock Contention

Part 8: Interview Questions

Staff/Principal Level

Production Troubleshooting Scenario 1: High-CPU Query Analysis

Incident Report

Investigation Workflow

Post-Incident Analysis

Production Troubleshooting Scenario 2: I/O-Bound Workload with Low Cache Hit Ratio

Incident Report

Investigation Workflow

Post-Incident Analysis

Module Mastery Checklist

Performance Investigation Framework

Statistics and Monitoring

CPU Profiling and System Analysis

I/O Performance

Query Optimization

Concurrency and Locking

Production Troubleshooting

Configuration Tuning

Practical Application

Next Steps