Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Module 8: Scaling Strategies

When your application grows beyond what a single server can handle, you need scaling strategies. This module covers horizontal scaling, sharding, and partitioning. The most important scaling principle: Do not shard until you absolutely must. Sharding adds irreversible complexity — cross-shard joins, distributed transactions, and operational overhead that will slow your team down for years. Most PostgreSQL instances can handle far more load than teams realize, especially with proper indexing, connection pooling, and read replicas. The order of operations should be: optimize queries, add indexes, use connection pooling, add read replicas, partition tables, and only then consider sharding.
Estimated Time: 12-14 hours
Hands-On: Implement table partitioning
Key Skill: Choosing the right scaling strategy
PostgreSQL scaling playbook from single node to sharding

8.1 Scaling Fundamentals

Vertical vs Horizontal Scaling

┌─────────────────────────────────────────────────────────────────────────────┐
│                    VERTICAL VS HORIZONTAL SCALING                            │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  VERTICAL SCALING (Scale Up):                                               │
│  ─────────────────────────────                                               │
│                                                                              │
│    Before:              After:                                               │
│    ┌─────────┐          ┌─────────────────┐                                 │
│    │ 4 cores │   ──▶    │    64 cores     │                                 │
│    │  16GB   │          │     512GB       │                                 │
│    │  1TB    │          │      20TB       │                                 │
│    └─────────┘          └─────────────────┘                                 │
│                                                                              │
│    Pros: Simple, no code changes                                            │
│    Cons: Hardware limits, expensive, single point of failure               │
│                                                                              │
│  HORIZONTAL SCALING (Scale Out):                                            │
│  ────────────────────────────────                                            │
│                                                                              │
│    Before:              After:                                               │
│    ┌─────────┐          ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐    │
│    │ 4 cores │   ──▶    │ 4 cores │ │ 4 cores │ │ 4 cores │ │ 4 cores │    │
│    │  16GB   │          │  16GB   │ │  16GB   │ │  16GB   │ │  16GB   │    │
│    └─────────┘          └─────────┘ └─────────┘ └─────────┘ └─────────┘    │
│                                                                              │
│    Pros: Linear scaling, fault tolerant, cost effective                     │
│    Cons: Complex, requires application changes, CAP theorem                 │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

When to Scale

SignalWhat It MeansAction
CPU > 80% sustainedQuery processing bottleneckRead replicas, optimize queries
Memory pressureWorking set > RAMMore RAM, or shard data
Disk I/O saturatedToo many reads/writesFaster disks, caching, sharding
Connection limitsToo many clientsConnection pooling
Response time degradationAll of the aboveProfile and optimize

8.2 Table Partitioning

What is Partitioning?

┌─────────────────────────────────────────────────────────────────────────────┐
│                       TABLE PARTITIONING                                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Without Partitioning:                                                       │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                         orders                                      │    │
│  │  500 million rows - all in one table                               │    │
│  │  Every query scans entire table (even with indexes)               │    │
│  │  Maintenance operations lock everything                            │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│  With Partitioning:                                                          │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                    orders (parent table)                           │    │
│  ├─────────────┬─────────────┬─────────────┬─────────────────────────┤    │
│  │ orders_2023 │ orders_2024 │ orders_2025 │         ...            │    │
│  │   50M rows  │   150M rows │   200M rows │                         │    │
│  └─────────────┴─────────────┴─────────────┴─────────────────────────┘    │
│                                                                              │
│  Benefits:                                                                   │
│  • Query "WHERE created_at >= '2024-01-01'" only scans 2024/2025           │
│  • VACUUM, REINDEX operate on individual partitions                        │
│  • Can drop old partitions instantly (vs DELETE + VACUUM)                  │
│  • Different storage for hot/cold data                                      │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Partition Types

Best for: Time-series data, continuous values
-- Create partitioned table
CREATE TABLE orders (
    id BIGINT,
    customer_id INT,
    order_date DATE,
    total DECIMAL(10,2),
    status VARCHAR(20)
) PARTITION BY RANGE (order_date);

-- Create partitions
CREATE TABLE orders_2023 PARTITION OF orders
    FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');

CREATE TABLE orders_2024 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');

CREATE TABLE orders_2025 PARTITION OF orders
    FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');

-- PostgreSQL automatically routes inserts
INSERT INTO orders VALUES (1, 100, '2024-06-15', 99.99, 'completed');
-- Goes to orders_2024

-- Query only scans relevant partitions
EXPLAIN SELECT * FROM orders WHERE order_date >= '2024-06-01';

Partition Maintenance

-- Add new partition for next year
CREATE TABLE orders_2026 PARTITION OF orders
    FOR VALUES FROM ('2026-01-01') TO ('2027-01-01');

-- Detach old partition (keeps data, removes from parent)
ALTER TABLE orders DETACH PARTITION orders_2020;

-- Then archive or drop
-- Option 1: Archive to cold storage
\copy orders_2020 TO 'orders_2020.csv' CSV HEADER
DROP TABLE orders_2020;

-- Option 2: Drop immediately
DROP TABLE orders_2020;

-- Automatic partition creation with pg_partman
CREATE EXTENSION pg_partman;
SELECT partman.create_parent(
    p_parent_table := 'public.orders',
    p_control := 'order_date',
    p_type := 'native',
    p_interval := 'monthly'
);

8.3 Database Sharding

Sharding Concepts

┌─────────────────────────────────────────────────────────────────────────────┐
│                         DATABASE SHARDING                                    │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Partitioning: Split data within ONE database                               │
│  Sharding: Split data across MULTIPLE databases                             │
│                                                                              │
│                          ┌───────────────┐                                   │
│                          │   Application │                                   │
│                          └───────┬───────┘                                   │
│                                  │                                           │
│                          ┌───────▼───────┐                                   │
│                          │ Shard Router  │                                   │
│                          └───────┬───────┘                                   │
│                                  │                                           │
│           ┌──────────────────────┼──────────────────────┐                   │
│           │                      │                      │                   │
│           ▼                      ▼                      ▼                   │
│   ┌───────────────┐      ┌───────────────┐      ┌───────────────┐          │
│   │   Shard 0     │      │   Shard 1     │      │   Shard 2     │          │
│   │ users 0-999K  │      │ users 1M-2M   │      │ users 2M-3M   │          │
│   │  Server: A    │      │  Server: B    │      │  Server: C    │          │
│   └───────────────┘      └───────────────┘      └───────────────┘          │
│                                                                              │
│  Each shard is a complete PostgreSQL instance with full functionality      │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Sharding Strategies

# Application determines shard
def get_shard(user_id):
    return user_id % NUM_SHARDS

# All user data for user 12345
shard = get_shard(12345)  # e.g., shard 1
conn = get_connection(shard)
Pros: Even distribution, simple Cons: Resharding is expensive (data migration)
# User IDs 0-1M on shard 0, 1M-2M on shard 1, etc.
def get_shard(user_id):
    if user_id < 1_000_000:
        return 0
    elif user_id < 2_000_000:
        return 1
    else:
        return 2
Pros: Range queries possible, easy to add shards Cons: Hotspots if ranges accessed unevenly
# Lookup table maps keys to shards
# shard_directory table:
# | user_id_range | shard_id |
# | 0-500000      | 0        |
# | 500001-1000000| 1        |

def get_shard(user_id):
    return lookup_service.get_shard_for_user(user_id)
Pros: Flexible, can rebalance easily Cons: Lookup adds latency, directory is SPOF

Cross-Shard Challenges

This table is the reason experienced engineers avoid sharding as long as possible. Each row represents a problem that does not exist in a single-node database and has no perfect solution in a sharded architecture. Read this table as a cost-benefit analysis, not a feature list.
ChallengeProblemSolutionHidden Cost
JoinsCan’t join across shardsDenormalize, or application-level joinsData duplication, stale reads, application complexity
TransactionsNo cross-shard ACIDTwo-phase commit, or eventual consistency2PC is slow and fragile; eventual consistency means reasoning about conflicts
AggregationsSUM/COUNT across shardsScatter-gather queriesLatency is bounded by the slowest shard; fan-out amplifies load
SequencesIDs must be unique globallyUUID, or dedicated ID serviceUUIDs are 16 bytes (vs 8 for bigint) and fragment B-Tree indexes; ID services are a new SPOF
RebalancingShards become uneven over timeLive migration between shardsRequires careful coordination; risk of data loss or downtime during migration

8.4 Caching Strategies

Caching Layers

Real-world analogy: Database caching works like a series of increasingly distant warehouses. Your application checks the closest warehouse (Redis) first, then the regional warehouse (PostgreSQL buffer pool), then the national warehouse (OS page cache), and finally the factory (disk). Each layer is larger but slower. The art of cache engineering is keeping the data most likely to be needed in the closest warehouse.
┌─────────────────────────────────────────────────────────────────────────────┐
│                         CACHING ARCHITECTURE                                 │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│                          ┌─────────────┐                                     │
│                          │    Client   │                                     │
│                          └──────┬──────┘                                     │
│                                 │                                            │
│                          ┌──────▼──────┐                                     │
│  CDN Cache ─────────────▶│     CDN     │◀──── Static assets, API responses  │
│                          └──────┬──────┘                                     │
│                                 │                                            │
│                          ┌──────▼──────┐                                     │
│  App Cache ─────────────▶│    Redis    │◀──── Session, hot data, results    │
│                          └──────┬──────┘                                     │
│                                 │ Miss                                       │
│                          ┌──────▼──────┐                                     │
│  Query Cache ────────────▶│ PostgreSQL │                                     │
│                          │   Buffer    │◀──── Recently accessed pages        │
│                          │    Pool     │                                     │
│                          └──────┬──────┘                                     │
│                                 │                                            │
│                          ┌──────▼──────┐                                     │
│  OS Cache ──────────────▶│  OS Page    │◀──── Filesystem cache               │
│                          │   Cache     │                                     │
│                          └──────┬──────┘                                     │
│                                 │                                            │
│                          ┌──────▼──────┐                                     │
│                          │    Disk     │                                     │
│                          └─────────────┘                                     │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Cache Patterns

Application manages cache explicitly.
def get_user(user_id):
    # 1. Check cache
    cached = redis.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)
    
    # 2. Cache miss - fetch from DB
    user = db.query("SELECT * FROM users WHERE id = %s", user_id)
    
    # 3. Populate cache
    redis.setex(f"user:{user_id}", 3600, json.dumps(user))
    
    return user

def update_user(user_id, data):
    # 1. Update database
    db.execute("UPDATE users SET ... WHERE id = %s", user_id)
    
    # 2. Invalidate cache
    redis.delete(f"user:{user_id}")

Cache Invalidation Strategies

“There are only two hard things in Computer Science: cache invalidation and naming things.” — Phil Karlton. Cache invalidation is genuinely the hardest part of caching. The patterns below represent different tradeoffs between staleness tolerance and implementation complexity. Choose the simplest one that meets your staleness requirements.
-- PostgreSQL LISTEN/NOTIFY for cache invalidation.
-- This approach pushes invalidation events to subscribers in real-time,
-- eliminating the polling overhead of TTL-based approaches.
CREATE OR REPLACE FUNCTION notify_cache_invalidation()
RETURNS TRIGGER AS $$
BEGIN
    PERFORM pg_notify('cache_invalidation', 
        json_build_object(
            'table', TG_TABLE_NAME,
            'id', NEW.id,
            'operation', TG_OP
        )::text
    );
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER user_cache_trigger
AFTER INSERT OR UPDATE OR DELETE ON users
FOR EACH ROW EXECUTE FUNCTION notify_cache_invalidation();
# Python listener
import psycopg2
import select

conn = psycopg2.connect(...)
conn.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT)
cur = conn.cursor()
cur.execute("LISTEN cache_invalidation;")

while True:
    if select.select([conn], [], [], 5) == ([], [], []):
        continue
    conn.poll()
    while conn.notifies:
        notify = conn.notifies.pop(0)
        data = json.loads(notify.payload)
        redis.delete(f"{data['table']}:{data['id']}")

8.5 Connection Scaling

Connection Pooling with PgBouncer

┌─────────────────────────────────────────────────────────────────────────────┐
│                    MULTI-TIER CONNECTION POOLING                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Tier 1: Application Pool (e.g., HikariCP)                                  │
│  ┌────────────────────────────────────────────────────────────────────┐     │
│  │  App Server 1      App Server 2      App Server N                 │     │
│  │  ┌───────────┐     ┌───────────┐     ┌───────────┐                │     │
│  │  │ 10 conns  │     │ 10 conns  │     │ 10 conns  │                │     │
│  │  └─────┬─────┘     └─────┬─────┘     └─────┬─────┘                │     │
│  └────────┼─────────────────┼───────────────┼────────────────────────┘     │
│           │                 │               │                               │
│           └────────────┬────┴───────────────┘                              │
│                        │                                                    │
│  Tier 2: PgBouncer (1000 client conns → 50 server conns)                   │
│  ┌─────────────────────┼────────────────────────────────────────────┐      │
│  │                ┌────▼────┐                                       │      │
│  │                │PgBouncer│                                       │      │
│  │                │pool=50  │                                       │      │
│  │                └────┬────┘                                       │      │
│  └─────────────────────┼────────────────────────────────────────────┘      │
│                        │                                                    │
│                        ▼                                                    │
│  ┌──────────────────────────────────────────────────────────────────┐      │
│  │                     PostgreSQL                                    │      │
│  │                  (50 backends)                                    │      │
│  └──────────────────────────────────────────────────────────────────┘      │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Optimizing PostgreSQL for Connections

The golden rule of PostgreSQL connections: Fewer connections usually means higher throughput. This is counterintuitive but well-established. Beyond approximately 2 * CPU_cores + disk_spindles active connections, performance degrades due to context switching, lock contention in ProcArray, and cache thrashing. A server with 16 cores will often perform better with 40 active connections than with 400.
-- postgresql.conf

# Connection limits
max_connections = 200              -- Keep low! Use a pooler for client-facing connections.
superuser_reserved_connections = 3 -- Always keep a few reserved for admin access during emergencies.

# Memory per connection
work_mem = 64MB                    -- DANGER: this is per-operation, not per-connection.
                                   -- A query with 3 sorts = 3 x 64MB = 192MB for that one query.
                                   -- With 200 connections: worst case = 200 x 192MB = 37GB.
maintenance_work_mem = 256MB       -- Used by VACUUM, CREATE INDEX, ALTER TABLE ADD FOREIGN KEY.

# Connection overhead
# Each connection uses:
# - ~10MB for backend process (fixed cost)
# - work_mem per sort/hash operation (variable cost)
# - Session state (catalogs, prepared statements)
#
# Rule of thumb:
# (max_connections * average_work_mem_per_connection) < 50% of available RAM
# Leave the other 50% for shared_buffers and OS page cache.

8.6 Global Distribution

Multi-Region Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                    MULTI-REGION DATABASE                                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                          US-EAST                                     │    │
│  │  ┌─────────────┐           ┌─────────────┐                          │    │
│  │  │   Primary   │──sync────▶│   Standby   │                          │    │
│  │  │  (writes)   │           │  (failover) │                          │    │
│  │  └──────┬──────┘           └─────────────┘                          │    │
│  │         │                                                            │    │
│  └─────────┼────────────────────────────────────────────────────────────┘    │
│            │                                                                 │
│      async │ replication                                                    │
│            │                                                                 │
│  ┌─────────┼────────────────────────────────────────────────────────────┐    │
│  │         ▼                    EU-WEST                                 │    │
│  │  ┌─────────────┐                                                     │    │
│  │  │   Replica   │◀─────────────── EU users (reads)                   │    │
│  │  │ (read-only) │                                                     │    │
│  │  └─────────────┘                                                     │    │
│  └──────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│  ┌──────────────────────────────────────────────────────────────────────┐    │
│  │                        ASIA-PACIFIC                                  │    │
│  │  ┌─────────────┐                                                     │    │
│  │  │   Replica   │◀─────────────── APAC users (reads)                 │    │
│  │  │ (read-only) │                                                     │    │
│  │  └─────────────┘                                                     │    │
│  └──────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│  Challenge: Write latency for non-US users (cross-Atlantic round trip)     │
│  Solutions: Accept latency, or use multi-master with conflict resolution   │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Geo-Sharding

Practical context: Geo-sharding is often driven by data residency regulations (GDPR, CCPA, data sovereignty laws) as much as by performance requirements. If your EU users’ data must stay in EU data centers, geo-sharding is not optional — it is a compliance requirement.
# Route users to their nearest shard based on region
REGION_SHARDS = {
    'us': 'postgres://shard-us.db.example.com/app',
    'eu': 'postgres://shard-eu.db.example.com/app',
    'asia': 'postgres://shard-asia.db.example.com/app',
}

def get_db_connection(user):
    region = determine_user_region(user)
    return connect(REGION_SHARDS[region])

# User data stays in their region (data residency compliance).
# Cross-region queries require a federation layer.
#
# Key design decision: what happens when a US user interacts
# with an EU user? You need either (a) a reference table replicated
# to all regions, or (b) an application-level fan-out that queries
# multiple shards and merges results. Neither is free.

8.7 Scaling Decision Framework

Decision Tree

┌─────────────────────────────────────────────────────────────────────────────┐
│                    SCALING DECISION FRAMEWORK                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  1. Is your database actually the bottleneck?                               │
│     ├── No → Fix application code first                                    │
│     └── Yes ↓                                                               │
│                                                                              │
│  2. Have you optimized queries and indexes?                                 │
│     ├── No → Do that first (cheapest fix)                                  │
│     └── Yes ↓                                                               │
│                                                                              │
│  3. Is it a read-heavy workload?                                            │
│     ├── Yes → Add read replicas + caching                                  │
│     └── No (write-heavy) ↓                                                 │
│                                                                              │
│  4. Can you scale vertically?                                               │
│     ├── Yes → Upgrade hardware (simpler)                                   │
│     └── No (maxed out) ↓                                                   │
│                                                                              │
│  5. Can you partition within one database?                                  │
│     ├── Yes → Table partitioning                                           │
│     └── No (too big) ↓                                                     │
│                                                                              │
│  6. Can you shard by a natural key?                                         │
│     ├── Yes → Implement sharding                                           │
│     └── No → Consider different database architecture                      │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Scaling Complexity Ladder

StageSizeStrategyComplexity
1< 10GBSingle server, good indexes
210-100GBVertical scaling, query optimization⭐⭐
3100GB-1TBRead replicas, connection pooling⭐⭐⭐
41-10TBTable partitioning, caching layer⭐⭐⭐⭐
5> 10TBSharding, dedicated infrastructure⭐⭐⭐⭐⭐

8.8 Practice: Implement Partitioning

Lab Exercise

Create a time-series orders table with monthly partitions and automated maintenance.
-- 1. Create partitioned orders table
CREATE TABLE orders (
    id BIGINT GENERATED ALWAYS AS IDENTITY,
    customer_id INT NOT NULL,
    order_date TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    total DECIMAL(12,2) NOT NULL,
    status VARCHAR(20) DEFAULT 'pending',
    items JSONB,
    PRIMARY KEY (id, order_date)
) PARTITION BY RANGE (order_date);

-- 2. Create monthly partitions for 2024
DO $$
DECLARE
    start_date DATE := '2024-01-01';
    end_date DATE;
    partition_name TEXT;
BEGIN
    FOR i IN 0..11 LOOP
        start_date := DATE '2024-01-01' + (i || ' months')::INTERVAL;
        end_date := start_date + '1 month'::INTERVAL;
        partition_name := 'orders_' || TO_CHAR(start_date, 'YYYY_MM');
        
        EXECUTE format(
            'CREATE TABLE %I PARTITION OF orders 
             FOR VALUES FROM (%L) TO (%L)',
            partition_name, start_date, end_date
        );
    END LOOP;
END $$;

-- 3. Create indexes on partitions (automatically applied to all)
CREATE INDEX idx_orders_customer ON orders(customer_id);
CREATE INDEX idx_orders_status ON orders(status);

-- 4. Insert test data
INSERT INTO orders (customer_id, order_date, total, status, items)
SELECT 
    (random() * 10000)::INT,
    '2024-01-01'::TIMESTAMPTZ + (random() * 365 * INTERVAL '1 day'),
    (random() * 1000)::DECIMAL(12,2),
    (ARRAY['pending', 'processing', 'shipped', 'delivered'])[1 + (random()*3)::INT],
    jsonb_build_object('item_count', (random() * 10)::INT)
FROM generate_series(1, 1000000);

-- 5. Verify partition pruning
EXPLAIN (ANALYZE, COSTS OFF) 
SELECT * FROM orders 
WHERE order_date >= '2024-06-01' AND order_date < '2024-07-01';
-- Should only scan orders_2024_06

-- 6. Maintenance: Drop old partition
ALTER TABLE orders DETACH PARTITION orders_2024_01;
DROP TABLE orders_2024_01;

-- 7. Add new partition for next period
CREATE TABLE orders_2025_01 PARTITION OF orders
    FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

Summary

You’ve learned how to scale PostgreSQL from a single instance to a globally distributed system. Key takeaways:

Optimize First

Indexes and queries before hardware

Partition Early

Plan for growth from the start

Cache Strategically

Right data, right layer

Course Completion

Congratulations! You’ve completed the Database Engineering course.
You now know how to:
  • Write efficient SQL and design schemas
  • Understand transactions and isolation
  • Optimize queries and indexes
  • Navigate PostgreSQL internals
  • Build highly available systems
  • Scale to millions of users

Next Steps

Get Certified

Take the certification exam

Join Community

Connect with other database engineers

Interview Deep-Dive

Strong Answer:
  • The scaling ladder in order: (1) Query optimization via pg_stat_statements — find top 10 queries by total_exec_time, fix missing indexes and N+1 patterns (often buys 2-5x headroom with zero infrastructure changes). (2) Configuration tuning — verify shared_buffers at 25% RAM, random_page_cost at 1.1 for SSD. (3) PgBouncer in transaction mode to reduce max_connections from 500 to 100. (4) Read replicas for read-heavy workloads. (5) Table partitioning for the largest tables. (6) Vertical scaling (modern instances offer 96+ cores, 768GB RAM). (7) Sharding only after exhausting all of the above.
Follow-up: When do you know sharding is truly necessary?The signal is write throughput saturation on a single node. If WAL generation rate is saturating disk I/O after all optimizations, you have outgrown single-node. Read scaling never requires sharding — replicas handle that. It is always write scaling that forces the sharding decision.
Strong Answer:
  • Partitioning splits a table within one instance — all SQL features work normally. Sharding splits across multiple instances — requires routing layer, breaks cross-shard JOINs and transactions. Partition when single-node hardware is sufficient but table size causes maintenance issues. Shard when write throughput exceeds single-node capacity.
  • Common mistakes: (1) Too many partitions — daily partitions on 10-year data creates 3650 partitions, causing planner overhead and catalog bloat. Use monthly or quarterly. (2) Queries missing the partition key in WHERE, causing full partition scans. (3) Not automating future partition creation — inserts fail when data falls outside existing ranges. Use pg_partman.
Follow-up: Can you partition an existing large table without downtime?Yes. Create a new partitioned table, use logical replication or trigger-based sync to replicate data from old to new, then perform a brief lock to swap table names. pg_partman and pgloader assist. Alternatively, ATTACH existing tables as partitions if data already conforms to partition boundaries.
Strong Answer:
  • PostgreSQL’s buffer pool is already a cache. If the working set fits in RAM, PostgreSQL serves data at sub-millisecond latency. Adding Redis only helps when: (1) the same small result is read thousands of times/sec (hot key — Redis handles 100K+ reads/sec per key), (2) session/ephemeral data without durability needs, (3) expensive computed results that tolerate bounded staleness, or (4) rate limiting and atomic counters.
  • Cache invalidation: use cache-aside with TTL for simplicity. For stronger consistency, use PostgreSQL LISTEN/NOTIFY to push invalidation events. Never use write-behind for critical data.
  • The biggest mistake: caching individual row lookups when the real bottleneck is an expensive aggregation query. Cache the aggregation result, not individual rows.
Follow-up: How does pg_prewarm fit in?pg_prewarm loads tables/indexes into shared_buffers proactively. After a restart, the buffer pool is cold. Combined with autoprewarm (PG 11+), which saves buffer contents on shutdown and reloads on startup, you maintain warm cache across restarts — reducing the need for external caching to handle cold-start.