> ## Documentation Index > Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt > Use this file to discover all available pages before exploring further. # Scaling Strategies > Scale databases from thousands to millions of users # Module 8: Scaling Strategies When your application grows beyond what a single server can handle, you need scaling strategies. This module covers horizontal scaling, sharding, and partitioning. **The most important scaling principle:** Do not shard until you absolutely must. Sharding adds irreversible complexity -- cross-shard joins, distributed transactions, and operational overhead that will slow your team down for years. Most PostgreSQL instances can handle far more load than teams realize, especially with proper indexing, connection pooling, and read replicas. The order of operations should be: optimize queries, add indexes, use connection pooling, add read replicas, partition tables, and only then consider sharding. **Estimated Time**: 12-14 hours\ **Hands-On**: Implement table partitioning\ **Key Skill**: Choosing the right scaling strategy PostgreSQL scaling playbook from single node to sharding

PostgreSQL scaling playbook from single node to sharding

*** ## 8.1 Scaling Fundamentals ### Vertical vs Horizontal Scaling ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ VERTICAL VS HORIZONTAL SCALING │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ VERTICAL SCALING (Scale Up): │ │ ───────────────────────────── │ │ │ │ Before: After: │ │ ┌─────────┐ ┌─────────────────┐ │ │ │ 4 cores │ ──▶ │ 64 cores │ │ │ │ 16GB │ │ 512GB │ │ │ │ 1TB │ │ 20TB │ │ │ └─────────┘ └─────────────────┘ │ │ │ │ Pros: Simple, no code changes │ │ Cons: Hardware limits, expensive, single point of failure │ │ │ │ HORIZONTAL SCALING (Scale Out): │ │ ──────────────────────────────── │ │ │ │ Before: After: │ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │ │ 4 cores │ ──▶ │ 4 cores │ │ 4 cores │ │ 4 cores │ │ 4 cores │ │ │ │ 16GB │ │ 16GB │ │ 16GB │ │ 16GB │ │ 16GB │ │ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │ │ │ Pros: Linear scaling, fault tolerant, cost effective │ │ Cons: Complex, requires application changes, CAP theorem │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` ### When to Scale | Signal | What It Means | Action | | ------------------------- | --------------------------- | ------------------------------- | | CPU > 80% sustained | Query processing bottleneck | Read replicas, optimize queries | | Memory pressure | Working set > RAM | More RAM, or shard data | | Disk I/O saturated | Too many reads/writes | Faster disks, caching, sharding | | Connection limits | Too many clients | Connection pooling | | Response time degradation | All of the above | Profile and optimize | *** ## 8.2 Table Partitioning ### What is Partitioning? ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ TABLE PARTITIONING │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ Without Partitioning: │ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │ │ orders │ │ │ │ 500 million rows - all in one table │ │ │ │ Every query scans entire table (even with indexes) │ │ │ │ Maintenance operations lock everything │ │ │ └─────────────────────────────────────────────────────────────────────┘ │ │ │ │ With Partitioning: │ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │ │ orders (parent table) │ │ │ ├─────────────┬─────────────┬─────────────┬─────────────────────────┤ │ │ │ orders_2023 │ orders_2024 │ orders_2025 │ ... │ │ │ │ 50M rows │ 150M rows │ 200M rows │ │ │ │ └─────────────┴─────────────┴─────────────┴─────────────────────────┘ │ │ │ │ Benefits: │ │ • Query "WHERE created_at >= '2024-01-01'" only scans 2024/2025 │ │ • VACUUM, REINDEX operate on individual partitions │ │ • Can drop old partitions instantly (vs DELETE + VACUUM) │ │ • Different storage for hot/cold data │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` ### Partition Types Best for: Time-series data, continuous values ```sql theme={null} -- Create partitioned table CREATE TABLE orders ( id BIGINT, customer_id INT, order_date DATE, total DECIMAL(10,2), status VARCHAR(20) ) PARTITION BY RANGE (order_date); -- Create partitions CREATE TABLE orders_2023 PARTITION OF orders FOR VALUES FROM ('2023-01-01') TO ('2024-01-01'); CREATE TABLE orders_2024 PARTITION OF orders FOR VALUES FROM ('2024-01-01') TO ('2025-01-01'); CREATE TABLE orders_2025 PARTITION OF orders FOR VALUES FROM ('2025-01-01') TO ('2026-01-01'); -- PostgreSQL automatically routes inserts INSERT INTO orders VALUES (1, 100, '2024-06-15', 99.99, 'completed'); -- Goes to orders_2024 -- Query only scans relevant partitions EXPLAIN SELECT * FROM orders WHERE order_date >= '2024-06-01'; ``` Best for: Discrete categories, regions ```sql theme={null} CREATE TABLE customers ( id BIGINT, name TEXT, email TEXT, region VARCHAR(10) ) PARTITION BY LIST (region); CREATE TABLE customers_na PARTITION OF customers FOR VALUES IN ('US', 'CA', 'MX'); CREATE TABLE customers_eu PARTITION OF customers FOR VALUES IN ('UK', 'DE', 'FR', 'ES', 'IT'); CREATE TABLE customers_asia PARTITION OF customers FOR VALUES IN ('JP', 'CN', 'IN', 'SG'); -- Default partition for unknown regions CREATE TABLE customers_other PARTITION OF customers DEFAULT; ``` Best for: Even distribution, no natural partition key ```sql theme={null} CREATE TABLE events ( id BIGINT, user_id INT, event_type TEXT, data JSONB ) PARTITION BY HASH (user_id); -- Create 4 partitions CREATE TABLE events_p0 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 0); CREATE TABLE events_p1 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 1); CREATE TABLE events_p2 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 2); CREATE TABLE events_p3 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 3); -- user_id % 4 determines partition ``` ### Partition Maintenance ```sql theme={null} -- Add new partition for next year CREATE TABLE orders_2026 PARTITION OF orders FOR VALUES FROM ('2026-01-01') TO ('2027-01-01'); -- Detach old partition (keeps data, removes from parent) ALTER TABLE orders DETACH PARTITION orders_2020; -- Then archive or drop -- Option 1: Archive to cold storage \copy orders_2020 TO 'orders_2020.csv' CSV HEADER DROP TABLE orders_2020; -- Option 2: Drop immediately DROP TABLE orders_2020; -- Automatic partition creation with pg_partman CREATE EXTENSION pg_partman; SELECT partman.create_parent( p_parent_table := 'public.orders', p_control := 'order_date', p_type := 'native', p_interval := 'monthly' ); ``` *** ## 8.3 Database Sharding ### Sharding Concepts ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ DATABASE SHARDING │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ Partitioning: Split data within ONE database │ │ Sharding: Split data across MULTIPLE databases │ │ │ │ ┌───────────────┐ │ │ │ Application │ │ │ └───────┬───────┘ │ │ │ │ │ ┌───────▼───────┐ │ │ │ Shard Router │ │ │ └───────┬───────┘ │ │ │ │ │ ┌──────────────────────┼──────────────────────┐ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ │ │ Shard 0 │ │ Shard 1 │ │ Shard 2 │ │ │ │ users 0-999K │ │ users 1M-2M │ │ users 2M-3M │ │ │ │ Server: A │ │ Server: B │ │ Server: C │ │ │ └───────────────┘ └───────────────┘ └───────────────┘ │ │ │ │ Each shard is a complete PostgreSQL instance with full functionality │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` ### Sharding Strategies ```python theme={null} # Application determines shard def get_shard(user_id): return user_id % NUM_SHARDS # All user data for user 12345 shard = get_shard(12345) # e.g., shard 1 conn = get_connection(shard) ``` **Pros**: Even distribution, simple **Cons**: Resharding is expensive (data migration) ```python theme={null} # User IDs 0-1M on shard 0, 1M-2M on shard 1, etc. def get_shard(user_id): if user_id < 1_000_000: return 0 elif user_id < 2_000_000: return 1 else: return 2 ``` **Pros**: Range queries possible, easy to add shards **Cons**: Hotspots if ranges accessed unevenly ```python theme={null} # Lookup table maps keys to shards # shard_directory table: # | user_id_range | shard_id | # | 0-500000 | 0 | # | 500001-1000000| 1 | def get_shard(user_id): return lookup_service.get_shard_for_user(user_id) ``` **Pros**: Flexible, can rebalance easily **Cons**: Lookup adds latency, directory is SPOF ### Cross-Shard Challenges **This table is the reason experienced engineers avoid sharding as long as possible.** Each row represents a problem that does not exist in a single-node database and has no perfect solution in a sharded architecture. Read this table as a cost-benefit analysis, not a feature list. | Challenge | Problem | Solution | Hidden Cost | | ---------------- | ------------------------------ | ----------------------------------------- | -------------------------------------------------------------------------------------------- | | **Joins** | Can't join across shards | Denormalize, or application-level joins | Data duplication, stale reads, application complexity | | **Transactions** | No cross-shard ACID | Two-phase commit, or eventual consistency | 2PC is slow and fragile; eventual consistency means reasoning about conflicts | | **Aggregations** | SUM/COUNT across shards | Scatter-gather queries | Latency is bounded by the slowest shard; fan-out amplifies load | | **Sequences** | IDs must be unique globally | UUID, or dedicated ID service | UUIDs are 16 bytes (vs 8 for bigint) and fragment B-Tree indexes; ID services are a new SPOF | | **Rebalancing** | Shards become uneven over time | Live migration between shards | Requires careful coordination; risk of data loss or downtime during migration | *** ## 8.4 Caching Strategies ### Caching Layers **Real-world analogy:** Database caching works like a series of increasingly distant warehouses. Your application checks the closest warehouse (Redis) first, then the regional warehouse (PostgreSQL buffer pool), then the national warehouse (OS page cache), and finally the factory (disk). Each layer is larger but slower. The art of cache engineering is keeping the data most likely to be needed in the closest warehouse. ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ CACHING ARCHITECTURE │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────┐ │ │ │ Client │ │ │ └──────┬──────┘ │ │ │ │ │ ┌──────▼──────┐ │ │ CDN Cache ─────────────▶│ CDN │◀──── Static assets, API responses │ │ └──────┬──────┘ │ │ │ │ │ ┌──────▼──────┐ │ │ App Cache ─────────────▶│ Redis │◀──── Session, hot data, results │ │ └──────┬──────┘ │ │ │ Miss │ │ ┌──────▼──────┐ │ │ Query Cache ────────────▶│ PostgreSQL │ │ │ │ Buffer │◀──── Recently accessed pages │ │ │ Pool │ │ │ └──────┬──────┘ │ │ │ │ │ ┌──────▼──────┐ │ │ OS Cache ──────────────▶│ OS Page │◀──── Filesystem cache │ │ │ Cache │ │ │ └──────┬──────┘ │ │ │ │ │ ┌──────▼──────┐ │ │ │ Disk │ │ │ └─────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` ### Cache Patterns Application manages cache explicitly. ```python theme={null} def get_user(user_id): # 1. Check cache cached = redis.get(f"user:{user_id}") if cached: return json.loads(cached) # 2. Cache miss - fetch from DB user = db.query("SELECT * FROM users WHERE id = %s", user_id) # 3. Populate cache redis.setex(f"user:{user_id}", 3600, json.dumps(user)) return user def update_user(user_id, data): # 1. Update database db.execute("UPDATE users SET ... WHERE id = %s", user_id) # 2. Invalidate cache redis.delete(f"user:{user_id}") ``` Cache is updated with every write. ```python theme={null} def update_user(user_id, data): # 1. Update cache redis.setex(f"user:{user_id}", 3600, json.dumps(data)) # 2. Update database db.execute("UPDATE users SET ... WHERE id = %s", user_id) ``` **Risk**: Cache and DB can get out of sync if DB write fails Writes go to cache, async flush to DB. ```python theme={null} def update_user(user_id, data): # 1. Write to cache only (fast!) redis.setex(f"user:{user_id}", 3600, json.dumps(data)) redis.lpush("write_queue", json.dumps({ "table": "users", "id": user_id, "data": data })) # Background worker def flush_writes(): while item := redis.brpop("write_queue"): db.execute(...) # Batch writes to DB ``` **Risk**: Data loss if cache fails before flush ### Cache Invalidation Strategies **"There are only two hard things in Computer Science: cache invalidation and naming things."** -- Phil Karlton. Cache invalidation is genuinely the hardest part of caching. The patterns below represent different tradeoffs between staleness tolerance and implementation complexity. Choose the simplest one that meets your staleness requirements. ```sql theme={null} -- PostgreSQL LISTEN/NOTIFY for cache invalidation. -- This approach pushes invalidation events to subscribers in real-time, -- eliminating the polling overhead of TTL-based approaches. CREATE OR REPLACE FUNCTION notify_cache_invalidation() RETURNS TRIGGER AS $$ BEGIN PERFORM pg_notify('cache_invalidation', json_build_object( 'table', TG_TABLE_NAME, 'id', NEW.id, 'operation', TG_OP )::text ); RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE TRIGGER user_cache_trigger AFTER INSERT OR UPDATE OR DELETE ON users FOR EACH ROW EXECUTE FUNCTION notify_cache_invalidation(); ``` ```python theme={null} # Python listener import psycopg2 import select conn = psycopg2.connect(...) conn.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT) cur = conn.cursor() cur.execute("LISTEN cache_invalidation;") while True: if select.select([conn], [], [], 5) == ([], [], []): continue conn.poll() while conn.notifies: notify = conn.notifies.pop(0) data = json.loads(notify.payload) redis.delete(f"{data['table']}:{data['id']}") ``` *** ## 8.5 Connection Scaling ### Connection Pooling with PgBouncer ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ MULTI-TIER CONNECTION POOLING │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ Tier 1: Application Pool (e.g., HikariCP) │ │ ┌────────────────────────────────────────────────────────────────────┐ │ │ │ App Server 1 App Server 2 App Server N │ │ │ │ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ │ │ │ │ 10 conns │ │ 10 conns │ │ 10 conns │ │ │ │ │ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ │ │ │ └────────┼─────────────────┼───────────────┼────────────────────────┘ │ │ │ │ │ │ │ └────────────┬────┴───────────────┘ │ │ │ │ │ Tier 2: PgBouncer (1000 client conns → 50 server conns) │ │ ┌─────────────────────┼────────────────────────────────────────────┐ │ │ │ ┌────▼────┐ │ │ │ │ │PgBouncer│ │ │ │ │ │pool=50 │ │ │ │ │ └────┬────┘ │ │ │ └─────────────────────┼────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌──────────────────────────────────────────────────────────────────┐ │ │ │ PostgreSQL │ │ │ │ (50 backends) │ │ │ └──────────────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` ### Optimizing PostgreSQL for Connections **The golden rule of PostgreSQL connections:** Fewer connections usually means higher throughput. This is counterintuitive but well-established. Beyond approximately `2 * CPU_cores + disk_spindles` active connections, performance degrades due to context switching, lock contention in ProcArray, and cache thrashing. A server with 16 cores will often perform better with 40 active connections than with 400. ```sql theme={null} -- postgresql.conf # Connection limits max_connections = 200 -- Keep low! Use a pooler for client-facing connections. superuser_reserved_connections = 3 -- Always keep a few reserved for admin access during emergencies. # Memory per connection work_mem = 64MB -- DANGER: this is per-operation, not per-connection. -- A query with 3 sorts = 3 x 64MB = 192MB for that one query. -- With 200 connections: worst case = 200 x 192MB = 37GB. maintenance_work_mem = 256MB -- Used by VACUUM, CREATE INDEX, ALTER TABLE ADD FOREIGN KEY. # Connection overhead # Each connection uses: # - ~10MB for backend process (fixed cost) # - work_mem per sort/hash operation (variable cost) # - Session state (catalogs, prepared statements) # # Rule of thumb: # (max_connections * average_work_mem_per_connection) < 50% of available RAM # Leave the other 50% for shared_buffers and OS page cache. ``` *** ## 8.6 Global Distribution ### Multi-Region Architecture ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ MULTI-REGION DATABASE │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │ │ US-EAST │ │ │ │ ┌─────────────┐ ┌─────────────┐ │ │ │ │ │ Primary │──sync────▶│ Standby │ │ │ │ │ │ (writes) │ │ (failover) │ │ │ │ │ └──────┬──────┘ └─────────────┘ │ │ │ │ │ │ │ │ └─────────┼────────────────────────────────────────────────────────────┘ │ │ │ │ │ async │ replication │ │ │ │ │ ┌─────────┼────────────────────────────────────────────────────────────┐ │ │ │ ▼ EU-WEST │ │ │ │ ┌─────────────┐ │ │ │ │ │ Replica │◀─────────────── EU users (reads) │ │ │ │ │ (read-only) │ │ │ │ │ └─────────────┘ │ │ │ └──────────────────────────────────────────────────────────────────────┘ │ │ │ │ ┌──────────────────────────────────────────────────────────────────────┐ │ │ │ ASIA-PACIFIC │ │ │ │ ┌─────────────┐ │ │ │ │ │ Replica │◀─────────────── APAC users (reads) │ │ │ │ │ (read-only) │ │ │ │ │ └─────────────┘ │ │ │ └──────────────────────────────────────────────────────────────────────┘ │ │ │ │ Challenge: Write latency for non-US users (cross-Atlantic round trip) │ │ Solutions: Accept latency, or use multi-master with conflict resolution │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` ### Geo-Sharding **Practical context:** Geo-sharding is often driven by data residency regulations (GDPR, CCPA, data sovereignty laws) as much as by performance requirements. If your EU users' data must stay in EU data centers, geo-sharding is not optional -- it is a compliance requirement. ```python theme={null} # Route users to their nearest shard based on region REGION_SHARDS = { 'us': 'postgres://shard-us.db.example.com/app', 'eu': 'postgres://shard-eu.db.example.com/app', 'asia': 'postgres://shard-asia.db.example.com/app', } def get_db_connection(user): region = determine_user_region(user) return connect(REGION_SHARDS[region]) # User data stays in their region (data residency compliance). # Cross-region queries require a federation layer. # # Key design decision: what happens when a US user interacts # with an EU user? You need either (a) a reference table replicated # to all regions, or (b) an application-level fan-out that queries # multiple shards and merges results. Neither is free. ``` *** ## 8.7 Scaling Decision Framework ### Decision Tree ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ SCALING DECISION FRAMEWORK │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ 1. Is your database actually the bottleneck? │ │ ├── No → Fix application code first │ │ └── Yes ↓ │ │ │ │ 2. Have you optimized queries and indexes? │ │ ├── No → Do that first (cheapest fix) │ │ └── Yes ↓ │ │ │ │ 3. Is it a read-heavy workload? │ │ ├── Yes → Add read replicas + caching │ │ └── No (write-heavy) ↓ │ │ │ │ 4. Can you scale vertically? │ │ ├── Yes → Upgrade hardware (simpler) │ │ └── No (maxed out) ↓ │ │ │ │ 5. Can you partition within one database? │ │ ├── Yes → Table partitioning │ │ └── No (too big) ↓ │ │ │ │ 6. Can you shard by a natural key? │ │ ├── Yes → Implement sharding │ │ └── No → Consider different database architecture │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` ### Scaling Complexity Ladder | Stage | Size | Strategy | Complexity | | ----- | --------- | ------------------------------------ | ---------- | | **1** | \< 10GB | Single server, good indexes | ⭐ | | **2** | 10-100GB | Vertical scaling, query optimization | ⭐⭐ | | **3** | 100GB-1TB | Read replicas, connection pooling | ⭐⭐⭐ | | **4** | 1-10TB | Table partitioning, caching layer | ⭐⭐⭐⭐ | | **5** | > 10TB | Sharding, dedicated infrastructure | ⭐⭐⭐⭐⭐ | *** ## 8.8 Practice: Implement Partitioning ### Lab Exercise Create a time-series orders table with monthly partitions and automated maintenance. ```sql theme={null} -- 1. Create partitioned orders table CREATE TABLE orders ( id BIGINT GENERATED ALWAYS AS IDENTITY, customer_id INT NOT NULL, order_date TIMESTAMPTZ NOT NULL DEFAULT NOW(), total DECIMAL(12,2) NOT NULL, status VARCHAR(20) DEFAULT 'pending', items JSONB, PRIMARY KEY (id, order_date) ) PARTITION BY RANGE (order_date); -- 2. Create monthly partitions for 2024 DO $$ DECLARE start_date DATE := '2024-01-01'; end_date DATE; partition_name TEXT; BEGIN FOR i IN 0..11 LOOP start_date := DATE '2024-01-01' + (i || ' months')::INTERVAL; end_date := start_date + '1 month'::INTERVAL; partition_name := 'orders_' || TO_CHAR(start_date, 'YYYY_MM'); EXECUTE format( 'CREATE TABLE %I PARTITION OF orders FOR VALUES FROM (%L) TO (%L)', partition_name, start_date, end_date ); END LOOP; END $$; -- 3. Create indexes on partitions (automatically applied to all) CREATE INDEX idx_orders_customer ON orders(customer_id); CREATE INDEX idx_orders_status ON orders(status); -- 4. Insert test data INSERT INTO orders (customer_id, order_date, total, status, items) SELECT (random() * 10000)::INT, '2024-01-01'::TIMESTAMPTZ + (random() * 365 * INTERVAL '1 day'), (random() * 1000)::DECIMAL(12,2), (ARRAY['pending', 'processing', 'shipped', 'delivered'])[1 + (random()*3)::INT], jsonb_build_object('item_count', (random() * 10)::INT) FROM generate_series(1, 1000000); -- 5. Verify partition pruning EXPLAIN (ANALYZE, COSTS OFF) SELECT * FROM orders WHERE order_date >= '2024-06-01' AND order_date < '2024-07-01'; -- Should only scan orders_2024_06 -- 6. Maintenance: Drop old partition ALTER TABLE orders DETACH PARTITION orders_2024_01; DROP TABLE orders_2024_01; -- 7. Add new partition for next period CREATE TABLE orders_2025_01 PARTITION OF orders FOR VALUES FROM ('2025-01-01') TO ('2025-02-01'); ``` *** ## Summary You've learned how to scale PostgreSQL from a single instance to a globally distributed system. Key takeaways: Indexes and queries before hardware Plan for growth from the start Right data, right layer *** ## Course Completion Congratulations! You've completed the Database Engineering course. **You now know how to:** * Write efficient SQL and design schemas * Understand transactions and isolation * Optimize queries and indexes * Navigate PostgreSQL internals * Build highly available systems * Scale to millions of users ### Next Steps Take the certification exam Connect with other database engineers *** ## Interview Deep-Dive **Strong Answer:** * The scaling ladder in order: (1) Query optimization via pg\_stat\_statements -- find top 10 queries by total\_exec\_time, fix missing indexes and N+1 patterns (often buys 2-5x headroom with zero infrastructure changes). (2) Configuration tuning -- verify shared\_buffers at 25% RAM, random\_page\_cost at 1.1 for SSD. (3) PgBouncer in transaction mode to reduce max\_connections from 500 to 100. (4) Read replicas for read-heavy workloads. (5) Table partitioning for the largest tables. (6) Vertical scaling (modern instances offer 96+ cores, 768GB RAM). (7) Sharding only after exhausting all of the above. **Follow-up: When do you know sharding is truly necessary?** The signal is write throughput saturation on a single node. If WAL generation rate is saturating disk I/O after all optimizations, you have outgrown single-node. Read scaling never requires sharding -- replicas handle that. It is always write scaling that forces the sharding decision. **Strong Answer:** * Partitioning splits a table within one instance -- all SQL features work normally. Sharding splits across multiple instances -- requires routing layer, breaks cross-shard JOINs and transactions. Partition when single-node hardware is sufficient but table size causes maintenance issues. Shard when write throughput exceeds single-node capacity. * Common mistakes: (1) Too many partitions -- daily partitions on 10-year data creates 3650 partitions, causing planner overhead and catalog bloat. Use monthly or quarterly. (2) Queries missing the partition key in WHERE, causing full partition scans. (3) Not automating future partition creation -- inserts fail when data falls outside existing ranges. Use pg\_partman. **Follow-up: Can you partition an existing large table without downtime?** Yes. Create a new partitioned table, use logical replication or trigger-based sync to replicate data from old to new, then perform a brief lock to swap table names. pg\_partman and pgloader assist. Alternatively, ATTACH existing tables as partitions if data already conforms to partition boundaries. **Strong Answer:** * PostgreSQL's buffer pool is already a cache. If the working set fits in RAM, PostgreSQL serves data at sub-millisecond latency. Adding Redis only helps when: (1) the same small result is read thousands of times/sec (hot key -- Redis handles 100K+ reads/sec per key), (2) session/ephemeral data without durability needs, (3) expensive computed results that tolerate bounded staleness, or (4) rate limiting and atomic counters. * Cache invalidation: use cache-aside with TTL for simplicity. For stronger consistency, use PostgreSQL LISTEN/NOTIFY to push invalidation events. Never use write-behind for critical data. * The biggest mistake: caching individual row lookups when the real bottleneck is an expensive aggregation query. Cache the aggregation result, not individual rows. **Follow-up: How does pg\_prewarm fit in?** pg\_prewarm loads tables/indexes into shared\_buffers proactively. After a restart, the buffer pool is cold. Combined with autoprewarm (PG 11+), which saves buffer contents on shutdown and reloads on startup, you maintain warm cache across restarts -- reducing the need for external caching to handle cold-start.