Module 8: Scaling Strategies
When your application grows beyond what a single server can handle, you need scaling strategies. This module covers horizontal scaling, sharding, and partitioning.Estimated Time: 12-14 hours
Hands-On: Implement table partitioning
Key Skill: Choosing the right scaling strategy
Hands-On: Implement table partitioning
Key Skill: Choosing the right scaling strategy
8.1 Scaling Fundamentals
Vertical vs Horizontal Scaling
Copy
┌─────────────────────────────────────────────────────────────────────────────┐
│ VERTICAL VS HORIZONTAL SCALING │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ VERTICAL SCALING (Scale Up): │
│ ───────────────────────────── │
│ │
│ Before: After: │
│ ┌─────────┐ ┌─────────────────┐ │
│ │ 4 cores │ ──▶ │ 64 cores │ │
│ │ 16GB │ │ 512GB │ │
│ │ 1TB │ │ 20TB │ │
│ └─────────┘ └─────────────────┘ │
│ │
│ Pros: Simple, no code changes │
│ Cons: Hardware limits, expensive, single point of failure │
│ │
│ HORIZONTAL SCALING (Scale Out): │
│ ──────────────────────────────── │
│ │
│ Before: After: │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ 4 cores │ ──▶ │ 4 cores │ │ 4 cores │ │ 4 cores │ │ 4 cores │ │
│ │ 16GB │ │ 16GB │ │ 16GB │ │ 16GB │ │ 16GB │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
│ │
│ Pros: Linear scaling, fault tolerant, cost effective │
│ Cons: Complex, requires application changes, CAP theorem │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
When to Scale
| Signal | What It Means | Action |
|---|---|---|
| CPU > 80% sustained | Query processing bottleneck | Read replicas, optimize queries |
| Memory pressure | Working set > RAM | More RAM, or shard data |
| Disk I/O saturated | Too many reads/writes | Faster disks, caching, sharding |
| Connection limits | Too many clients | Connection pooling |
| Response time degradation | All of the above | Profile and optimize |
8.2 Table Partitioning
What is Partitioning?
Copy
┌─────────────────────────────────────────────────────────────────────────────┐
│ TABLE PARTITIONING │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Without Partitioning: │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ orders │ │
│ │ 500 million rows - all in one table │ │
│ │ Every query scans entire table (even with indexes) │ │
│ │ Maintenance operations lock everything │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ With Partitioning: │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ orders (parent table) │ │
│ ├─────────────┬─────────────┬─────────────┬─────────────────────────┤ │
│ │ orders_2023 │ orders_2024 │ orders_2025 │ ... │ │
│ │ 50M rows │ 150M rows │ 200M rows │ │ │
│ └─────────────┴─────────────┴─────────────┴─────────────────────────┘ │
│ │
│ Benefits: │
│ • Query "WHERE created_at >= '2024-01-01'" only scans 2024/2025 │
│ • VACUUM, REINDEX operate on individual partitions │
│ • Can drop old partitions instantly (vs DELETE + VACUUM) │
│ • Different storage for hot/cold data │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Partition Types
- Range Partitioning
- List Partitioning
- Hash Partitioning
Best for: Time-series data, continuous values
Copy
-- Create partitioned table
CREATE TABLE orders (
id BIGINT,
customer_id INT,
order_date DATE,
total DECIMAL(10,2),
status VARCHAR(20)
) PARTITION BY RANGE (order_date);
-- Create partitions
CREATE TABLE orders_2023 PARTITION OF orders
FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');
CREATE TABLE orders_2024 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
CREATE TABLE orders_2025 PARTITION OF orders
FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');
-- PostgreSQL automatically routes inserts
INSERT INTO orders VALUES (1, 100, '2024-06-15', 99.99, 'completed');
-- Goes to orders_2024
-- Query only scans relevant partitions
EXPLAIN SELECT * FROM orders WHERE order_date >= '2024-06-01';
Best for: Discrete categories, regions
Copy
CREATE TABLE customers (
id BIGINT,
name TEXT,
email TEXT,
region VARCHAR(10)
) PARTITION BY LIST (region);
CREATE TABLE customers_na PARTITION OF customers
FOR VALUES IN ('US', 'CA', 'MX');
CREATE TABLE customers_eu PARTITION OF customers
FOR VALUES IN ('UK', 'DE', 'FR', 'ES', 'IT');
CREATE TABLE customers_asia PARTITION OF customers
FOR VALUES IN ('JP', 'CN', 'IN', 'SG');
-- Default partition for unknown regions
CREATE TABLE customers_other PARTITION OF customers
DEFAULT;
Best for: Even distribution, no natural partition key
Copy
CREATE TABLE events (
id BIGINT,
user_id INT,
event_type TEXT,
data JSONB
) PARTITION BY HASH (user_id);
-- Create 4 partitions
CREATE TABLE events_p0 PARTITION OF events
FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE events_p1 PARTITION OF events
FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE events_p2 PARTITION OF events
FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE events_p3 PARTITION OF events
FOR VALUES WITH (MODULUS 4, REMAINDER 3);
-- user_id % 4 determines partition
Partition Maintenance
Copy
-- Add new partition for next year
CREATE TABLE orders_2026 PARTITION OF orders
FOR VALUES FROM ('2026-01-01') TO ('2027-01-01');
-- Detach old partition (keeps data, removes from parent)
ALTER TABLE orders DETACH PARTITION orders_2020;
-- Then archive or drop
-- Option 1: Archive to cold storage
\copy orders_2020 TO 'orders_2020.csv' CSV HEADER
DROP TABLE orders_2020;
-- Option 2: Drop immediately
DROP TABLE orders_2020;
-- Automatic partition creation with pg_partman
CREATE EXTENSION pg_partman;
SELECT partman.create_parent(
p_parent_table := 'public.orders',
p_control := 'order_date',
p_type := 'native',
p_interval := 'monthly'
);
8.3 Database Sharding
Sharding Concepts
Copy
┌─────────────────────────────────────────────────────────────────────────────┐
│ DATABASE SHARDING │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Partitioning: Split data within ONE database │
│ Sharding: Split data across MULTIPLE databases │
│ │
│ ┌───────────────┐ │
│ │ Application │ │
│ └───────┬───────┘ │
│ │ │
│ ┌───────▼───────┐ │
│ │ Shard Router │ │
│ └───────┬───────┘ │
│ │ │
│ ┌──────────────────────┼──────────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
│ │ Shard 0 │ │ Shard 1 │ │ Shard 2 │ │
│ │ users 0-999K │ │ users 1M-2M │ │ users 2M-3M │ │
│ │ Server: A │ │ Server: B │ │ Server: C │ │
│ └───────────────┘ └───────────────┘ └───────────────┘ │
│ │
│ Each shard is a complete PostgreSQL instance with full functionality │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Sharding Strategies
Key-Based (Hash) Sharding
Key-Based (Hash) Sharding
Copy
# Application determines shard
def get_shard(user_id):
return user_id % NUM_SHARDS
# All user data for user 12345
shard = get_shard(12345) # e.g., shard 1
conn = get_connection(shard)
Range-Based Sharding
Range-Based Sharding
Copy
# User IDs 0-1M on shard 0, 1M-2M on shard 1, etc.
def get_shard(user_id):
if user_id < 1_000_000:
return 0
elif user_id < 2_000_000:
return 1
else:
return 2
Directory-Based Sharding
Directory-Based Sharding
Copy
# Lookup table maps keys to shards
# shard_directory table:
# | user_id_range | shard_id |
# | 0-500000 | 0 |
# | 500001-1000000| 1 |
def get_shard(user_id):
return lookup_service.get_shard_for_user(user_id)
Cross-Shard Challenges
| Challenge | Problem | Solution |
|---|---|---|
| Joins | Can’t join across shards | Denormalize, or application-level joins |
| Transactions | No cross-shard ACID | Two-phase commit, or eventual consistency |
| Aggregations | SUM/COUNT across shards | Scatter-gather queries |
| Sequences | IDs must be unique globally | UUID, or dedicated ID service |
8.4 Caching Strategies
Caching Layers
Copy
┌─────────────────────────────────────────────────────────────────────────────┐
│ CACHING ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ │
│ │ Client │ │
│ └──────┬──────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ CDN Cache ─────────────▶│ CDN │◀──── Static assets, API responses │
│ └──────┬──────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ App Cache ─────────────▶│ Redis │◀──── Session, hot data, results │
│ └──────┬──────┘ │
│ │ Miss │
│ ┌──────▼──────┐ │
│ Query Cache ────────────▶│ PostgreSQL │ │
│ │ Buffer │◀──── Recently accessed pages │
│ │ Pool │ │
│ └──────┬──────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ OS Cache ──────────────▶│ OS Page │◀──── Filesystem cache │
│ │ Cache │ │
│ └──────┬──────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ │ Disk │ │
│ └─────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Cache Patterns
- Cache-Aside
- Write-Through
- Write-Behind
Application manages cache explicitly.
Copy
def get_user(user_id):
# 1. Check cache
cached = redis.get(f"user:{user_id}")
if cached:
return json.loads(cached)
# 2. Cache miss - fetch from DB
user = db.query("SELECT * FROM users WHERE id = %s", user_id)
# 3. Populate cache
redis.setex(f"user:{user_id}", 3600, json.dumps(user))
return user
def update_user(user_id, data):
# 1. Update database
db.execute("UPDATE users SET ... WHERE id = %s", user_id)
# 2. Invalidate cache
redis.delete(f"user:{user_id}")
Cache is updated with every write.Risk: Cache and DB can get out of sync if DB write fails
Copy
def update_user(user_id, data):
# 1. Update cache
redis.setex(f"user:{user_id}", 3600, json.dumps(data))
# 2. Update database
db.execute("UPDATE users SET ... WHERE id = %s", user_id)
Writes go to cache, async flush to DB.Risk: Data loss if cache fails before flush
Copy
def update_user(user_id, data):
# 1. Write to cache only (fast!)
redis.setex(f"user:{user_id}", 3600, json.dumps(data))
redis.lpush("write_queue", json.dumps({
"table": "users",
"id": user_id,
"data": data
}))
# Background worker
def flush_writes():
while item := redis.brpop("write_queue"):
db.execute(...) # Batch writes to DB
Cache Invalidation Strategies
Copy
-- PostgreSQL LISTEN/NOTIFY for cache invalidation
CREATE OR REPLACE FUNCTION notify_cache_invalidation()
RETURNS TRIGGER AS $$
BEGIN
PERFORM pg_notify('cache_invalidation',
json_build_object(
'table', TG_TABLE_NAME,
'id', NEW.id,
'operation', TG_OP
)::text
);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER user_cache_trigger
AFTER INSERT OR UPDATE OR DELETE ON users
FOR EACH ROW EXECUTE FUNCTION notify_cache_invalidation();
Copy
# Python listener
import psycopg2
import select
conn = psycopg2.connect(...)
conn.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT)
cur = conn.cursor()
cur.execute("LISTEN cache_invalidation;")
while True:
if select.select([conn], [], [], 5) == ([], [], []):
continue
conn.poll()
while conn.notifies:
notify = conn.notifies.pop(0)
data = json.loads(notify.payload)
redis.delete(f"{data['table']}:{data['id']}")
8.5 Connection Scaling
Connection Pooling with PgBouncer
Copy
┌─────────────────────────────────────────────────────────────────────────────┐
│ MULTI-TIER CONNECTION POOLING │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Tier 1: Application Pool (e.g., HikariCP) │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ App Server 1 App Server 2 App Server N │ │
│ │ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ │
│ │ │ 10 conns │ │ 10 conns │ │ 10 conns │ │ │
│ │ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ │ │
│ └────────┼─────────────────┼───────────────┼────────────────────────┘ │
│ │ │ │ │
│ └────────────┬────┴───────────────┘ │
│ │ │
│ Tier 2: PgBouncer (1000 client conns → 50 server conns) │
│ ┌─────────────────────┼────────────────────────────────────────────┐ │
│ │ ┌────▼────┐ │ │
│ │ │PgBouncer│ │ │
│ │ │pool=50 │ │ │
│ │ └────┬────┘ │ │
│ └─────────────────────┼────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ PostgreSQL │ │
│ │ (50 backends) │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Optimizing PostgreSQL for Connections
Copy
-- postgresql.conf
# Connection limits
max_connections = 200 # Keep low! Use pooler
superuser_reserved_connections = 3
# Memory per connection
work_mem = 64MB # Careful: per sort/hash operation
maintenance_work_mem = 256MB
# Connection overhead
# Each connection uses:
# - ~10MB for backend process
# - work_mem per operation
# - Session state
# Rule of thumb:
# max_connections × work_mem < 50% of available RAM
8.6 Global Distribution
Multi-Region Architecture
Copy
┌─────────────────────────────────────────────────────────────────────────────┐
│ MULTI-REGION DATABASE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ US-EAST │ │
│ │ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Primary │──sync────▶│ Standby │ │ │
│ │ │ (writes) │ │ (failover) │ │ │
│ │ └──────┬──────┘ └─────────────┘ │ │
│ │ │ │ │
│ └─────────┼────────────────────────────────────────────────────────────┘ │
│ │ │
│ async │ replication │
│ │ │
│ ┌─────────┼────────────────────────────────────────────────────────────┐ │
│ │ ▼ EU-WEST │ │
│ │ ┌─────────────┐ │ │
│ │ │ Replica │◀─────────────── EU users (reads) │ │
│ │ │ (read-only) │ │ │
│ │ └─────────────┘ │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ ASIA-PACIFIC │ │
│ │ ┌─────────────┐ │ │
│ │ │ Replica │◀─────────────── APAC users (reads) │ │
│ │ │ (read-only) │ │ │
│ │ └─────────────┘ │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
│ │
│ Challenge: Write latency for non-US users (cross-Atlantic round trip) │
│ Solutions: Accept latency, or use multi-master with conflict resolution │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Geo-Sharding
Copy
# Route users to their nearest shard based on region
REGION_SHARDS = {
'us': 'postgres://shard-us.db.example.com/app',
'eu': 'postgres://shard-eu.db.example.com/app',
'asia': 'postgres://shard-asia.db.example.com/app',
}
def get_db_connection(user):
region = determine_user_region(user)
return connect(REGION_SHARDS[region])
# User data stays in their region (data residency compliance)
# Cross-region queries require federation layer
8.7 Scaling Decision Framework
Decision Tree
Copy
┌─────────────────────────────────────────────────────────────────────────────┐
│ SCALING DECISION FRAMEWORK │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. Is your database actually the bottleneck? │
│ ├── No → Fix application code first │
│ └── Yes ↓ │
│ │
│ 2. Have you optimized queries and indexes? │
│ ├── No → Do that first (cheapest fix) │
│ └── Yes ↓ │
│ │
│ 3. Is it a read-heavy workload? │
│ ├── Yes → Add read replicas + caching │
│ └── No (write-heavy) ↓ │
│ │
│ 4. Can you scale vertically? │
│ ├── Yes → Upgrade hardware (simpler) │
│ └── No (maxed out) ↓ │
│ │
│ 5. Can you partition within one database? │
│ ├── Yes → Table partitioning │
│ └── No (too big) ↓ │
│ │
│ 6. Can you shard by a natural key? │
│ ├── Yes → Implement sharding │
│ └── No → Consider different database architecture │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Scaling Complexity Ladder
| Stage | Size | Strategy | Complexity |
|---|---|---|---|
| 1 | < 10GB | Single server, good indexes | ⭐ |
| 2 | 10-100GB | Vertical scaling, query optimization | ⭐⭐ |
| 3 | 100GB-1TB | Read replicas, connection pooling | ⭐⭐⭐ |
| 4 | 1-10TB | Table partitioning, caching layer | ⭐⭐⭐⭐ |
| 5 | > 10TB | Sharding, dedicated infrastructure | ⭐⭐⭐⭐⭐ |
8.8 Practice: Implement Partitioning
Lab Exercise
Create a time-series orders table with monthly partitions and automated maintenance.Complete Solution
Complete Solution
Copy
-- 1. Create partitioned orders table
CREATE TABLE orders (
id BIGINT GENERATED ALWAYS AS IDENTITY,
customer_id INT NOT NULL,
order_date TIMESTAMPTZ NOT NULL DEFAULT NOW(),
total DECIMAL(12,2) NOT NULL,
status VARCHAR(20) DEFAULT 'pending',
items JSONB,
PRIMARY KEY (id, order_date)
) PARTITION BY RANGE (order_date);
-- 2. Create monthly partitions for 2024
DO $$
DECLARE
start_date DATE := '2024-01-01';
end_date DATE;
partition_name TEXT;
BEGIN
FOR i IN 0..11 LOOP
start_date := DATE '2024-01-01' + (i || ' months')::INTERVAL;
end_date := start_date + '1 month'::INTERVAL;
partition_name := 'orders_' || TO_CHAR(start_date, 'YYYY_MM');
EXECUTE format(
'CREATE TABLE %I PARTITION OF orders
FOR VALUES FROM (%L) TO (%L)',
partition_name, start_date, end_date
);
END LOOP;
END $$;
-- 3. Create indexes on partitions (automatically applied to all)
CREATE INDEX idx_orders_customer ON orders(customer_id);
CREATE INDEX idx_orders_status ON orders(status);
-- 4. Insert test data
INSERT INTO orders (customer_id, order_date, total, status, items)
SELECT
(random() * 10000)::INT,
'2024-01-01'::TIMESTAMPTZ + (random() * 365 * INTERVAL '1 day'),
(random() * 1000)::DECIMAL(12,2),
(ARRAY['pending', 'processing', 'shipped', 'delivered'])[1 + (random()*3)::INT],
jsonb_build_object('item_count', (random() * 10)::INT)
FROM generate_series(1, 1000000);
-- 5. Verify partition pruning
EXPLAIN (ANALYZE, COSTS OFF)
SELECT * FROM orders
WHERE order_date >= '2024-06-01' AND order_date < '2024-07-01';
-- Should only scan orders_2024_06
-- 6. Maintenance: Drop old partition
ALTER TABLE orders DETACH PARTITION orders_2024_01;
DROP TABLE orders_2024_01;
-- 7. Add new partition for next period
CREATE TABLE orders_2025_01 PARTITION OF orders
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
Summary
You’ve learned how to scale PostgreSQL from a single instance to a globally distributed system. Key takeaways:Optimize First
Indexes and queries before hardware
Partition Early
Plan for growth from the start
Cache Strategically
Right data, right layer
Course Completion
Congratulations! You’ve completed the Database Engineering course.
- Write efficient SQL and design schemas
- Understand transactions and isolation
- Optimize queries and indexes
- Navigate PostgreSQL internals
- Build highly available systems
- Scale to millions of users