Module 7: Replication & High Availability
Production databases must survive hardware failures, network issues, and data center outages. This module teaches you how to build highly available PostgreSQL deployments.Estimated Time: 10-12 hours
Hands-On: Set up streaming replication
Key Skill: Designing for 99.99% uptime
Hands-On: Set up streaming replication
Key Skill: Designing for 99.99% uptime
7.1 High Availability Concepts
The Need for Replication
Copy
┌─────────────────────────────────────────────────────────────────────────────┐
│ WHY YOU NEED REPLICATION │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Without Replication: │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ ┌────────────┐ ┌────────────────┐ │ │
│ │ │ App │────────▶│ Database │ │ │
│ │ │ Server │ │ (Primary) │ │ │
│ │ └────────────┘ └───────┬────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ FAILURE │ │
│ │ │ │
│ │ Result: Complete outage until server is repaired │ │
│ │ Risk of data loss if disk fails │ │
│ │ Recovery time: hours to days │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ With Replication: │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ ┌────────────┐ ┌────────────────┐ │ │
│ │ │ App │────────▶│ Primary DB │ │ │
│ │ │ Server │ └───────┬────────┘ │ │
│ │ └─────┬──────┘ │ WAL Stream │ │
│ │ │ ▼ │ │
│ │ │ ┌────────────────┐ │ │
│ │ └───(failover)──▶│ Standby DB │ │ │
│ │ └────────────────┘ │ │
│ │ │ │
│ │ Result: Automatic failover in seconds │ │
│ │ Zero data loss (with sync replication) │ │
│ │ Read queries can use standby │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Replication Types
- Physical (Streaming)
- Logical
Replicates WAL bytes - exact binary copy of primary.Pros:
Copy
Primary ─────WAL─────▶ Standby
(byte stream)
- Complete replica (all databases, schemas)
- Simple setup
- Low overhead on primary
- Same PostgreSQL major version required
- Cannot select specific tables
- Standby is read-only
Replicates row changes - decoded from WAL.Pros:
Copy
Primary ─────CHANGES─────▶ Subscriber
(INSERT, UPDATE, DELETE)
- Cross-version replication
- Selective table replication
- Subscriber can have different schema/indexes
- Subscriber can be writable
- Higher overhead (decoding)
- Doesn’t replicate DDL
- More complex to set up
7.2 Streaming Replication Setup
Primary Server Configuration
Copy
-- postgresql.conf on primary
wal_level = replica -- Enable replication
max_wal_senders = 5 -- Max replication connections
max_replication_slots = 5 -- Prevent WAL removal before standby catches up
synchronous_commit = on -- Wait for WAL flush
-- For synchronous replication (zero data loss)
synchronous_standby_names = 'FIRST 1 (standby1, standby2)'
-- pg_hba.conf - allow replication connections
-- host replication replicator 192.168.1.0/24 scram-sha-256
Copy
-- Create replication user
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secure_password';
-- Create replication slot (prevents WAL cleanup before standby catches up)
SELECT pg_create_physical_replication_slot('standby1_slot');
-- View slots
SELECT * FROM pg_replication_slots;
Standby Server Setup
Copy
# Take base backup from primary
pg_basebackup -h primary_host -U replicator -D /var/lib/postgresql/data \
-P -Xs -R -S standby1_slot
# The -R flag creates standby.signal and configures primary_conninfo
Copy
-- postgresql.auto.conf on standby (created by pg_basebackup -R)
primary_conninfo = 'host=primary_host port=5432 user=replicator password=xxx'
primary_slot_name = 'standby1_slot'
-- postgresql.conf on standby
hot_standby = on -- Allow read queries on standby
Monitoring Replication
Copy
-- On primary: View connected standbys
SELECT
client_addr,
state,
sent_lsn,
write_lsn,
flush_lsn,
replay_lsn,
pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replication_lag_bytes
FROM pg_stat_replication;
-- On standby: View replication status
SELECT
status,
received_lsn,
latest_end_lsn,
pg_last_wal_receive_lsn(),
pg_last_wal_replay_lsn(),
pg_last_xact_replay_timestamp()
FROM pg_stat_wal_receiver;
-- Calculate lag in seconds
SELECT
EXTRACT(EPOCH FROM (NOW() - pg_last_xact_replay_timestamp())) AS lag_seconds
FROM pg_stat_wal_receiver;
Failover Process Deep Dive
- Faster recovery (seconds to minutes)
- Risk of split-brain if network partition
- Requires robust failure detection
- Slower (minutes to hours)
- Human verification prevents mistakes
- Better for planned maintenance
- ✓ Verify leader is truly dead (not just slow)
- ✓ Choose most up-to-date follower
- ✓ Promote follower to leader
- ✓ Update DNS/load balancer
- ✓ Reconfigure other followers
- ✓ Monitor for split-brain
- ✓ Investigate root cause
7.3 Synchronous vs Asynchronous
Copy
┌─────────────────────────────────────────────────────────────────────────────┐
│ SYNCHRONOUS vs ASYNCHRONOUS REPLICATION │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ASYNCHRONOUS (default): │
│ ──────────────────────── │
│ │
│ Client Primary Standby │
│ │ │
│ │──COMMIT──▶│ │
│ │ │──WAL FLUSH──▶DISK │
│ │◀──OK──────│ │
│ │ │ │ │
│ │ │──WAL──────▶│ (async, may lag) │
│ │ │ │──REPLAY──▶ │
│ │
│ Pros: Lowest latency, primary doesn't wait │
│ Cons: Standby can be behind, data loss risk on failover │
│ │
│ SYNCHRONOUS: │
│ ───────────── │
│ │
│ Client Primary Standby │
│ │ │
│ │──COMMIT──▶│ │
│ │ │──WAL FLUSH──▶DISK │
│ │ │──WAL──────────────────▶│ │
│ │ │ │──FLUSH──▶DISK │
│ │ │◀───────────OK──────────│ │
│ │◀──OK──────│ │
│ │
│ Pros: Zero data loss (WAL on both disks before commit returns) │
│ Cons: Higher latency (network round trip per commit) │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Configuration Options
Copy
-- Asynchronous (default)
synchronous_commit = on
synchronous_standby_names = ''
-- Synchronous with named standbys
synchronous_standby_names = 'standby1' -- Wait for standby1
-- First N of list (any 1 from list)
synchronous_standby_names = 'FIRST 1 (standby1, standby2)'
-- Any N of list (quorum)
synchronous_standby_names = 'ANY 2 (standby1, standby2, standby3)'
-- Per-transaction control
SET synchronous_commit = off; -- This transaction doesn't wait
7.4 Failover and Switchover
Planned Switchover
Copy
# Step 1: Stop writes to primary
# (Application should connect to new primary after switch)
# Step 2: Ensure standby is caught up
psql -h primary -c "SELECT pg_current_wal_lsn();"
psql -h standby -c "SELECT pg_last_wal_replay_lsn();"
# Both should match
# Step 3: Promote standby
pg_ctl promote -D /var/lib/postgresql/data
# Or
psql -c "SELECT pg_promote();"
# Step 4: Old primary becomes standby (optional)
# Configure recovery settings and restart
Automatic Failover with Patroni
Copy
# patroni.yml
scope: postgres-cluster
name: node1
restapi:
listen: 0.0.0.0:8008
connect_address: 192.168.1.1:8008
etcd:
hosts: etcd1:2379,etcd2:2379,etcd3:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
listen: 0.0.0.0:5432
connect_address: 192.168.1.1:5432
data_dir: /var/lib/postgresql/data
authentication:
replication:
username: replicator
password: secret
7.5 Logical Replication
Publisher Setup
Copy
-- On publisher
CREATE PUBLICATION my_publication
FOR TABLE users, orders, products;
-- Or all tables
CREATE PUBLICATION all_tables FOR ALL TABLES;
-- View publications
SELECT * FROM pg_publication;
SELECT * FROM pg_publication_tables;
Subscriber Setup
Copy
-- On subscriber (tables must exist with same schema)
CREATE SUBSCRIPTION my_subscription
CONNECTION 'host=publisher port=5432 dbname=mydb user=replicator'
PUBLICATION my_publication;
-- View subscriptions
SELECT * FROM pg_subscription;
SELECT * FROM pg_stat_subscription;
-- Pause/resume
ALTER SUBSCRIPTION my_subscription DISABLE;
ALTER SUBSCRIPTION my_subscription ENABLE;
Use Cases for Logical Replication
Cross-Version Migration
Replicate from PG 14 to PG 16, then switch over
Selective Replication
Only sync specific tables to analytics database
Multi-Master
Write to different tables on different servers
Data Consolidation
Combine data from multiple sources
7.6 Connection Pooling
The Problem
Copy
┌─────────────────────────────────────────────────────────────────────────────┐
│ CONNECTION OVERHEAD │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Each PostgreSQL connection: │
│ • Spawns a backend process (~5-10 MB RAM) │
│ • Has connection establishment overhead (~100ms) │
│ • Maintains session state │
│ │
│ 100 app servers × 20 connections each = 2000 connections! │
│ │
│ Without Pooler: │
│ ┌────────┐ │
│ │ App 1 │──conn1──┐ │
│ │ │──conn2──┤ │
│ └────────┘ │ │
│ ┌────────┐ ├──▶ PostgreSQL (2000 backends = 20GB RAM!) │
│ │ App 2 │──conn3──┤ │
│ │ │──conn4──┤ │
│ └────────┘ │ │
│ ... │ │
│ │
│ With Connection Pooler: │
│ ┌────────┐ │
│ │ App 1 │──┐ ┌──────────┐ │
│ │ │──┤ │ │ │
│ └────────┘ ├────────▶│ PgBouncer│───(20 conns)──▶ PostgreSQL │
│ ┌────────┐ │ │ │ (200MB RAM) │
│ │ App 2 │──┤ └──────────┘ │
│ │ │──┘ │
│ └────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
PgBouncer Setup
Copy
# pgbouncer.ini
[databases]
mydb = host=localhost port=5432 dbname=mydb
[pgbouncer]
listen_addr = *
listen_port = 6432
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/userlist.txt
# Pool mode
pool_mode = transaction # Recommended for most apps
# Pool size
default_pool_size = 20 # Connections to PostgreSQL per db/user
max_client_conn = 1000 # Total client connections
reserve_pool_size = 5 # Extra for burst
# Timeouts
server_idle_timeout = 600
client_idle_timeout = 0
query_timeout = 0
Pool Modes
| Mode | Description | Best For |
|---|---|---|
| session | Connection held for entire session | Session state needed (SET, temp tables) |
| transaction | Connection held for transaction | Most applications |
| statement | Connection released after each query | Simple queries, max sharing |
Copy
-- Monitor PgBouncer
psql -h localhost -p 6432 pgbouncer
SHOW POOLS;
SHOW SERVERS;
SHOW CLIENTS;
SHOW STATS;
7.7 Read Replicas and Load Balancing
Architecture
Copy
┌─────────────────────────────────────────────────────────────────────────────┐
│ READ REPLICA LOAD BALANCING │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ │
│ │ HAProxy/ │ │
│ │ PgPool │ │
│ └──────┬──────┘ │
│ │ │
│ ┌────────────────┼────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Primary │ │ Replica 1 │ │ Replica 2 │ │
│ │ (Read/Write)│ │ (Read Only) │ │ (Read Only) │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │ │
│ │───WAL─────────▶│ │ │
│ │───────────WAL─────────────────▶│ │
│ │
│ Routing Rules: │
│ • All writes → Primary │
│ • Reads → Round-robin across replicas │
│ • Heavy reports → Dedicated replica │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Application-Level Routing
Copy
# Django example
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'HOST': 'primary.db.example.com',
'NAME': 'mydb',
},
'replica': {
'ENGINE': 'django.db.backends.postgresql',
'HOST': 'replica.db.example.com',
'NAME': 'mydb',
}
}
# Database router
class PrimaryReplicaRouter:
def db_for_read(self, model, **hints):
return 'replica'
def db_for_write(self, model, **hints):
return 'default'
def allow_migrate(self, db, app_label, model_name=None, **hints):
return db == 'default'
7.8 Disaster Recovery
Backup Strategies
pg_dump (Logical Backup)
pg_dump (Logical Backup)
Copy
# Full database backup
pg_dump -Fc mydb > backup.dump
# Specific tables
pg_dump -Fc -t users -t orders mydb > partial.dump
# Restore
pg_restore -d mydb backup.dump
pg_basebackup (Physical Backup)
pg_basebackup (Physical Backup)
Copy
# Full physical backup with WAL
pg_basebackup -h primary -U replicator -D /backup/base \
-Fp -Xs -P -R
# Compressed
pg_basebackup -h primary -U replicator -D - -Ft -z > backup.tar.gz
Continuous Archiving (PITR)
Continuous Archiving (PITR)
Copy
-- postgresql.conf
archive_mode = on
archive_command = 'rsync -a %p backup:/wal_archive/%f'
Copy
# Recovery to specific point in time
# recovery.conf / postgresql.conf
restore_command = 'cp /wal_archive/%f %p'
recovery_target_time = '2024-01-15 14:30:00'
3-2-1 Backup Rule
Copy
┌─────────────────────────────────────────────────────────────────────────────┐
│ 3-2-1 BACKUP RULE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ 3 copies of your data: │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Primary │ │ Replica │ │ Backup │ │
│ │ (Live) │ │ (Live) │ │ (Archive) │ │
│ └────────────┘ └────────────┘ └────────────┘ │
│ │
│ 2 different storage types: │
│ ┌────────────────────┐ ┌────────────────────┐ │
│ │ Local SSD │ │ Object Storage │ │
│ │ (Primary + Replica)│ │ (S3, GCS, etc.) │ │
│ └────────────────────┘ └────────────────────┘ │
│ │
│ 1 offsite location: │
│ ┌────────────────────┐ │
│ │ Different Region │ │
│ │ or Cloud Provider │ │
│ └────────────────────┘ │
│ │
│ Test your backups regularly! │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
7.9 Practice: Set Up Replication
Lab Exercise
- Set up two PostgreSQL instances (use Docker)
- Configure streaming replication
- Test failover
- Set up PgBouncer in front
Docker Compose Setup
Docker Compose Setup
Copy
version: '3.8'
services:
primary:
image: postgres:16
environment:
POSTGRES_PASSWORD: primary_pass
POSTGRES_USER: postgres
volumes:
- ./primary-data:/var/lib/postgresql/data
- ./primary-init:/docker-entrypoint-initdb.d
ports:
- "5432:5432"
command: >
postgres
-c wal_level=replica
-c max_wal_senders=5
-c max_replication_slots=5
standby:
image: postgres:16
environment:
POSTGRES_PASSWORD: standby_pass
PGUSER: replicator
PGPASSWORD: repl_pass
volumes:
- ./standby-data:/var/lib/postgresql/data
ports:
- "5433:5432"
depends_on:
- primary
pgbouncer:
image: edoburu/pgbouncer
environment:
DATABASE_URL: "postgres://postgres:primary_pass@primary:5432/postgres"
POOL_MODE: transaction
MAX_CLIENT_CONN: 100
ports:
- "6432:5432"
depends_on:
- primary
Next Module
Module 8: Scaling Strategies
Scale from thousands to millions of users