Skip to main content
Replication & High Availability Concept

Module 7: Replication & High Availability

Production databases must survive hardware failures, network issues, and data center outages. This module teaches you how to build highly available PostgreSQL deployments.
Estimated Time: 10-12 hours
Hands-On: Set up streaming replication
Key Skill: Designing for 99.99% uptime
PostgreSQL replication and high availability topologies

7.1 High Availability Concepts

The Need for Replication

┌─────────────────────────────────────────────────────────────────────────────┐
│                    WHY YOU NEED REPLICATION                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Without Replication:                                                        │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                                                                     │    │
│  │    ┌────────────┐         ┌────────────────┐                       │    │
│  │    │    App     │────────▶│   Database     │                       │    │
│  │    │   Server   │         │   (Primary)    │                       │    │
│  │    └────────────┘         └───────┬────────┘                       │    │
│  │                                   │                                 │    │
│  │                                   ▼                                 │    │
│  │                              FAILURE                                │    │
│  │                                                                     │    │
│  │    Result: Complete outage until server is repaired                │    │
│  │            Risk of data loss if disk fails                         │    │
│  │            Recovery time: hours to days                            │    │
│  │                                                                     │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│  With Replication:                                                           │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                                                                     │    │
│  │    ┌────────────┐         ┌────────────────┐                       │    │
│  │    │    App     │────────▶│   Primary DB   │                       │    │
│  │    │   Server   │         └───────┬────────┘                       │    │
│  │    └─────┬──────┘                 │ WAL Stream                     │    │
│  │          │                        ▼                                 │    │
│  │          │                ┌────────────────┐                       │    │
│  │          └───(failover)──▶│   Standby DB   │                       │    │
│  │                           └────────────────┘                       │    │
│  │                                                                     │    │
│  │    Result: Automatic failover in seconds                           │    │
│  │            Zero data loss (with sync replication)                  │    │
│  │            Read queries can use standby                            │    │
│  │                                                                     │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Replication Types

Replicates WAL bytes - exact binary copy of primary.
Primary ─────WAL─────▶ Standby
         (byte stream)
Pros:
  • Complete replica (all databases, schemas)
  • Simple setup
  • Low overhead on primary
Cons:
  • Same PostgreSQL major version required
  • Cannot select specific tables
  • Standby is read-only

7.2 Streaming Replication Setup

Primary Server Configuration

-- postgresql.conf on primary
wal_level = replica                    -- Enable replication
max_wal_senders = 5                    -- Max replication connections
max_replication_slots = 5              -- Prevent WAL removal before standby catches up
synchronous_commit = on                -- Wait for WAL flush

-- For synchronous replication (zero data loss)
synchronous_standby_names = 'FIRST 1 (standby1, standby2)'

-- pg_hba.conf - allow replication connections
-- host  replication  replicator  192.168.1.0/24  scram-sha-256
-- Create replication user
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secure_password';

-- Create replication slot (prevents WAL cleanup before standby catches up)
SELECT pg_create_physical_replication_slot('standby1_slot');

-- View slots
SELECT * FROM pg_replication_slots;

Standby Server Setup

# Take base backup from primary
pg_basebackup -h primary_host -U replicator -D /var/lib/postgresql/data \
  -P -Xs -R -S standby1_slot

# The -R flag creates standby.signal and configures primary_conninfo
-- postgresql.auto.conf on standby (created by pg_basebackup -R)
primary_conninfo = 'host=primary_host port=5432 user=replicator password=xxx'
primary_slot_name = 'standby1_slot'

-- postgresql.conf on standby
hot_standby = on                       -- Allow read queries on standby

Monitoring Replication

-- On primary: View connected standbys
SELECT 
    client_addr,
    state,
    sent_lsn,
    write_lsn,
    flush_lsn,
    replay_lsn,
    pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replication_lag_bytes
FROM pg_stat_replication;

-- On standby: View replication status
SELECT 
    status,
    received_lsn,
    latest_end_lsn,
    pg_last_wal_receive_lsn(),
    pg_last_wal_replay_lsn(),
    pg_last_xact_replay_timestamp()
FROM pg_stat_wal_receiver;

-- Calculate lag in seconds
SELECT 
    EXTRACT(EPOCH FROM (NOW() - pg_last_xact_replay_timestamp())) AS lag_seconds
FROM pg_stat_wal_receiver;

Failover Process Deep Dive

Database Failover Automatic vs Manual Failover: Automatic (PostgreSQL with Patroni/Stolon):
  • Faster recovery (seconds to minutes)
  • Risk of split-brain if network partition
  • Requires robust failure detection
Manual (Traditional):
  • Slower (minutes to hours)
  • Human verification prevents mistakes
  • Better for planned maintenance
Failover Checklist:
  1. ✓ Verify leader is truly dead (not just slow)
  2. ✓ Choose most up-to-date follower
  3. ✓ Promote follower to leader
  4. ✓ Update DNS/load balancer
  5. ✓ Reconfigure other followers
  6. ✓ Monitor for split-brain
  7. ✓ Investigate root cause
Testing Failover: Use chaos engineering tools like Chaos Monkey to regularly test your failover process in production.

7.3 Synchronous vs Asynchronous

┌─────────────────────────────────────────────────────────────────────────────┐
│            SYNCHRONOUS vs ASYNCHRONOUS REPLICATION                           │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ASYNCHRONOUS (default):                                                     │
│  ────────────────────────                                                    │
│                                                                              │
│    Client       Primary                  Standby                            │
│       │                                                                      │
│       │──COMMIT──▶│                                                         │
│       │           │──WAL FLUSH──▶DISK                                       │
│       │◀──OK──────│                                                         │
│       │           │           │                                              │
│       │           │──WAL──────▶│  (async, may lag)                          │
│       │           │           │──REPLAY──▶                                   │
│                                                                              │
│    Pros: Lowest latency, primary doesn't wait                               │
│    Cons: Standby can be behind, data loss risk on failover                 │
│                                                                              │
│  SYNCHRONOUS:                                                                │
│  ─────────────                                                               │
│                                                                              │
│    Client       Primary                  Standby                            │
│       │                                                                      │
│       │──COMMIT──▶│                                                         │
│       │           │──WAL FLUSH──▶DISK                                       │
│       │           │──WAL──────────────────▶│                                │
│       │           │                        │──FLUSH──▶DISK                  │
│       │           │◀───────────OK──────────│                                │
│       │◀──OK──────│                                                         │
│                                                                              │
│    Pros: Zero data loss (WAL on both disks before commit returns)          │
│    Cons: Higher latency (network round trip per commit)                    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Configuration Options

-- Asynchronous (default)
synchronous_commit = on
synchronous_standby_names = ''

-- Synchronous with named standbys
synchronous_standby_names = 'standby1'  -- Wait for standby1

-- First N of list (any 1 from list)
synchronous_standby_names = 'FIRST 1 (standby1, standby2)'

-- Any N of list (quorum)
synchronous_standby_names = 'ANY 2 (standby1, standby2, standby3)'

-- Per-transaction control
SET synchronous_commit = off;  -- This transaction doesn't wait

7.4 Failover and Switchover

Planned Switchover

# Step 1: Stop writes to primary
# (Application should connect to new primary after switch)

# Step 2: Ensure standby is caught up
psql -h primary -c "SELECT pg_current_wal_lsn();"
psql -h standby -c "SELECT pg_last_wal_replay_lsn();"
# Both should match

# Step 3: Promote standby
pg_ctl promote -D /var/lib/postgresql/data
# Or
psql -c "SELECT pg_promote();"

# Step 4: Old primary becomes standby (optional)
# Configure recovery settings and restart

Automatic Failover with Patroni

# patroni.yml
scope: postgres-cluster
name: node1

restapi:
  listen: 0.0.0.0:8008
  connect_address: 192.168.1.1:8008

etcd:
  hosts: etcd1:2379,etcd2:2379,etcd3:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576

postgresql:
  listen: 0.0.0.0:5432
  connect_address: 192.168.1.1:5432
  data_dir: /var/lib/postgresql/data
  authentication:
    replication:
      username: replicator
      password: secret

7.5 Logical Replication

Publisher Setup

-- On publisher
CREATE PUBLICATION my_publication 
FOR TABLE users, orders, products;

-- Or all tables
CREATE PUBLICATION all_tables FOR ALL TABLES;

-- View publications
SELECT * FROM pg_publication;
SELECT * FROM pg_publication_tables;

Subscriber Setup

-- On subscriber (tables must exist with same schema)
CREATE SUBSCRIPTION my_subscription
CONNECTION 'host=publisher port=5432 dbname=mydb user=replicator'
PUBLICATION my_publication;

-- View subscriptions
SELECT * FROM pg_subscription;
SELECT * FROM pg_stat_subscription;

-- Pause/resume
ALTER SUBSCRIPTION my_subscription DISABLE;
ALTER SUBSCRIPTION my_subscription ENABLE;

Use Cases for Logical Replication

Cross-Version Migration

Replicate from PG 14 to PG 16, then switch over

Selective Replication

Only sync specific tables to analytics database

Multi-Master

Write to different tables on different servers

Data Consolidation

Combine data from multiple sources

7.6 Connection Pooling

The Problem

┌─────────────────────────────────────────────────────────────────────────────┐
│                    CONNECTION OVERHEAD                                       │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Each PostgreSQL connection:                                                 │
│  • Spawns a backend process (~5-10 MB RAM)                                  │
│  • Has connection establishment overhead (~100ms)                           │
│  • Maintains session state                                                   │
│                                                                              │
│  100 app servers × 20 connections each = 2000 connections!                  │
│                                                                              │
│  Without Pooler:                                                             │
│  ┌────────┐                                                                 │
│  │ App 1  │──conn1──┐                                                       │
│  │        │──conn2──┤                                                       │
│  └────────┘         │                                                       │
│  ┌────────┐         ├──▶ PostgreSQL (2000 backends = 20GB RAM!)            │
│  │ App 2  │──conn3──┤                                                       │
│  │        │──conn4──┤                                                       │
│  └────────┘         │                                                       │
│  ...                 │                                                       │
│                                                                              │
│  With Connection Pooler:                                                     │
│  ┌────────┐                                                                 │
│  │ App 1  │──┐         ┌──────────┐                                        │
│  │        │──┤         │          │                                        │
│  └────────┘  ├────────▶│ PgBouncer│───(20 conns)──▶ PostgreSQL            │
│  ┌────────┐  │         │          │        (200MB RAM)                     │
│  │ App 2  │──┤         └──────────┘                                        │
│  │        │──┘                                                              │
│  └────────┘                                                                 │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

PgBouncer Setup

# pgbouncer.ini
[databases]
mydb = host=localhost port=5432 dbname=mydb

[pgbouncer]
listen_addr = *
listen_port = 6432
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/userlist.txt

# Pool mode
pool_mode = transaction    # Recommended for most apps

# Pool size
default_pool_size = 20     # Connections to PostgreSQL per db/user
max_client_conn = 1000     # Total client connections
reserve_pool_size = 5      # Extra for burst

# Timeouts
server_idle_timeout = 600
client_idle_timeout = 0
query_timeout = 0

Pool Modes

ModeDescriptionBest For
sessionConnection held for entire sessionSession state needed (SET, temp tables)
transactionConnection held for transactionMost applications
statementConnection released after each querySimple queries, max sharing
-- Monitor PgBouncer
psql -h localhost -p 6432 pgbouncer

SHOW POOLS;
SHOW SERVERS;
SHOW CLIENTS;
SHOW STATS;

7.7 Read Replicas and Load Balancing

Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│               READ REPLICA LOAD BALANCING                                    │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│                        ┌─────────────┐                                       │
│                        │   HAProxy/  │                                       │
│                        │   PgPool    │                                       │
│                        └──────┬──────┘                                       │
│                               │                                              │
│              ┌────────────────┼────────────────┐                            │
│              │                │                │                            │
│              ▼                ▼                ▼                            │
│      ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                     │
│      │   Primary   │  │  Replica 1  │  │  Replica 2  │                     │
│      │ (Read/Write)│  │ (Read Only) │  │ (Read Only) │                     │
│      └──────┬──────┘  └──────┬──────┘  └──────┬──────┘                     │
│             │                │                │                             │
│             │───WAL─────────▶│                │                             │
│             │───────────WAL─────────────────▶│                             │
│                                                                              │
│   Routing Rules:                                                            │
│   • All writes → Primary                                                    │
│   • Reads → Round-robin across replicas                                     │
│   • Heavy reports → Dedicated replica                                       │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Application-Level Routing

# Django example
DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'HOST': 'primary.db.example.com',
        'NAME': 'mydb',
    },
    'replica': {
        'ENGINE': 'django.db.backends.postgresql',
        'HOST': 'replica.db.example.com',
        'NAME': 'mydb',
    }
}

# Database router
class PrimaryReplicaRouter:
    def db_for_read(self, model, **hints):
        return 'replica'
    
    def db_for_write(self, model, **hints):
        return 'default'
    
    def allow_migrate(self, db, app_label, model_name=None, **hints):
        return db == 'default'

7.8 Disaster Recovery

Backup Strategies

# Full database backup
pg_dump -Fc mydb > backup.dump

# Specific tables
pg_dump -Fc -t users -t orders mydb > partial.dump

# Restore
pg_restore -d mydb backup.dump
Pros: Portable, selective, human-readable (with -Fp) Cons: Slow for large databases, point-in-time only
# Full physical backup with WAL
pg_basebackup -h primary -U replicator -D /backup/base \
    -Fp -Xs -P -R

# Compressed
pg_basebackup -h primary -U replicator -D - -Ft -z > backup.tar.gz
Pros: Fast, consistent, can be used for new replicas Cons: Same PG version only, full backup only
-- postgresql.conf
archive_mode = on
archive_command = 'rsync -a %p backup:/wal_archive/%f'
# Recovery to specific point in time
# recovery.conf / postgresql.conf
restore_command = 'cp /wal_archive/%f %p'
recovery_target_time = '2024-01-15 14:30:00'
Enables: Recover to any point in time

3-2-1 Backup Rule

┌─────────────────────────────────────────────────────────────────────────────┐
│                          3-2-1 BACKUP RULE                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   3 copies of your data:                                                    │
│   ┌────────────┐  ┌────────────┐  ┌────────────┐                           │
│   │  Primary   │  │  Replica   │  │   Backup   │                           │
│   │   (Live)   │  │   (Live)   │  │  (Archive) │                           │
│   └────────────┘  └────────────┘  └────────────┘                           │
│                                                                              │
│   2 different storage types:                                                │
│   ┌────────────────────┐  ┌────────────────────┐                           │
│   │    Local SSD       │  │    Object Storage  │                           │
│   │  (Primary + Replica)│  │    (S3, GCS, etc.) │                           │
│   └────────────────────┘  └────────────────────┘                           │
│                                                                              │
│   1 offsite location:                                                       │
│   ┌────────────────────┐                                                    │
│   │  Different Region  │                                                    │
│   │  or Cloud Provider │                                                    │
│   └────────────────────┘                                                    │
│                                                                              │
│   Test your backups regularly!                                              │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

7.9 Practice: Set Up Replication

Lab Exercise

  1. Set up two PostgreSQL instances (use Docker)
  2. Configure streaming replication
  3. Test failover
  4. Set up PgBouncer in front
version: '3.8'

services:
  primary:
    image: postgres:16
    environment:
      POSTGRES_PASSWORD: primary_pass
      POSTGRES_USER: postgres
    volumes:
      - ./primary-data:/var/lib/postgresql/data
      - ./primary-init:/docker-entrypoint-initdb.d
    ports:
      - "5432:5432"
    command: >
      postgres 
      -c wal_level=replica 
      -c max_wal_senders=5 
      -c max_replication_slots=5

  standby:
    image: postgres:16
    environment:
      POSTGRES_PASSWORD: standby_pass
      PGUSER: replicator
      PGPASSWORD: repl_pass
    volumes:
      - ./standby-data:/var/lib/postgresql/data
    ports:
      - "5433:5432"
    depends_on:
      - primary

  pgbouncer:
    image: edoburu/pgbouncer
    environment:
      DATABASE_URL: "postgres://postgres:primary_pass@primary:5432/postgres"
      POOL_MODE: transaction
      MAX_CLIENT_CONN: 100
    ports:
      - "6432:5432"
    depends_on:
      - primary

Next Module

Module 8: Scaling Strategies

Scale from thousands to millions of users