> ## Documentation Index
> Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Replication & High Availability

> Build database systems that survive failures

<Frame>
  <img src="https://mintcdn.com/devweeekends/0kwJwOL2KCwg2YYu/images/courses/database-engineering/replication-concept.svg?fit=max&auto=format&n=0kwJwOL2KCwg2YYu&q=85&s=0ad1888b646bddb1ea597013abee7aef" alt="Replication & High Availability Concept" width="1080" height="1080" data-path="images/courses/database-engineering/replication-concept.svg" />
</Frame>

# Module 7: Replication & High Availability

Production databases must survive hardware failures, network issues, and data center outages. This module teaches you how to build highly available PostgreSQL deployments.

**Real-world analogy:** Replication is like having a live backup singer who knows every word to every song. If the lead vocalist (primary) loses their voice mid-concert (server crash), the backup (standby) can take over immediately without the audience (users) noticing more than a brief pause. The key engineering challenge is keeping the backup singer perfectly in sync -- and deciding whether you are willing to slow down the lead singer to guarantee synchronization.

<Info>
  **Estimated Time**: 10-12 hours\
  **Hands-On**: Set up streaming replication\
  **Key Skill**: Designing for 99.99% uptime
</Info>

<img src="https://mintcdn.com/devweeekends/0kwJwOL2KCwg2YYu/images/courses/database-engineering/replication-ha.svg?fit=max&auto=format&n=0kwJwOL2KCwg2YYu&q=85&s=f71e6150a580a29e7c1300938a34bc0c" alt="PostgreSQL replication and high availability topologies" width="1080" height="1080" data-path="images/courses/database-engineering/replication-ha.svg" />

***

## 7.1 High Availability Concepts

### The Need for Replication

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                    WHY YOU NEED REPLICATION                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Without Replication:                                                        │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                                                                     │    │
│  │    ┌────────────┐         ┌────────────────┐                       │    │
│  │    │    App     │────────▶│   Database     │                       │    │
│  │    │   Server   │         │   (Primary)    │                       │    │
│  │    └────────────┘         └───────┬────────┘                       │    │
│  │                                   │                                 │    │
│  │                                   ▼                                 │    │
│  │                              FAILURE                                │    │
│  │                                                                     │    │
│  │    Result: Complete outage until server is repaired                │    │
│  │            Risk of data loss if disk fails                         │    │
│  │            Recovery time: hours to days                            │    │
│  │                                                                     │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│  With Replication:                                                           │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                                                                     │    │
│  │    ┌────────────┐         ┌────────────────┐                       │    │
│  │    │    App     │────────▶│   Primary DB   │                       │    │
│  │    │   Server   │         └───────┬────────┘                       │    │
│  │    └─────┬──────┘                 │ WAL Stream                     │    │
│  │          │                        ▼                                 │    │
│  │          │                ┌────────────────┐                       │    │
│  │          └───(failover)──▶│   Standby DB   │                       │    │
│  │                           └────────────────┘                       │    │
│  │                                                                     │    │
│  │    Result: Automatic failover in seconds                           │    │
│  │            Zero data loss (with sync replication)                  │    │
│  │            Read queries can use standby                            │    │
│  │                                                                     │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
```

### Replication Types

<Tabs>
  <Tab title="Physical (Streaming)">
    **Replicates WAL bytes** - exact binary copy of primary.

    ```
    Primary ─────WAL─────▶ Standby
             (byte stream)
    ```

    **Pros**:

    * Complete replica (all databases, schemas)
    * Simple setup
    * Low overhead on primary

    **Cons**:

    * Same PostgreSQL major version required
    * Cannot select specific tables
    * Standby is read-only
  </Tab>

  <Tab title="Logical">
    **Replicates row changes** - decoded from WAL.

    ```
    Primary ─────CHANGES─────▶ Subscriber
             (INSERT, UPDATE, DELETE)
    ```

    **Pros**:

    * Cross-version replication
    * Selective table replication
    * Subscriber can have different schema/indexes
    * Subscriber can be writable

    **Cons**:

    * Higher overhead (decoding)
    * Doesn't replicate DDL
    * More complex to set up
  </Tab>
</Tabs>

***

## 7.2 Streaming Replication Setup

### Primary Server Configuration

```sql theme={null}
-- postgresql.conf on primary
wal_level = replica                    -- Enable replication
max_wal_senders = 5                    -- Max replication connections
max_replication_slots = 5              -- Prevent WAL removal before standby catches up
synchronous_commit = on                -- Wait for WAL flush

-- For synchronous replication (zero data loss)
synchronous_standby_names = 'FIRST 1 (standby1, standby2)'

-- pg_hba.conf - allow replication connections
-- host  replication  replicator  192.168.1.0/24  scram-sha-256
```

```sql theme={null}
-- Create replication user
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secure_password';

-- Create replication slot (prevents WAL cleanup before standby catches up)
SELECT pg_create_physical_replication_slot('standby1_slot');

-- View slots
SELECT * FROM pg_replication_slots;
```

### Standby Server Setup

```bash theme={null}
# Take base backup from primary
pg_basebackup -h primary_host -U replicator -D /var/lib/postgresql/data \
  -P -Xs -R -S standby1_slot

# The -R flag creates standby.signal and configures primary_conninfo
```

```sql theme={null}
-- postgresql.auto.conf on standby (created by pg_basebackup -R)
primary_conninfo = 'host=primary_host port=5432 user=replicator password=xxx'
primary_slot_name = 'standby1_slot'

-- postgresql.conf on standby
hot_standby = on                       -- Allow read queries on standby
```

### Monitoring Replication

**Practical tip:** Replication lag is not one number -- it is four numbers, each measuring a different stage. Understanding the difference between `sent_lsn`, `write_lsn`, `flush_lsn`, and `replay_lsn` is essential for diagnosing replication problems. If `sent` is far ahead of `write`, you have a network bottleneck. If `write` is ahead of `flush`, the standby's disk is slow. If `flush` is ahead of `replay`, the standby's CPU cannot keep up with applying changes.

```sql theme={null}
-- On primary: View connected standbys
SELECT 
    client_addr,
    state,
    sent_lsn,      -- WAL sent over the network to the standby
    write_lsn,     -- WAL written to standby's disk (but not fsynced)
    flush_lsn,     -- WAL fsynced to standby's disk (durable)
    replay_lsn,    -- WAL actually applied (queries on standby see this data)
    pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replication_lag_bytes
FROM pg_stat_replication;

-- On standby: View replication status
SELECT 
    status,
    received_lsn,
    latest_end_lsn,
    pg_last_wal_receive_lsn(),
    pg_last_wal_replay_lsn(),
    pg_last_xact_replay_timestamp()
FROM pg_stat_wal_receiver;

-- Calculate lag in seconds
SELECT 
    EXTRACT(EPOCH FROM (NOW() - pg_last_xact_replay_timestamp())) AS lag_seconds
FROM pg_stat_wal_receiver;
```

### Failover Process Deep Dive

<img src="https://mintcdn.com/devweeekends/0kwJwOL2KCwg2YYu/images/courses/database-failover.svg?fit=max&auto=format&n=0kwJwOL2KCwg2YYu&q=85&s=a6189bcc741aa2c232265b95ccaeef89" alt="Database Failover" width="1080" height="1080" data-path="images/courses/database-failover.svg" />

**Automatic vs Manual Failover**:

**Automatic** (PostgreSQL with Patroni/Stolon):

* Faster recovery (seconds to minutes)
* Risk of split-brain if network partition
* Requires robust failure detection

**Manual** (Traditional):

* Slower (minutes to hours)
* Human verification prevents mistakes
* Better for planned maintenance

**Common pitfall -- split-brain:** Split-brain occurs when both the primary and the standby believe they are the leader, accepting writes independently. This creates two divergent copies of your data that are extremely difficult to reconcile. It is the single most dangerous failure mode in database replication. Patroni avoids this by using a distributed consensus store (etcd/ZooKeeper) as the source of truth for leadership -- a node is only the leader if the consensus store says so.

**Failover Checklist**:

1. ✓ Verify leader is truly dead (not just slow)
2. ✓ Choose most up-to-date follower
3. ✓ Promote follower to leader
4. ✓ Update DNS/load balancer
5. ✓ Reconfigure other followers
6. ✓ Monitor for split-brain
7. ✓ Investigate root cause

**Testing Failover**: Use chaos engineering tools like Chaos Monkey to regularly test your failover process in production.

***

## 7.3 Synchronous vs Asynchronous

```
┌─────────────────────────────────────────────────────────────────────────────┐
│            SYNCHRONOUS vs ASYNCHRONOUS REPLICATION                           │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ASYNCHRONOUS (default):                                                     │
│  ────────────────────────                                                    │
│                                                                              │
│    Client       Primary                  Standby                            │
│       │                                                                      │
│       │──COMMIT──▶│                                                         │
│       │           │──WAL FLUSH──▶DISK                                       │
│       │◀──OK──────│                                                         │
│       │           │           │                                              │
│       │           │──WAL──────▶│  (async, may lag)                          │
│       │           │           │──REPLAY──▶                                   │
│                                                                              │
│    Pros: Lowest latency, primary doesn't wait                               │
│    Cons: Standby can be behind, data loss risk on failover                 │
│                                                                              │
│  SYNCHRONOUS:                                                                │
│  ─────────────                                                               │
│                                                                              │
│    Client       Primary                  Standby                            │
│       │                                                                      │
│       │──COMMIT──▶│                                                         │
│       │           │──WAL FLUSH──▶DISK                                       │
│       │           │──WAL──────────────────▶│                                │
│       │           │                        │──FLUSH──▶DISK                  │
│       │           │◀───────────OK──────────│                                │
│       │◀──OK──────│                                                         │
│                                                                              │
│    Pros: Zero data loss (WAL on both disks before commit returns)          │
│    Cons: Higher latency (network round trip per commit)                    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
```

### Configuration Options

```sql theme={null}
-- Asynchronous (default)
synchronous_commit = on
synchronous_standby_names = ''

-- Synchronous with named standbys
synchronous_standby_names = 'standby1'  -- Wait for standby1

-- First N of list (any 1 from list)
synchronous_standby_names = 'FIRST 1 (standby1, standby2)'

-- Any N of list (quorum)
synchronous_standby_names = 'ANY 2 (standby1, standby2, standby3)'

-- Per-transaction control (this is a powerful production technique)
SET synchronous_commit = off;  -- This transaction doesn't wait for standby
-- Use this for non-critical writes like logging, analytics, or notifications.
-- Keep synchronous_commit = on for financial transactions and order processing.
-- This gives you the best of both worlds: strong durability where it matters,
-- high throughput where eventual consistency is acceptable.
```

***

## 7.4 Failover and Switchover

### Planned Switchover

```bash theme={null}
# Step 1: Stop writes to primary
# (Application should connect to new primary after switch)

# Step 2: Ensure standby is caught up
psql -h primary -c "SELECT pg_current_wal_lsn();"
psql -h standby -c "SELECT pg_last_wal_replay_lsn();"
# Both should match

# Step 3: Promote standby
pg_ctl promote -D /var/lib/postgresql/data
# Or
psql -c "SELECT pg_promote();"

# Step 4: Old primary becomes standby (optional)
# Configure recovery settings and restart
```

### Automatic Failover with Patroni

```yaml theme={null}
# patroni.yml
scope: postgres-cluster
name: node1

restapi:
  listen: 0.0.0.0:8008
  connect_address: 192.168.1.1:8008

etcd:
  hosts: etcd1:2379,etcd2:2379,etcd3:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576

postgresql:
  listen: 0.0.0.0:5432
  connect_address: 192.168.1.1:5432
  data_dir: /var/lib/postgresql/data
  authentication:
    replication:
      username: replicator
      password: secret
```

***

## 7.5 Logical Replication

### Publisher Setup

```sql theme={null}
-- On publisher
CREATE PUBLICATION my_publication 
FOR TABLE users, orders, products;

-- Or all tables
CREATE PUBLICATION all_tables FOR ALL TABLES;

-- View publications
SELECT * FROM pg_publication;
SELECT * FROM pg_publication_tables;
```

### Subscriber Setup

```sql theme={null}
-- On subscriber (tables must exist with same schema)
CREATE SUBSCRIPTION my_subscription
CONNECTION 'host=publisher port=5432 dbname=mydb user=replicator'
PUBLICATION my_publication;

-- View subscriptions
SELECT * FROM pg_subscription;
SELECT * FROM pg_stat_subscription;

-- Pause/resume
ALTER SUBSCRIPTION my_subscription DISABLE;
ALTER SUBSCRIPTION my_subscription ENABLE;
```

### Use Cases for Logical Replication

<CardGroup cols={2}>
  <Card title="Cross-Version Migration" icon="arrow-up-right-from-square">
    Replicate from PG 14 to PG 16, then switch over
  </Card>

  <Card title="Selective Replication" icon="filter">
    Only sync specific tables to analytics database
  </Card>

  <Card title="Multi-Master" icon="copy">
    Write to different tables on different servers
  </Card>

  <Card title="Data Consolidation" icon="merge">
    Combine data from multiple sources
  </Card>
</CardGroup>

***

## 7.6 Connection Pooling

### The Problem

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                    CONNECTION OVERHEAD                                       │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Each PostgreSQL connection:                                                 │
│  • Spawns a backend process (~5-10 MB RAM)                                  │
│  • Has connection establishment overhead (~100ms)                           │
│  • Maintains session state                                                   │
│                                                                              │
│  100 app servers × 20 connections each = 2000 connections!                  │
│                                                                              │
│  Without Pooler:                                                             │
│  ┌────────┐                                                                 │
│  │ App 1  │──conn1──┐                                                       │
│  │        │──conn2──┤                                                       │
│  └────────┘         │                                                       │
│  ┌────────┐         ├──▶ PostgreSQL (2000 backends = 20GB RAM!)            │
│  │ App 2  │──conn3──┤                                                       │
│  │        │──conn4──┤                                                       │
│  └────────┘         │                                                       │
│  ...                 │                                                       │
│                                                                              │
│  With Connection Pooler:                                                     │
│  ┌────────┐                                                                 │
│  │ App 1  │──┐         ┌──────────┐                                        │
│  │        │──┤         │          │                                        │
│  └────────┘  ├────────▶│ PgBouncer│───(20 conns)──▶ PostgreSQL            │
│  ┌────────┐  │         │          │        (200MB RAM)                     │
│  │ App 2  │──┤         └──────────┘                                        │
│  │        │──┘                                                              │
│  └────────┘                                                                 │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
```

### PgBouncer Setup

```ini theme={null}
# pgbouncer.ini
[databases]
mydb = host=localhost port=5432 dbname=mydb

[pgbouncer]
listen_addr = *
listen_port = 6432
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/userlist.txt

# Pool mode
pool_mode = transaction    # Recommended for most apps

# Pool size
default_pool_size = 20     # Connections to PostgreSQL per db/user
max_client_conn = 1000     # Total client connections
reserve_pool_size = 5      # Extra for burst

# Timeouts
server_idle_timeout = 600
client_idle_timeout = 0
query_timeout = 0
```

### Pool Modes

**Common pitfall:** Choosing the wrong pool mode is the number one source of PgBouncer problems. In `transaction` mode, session-level state (SET commands, prepared statements, temp tables, LISTEN/NOTIFY) is lost between transactions because each transaction may land on a different backend connection. If your application uses Django or Rails with prepared statements, you may need `session` mode -- but that significantly reduces pooling efficiency.

| Mode            | Description                          | Best For                                | Gotcha                                               |
| --------------- | ------------------------------------ | --------------------------------------- | ---------------------------------------------------- |
| **session**     | Connection held for entire session   | Session state needed (SET, temp tables) | Minimal pooling benefit -- one backend per client    |
| **transaction** | Connection held for transaction      | Most applications (stateless queries)   | Breaks prepared statements, SET, LISTEN, temp tables |
| **statement**   | Connection released after each query | Simple queries, max sharing             | Completely breaks multi-statement transactions       |

```sql theme={null}
-- Monitor PgBouncer
psql -h localhost -p 6432 pgbouncer

SHOW POOLS;
SHOW SERVERS;
SHOW CLIENTS;
SHOW STATS;
```

***

## 7.7 Read Replicas and Load Balancing

### Architecture

```
┌─────────────────────────────────────────────────────────────────────────────┐
│               READ REPLICA LOAD BALANCING                                    │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│                        ┌─────────────┐                                       │
│                        │   HAProxy/  │                                       │
│                        │   PgPool    │                                       │
│                        └──────┬──────┘                                       │
│                               │                                              │
│              ┌────────────────┼────────────────┐                            │
│              │                │                │                            │
│              ▼                ▼                ▼                            │
│      ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                     │
│      │   Primary   │  │  Replica 1  │  │  Replica 2  │                     │
│      │ (Read/Write)│  │ (Read Only) │  │ (Read Only) │                     │
│      └──────┬──────┘  └──────┬──────┘  └──────┬──────┘                     │
│             │                │                │                             │
│             │───WAL─────────▶│                │                             │
│             │───────────WAL─────────────────▶│                             │
│                                                                              │
│   Routing Rules:                                                            │
│   • All writes → Primary                                                    │
│   • Reads → Round-robin across replicas                                     │
│   • Heavy reports → Dedicated replica                                       │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
```

### Application-Level Routing

```python theme={null}
# Django example
DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'HOST': 'primary.db.example.com',
        'NAME': 'mydb',
    },
    'replica': {
        'ENGINE': 'django.db.backends.postgresql',
        'HOST': 'replica.db.example.com',
        'NAME': 'mydb',
    }
}

# Database router
class PrimaryReplicaRouter:
    def db_for_read(self, model, **hints):
        return 'replica'
    
    def db_for_write(self, model, **hints):
        return 'default'
    
    def allow_migrate(self, db, app_label, model_name=None, **hints):
        return db == 'default'
```

***

## 7.8 Disaster Recovery

### Backup Strategies

<AccordionGroup>
  <Accordion title="pg_dump (Logical Backup)" icon="file-export">
    ```bash theme={null}
    # Full database backup
    pg_dump -Fc mydb > backup.dump

    # Specific tables
    pg_dump -Fc -t users -t orders mydb > partial.dump

    # Restore
    pg_restore -d mydb backup.dump
    ```

    **Pros**: Portable, selective, human-readable (with -Fp)
    **Cons**: Slow for large databases, point-in-time only
  </Accordion>

  <Accordion title="pg_basebackup (Physical Backup)" icon="hard-drive">
    ```bash theme={null}
    # Full physical backup with WAL
    pg_basebackup -h primary -U replicator -D /backup/base \
        -Fp -Xs -P -R

    # Compressed
    pg_basebackup -h primary -U replicator -D - -Ft -z > backup.tar.gz
    ```

    **Pros**: Fast, consistent, can be used for new replicas
    **Cons**: Same PG version only, full backup only
  </Accordion>

  <Accordion title="Continuous Archiving (PITR)" icon="clock-rotate-left">
    ```sql theme={null}
    -- postgresql.conf
    archive_mode = on
    archive_command = 'rsync -a %p backup:/wal_archive/%f'
    ```

    ```bash theme={null}
    # Recovery to specific point in time
    # recovery.conf / postgresql.conf
    restore_command = 'cp /wal_archive/%f %p'
    recovery_target_time = '2024-01-15 14:30:00'
    ```

    **Enables**: Recover to any point in time
  </Accordion>
</AccordionGroup>

### 3-2-1 Backup Rule

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                          3-2-1 BACKUP RULE                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   3 copies of your data:                                                    │
│   ┌────────────┐  ┌────────────┐  ┌────────────┐                           │
│   │  Primary   │  │  Replica   │  │   Backup   │                           │
│   │   (Live)   │  │   (Live)   │  │  (Archive) │                           │
│   └────────────┘  └────────────┘  └────────────┘                           │
│                                                                              │
│   2 different storage types:                                                │
│   ┌────────────────────┐  ┌────────────────────┐                           │
│   │    Local SSD       │  │    Object Storage  │                           │
│   │  (Primary + Replica)│  │    (S3, GCS, etc.) │                           │
│   └────────────────────┘  └────────────────────┘                           │
│                                                                              │
│   1 offsite location:                                                       │
│   ┌────────────────────┐                                                    │
│   │  Different Region  │                                                    │
│   │  or Cloud Provider │                                                    │
│   └────────────────────┘                                                    │
│                                                                              │
│   Test your backups regularly!                                              │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
```

***

## 7.9 Practice: Set Up Replication

### Lab Exercise

1. Set up two PostgreSQL instances (use Docker)
2. Configure streaming replication
3. Test failover
4. Set up PgBouncer in front

<Accordion title="Docker Compose Setup">
  ```yaml theme={null}
  version: '3.8'

  services:
    primary:
      image: postgres:16
      environment:
        POSTGRES_PASSWORD: primary_pass
        POSTGRES_USER: postgres
      volumes:
        - ./primary-data:/var/lib/postgresql/data
        - ./primary-init:/docker-entrypoint-initdb.d
      ports:
        - "5432:5432"
      command: >
        postgres 
        -c wal_level=replica 
        -c max_wal_senders=5 
        -c max_replication_slots=5

    standby:
      image: postgres:16
      environment:
        POSTGRES_PASSWORD: standby_pass
        PGUSER: replicator
        PGPASSWORD: repl_pass
      volumes:
        - ./standby-data:/var/lib/postgresql/data
      ports:
        - "5433:5432"
      depends_on:
        - primary

    pgbouncer:
      image: edoburu/pgbouncer
      environment:
        DATABASE_URL: "postgres://postgres:primary_pass@primary:5432/postgres"
        POOL_MODE: transaction
        MAX_CLIENT_CONN: 100
      ports:
        - "6432:5432"
      depends_on:
        - primary
  ```
</Accordion>

***

## Next Module

<Card title="Module 8: Scaling Strategies" icon="arrow-right" href="/courses/database-engineering/scaling">
  Scale from thousands to millions of users
</Card>

***

## Interview Deep-Dive

<AccordionGroup>
  <Accordion title="You are designing a database architecture with 99.99% uptime. Walk through your approach with PostgreSQL.">
    **Strong Answer:**

    * 99.99% means max 52 minutes downtime per year, covering planned maintenance and unplanned failures. Architecture: primary + 2 synchronous standbys managed by Patroni with etcd consensus. Use `synchronous_standby_names = 'ANY 1 (standby1, standby2)'` so the primary needs only one standby ACK to commit. PgBouncer in front absorbs connection storms during failover. HAProxy for read replica load balancing.
    * Zero-downtime maintenance: pg\_upgrade via logical replication for major versions, `CREATE INDEX CONCURRENTLY` for schema changes, never VACUUM FULL during business hours. Budget: 30 min for 2 unplanned failovers/year, 22 min for planned maintenance.

    **Follow-up: How do you prevent split-brain during failover?**

    Patroni uses etcd as the source of truth for leadership. A node is only the leader if it holds the etcd leader key (requires quorum). If the old primary is partitioned, its lease expires and Patroni demotes it. The new primary can only be promoted after acquiring the leader key. This prevents both nodes from accepting writes simultaneously.
  </Accordion>

  <Accordion title="When would you choose logical replication over physical streaming replication?">
    **Strong Answer:**

    * **Cross-version upgrades**: Physical requires identical major versions. Logical works across versions (PG 14 to PG 16), enabling zero-downtime upgrades.
    * **Selective replication**: Physical copies everything. Logical can publish specific tables to specific subscribers.
    * **Writable subscribers**: Physical standbys are read-only. Logical subscribers can have their own tables, indexes, and materialized views.
    * **Tradeoffs**: Logical has higher overhead (WAL decoding), does not replicate DDL, and minimal conflict resolution (INSERT conflicts stop replication until manually resolved). For pure HA failover, physical is simpler and faster.

    **Follow-up: What happens if the subscriber is down for a day?**

    It catches up automatically via replication slots, which prevent the publisher from recycling unconsumed WAL. The danger: a long-disconnected subscriber causes unbounded WAL retention on the publisher. Monitor `pg_replication_slots` for `active = false` and set `max_slot_wal_keep_size` to cap retention.
  </Accordion>

  <Accordion title="Your read replica shows 30 seconds of replication lag during peak hours. Diagnose and fix it.">
    **Strong Answer:**

    * Check `pg_stat_replication` on the primary and compare the four LSN values to identify the bottleneck stage:
    * **sent\_lsn >> write\_lsn**: Network bottleneck. Fix: faster link, enable wal\_compression (30-50% WAL reduction), or co-locate replica.
    * **write\_lsn >> flush\_lsn**: Replica disk cannot keep up with fsync. Fix: faster storage (NVMe).
    * **flush\_lsn >> replay\_lsn**: Replica CPU cannot apply WAL fast enough. Common when primary generates WAL from many parallel backends but replay is single-threaded. Fix: reduce indexes on replica, or leverage PG 15+ parallel recovery improvements.
    * **Bonus**: Long-running queries on replica holding back replay due to high `max_standby_streaming_delay`. Fix: lower the delay (accept query cancellation) or route analytics to a dedicated replica.

    **Follow-up: Can you achieve zero lag while allowing reads on the replica?**

    Yes. `synchronous_commit = remote_apply` ensures the standby has fully replayed WAL before the primary acknowledges the commit. Reads on the standby are always consistent with committed primary data. The cost is increased commit latency on the primary.
  </Accordion>
</AccordionGroup>
