Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Database Sharding Strategies

SQL vs NoSQL

This is one of the most common and most misunderstood decisions in system design. The real answer is almost never “one or the other” — most production systems at scale use both, each for what it does best. Think of SQL databases as spreadsheets with strict rules: every row follows the same schema, relationships between tables are enforced, and complex queries across multiple tables are a first-class feature. NoSQL databases are more like filing cabinets: each drawer (document/key) can hold whatever you put in it, you can add new drawers trivially, and looking up a specific drawer by its label is extremely fast — but correlating information across drawers requires more manual effort.

When to Use SQL (Relational)

ACID Required

Financial transactions, inventory management

Complex Queries

Reporting, analytics with JOINs

Structured Data

Fixed schema, clear relationships

Data Integrity

Foreign keys, constraints matter

When to Use NoSQL

Flexible Schema

Evolving data models, semi-structured data

High Scale

Horizontal scaling, massive data volume

Low Latency

Key-value lookups, caching layer

High Write Volume

Time series, event logging

NoSQL Types

┌─────────────────────────────────────────────────────────────────┐
│                          NoSQL Types                            │
├─────────────────┬─────────────────┬─────────────────┬───────────┤
│   Key-Value     │    Document     │   Column-Family │   Graph   │
├─────────────────┼─────────────────┼─────────────────┼───────────┤
│ Redis           │ MongoDB         │ Cassandra       │ Neo4j     │
│ Memcached       │ CouchDB         │ HBase           │ Amazon    │
│ DynamoDB        │ DynamoDB        │ BigTable        │ Neptune   │
├─────────────────┼─────────────────┼─────────────────┼───────────┤
│ Simple lookups  │ Flexible docs   │ Wide columns    │ Relations │
│ Caching         │ Content mgmt    │ Time series     │ Social    │
│ Session store   │ User profiles   │ Analytics       │ Recommend │
└─────────────────┴─────────────────┴─────────────────┴───────────┘

Database Sharding

Distribute data across multiple databases for horizontal scaling. Sharding is the database equivalent of splitting a library’s catalog across multiple buildings by subject: each building handles a fraction of the total collection, so no single building becomes overcrowded. The catch is that if someone wants to search across all subjects at once, they now need to visit every building and combine the results. Sharding is a last resort, not a first choice. It introduces significant operational complexity: cross-shard queries become expensive, transactions spanning multiple shards require distributed coordination (like 2PC or Saga patterns), and rebalancing data when you add or remove shards is painful. Exhaust vertical scaling, read replicas, and caching before you reach for sharding.

Sharding Strategies

1. Range-Based Sharding

┌─────────────────────────────────────────────────────────────┐
│                    User ID Range Sharding                   │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  User IDs 1-1M      User IDs 1M-2M     User IDs 2M-3M      │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐     │
│  │   Shard 1   │    │   Shard 2   │    │   Shard 3   │     │
│  └─────────────┘    └─────────────┘    └─────────────┘     │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Pros: Simple, range queries work
Cons: Hotspots (new users all hit last shard)

2. Hash-Based Sharding

def get_shard(user_id, num_shards):
    return hash(user_id) % num_shards

# User 12345 → Shard (12345 % 4) = Shard 1
# User 67890 → Shard (67890 % 4) = Shard 2
┌─────────────────────────────────────────────────────────────┐
│                    Hash-Based Sharding                      │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│     hash(id) % 4                                           │
│     ┌────────────────────────────────────────┐             │
│     │              Shard Key                  │             │
│     └────────────────────────────────────────┘             │
│          │         │         │         │                   │
│          ▼         ▼         ▼         ▼                   │
│     ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐           │
│     │Shard 0 │ │Shard 1 │ │Shard 2 │ │Shard 3 │           │
│     └────────┘ └────────┘ └────────┘ └────────┘           │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Pros: Even distribution
Cons: Range queries need all shards, resharding is hard

3. Consistent Hashing

                    ┌───────────────────┐
                   ╱                     ╲
                  │                       │
             Node A    ●───────────●    Node B
                 ●                  ●
                │                    │
               │      Hash Ring       │
               │                      │
                ●                    ●
             Node D    ●───────────●    Node C
                  │                       │
                   ╲                     ╱
                    └───────────────────┘

• Keys hash to positions on ring
• Assigned to next node clockwise
• Adding/removing nodes only affects neighbors

Sharding Challenges

ChallengeSolutionReal-World Impact
Cross-shard queriesApplication-level joins, denormalizationA query that took 5ms on a single DB may take 50-200ms when scatter-gathering across 10 shards
Transactions2PC, Saga pattern, design to avoidDistributed transactions are 10-100x slower than local transactions; design your shard key to keep related data together
ReshardingConsistent hashing, virtual shardsWithout virtual shards, adding one node can require migrating 1/N of all data; with virtual shards, you only move specific virtual partitions
HotspotsBetter shard key, splitting hot shardsInstagram famously had to re-shard because celebrity accounts created hot partitions; the shard key choice is the most consequential sharding decision
Interview Pattern: When discussing sharding, always mention the shard key selection first. The best shard key matches your most common access pattern and distributes load evenly. A strong answer sounds like: “I would shard by user_id because our primary access pattern is per-user queries. This keeps all of a user’s data co-located on one shard, avoiding cross-shard joins for the hot path. The risk is celebrity users creating hot shards, which I would mitigate with a salt suffix for high-traffic accounts.”

Database Replication

Synchronous vs Asynchronous

Synchronous Replication           Asynchronous Replication
                                  
Client ──► Master ──► Replica     Client ──► Master ─ ─► Replica
              │                                │
              ▼                                ▼
         Wait for ACK                   Return immediately
              │                                │
              ▼                                ▼
         Return to client             Replica catches up later
         
Pros: Strong consistency            Pros: Lower latency
Cons: Higher latency               Cons: Possible data loss

Replication Topologies

Single Leader                    Multi-Leader
                                 
   ┌────────┐                   ┌────────┐   ┌────────┐
   │ Leader │                   │Leader A│◄─►│Leader B│
   └────┬───┘                   └────┬───┘   └────┬───┘
        │                            │            │
   ┌────┴────┐                  ┌────┴───┐  ┌────┴───┐
   │         │                  │Follower│  │Follower│
┌──┴──┐   ┌──┴──┐              └────────┘  └────────┘
│Foll1│   │Foll2│              
└─────┘   └─────┘              Use: Multi-datacenter
                               Challenge: Conflict resolution


Leaderless (Dynamo-style)
                                 
   ┌──────┐   ┌──────┐   ┌──────┐
   │Node A│◄─►│Node B│◄─►│Node C│
   └──────┘   └──────┘   └──────┘
       ▲                     ▲
       └─────────────────────┘
       
Write to W nodes, Read from R nodes
W + R > N ensures consistency

Indexing

B-Tree Index (Default)

                    ┌───────────────────┐
                    │    Root Node      │
                    │  [10, 20, 30]     │
                    └─────────┬─────────┘

        ┌─────────────────────┼─────────────────────┐
        │                     │                     │
┌───────▼───────┐   ┌─────────▼─────────┐   ┌───────▼───────┐
│   [5, 8, 9]   │   │  [12, 15, 18]     │   │ [25, 28, 35]  │
└───────┬───────┘   └─────────┬─────────┘   └───────┬───────┘
        │                     │                     │
        ▼                     ▼                     ▼
    [Leaf nodes with actual row pointers]
    
• Balanced tree structure
• O(log n) lookups
• Good for range queries
• Default for most databases

Index Types

-- Single Column Index
CREATE INDEX idx_email ON users(email);

-- Composite Index (order matters!)
CREATE INDEX idx_name ON users(last_name, first_name);
-- Helps: WHERE last_name = 'Smith'
-- Helps: WHERE last_name = 'Smith' AND first_name = 'John'
-- Does NOT help: WHERE first_name = 'John'

-- Covering Index (includes columns)
CREATE INDEX idx_covering ON users(email) INCLUDE (name, status);
-- Query reads only from index, no table lookup

-- Partial Index
CREATE INDEX idx_active ON users(email) WHERE status = 'active';
-- Smaller index, only for specific queries

-- Full-Text Index
CREATE INDEX idx_content ON articles USING GIN(to_tsvector('english', content));
-- For text search

Index Trade-offs

BenefitCost
Faster readsSlower writes (each write must update the index)
Faster sortingStorage space (indexes can be 10-30% of table size)
Faster joinsMemory usage (hot indexes should fit in RAM)
Index maintenance (fragmentation, vacuuming)
Practical Sizing Rule: A well-indexed query on PostgreSQL can serve 1,000-5,000 QPS on a single server for simple lookups. Without the right index, the same query might only manage 10-50 QPS because it triggers a full table scan. In interviews, when you mention a database, always specify what you would index and why — it shows operational maturity. The composite index column order follows the “leftmost prefix” rule: an index on (A, B, C) supports queries on A, (A, B), and (A, B, C), but not B alone or (B, C).

Practical Database Patterns

Connection Pooling

Managing database connections efficiently is critical for performance.
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker, scoped_session
from contextlib import contextmanager

class DatabasePool:
    """Production-ready database connection pool"""
    
    def __init__(self, database_url: str):
        self.engine = create_engine(
            database_url,
            pool_size=20,           # Persistent connections
            max_overflow=30,        # Extra connections under load
            pool_timeout=30,        # Wait time for connection
            pool_recycle=3600,      # Recycle connections hourly
            pool_pre_ping=True,     # Verify connection before use
            echo=False              # Set True for SQL logging
        )
        
        self._session_factory = sessionmaker(bind=self.engine)
        self.Session = scoped_session(self._session_factory)
    
    @contextmanager
    def get_session(self):
        """Context manager for database sessions"""
        session = self.Session()
        try:
            yield session
            session.commit()
        except Exception as e:
            session.rollback()
            raise e
        finally:
            session.close()
    
    def execute_with_retry(self, query, params=None, max_retries=3):
        """Execute query with automatic retry on transient failures"""
        import time
        
        for attempt in range(max_retries):
            try:
                with self.get_session() as session:
                    result = session.execute(query, params or {})
                    return result.fetchall()
            except Exception as e:
                if "connection" in str(e).lower() and attempt < max_retries - 1:
                    time.sleep(2 ** attempt)  # Exponential backoff
                    continue
                raise


# Usage
db = DatabasePool("postgresql://user:pass@localhost/mydb")

with db.get_session() as session:
    users = session.query(User).filter(User.active == True).all()

Read/Write Splitting

Route reads to replicas and writes to primary for better scalability.
import random
from datetime import datetime, timedelta
from typing import Optional, List, Any
from enum import Enum

class QueryType(Enum):
    READ = "read"
    WRITE = "write"

class ReadWriteRouter:
    """
    Routes queries to primary or replicas based on query type
    and session consistency requirements.
    """
    
    def __init__(self, primary_pool, replica_pools: list):
        self.primary = primary_pool
        self.replicas = replica_pools
        self.user_writes = {}  # Track recent writes per user
        self.replication_lag_threshold = 5  # seconds
    
    def get_connection(self, query_type: QueryType, 
                       user_id: Optional[str] = None):
        """
        Get appropriate connection based on query type and user session.
        Ensures read-your-writes consistency.
        """
        if query_type == QueryType.WRITE:
            # All writes go to primary
            self._record_write(user_id)
            return self.primary
        
        # For reads, check if user recently wrote
        if user_id and self._should_read_from_primary(user_id):
            return self.primary
        
        # Otherwise, load balance across replicas
        return self._get_healthy_replica()
    
    def _record_write(self, user_id: Optional[str]):
        if user_id:
            self.user_writes[user_id] = datetime.utcnow()
    
    def _should_read_from_primary(self, user_id: str) -> bool:
        """Read from primary if user wrote recently"""
        last_write = self.user_writes.get(user_id)
        if not last_write:
            return False
        
        time_since_write = (datetime.utcnow() - last_write).total_seconds()
        return time_since_write < self.replication_lag_threshold
    
    def _get_healthy_replica(self):
        """Get a healthy replica with lowest lag"""
        healthy = [r for r in self.replicas if self._is_healthy(r)]
        
        if not healthy:
            # Fallback to primary if no healthy replicas
            return self.primary
        
        # Random selection (could be weighted by lag)
        return random.choice(healthy)
    
    def _is_healthy(self, replica) -> bool:
        try:
            # Check replication lag
            result = replica.execute("SELECT pg_last_xact_replay_timestamp()")
            lag = (datetime.utcnow() - result).total_seconds()
            return lag < 30  # Healthy if lag < 30 seconds
        except:
            return False


# ORM Integration Example
class UserRepository:
    def __init__(self, router: ReadWriteRouter):
        self.router = router
    
    def find_by_id(self, user_id: str, session_user: str = None) -> dict:
        """Read operation - goes to replica"""
        conn = self.router.get_connection(QueryType.READ, session_user)
        return conn.execute(
            "SELECT * FROM users WHERE id = %s", 
            (user_id,)
        ).fetchone()
    
    def update(self, user_id: str, data: dict, session_user: str) -> bool:
        """Write operation - goes to primary"""
        conn = self.router.get_connection(QueryType.WRITE, session_user)
        conn.execute(
            "UPDATE users SET name = %s WHERE id = %s",
            (data['name'], user_id)
        )
        return True


# Usage
router = ReadWriteRouter(primary_pool, [replica1, replica2])
repo = UserRepository(router)

# This write goes to primary
repo.update("user_123", {"name": "Alice"}, session_user="user_123")

# This read goes to primary (recent write by same user)
user = repo.find_by_id("user_123", session_user="user_123")

# This read goes to replica (different user)
user = repo.find_by_id("user_123", session_user="user_456")

Sharding Implementation

Sharding Strategies
import hashlib
from typing import Dict, Any, List

class ShardRouter:
    """
    Routes queries to appropriate shard based on shard key.
    Uses consistent hashing for even distribution.
    """
    
    def __init__(self, shards: Dict[int, Any]):
        self.shards = shards  # shard_id -> connection pool
        self.num_shards = len(shards)
    
    def get_shard(self, shard_key: str) -> Any:
        """Get shard connection for a given key"""
        shard_id = self._compute_shard_id(shard_key)
        return self.shards[shard_id]
    
    def _compute_shard_id(self, key: str) -> int:
        """Consistent hash to determine shard"""
        hash_value = int(hashlib.md5(key.encode()).hexdigest(), 16)
        return hash_value % self.num_shards
    
    def execute_on_shard(self, shard_key: str, query: str, params: tuple):
        """Execute query on appropriate shard"""
        shard = self.get_shard(shard_key)
        return shard.execute(query, params)
    
    def scatter_gather(self, query: str, params: tuple = None) -> List[Any]:
        """Execute query on ALL shards and combine results"""
        results = []
        for shard_id, shard in self.shards.items():
            try:
                result = shard.execute(query, params or ())
                results.extend(result.fetchall())
            except Exception as e:
                print(f"Shard {shard_id} failed: {e}")
        return results


class ShardedUserRepository:
    """User repository with sharding by user_id"""
    
    def __init__(self, shard_router: ShardRouter):
        self.router = shard_router
    
    def create_user(self, user_id: str, data: dict) -> bool:
        """Create user on appropriate shard"""
        query = """
            INSERT INTO users (id, name, email, created_at)
            VALUES (%s, %s, %s, NOW())
        """
        self.router.execute_on_shard(
            shard_key=user_id,
            query=query,
            params=(user_id, data['name'], data['email'])
        )
        return True
    
    def find_by_id(self, user_id: str) -> dict:
        """Find user - knows exactly which shard"""
        query = "SELECT * FROM users WHERE id = %s"
        result = self.router.execute_on_shard(user_id, query, (user_id,))
        return result.fetchone()
    
    def find_by_email(self, email: str) -> dict:
        """Find by email - must scatter-gather all shards"""
        query = "SELECT * FROM users WHERE email = %s"
        results = self.router.scatter_gather(query, (email,))
        return results[0] if results else None
    
    def get_total_count(self) -> int:
        """Count across all shards"""
        query = "SELECT COUNT(*) as count FROM users"
        results = self.router.scatter_gather(query)
        return sum(r['count'] for r in results)


# Setup
shards = {
    0: create_pool("postgresql://localhost:5432/users_shard_0"),
    1: create_pool("postgresql://localhost:5433/users_shard_1"),
    2: create_pool("postgresql://localhost:5434/users_shard_2"),
    3: create_pool("postgresql://localhost:5435/users_shard_3"),
}

router = ShardRouter(shards)
repo = ShardedUserRepository(router)

# Usage
repo.create_user("user_123", {"name": "Alice", "email": "alice@example.com"})
user = repo.find_by_id("user_123")  # Goes to one shard
user = repo.find_by_email("alice@example.com")  # Queries ALL shards

Query Optimization

EXPLAIN ANALYZE

EXPLAIN ANALYZE
SELECT * FROM orders 
WHERE user_id = 123 
AND created_at > '2024-01-01';

-- Output:
-- Index Scan using idx_orders_user_date on orders
--   Index Cond: ((user_id = 123) AND (created_at > '2024-01-01'))
--   Rows Returned: 45
--   Actual Time: 0.023..0.089 ms

Common Optimizations

-- [Bad] Function on column prevents index use
SELECT * FROM users WHERE YEAR(created_at) = 2024;

-- [Good] Range query uses index
SELECT * FROM users 
WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';

-- [Bad] SELECT * fetches unnecessary data
SELECT * FROM users WHERE id = 123;

-- [Good] Select only needed columns
SELECT id, name, email FROM users WHERE id = 123;

-- [Bad] N+1 queries
for user in users:
    orders = query("SELECT * FROM orders WHERE user_id = ?", user.id)

-- [Good] Single query with JOIN
SELECT u.*, o.* FROM users u 
LEFT JOIN orders o ON u.id = o.user_id 
WHERE u.status = 'active';

Database Selection Guide

RequirementRecommended
Complex transactionsPostgreSQL, MySQL
Simple key-valueRedis, DynamoDB
Document storageMongoDB
Time seriesInfluxDB, TimescaleDB
Full-text searchElasticsearch
Graph relationsNeo4j
AnalyticsClickHouse, BigQuery
Global scaleCockroachDB, Spanner
Design Tip: Start with PostgreSQL for most applications. It handles JSON, full-text search, and scales well. Add specialized databases only when needed.

Interview Questions

Strong answer:
  • The way I think about this is: SQL is the default, and you need a compelling reason to reach for NoSQL. The compelling reasons are (a) your data is genuinely unstructured or polymorphic, like user-generated content where every document has different fields, (b) you need horizontal write scalability beyond what a single Postgres node can handle — typically north of 50K writes/sec sustained, or (c) your access pattern is almost entirely key-value lookups with no cross-entity queries.
  • In practice, most teams I have seen pick MongoDB because they want “flexible schema,” but what they really mean is they do not want to write migrations. That is a terrible reason — you just move schema enforcement from the database to every single application query, which is strictly worse. Schemaless does not mean schema-free; it means the schema lives in your application code instead of the database, and every reader has to handle every historical shape of the data.
  • The real trade-off is operational: with SQL, you get transactions, foreign keys, and mature tooling for backup/restore, point-in-time recovery, and EXPLAIN plans. With NoSQL, you trade those guarantees for horizontal scalability and flexible data models. At a company processing 500M events/day for analytics, Cassandra was the right call because the write volume was enormous and we never needed cross-partition joins. But for the core user/account/billing system, Postgres was non-negotiable because we needed ACID.
  • What most people miss is that modern Postgres handles JSON columns (jsonb) extremely well — you can have a relational schema with flexible sub-documents, getting 80% of the NoSQL flexibility with full ACID guarantees.
Red flag answer: “NoSQL is better because it scales horizontally and SQL doesn’t scale.” This ignores that most applications never outgrow a single well-tuned Postgres instance, and that horizontal scaling introduces massive complexity around transactions and consistency.Follow-ups:
  1. You mentioned Cassandra for high write volume. How does its write path differ from Postgres internally, and why does that make it faster for writes?
  2. If a team is already on MongoDB and experiencing data consistency bugs from the lack of schema enforcement, what migration strategy would you propose to move the critical path to Postgres without downtime?
Strong answer:
  • A composite index on (A, B, C) is essentially a sorted tree where entries are first sorted by A, then by B within each A value, then by C within each (A, B) pair. This means the index can efficiently serve queries that filter on A, or (A, B), or (A, B, C), but not B alone or (B, C) — because without knowing A first, the index entries for a given B value are scattered across the entire tree.
  • The way I decide column order is: put the column with the highest selectivity that appears in equality conditions first. If your query is WHERE status = 'active' AND created_at > '2024-01-01', and status has 5 distinct values while created_at has millions, you still put status first if it is an equality filter. Equality columns always come before range columns in the index definition, because once you hit a range condition, the index cannot use subsequent columns efficiently.
  • A real-world example: on an e-commerce orders table, an index on (user_id, status, created_at) serves WHERE user_id = 123 AND status = 'shipped' AND created_at > '2024-01-01' perfectly. But if you put created_at first, the query planner would scan a huge date range and then filter in-memory for user_id and status — orders of magnitude slower.
  • The gotcha here is covering indexes: if your query only needs columns that are all in the index (including INCLUDE columns), the database never touches the heap table at all. That is an index-only scan and it is dramatically faster — I have seen queries go from 50ms to 0.5ms just by adding an INCLUDE clause.
Red flag answer: “You should just index every column that appears in a WHERE clause.” This shows no understanding of how multi-column indexes work and would bloat write performance and storage without actually helping query performance.Follow-ups:
  1. You have a query that filters on WHERE B = 5 AND C = 10 but your only composite index is (A, B, C). What happens at the query plan level, and what would you do about it?
  2. How would you decide between creating a new composite index versus adding INCLUDE columns to an existing one? What is the write amplification trade-off?
Strong answer:
  • The shard key decision is the single most consequential choice in a sharded architecture because it is nearly impossible to change later without a full data migration. The key principles are: (1) the shard key should match your dominant access pattern so that most queries hit exactly one shard, (2) it should distribute data evenly to avoid hotspots, and (3) it should be immutable — you never want a row to need to move between shards.
  • For a multi-tenant SaaS app, tenant_id is the natural first choice. Most queries are scoped to a single tenant — “show me tenant X’s dashboards, users, settings.” This keeps all of a tenant’s data co-located on one shard, so the hot path never crosses shards.
  • The risk is tenant size skew. If you have 10,000 small tenants and 3 enterprise tenants that each have 100x the data, those 3 tenants become hot shards. The mitigation is a tiered approach: small tenants share shards (multi-tenant per shard), while large tenants get dedicated shards. Some teams use a shard mapping table — tenant_id -> shard_id — so you can move a growing tenant to its own shard without changing the application logic.
  • What most people miss is cross-tenant queries. Your admin dashboard that shows “total users across all tenants” now requires a scatter-gather across every shard. The solution is to maintain a separate analytics datastore (like ClickHouse or BigQuery) that aggregates cross-tenant data asynchronously, so your operational shards stay fast.
Red flag answer: “I would shard by auto-increment ID using modulo.” This ignores the access pattern entirely and guarantees that every tenant-scoped query becomes a scatter-gather across all shards.Follow-ups:
  1. A tenant that was on a shared shard has grown 50x and is now causing latency for the other tenants on that shard. How do you migrate them to a dedicated shard with zero downtime?
  2. How would your shard key decision change if the application also needed to support queries like “find all tenants in the healthcare industry” or global search across tenants?
Strong answer:
  • With synchronous replication, the primary waits for at least one replica to acknowledge the write before returning success to the client. This guarantees that if the primary dies, no committed data is lost — but it adds network round-trip latency to every write. With async replication, the primary returns success immediately and the replica catches up in the background, which is faster but means the replica can be seconds behind — if the primary dies, those un-replicated writes are gone.
  • In practice, I would use synchronous replication for anything involving money, legal records, or data that is expensive or impossible to reconstruct. Financial transactions at a payments company, for example — losing even one committed transaction is unacceptable.
  • I would accept async replication for data that is high-volume but reconstructable or where slight staleness is tolerable. Examples: analytics events (worst case, you lose a few seconds of events and your dashboard is marginally off), session data (user logs in again), or cached materialized views that get rebuilt periodically.
  • The nuanced middle ground is semi-synchronous replication, which Postgres supports as “synchronous commit to at least N replicas.” You can set synchronous_standby_names to require one of two replicas to ACK, so if one replica is slow, the other picks up the slack. This gets you durability without the worst-case latency of waiting for the slowest replica. In MySQL, rpl_semi_sync_master_wait_for_slave_count does the same thing. The trade-off is that if both replicas are down, writes either block or you have to manually failover to async mode.
Red flag answer: “Always use synchronous replication because data loss is never acceptable.” This shows no understanding of real-world trade-offs — synchronous replication across data centers can add 50-100ms per write, which would destroy throughput for high-volume systems.Follow-ups:
  1. You are running async replication and the primary fails with 3 seconds of un-replicated writes. How do you detect what was lost and what is your recovery procedure?
  2. How does replication lag affect your read/write splitting strategy, and what mechanisms would you put in place to handle it?
Strong answer:
  • The most common reason is that the query planner estimates the query will return a large fraction of the table. Postgres’s cost-based optimizer decides that a sequential scan reading the entire table in order is cheaper than an index scan that requires random I/O for each row. This typically happens when the WHERE clause matches more than roughly 10-20% of the table. The fix depends on whether the planner’s estimate is correct.
  • If the estimate is wrong — for example, stale statistics — run ANALYZE on the table to refresh the planner’s statistics. I have seen cases where a table grew from 1M to 50M rows but ANALYZE had not run, so the planner thought the table was still small. In Postgres, autovacuum handles this, but on tables with very high write rates, autovacuum can fall behind.
  • If the query genuinely matches a large percentage of rows, an index will not help — it is actually faster to do a sequential scan. The fix is to redesign the query: add more selective filters, use a partial index that only covers the subset you care about (e.g., CREATE INDEX idx_active ON orders(created_at) WHERE status = 'active'), or use a covering index to enable an index-only scan.
  • Other causes: the function wrapping the indexed column prevents index usage (e.g., WHERE YEAR(created_at) = 2024 cannot use an index on created_at — rewrite as a range condition), type mismatch between the column and the parameter (e.g., comparing a varchar column to an integer), or the index was created with a different collation or operator class. A sneaky one in Postgres: if the column is nullable and the query uses IS NOT NULL, the B-tree index does include NULLs by default, but the planner may still prefer a seq scan if most rows are non-null.
Red flag answer: “The index must be corrupted, I would drop and recreate it.” Jumping to index corruption without checking EXPLAIN output, statistics, or query structure shows no systematic debugging approach.Follow-ups:
  1. How would you distinguish between “planner chose seq scan because it is genuinely faster” versus “planner chose seq scan because of stale statistics”? What specific numbers in the EXPLAIN output would you look at?
  2. You have a query with WHERE status IN ('pending', 'processing') AND created_at > NOW() - INTERVAL '7 days' on a 200M-row table. Design the optimal index and explain why.
Strong answer:
  • Consistent hashing maps both data keys and server nodes onto a circular hash space (a “ring” from 0 to 2^32). Each key is assigned to the first node encountered when walking clockwise from the key’s position on the ring. The critical advantage over simple hash(key) % N is what happens when N changes.
  • With hash(key) % N, adding or removing a single server changes the modulo result for nearly every key, meaning you need to migrate almost all data. With consistent hashing, only keys between the removed node and its predecessor (going counter-clockwise) need to move — roughly 1/N of all keys. This is the difference between migrating 100% of data versus migrating 7% when going from 15 to 16 nodes.
  • In practice, naive consistent hashing creates uneven distribution because nodes are placed at arbitrary points on the ring. The fix is virtual nodes (vnodes) — each physical server gets 100-200 positions on the ring instead of one. This smooths out the distribution and also makes rebalancing more granular. When a node goes down, its load is distributed across many other nodes instead of just the next one on the ring. Cassandra, DynamoDB, and Riak all use this approach.
  • The operational gotcha is that consistent hashing only helps with data placement — it does not solve the replication or data transfer problem. When you add a node, the data that maps to it still needs to be physically copied from the nodes that currently own it. In Cassandra, this is the “streaming” process and it can saturate the network for hours on large clusters. You need to throttle it to avoid impacting live traffic.
Red flag answer: “Consistent hashing just means using a hash function to pick a server.” This misses the entire point — it is about minimizing data movement during topology changes, not just about hashing.Follow-ups:
  1. How do virtual nodes interact with replication? If a key’s primary is vnode V on physical node A, how do you ensure replicas land on different physical nodes rather than other vnodes on the same machine?
  2. You are using consistent hashing and one node is consistently hotter than others despite vnodes. What debugging steps would you take, and what might be the root cause?
Strong answer:
  • The N+1 problem occurs when code loads a list of N entities and then, for each one, fires a separate query to load a related entity. So you get 1 query for the list plus N queries for the relations — hence N+1. On a page showing 50 orders with their products, that is 51 database round-trips instead of 1 or 2. Each round-trip adds latency (typically 0.5-2ms per query on a local network), so N+1 can silently turn a 5ms page load into a 100ms one.
  • Detection in production: the best tool is query logging with aggregation. In Postgres, pg_stat_statements groups identical query patterns and shows total execution count and time — if you see SELECT * FROM products WHERE id = $1 called 10,000 times per minute with identical patterns, that is N+1. APM tools like Datadog or New Relic can also show “queries per request” as a metric, and any request with more than ~10 queries is suspicious. Django has django-debug-toolbar and nplusone library; Rails has bullet gem.
  • Solutions beyond JOIN: (1) Batch loading — collect all the IDs and do WHERE id IN (...) in one query. ORMs call this “eager loading” (select_related/prefetch_related in Django, .includes() in Rails). (2) DataLoader pattern (from GraphQL) — batch and deduplicate database calls within a single request lifecycle. (3) Denormalization — embed the related data in the parent record if it is read-heavy and rarely changes. (4) Caching the related entities — if products rarely change, a cache hit avoids the query entirely.
  • The trade-off with JOINs at scale is that large JOINs can be expensive too — a JOIN across two 100M-row tables with poor indexes can be worse than N+1 with good caching. The real answer is always: measure first, optimize the specific bottleneck.
Red flag answer: “Just add eager loading everywhere to prevent it.” Blindly eager-loading all relations leads to loading massive amounts of unused data, wasting memory and bandwidth. You should only eager-load relations you actually use.Follow-ups:
  1. You are working with a GraphQL API that has deeply nested resolvers. How does the DataLoader pattern solve N+1 at the resolver level, and what are its limitations?
  2. You have identified an N+1 problem in a critical path, but the ORM’s eager loading generates a JOIN that is even slower due to Cartesian product explosion with multiple has-many relations. What do you do?
Strong answer:
  • A connection pool maintains a set of pre-established database connections that application threads borrow and return, avoiding the overhead of creating a new TCP connection and completing the authentication handshake for every query. Creating a Postgres connection takes 5-20ms (TCP handshake + TLS + auth), so a pool eliminates that latency for the steady state.
  • When the pool is exhausted, the behavior depends on configuration: most pools will queue the request and wait up to a timeout (e.g., 30 seconds) for a connection to be returned. If the timeout expires, the application gets an exception. In production, pool exhaustion is usually a symptom, not the root cause — the real problem is slow queries holding connections too long, a missing index causing queries to take 5 seconds instead of 5 milliseconds, or a transaction left open without committing.
  • For sizing, I use this formula as a starting point: pool_size = (number_of_cores * 2) + effective_spindle_count. For a 4-core machine with SSD storage, that is roughly 9-10 connections. The key insight is that more connections is not better — Postgres performance degrades beyond a certain point because each connection consumes about 5-10MB of RAM and the OS context-switches between them. I have seen cases where reducing the pool from 200 to 30 connections actually improved throughput because the database stopped thrashing.
  • In a microservices architecture, pool sizing gets harder because you have N services each with their own pool. If you have 10 services with 20 connections each, that is 200 connections to one Postgres instance. The solution is PgBouncer or a similar connection pooler in front of the database, which multiplexes hundreds of application-side connections into a smaller number of actual database connections (say, 50). PgBouncer in transaction-mode pooling is the standard recommendation.
Red flag answer: “Set the pool size to something large like 500 so you never run out.” This shows no understanding that databases have finite connection capacity, and that each connection consumes memory and CPU on the database server.Follow-ups:
  1. What is the difference between PgBouncer’s session-mode, transaction-mode, and statement-mode pooling? When would you use each, and what breaks in transaction mode?
  2. You see ERROR: remaining connection slots are reserved for superuser connections in production. Walk me through your immediate triage and the longer-term fix.