Skip to main content

Module 16: System Design Case Studies

Apply everything you’ve learned to real-world database design challenges. Each case study covers requirements, schema design, queries, scaling, and trade-offs.
Estimated Time: 10-12 hours
Format: 6 comprehensive case studies
Interview Use: Practice for 45-60 min design rounds
Skill Level: Senior/Staff

Case Study 1: E-Commerce Platform

Requirements

  • Product catalog with categories and attributes
  • User accounts with addresses
  • Shopping cart and checkout
  • Order management with status tracking
  • Inventory management with stock levels
  • Review and rating system

Schema Design

-- Core entities
CREATE TABLE users (
    id BIGSERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    password_hash VARCHAR(255) NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE addresses (
    id BIGSERIAL PRIMARY KEY,
    user_id BIGINT REFERENCES users(id),
    type VARCHAR(20), -- 'shipping', 'billing'
    street TEXT NOT NULL,
    city VARCHAR(100),
    country VARCHAR(2),
    postal_code VARCHAR(20),
    is_default BOOLEAN DEFAULT false
);

-- Product catalog
CREATE TABLE categories (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    parent_id INT REFERENCES categories(id),
    path ltree  -- requires ltree extension
);

CREATE TABLE products (
    id BIGSERIAL PRIMARY KEY,
    sku VARCHAR(50) UNIQUE NOT NULL,
    name VARCHAR(255) NOT NULL,
    description TEXT,
    category_id INT REFERENCES categories(id),
    base_price DECIMAL(10,2) NOT NULL,
    attributes JSONB DEFAULT '{}',  -- flexible attributes
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE product_variants (
    id BIGSERIAL PRIMARY KEY,
    product_id BIGINT REFERENCES products(id),
    sku VARCHAR(50) UNIQUE NOT NULL,
    name VARCHAR(100),  -- "Red, Large"
    price_modifier DECIMAL(10,2) DEFAULT 0,
    attributes JSONB NOT NULL  -- {"color": "red", "size": "L"}
);

-- Inventory
CREATE TABLE inventory (
    variant_id BIGINT PRIMARY KEY REFERENCES product_variants(id),
    quantity INT NOT NULL DEFAULT 0,
    reserved INT NOT NULL DEFAULT 0,
    CHECK (quantity >= 0),
    CHECK (reserved >= 0),
    CHECK (reserved <= quantity)
);

-- Orders
CREATE TABLE orders (
    id BIGSERIAL PRIMARY KEY,
    user_id BIGINT REFERENCES users(id),
    status VARCHAR(20) DEFAULT 'pending',
    total DECIMAL(12,2) NOT NULL,
    shipping_address_id BIGINT REFERENCES addresses(id),
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE order_items (
    id BIGSERIAL PRIMARY KEY,
    order_id BIGINT REFERENCES orders(id),
    variant_id BIGINT REFERENCES product_variants(id),
    quantity INT NOT NULL,
    unit_price DECIMAL(10,2) NOT NULL,
    CHECK (quantity > 0)
);

-- Reviews
CREATE TABLE reviews (
    id BIGSERIAL PRIMARY KEY,
    product_id BIGINT REFERENCES products(id),
    user_id BIGINT REFERENCES users(id),
    rating INT NOT NULL CHECK (rating BETWEEN 1 AND 5),
    title VARCHAR(200),
    body TEXT,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    UNIQUE(product_id, user_id)
);

Key Queries

-- Product listing with category tree
WITH RECURSIVE cat_tree AS (
    SELECT id, name, parent_id, ARRAY[name] AS path
    FROM categories WHERE id = $category_id
    UNION ALL
    SELECT c.id, c.name, c.parent_id, ct.path || c.name
    FROM categories c
    JOIN cat_tree ct ON c.parent_id = ct.id
)
SELECT p.*, array_agg(pv.id) AS variant_ids
FROM products p
JOIN cat_tree ct ON p.category_id = ct.id
LEFT JOIN product_variants pv ON pv.product_id = p.id
GROUP BY p.id;

-- Checkout with inventory reservation
BEGIN;
SELECT * FROM inventory 
WHERE variant_id = $variant_id 
FOR UPDATE;  -- Lock row

UPDATE inventory 
SET reserved = reserved + $quantity
WHERE variant_id = $variant_id
  AND (quantity - reserved) >= $quantity;

-- Check affected rows, if 0 then out of stock
COMMIT;

-- Product with aggregated reviews
SELECT 
    p.*,
    COUNT(r.id) AS review_count,
    AVG(r.rating) AS avg_rating,
    jsonb_build_object(
        '5', COUNT(*) FILTER (WHERE r.rating = 5),
        '4', COUNT(*) FILTER (WHERE r.rating = 4),
        '3', COUNT(*) FILTER (WHERE r.rating = 3),
        '2', COUNT(*) FILTER (WHERE r.rating = 2),
        '1', COUNT(*) FILTER (WHERE r.rating = 1)
    ) AS rating_distribution
FROM products p
LEFT JOIN reviews r ON r.product_id = p.id
WHERE p.id = $product_id
GROUP BY p.id;

Scaling Strategy

┌─────────────────────────────────────────────────────────────────────────────┐
│                    E-COMMERCE SCALING                                        │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   Read Path (90% of traffic):                                               │
│   ┌─────────────┐     ┌─────────────┐     ┌─────────────┐                  │
│   │   CDN       │────▶│   Redis     │────▶│   Replica   │                  │
│   │ (static)    │     │ (products)  │     │  (queries)  │                  │
│   └─────────────┘     └─────────────┘     └─────────────┘                  │
│                                                                              │
│   Write Path (10% of traffic):                                              │
│   ┌─────────────┐     ┌─────────────┐                                       │
│   │    App      │────▶│   Primary   │                                       │
│   │  (orders)   │     │  (writes)   │                                       │
│   └─────────────┘     └─────────────┘                                       │
│                                                                              │
│   Indexing:                                                                  │
│   • products(category_id, created_at DESC) - listings                       │
│   • products USING GIN(attributes) - attribute search                       │
│   • inventory(variant_id) - stock checks                                    │
│   • orders(user_id, created_at DESC) - order history                        │
│                                                                              │
│   Partitioning:                                                              │
│   • orders by created_at (monthly) - historical data                        │
│   • reviews by product_id hash - even distribution                          │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Case Study 2: Real-Time Analytics

Requirements

  • Ingest millions of events per day
  • Sub-second dashboard queries
  • Time-series aggregations (hourly, daily, weekly)
  • Retention: raw data 30 days, aggregates forever

Schema Design

-- Time-series events with partitioning
CREATE TABLE events (
    id BIGSERIAL,
    event_time TIMESTAMPTZ NOT NULL,
    event_type VARCHAR(50) NOT NULL,
    user_id BIGINT,
    properties JSONB,
    PRIMARY KEY (event_time, id)
) PARTITION BY RANGE (event_time);

-- Create monthly partitions
CREATE TABLE events_2024_01 PARTITION OF events
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
-- ... more partitions

-- Pre-aggregated summaries
CREATE TABLE event_stats_hourly (
    hour TIMESTAMPTZ NOT NULL,
    event_type VARCHAR(50) NOT NULL,
    count BIGINT DEFAULT 0,
    unique_users BIGINT DEFAULT 0,
    PRIMARY KEY (hour, event_type)
);

CREATE TABLE event_stats_daily (
    date DATE NOT NULL,
    event_type VARCHAR(50) NOT NULL,
    count BIGINT DEFAULT 0,
    unique_users BIGINT DEFAULT 0,
    properties_agg JSONB,  -- aggregated metrics
    PRIMARY KEY (date, event_type)
);

-- HyperLogLog for unique counts (requires extension)
CREATE EXTENSION IF NOT EXISTS hll;

ALTER TABLE event_stats_hourly 
ADD COLUMN user_hll hll DEFAULT hll_empty();

Ingestion Pattern

-- Batch insert with conflict handling
INSERT INTO events (event_time, event_type, user_id, properties)
VALUES 
    ($1, $2, $3, $4),
    ($5, $6, $7, $8),
    -- ... batch of 1000
ON CONFLICT DO NOTHING;

-- Continuous aggregation (run every hour)
INSERT INTO event_stats_hourly (hour, event_type, count, unique_users, user_hll)
SELECT 
    date_trunc('hour', event_time) AS hour,
    event_type,
    COUNT(*) AS count,
    COUNT(DISTINCT user_id) AS unique_users,
    hll_add_agg(hll_hash_bigint(user_id)) AS user_hll
FROM events
WHERE event_time >= $last_processed_hour
  AND event_time < $current_hour
GROUP BY 1, 2
ON CONFLICT (hour, event_type) DO UPDATE SET
    count = event_stats_hourly.count + EXCLUDED.count,
    unique_users = EXCLUDED.unique_users,
    user_hll = hll_union(event_stats_hourly.user_hll, EXCLUDED.user_hll);

Query Patterns

-- Dashboard query (uses pre-aggregates)
SELECT 
    date_trunc('day', hour) AS date,
    event_type,
    SUM(count) AS total_events,
    hll_cardinality(hll_union_agg(user_hll)) AS unique_users
FROM event_stats_hourly
WHERE hour >= NOW() - INTERVAL '7 days'
GROUP BY 1, 2
ORDER BY 1, 2;

-- Drill-down to raw events
SELECT 
    event_time,
    event_type,
    properties
FROM events
WHERE event_time >= $start AND event_time < $end
  AND event_type = $type
ORDER BY event_time DESC
LIMIT 100;

Data Lifecycle

-- Drop old partitions (automated)
DROP TABLE events_2023_01;  -- Instant, no locking

-- Compress old data with TimescaleDB (optional)
-- Or move to columnar storage (Citus columnar)
SELECT alter_table_set_access_method('events_2023_12', 'columnar');

Case Study 3: Social Media Feed

Requirements

  • Users follow other users
  • Home feed shows posts from followed users
  • Support for 100M users, 10M DAU
  • Feed must be < 100ms

Schema Design

CREATE TABLE users (
    id BIGSERIAL PRIMARY KEY,
    username VARCHAR(50) UNIQUE NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE follows (
    follower_id BIGINT NOT NULL REFERENCES users(id),
    following_id BIGINT NOT NULL REFERENCES users(id),
    created_at TIMESTAMPTZ DEFAULT NOW(),
    PRIMARY KEY (follower_id, following_id)
);

CREATE TABLE posts (
    id BIGSERIAL PRIMARY KEY,
    author_id BIGINT NOT NULL REFERENCES users(id),
    content TEXT NOT NULL,
    media_urls TEXT[],
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Fan-out timeline (for < 10K followers)
CREATE TABLE timelines (
    user_id BIGINT NOT NULL,
    post_id BIGINT NOT NULL REFERENCES posts(id),
    author_id BIGINT NOT NULL,
    created_at TIMESTAMPTZ NOT NULL,
    PRIMARY KEY (user_id, created_at DESC, post_id)
);

-- Celebrity posts (for > 10K followers)
-- Not fanned out, merged at read time
CREATE INDEX idx_posts_author_time ON posts(author_id, created_at DESC);

Fan-Out Strategy

-- On new post from regular user (< 10K followers)
INSERT INTO timelines (user_id, post_id, author_id, created_at)
SELECT 
    follower_id,
    $post_id,
    $author_id,
    $created_at
FROM follows
WHERE following_id = $author_id;

-- Async worker for large fan-outs
-- Use message queue (RabbitMQ, SQS) for batching

Feed Query (Hybrid)

-- Get user's celebrity followings
WITH celebrities AS (
    SELECT following_id 
    FROM follows f
    JOIN user_stats us ON us.user_id = f.following_id
    WHERE f.follower_id = $user_id 
      AND us.follower_count > 10000
),
-- Get pre-materialized timeline
materialized AS (
    SELECT post_id, author_id, created_at
    FROM timelines
    WHERE user_id = $user_id
      AND created_at > NOW() - INTERVAL '7 days'
    ORDER BY created_at DESC
    LIMIT 100
),
-- Get celebrity posts
celebrity_posts AS (
    SELECT id AS post_id, author_id, created_at
    FROM posts
    WHERE author_id IN (SELECT following_id FROM celebrities)
      AND created_at > NOW() - INTERVAL '7 days'
    ORDER BY created_at DESC
    LIMIT 100
)
-- Merge and return
SELECT p.*
FROM posts p
JOIN (
    SELECT * FROM materialized
    UNION ALL
    SELECT * FROM celebrity_posts
) feed ON feed.post_id = p.id
ORDER BY feed.created_at DESC
LIMIT 20;

Case Study 4: Multi-Tenant SaaS

Requirements

  • Thousands of tenants (companies)
  • Complete data isolation
  • Some tenants are very large (1M+ rows)
  • Compliance requirements (GDPR)

Schema Design (Shared with RLS)

-- Enable RLS
CREATE TABLE tenants (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name VARCHAR(255) NOT NULL,
    plan VARCHAR(20) DEFAULT 'free',
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE users (
    id BIGSERIAL PRIMARY KEY,
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    email VARCHAR(255) NOT NULL,
    role VARCHAR(20) DEFAULT 'member',
    UNIQUE(tenant_id, email)
);

CREATE TABLE projects (
    id BIGSERIAL PRIMARY KEY,
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    name VARCHAR(255) NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE tasks (
    id BIGSERIAL PRIMARY KEY,
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    project_id BIGINT REFERENCES projects(id),
    title VARCHAR(255) NOT NULL,
    status VARCHAR(20) DEFAULT 'open',
    assignee_id BIGINT REFERENCES users(id)
);

-- Row Level Security
ALTER TABLE projects ENABLE ROW LEVEL SECURITY;
ALTER TABLE tasks ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation_projects ON projects
    USING (tenant_id = current_setting('app.tenant_id')::UUID);

CREATE POLICY tenant_isolation_tasks ON tasks
    USING (tenant_id = current_setting('app.tenant_id')::UUID);

-- Set tenant context (in application)
-- SET app.tenant_id = 'uuid-here';

Connection Management

# Application middleware
def set_tenant_context(tenant_id):
    """Set tenant context for RLS"""
    with get_db_connection() as conn:
        conn.execute(f"SET app.tenant_id = '{tenant_id}'")
        yield conn
        conn.execute("RESET app.tenant_id")

# PgBouncer with tenant routing (advanced)
# Route large tenants to dedicated pools

Tenant Isolation Testing

-- Test: Ensure tenant A can't see tenant B's data
SET app.tenant_id = 'tenant-a-uuid';

SELECT * FROM projects;  -- Should only see tenant A

-- Try to access tenant B (should fail silently - return 0 rows)
SELECT * FROM projects WHERE tenant_id = 'tenant-b-uuid';
-- Returns 0 rows due to RLS

-- Bypass for admin operations
SET ROLE admin_role;  -- Must have BYPASSRLS
SELECT * FROM projects;  -- Sees all tenants

Case Study 5: Event Sourcing System

Requirements

  • All state changes stored as events (audit trail)
  • Ability to rebuild state at any point in time
  • Support for projections (read models)
  • Event replay for debugging

Schema Design

-- Event store (append-only)
CREATE TABLE events (
    id BIGSERIAL PRIMARY KEY,
    stream_id UUID NOT NULL,
    stream_type VARCHAR(100) NOT NULL,  -- 'Order', 'User', etc.
    event_type VARCHAR(100) NOT NULL,   -- 'OrderCreated', 'ItemAdded'
    event_data JSONB NOT NULL,
    metadata JSONB DEFAULT '{}',
    version INT NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    UNIQUE(stream_id, version)  -- Optimistic concurrency
);

-- Example events
INSERT INTO events (stream_id, stream_type, event_type, event_data, version)
VALUES 
    ('order-123', 'Order', 'OrderCreated', 
     '{"customer_id": 456, "items": []}', 1),
    ('order-123', 'Order', 'ItemAdded', 
     '{"product_id": 789, "quantity": 2}', 2),
    ('order-123', 'Order', 'OrderSubmitted', 
     '{"submitted_at": "2024-01-15T10:30:00Z"}', 3);

-- Projection: Materialized current state
CREATE TABLE order_projections (
    order_id UUID PRIMARY KEY,
    customer_id BIGINT NOT NULL,
    items JSONB NOT NULL DEFAULT '[]',
    status VARCHAR(20) NOT NULL,
    total DECIMAL(12,2),
    last_event_id BIGINT NOT NULL,  -- For replay
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

-- Subscription checkpoints
CREATE TABLE subscription_checkpoints (
    subscription_id VARCHAR(100) PRIMARY KEY,
    last_processed_id BIGINT NOT NULL,
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

Event Replay

-- Rebuild projection from events
CREATE OR REPLACE FUNCTION rebuild_order_projection(p_order_id UUID)
RETURNS void AS $$
DECLARE
    evt RECORD;
    current_state JSONB := '{}';
BEGIN
    -- Delete existing projection
    DELETE FROM order_projections WHERE order_id = p_order_id;
    
    -- Replay all events
    FOR evt IN 
        SELECT * FROM events 
        WHERE stream_id = p_order_id 
        ORDER BY version
    LOOP
        current_state := apply_event(current_state, evt.event_type, evt.event_data);
    END LOOP;
    
    -- Insert rebuilt projection
    INSERT INTO order_projections (order_id, customer_id, items, status, last_event_id)
    VALUES (
        p_order_id,
        (current_state->>'customer_id')::BIGINT,
        current_state->'items',
        current_state->>'status',
        evt.id
    );
END;
$$ LANGUAGE plpgsql;

Subscription Processing

# Python async event processor
async def process_events():
    last_id = get_checkpoint('order_projection')
    
    while True:
        events = await db.fetch("""
            SELECT * FROM events 
            WHERE id > $1 
            ORDER BY id 
            LIMIT 100
        """, last_id)
        
        for event in events:
            await update_projection(event)
            last_id = event['id']
        
        await save_checkpoint('order_projection', last_id)
        await asyncio.sleep(0.1)

Case Study 6: Global Application

Requirements

  • Users in US, EU, Asia
  • Data residency (EU data stays in EU)
  • Low latency reads globally
  • Strong consistency for writes

Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                    GLOBAL DATABASE ARCHITECTURE                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   US-EAST (Primary for US users)                                            │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │  ┌─────────────┐     ┌─────────────┐                               │   │
│   │  │   Primary   │────▶│   Replica   │ (local reads)                 │   │
│   │  │  (writes)   │     │             │                               │   │
│   │  └──────┬──────┘     └─────────────┘                               │   │
│   └─────────┼───────────────────────────────────────────────────────────┘   │
│             │ async replication (reference data only)                       │
│   ┌─────────┼───────────────────────────────────────────────────────────┐   │
│   │         ▼           EU-WEST (Primary for EU users)                 │   │
│   │  ┌─────────────┐     ┌─────────────┐                               │   │
│   │  │   Primary   │────▶│   Replica   │                               │   │
│   │  │  (writes)   │     │             │                               │   │
│   │  └─────────────┘     └─────────────┘                               │   │
│   │                                                                     │   │
│   │  EU user data ONLY stored here (GDPR)                              │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
│   Data Routing:                                                             │
│   • User data: Route to home region based on user.region                   │
│   • Reference data: Replicated globally (products, configs)                │
│   • Cross-region queries: Federation layer or application-level           │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Implementation

-- Users partitioned by region
CREATE TABLE users (
    id BIGSERIAL,
    region VARCHAR(10) NOT NULL,  -- 'us', 'eu', 'asia'
    email VARCHAR(255) NOT NULL,
    data JSONB,
    PRIMARY KEY (region, id)
) PARTITION BY LIST (region);

CREATE TABLE users_us PARTITION OF users FOR VALUES IN ('us');
CREATE TABLE users_eu PARTITION OF users FOR VALUES IN ('eu');
CREATE TABLE users_asia PARTITION OF users FOR VALUES IN ('asia');

-- Reference data (replicated everywhere)
CREATE TABLE products (
    id BIGSERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    -- No region-specific data
);

-- Foreign data wrapper for cross-region queries
CREATE EXTENSION postgres_fdw;

CREATE SERVER eu_server
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'eu-db.example.com', port '5432', dbname 'app');

CREATE USER MAPPING FOR app_user
SERVER eu_server
OPTIONS (user 'app_user', password 'secret');

CREATE FOREIGN TABLE users_eu_remote (
    id BIGINT,
    region VARCHAR(10),
    email VARCHAR(255)
) SERVER eu_server OPTIONS (table_name 'users_eu');

Summary

You’ve now seen how to apply database design principles to real-world systems:

E-Commerce

Inventory, orders, flexible attributes

Analytics

Time-series, pre-aggregation, partitioning

Social Feed

Fan-out, hybrid approach, caching

Multi-Tenant

RLS, isolation, connection management

Event Sourcing

Append-only, projections, replay

Global

Geo-routing, data residency, federation

Course Completion 🎉

Congratulations! You’ve completed the Database Engineering Mastery course.
You’re now prepared for:
  • Senior/Staff engineering interviews
  • PostgreSQL open-source contributions
  • Production database architecture and operations

What’s Next?