Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Module 16: System Design Case Studies
Apply everything you’ve learned to real-world database design challenges. Each case study covers requirements, schema design, queries, scaling, and trade-offs. Think of these as the difference between knowing how to play individual chess pieces and actually playing a full game — this is where isolated skills become integrated engineering judgment.Format: 6 comprehensive case studies
Interview Use: Practice for 45-60 min design rounds
Skill Level: Senior/Staff
Case Study 1: E-Commerce Platform
Requirements
- Product catalog with categories and attributes
- User accounts with addresses
- Shopping cart and checkout
- Order management with status tracking
- Inventory management with stock levels
- Review and rating system
Schema Design
Key Queries
Scaling Strategy
Case Study 2: Real-Time Analytics
Requirements
- Ingest millions of events per day
- Sub-second dashboard queries
- Time-series aggregations (hourly, daily, weekly)
- Retention: raw data 30 days, aggregates forever
Schema Design
Ingestion Pattern
Query Patterns
Data Lifecycle
Case Study 3: Social Media Feed
Requirements
- Users follow other users
- Home feed shows posts from followed users
- Support for 100M users, 10M DAU
- Feed must be under 100ms
Schema Design
Fan-Out Strategy
Feed Query (Hybrid)
Case Study 4: Multi-Tenant SaaS
Requirements
- Thousands of tenants (companies)
- Complete data isolation
- Some tenants are very large (1M+ rows)
- Compliance requirements (GDPR)
Schema Design (Shared with RLS)
Connection Management
Tenant Isolation Testing
Case Study 5: Event Sourcing System
Requirements
- All state changes stored as events (audit trail)
- Ability to rebuild state at any point in time
- Support for projections (read models)
- Event replay for debugging
Schema Design
Event Replay
Subscription Processing
Case Study 6: Global Application
Requirements
- Users in US, EU, Asia
- Data residency (EU data stays in EU)
- Low latency reads globally
- Strong consistency for writes
Architecture
Implementation
How to Use These Case Studies in Interviews
System design interviews for database engineering roles typically follow a 45-60 minute format. Here’s how to leverage these case studies for maximum interview success:Interview Structure and Timing
Phase 1: Requirements Clarification (5-10 minutes) This is where most candidates fail before they even start. Jumping into schema design without understanding constraints is like writing code without reading the requirements. The interviewer is evaluating your engineering judgment, not your typing speed.- Ask about scale: How many users? Queries per second? Data volume?
- Identify read vs write ratio (often 90:10 or higher)
- Clarify consistency requirements: Strong vs eventual consistency?
- Understand geographic distribution and latency requirements
- Confirm data retention and compliance needs
- Start with a simple single-server design
- Identify the main entities and their relationships
- Draw a basic ER diagram or schema outline
- Explain your normalization choices
- Discuss primary access patterns and query shapes
- Schema Design: Walk through 2-3 core tables in detail
- Show primary keys, foreign keys, indexes
- Explain constraint choices
- Justify data types and column sizes
- Query Patterns: Write 2-3 critical queries
- Show how indexes support these queries
- Discuss query plan expectations
- Mention potential optimization techniques
- Scaling Bottlenecks: Identify the first scaling constraint
- Connection limits? I/O? CPU?
- Propose specific solutions (not just “add caching”)
- Progression from single server to scaled architecture
- Read scaling: Replicas, caching layers, CDN
- Write scaling: Partitioning, sharding strategies
- Handle trade-offs explicitly:
- “If we denormalize here, we gain X but lose Y”
- “Caching improves reads but introduces consistency challenges”
- Discuss monitoring and observability needs
- Mention backup and disaster recovery
- Consider operational complexity
- Address interviewer’s specific concerns
What Interviewers Look For
✅ Strong Signals:- Clear, structured thinking with explicit phases
- Asking clarifying questions before jumping to solutions
- Justifying design decisions with concrete trade-offs
- Concrete SQL examples, not just hand-waving
- Understanding when to denormalize and why
- Awareness of operational concerns (monitoring, backups)
- Scaling incrementally (not over-engineering initially)
- Immediately jumping to complex distributed systems
- Generic answers (“we’ll use microservices” without specifics)
- Not considering query patterns when designing schema
- Ignoring constraints and data integrity
- Proposing technologies without understanding their trade-offs
- Unable to write actual SQL queries
- No discussion of indexes or query optimization
Practice Strategies
1. Time Yourself: Set a 45-minute timer and work through a full case study 2. Write Actual SQL: Don’t just describe—write CREATE TABLE statements 3. Draw Diagrams: Practice drawing ER diagrams and architecture diagrams quickly 4. Explain Out Loud: Verbalize your thinking process as if explaining to an interviewer 5. Compare Your Solution: After completing, review the provided solution and note differences 6. Focus on Trade-offs: For every design choice, practice stating the pros and consCommon Interview Questions to Prepare
- “How would you handle hot partition problems in sharding?”
- “What’s your strategy for zero-downtime migrations?”
- “How do you ensure data consistency across microservices?”
- “Explain your approach to managing database schema changes in production.”
- “How would you optimize this query?” (given EXPLAIN output)
- “What monitoring metrics would you track for this system?”
Case Study to Interview Mapping
- E-Commerce: Tests inventory management, transactions, and consistency
- Analytics Platform: Tests time-series optimization and aggregation strategies
- Social Feed: Tests fan-out approaches and caching patterns
- Multi-Tenant SaaS: Tests isolation strategies and connection management
- Event Sourcing: Tests immutable data patterns and projections
- Global Application: Tests geo-distribution and data residency
Design Template: Build Your Own Case Study
Use this template to practice system design for any application domain:1. Requirements Analysis
Functional Requirements:- List 5-7 core features the system must support
- Identify the main entities and their relationships
- Define critical user workflows (e.g., checkout flow, post creation)
- Scale: How many users? Requests/second? Data volume?
- Read:Write Ratio: Typically 90:10, 95:5, or 50:50?
- Latency: P95/P99 targets (e.g., less than 100ms for reads, less than 500ms for writes)
- Consistency: Strong, eventual, or per-operation?
- Availability: Target uptime (e.g., 99.9%, 99.99%)?
2. Data Model Design
Core Entities Template:- All tables have explicit primary keys
- Foreign keys defined with appropriate ON DELETE behavior
- Unique constraints on natural keys (email, username, etc.)
- Check constraints for data validation
- Appropriate indexes for query patterns (list them explicitly)
- Consider partitioning strategy if time-series or very large tables
- JSONB columns only where schema flexibility is truly needed
3. Access Patterns and Queries
Identify Top 5 Queries by Frequency:- [Query description] - Expected QPS: ___
- [Query description] - Expected QPS: ___
- [Query description] - Expected QPS: ___
- [Query description] - Expected QPS: ___
- [Query description] - Expected QPS: ___
- Transactional writes (e.g., orders, payments)
- Bulk imports (e.g., data ingestion)
- Background jobs (e.g., aggregation, cleanup)
4. Indexing Strategy
5. Scaling Plan
Phase 1: Single Server (0-10K users)- Single PostgreSQL instance
- Application handles connection pooling
- Basic monitoring and backups
- Add 2-3 read replicas
- Implement read/write splitting in application
- Add Redis for session and hot data caching
- CDN for static assets
- Vertical scaling of primary (larger instance)
- Implement connection pooling (PgBouncer)
- Consider partitioning large tables by date/hash
- Asynchronous processing for non-critical writes
- Sharding strategy: [by user_id, by geographic region, etc.]
- Cross-shard query handling approach
- Multi-region deployment with data residency
- Event-driven architecture for eventual consistency
6. Trade-offs and Considerations
Consistency vs Performance:- Where can you use eventual consistency?
- Which operations require strong consistency?
- How do you handle consistency across shards?
- Which tables/columns will you denormalize? Why?
- What’s your strategy for keeping denormalized data in sync?
- Is sharding premature optimization for your scale?
- Can simpler solutions (caching, indexing) solve the problem?
7. Operational Considerations
Monitoring:- Key metrics to track (latency, throughput, error rate, saturation)
- Query performance tracking (pg_stat_statements)
- Replication lag monitoring
- Connection pool utilization
- Backup strategy and RPO (Recovery Point Objective)
- Restore and RTO (Recovery Time Objective)
- WAL archiving and PITR (Point-in-Time Recovery)
- How will you handle schema migrations with zero downtime?
- Versioning strategy for breaking changes
Self-Evaluation Rubric
Use this rubric to assess your case study solutions and identify areas for improvement:Requirements Understanding (Score: /10)
Excellent (9-10):- Asked clarifying questions about scale, consistency, and latency requirements
- Identified all implicit requirements (e.g., multi-tenancy, geographic distribution)
- Made reasonable assumptions where information was missing
- Documented assumptions explicitly
- Asked some clarifying questions
- Covered most critical requirements
- Made some assumptions but didn’t document all
- Jumped directly to solution without clarifying requirements
- Missed critical non-functional requirements
- Made unrealistic assumptions
Schema Design (Score: /20)
Excellent (17-20):- All entities properly normalized (or intentionally denormalized with justification)
- Appropriate data types and constraints
- Foreign keys with correct ON DELETE behavior
- Indexes match query patterns precisely
- Considered partitioning where appropriate
- JSONB used judiciously
- Mostly normalized schema
- Some indexes missing or unnecessary indexes included
- Minor data type issues
- Constraints mostly appropriate
- Normalization violations without justification
- Missing foreign keys or constraints
- No indexes or all indexes wrong
- Inappropriate use of JSONB for structured data
- Missing primary keys
Query Design (Score: /15)
Excellent (13-15):- All queries are efficient and use indexes
- Proper use of JOINs, CTEs, and window functions
- Transactions used appropriately for consistency
- Handles edge cases (NULL values, empty results)
- Queries are functional but not optimally efficient
- Some missed opportunities for index usage
- Minor correctness issues
- Queries don’t use available indexes
- N+1 query patterns
- Missing WHERE clauses or overly broad queries
- Incorrect JOIN types or conditions
Scaling Strategy (Score: /20)
Excellent (17-20):- Clear progression from simple to complex
- Specific solutions for read and write scaling
- Justified when to introduce each component
- Considered operational complexity
- Addressed specific bottlenecks (connection limits, I/O, CPU)
- Reasonable scaling approach
- Some components mentioned without detail
- Missing specific thresholds for scaling triggers
- Generic “use microservices” without specifics
- Over-engineered from the start
- No consideration of bottlenecks
- Proposed solutions don’t match requirements
Trade-off Analysis (Score: /15)
Excellent (13-15):- Every major decision includes explicit trade-offs
- Quantified trade-offs where possible (e.g., “10% write overhead for 50% read improvement”)
- Considered alternative approaches
- Addressed operational implications
- Some trade-offs discussed
- Mostly qualitative analysis
- Missing some alternative approaches
- No trade-off discussion
- Claims solutions are “best” without justification
- Doesn’t consider downsides
Operational Awareness (Score: /10)
Excellent (9-10):- Detailed monitoring plan
- Backup and disaster recovery strategy
- Schema migration approach
- Considered operational complexity and team skills
- Mentioned monitoring and backups
- Missing some operational details
- No discussion of operations
- Proposes complex system without considering maintenance
Communication Clarity (Score: /10)
Excellent (9-10):- Clear, structured presentation
- Used diagrams effectively
- Explained rationale for decisions
- Easy to follow progression
- Mostly clear
- Some organization issues
- Could use more diagrams
- Difficult to follow
- Jumps around randomly
- No visual aids
Total Score: ___/100
90-100: Excellent - Ready for staff+ interviews 75-89: Good - Solid senior-level understanding 60-74: Fair - Has foundational knowledge but needs more practice Below 60: Needs Improvement - Review fundamentals and practice moreAction Items Based on Your Score
If you scored below 70:- Review the fundamentals in earlier modules (especially Database Design and Performance)
- Practice writing actual SQL, not just describing solutions
- Focus on one case study at a time and compare your solution to the provided one
- Deepdive into areas where you scored lowest
- Practice explaining trade-offs out loud
- Time yourself on case studies to improve pacing
- Write more detailed schemas and queries
- Practice more complex variations
- Focus on communication and presentation
- Review edge cases and failure scenarios
- Consider mentoring others to solidify your knowledge
Capstone Project: Build a Complete Database-Backed System
Put everything together by building a production-ready system from scratch:Project Overview
Design, implement, and document a complete database-backed application that demonstrates mastery of all course concepts. Duration: 2-4 weeksOutcome: Portfolio piece suitable for interviews
Format: Working application + comprehensive documentation
Project Requirements
Choose ONE of these domains:- Multi-Tenant SaaS Analytics Platform: Track events, generate reports, support multiple customers
- Real-Time Collaborative Tool: Like Google Docs or Figma, with conflict resolution
- E-Learning Platform: Courses, enrollments, progress tracking, assessments
- Inventory Management System: Multi-warehouse, real-time stock, order fulfillment
- Social Media Feed Engine: Posts, follows, timeline generation, notifications
- OR propose your own (must be approved based on complexity)
Implementation Checklist
Phase 1: Design and Planning (Week 1)
- Write a requirements document (functional and non-functional)
- Create detailed ER diagram with cardinality
- Design complete schema with all constraints
- Identify top 10 query patterns
- Plan indexing strategy
- Document normalization decisions and intentional denormalizations
Phase 2: Core Implementation (Week 1-2)
- Set up PostgreSQL database (local or cloud)
- Implement complete schema with migrations
- Seed database with realistic test data (use faker libraries)
- Implement all core queries with EXPLAIN ANALYZE results
- Add appropriate indexes and verify performance
- Write SQL-level tests for critical queries
Phase 3: Application Layer (Week 2-3)
- Build simple API or web app that uses the database
- Implement connection pooling (PgBouncer or application-level)
- Add basic authentication and authorization
- Implement at least one complex transaction (e.g., checkout, enrollment)
- Handle concurrent access scenarios
- Add error handling and validation
Phase 4: Performance and Scaling (Week 3)
- Set up pg_stat_statements for query monitoring
- Profile top queries and optimize
- Implement caching layer (Redis or similar) for hot data
- Add read replica (if using cloud provider)
- Load test with realistic traffic patterns
- Document performance baselines
Phase 5: Production Readiness (Week 4)
- Set up automated backups
- Implement monitoring dashboards (Grafana or similar)
- Add alerts for critical metrics
- Create runbook for common operational tasks
- Implement graceful degradation for failures
- Document disaster recovery procedures
Deliverables
1. Code Repository (GitHub/GitLab)- Schema migrations (versioned)
- Seed data scripts
- Application code
- Test suite
- README with setup instructions
- Requirements and assumptions
- ER diagrams and schema design
- Query patterns and access patterns
- Indexing strategy with justifications
- Scaling plan (current + future)
- Trade-off analysis
- Query performance benchmarks (EXPLAIN ANALYZE outputs)
- Load testing results (throughput, latency percentiles)
- Bottleneck analysis
- Optimization decisions and their impact
- Monitoring setup and key metrics
- Backup and restore procedures
- Common troubleshooting scenarios
- Schema migration workflow
- Disaster recovery plan
Evaluation Criteria
Technical Depth (40%)- Schema design quality and normalization
- Query optimization and index usage
- Transaction handling and concurrency
- Performance under load
- Monitoring and observability
- Error handling and resilience
- Backup and recovery capability
- Security considerations
- Clear architecture and design decisions
- Trade-off analysis
- Operational procedures
- Code comments and README
- All phases completed
- Realistic data and scenarios
- Working end-to-end
Success Examples
Excellent Projects Include:- Thoughtful denormalization with documented trade-offs
- Creative use of PostgreSQL features (JSONB, CTEs, window functions)
- Real performance optimization with before/after metrics
- Production-quality error handling and logging
- Comprehensive monitoring setup
- Using MongoDB or MySQL when PostgreSQL was required
- No indexes or all wrong indexes
- Hand-waving instead of actual implementation
- No performance testing
- Generic solutions without domain-specific optimization
Bonus Challenges (Optional)
For extra depth, consider adding:- Multi-region deployment with data residency
- Sharding implementation for a specific table
- Custom PostgreSQL extension
- Advanced full-text search with ranking
- Event sourcing pattern for audit trail
- GraphQL API with optimized data loader
- Real-time subscriptions using PostgreSQL LISTEN/NOTIFY
Getting Help
- Post questions in course Discord
- Schedule 1:1 code reviews (if available)
- Share your design doc for feedback before full implementation
- Present your project to peers for practice
Interview Deep-Dive
Design a database schema for a ride-sharing app like Uber. Walk me through your schema, the top 3 queries, and how you would handle the transition from a single city to 50 cities globally.
Design a database schema for a ride-sharing app like Uber. Walk me through your schema, the top 3 queries, and how you would handle the transition from a single city to 50 cities globally.
driver_locations table stores current GPS coordinates and is updated every 5-10 seconds per active driver — this is a high-write, high-read table that benefits from being kept small (only active drivers, with completed rides archived). I would use a PostGIS geography column for location data, enabling spatial index queries like “find all available drivers within 2km of this pickup point” using ST_DWithin.The rides table tracks the lifecycle: requested, matched, in_progress, completed, cancelled. A key design decision is whether to store the full route as a JSONB array of coordinates or in a separate ride_waypoints table. For most queries (fare calculation, ETA), the start and end points plus the polyline are sufficient — store the full waypoint trace in a separate table or object storage for replay/dispute resolution.For the top 3 queries: (1) “Find available drivers near pickup” — spatial query on driver_locations with a GiST index, filtered by status = 'available'. (2) “Get ride history for user” — index on (rider_id, created_at DESC) with pagination using keyset/cursor, not OFFSET. (3) “Calculate driver earnings for pay period” — pre-aggregated in a driver_earnings_daily materialized view, summing completed ride fares by driver and day.For scaling from 1 to 50 cities: start with a single database, partitioned by city or region. As you grow, shard by geographic region — rides in Tokyo never join with rides in London, so cross-shard queries are extremely rare. Driver locations can be handled by a Redis-backed geospatial index for real-time matching, with PostgreSQL as the system of record for completed rides. The key insight is that ride-sharing has strong geographic locality, making it one of the cleanest sharding candidates in system design.Follow-up: How do you handle the scenario where a driver is near a city boundary and could serve riders in either city?Geographic boundaries should use overlapping zones, not hard borders. When matching drivers near a boundary, query both city shards (or partition zones). Since this affects a small percentage of rides (drivers near boundaries), the cross-shard cost is acceptable. In practice, companies like Uber use hexagonal grid systems (H3) that can overlap administrative boundaries. The shard key is the hex cell, not the city name, and boundary cells are replicated to both regions. This adds write amplification for a small number of cells but eliminates the boundary problem entirely.You are designing a multi-tenant analytics platform. One tenant generates 80% of the total data volume. How does this affect your schema design, partitioning strategy, and connection management?
You are designing a multi-tenant analytics platform. One tenant generates 80% of the total data volume. How does this affect your schema design, partitioning strategy, and connection management?
statement_timeout and work_mem limits — the large tenant may need higher work_mem for their complex aggregations, but you do not want their queries consuming all available memory.The key trade-off: operational complexity increases with hybrid isolation. You now have two deployment models, two monitoring configurations, and two migration paths. This is worth it when the alternative is all tenants suffering degraded performance because of one whale. In my experience, the breakpoint is around 10x data disparity — if one tenant has more than 10x the data of the median tenant, hybrid isolation pays for itself.Follow-up: The large tenant now wants real-time dashboards while other tenants are fine with 15-minute-old data. How does this change the architecture?This is a read-path bifurcation problem. For tenants accepting stale data, serve their dashboards from read replicas with a controlled replication lag — even add a materialized view refresh cycle of 15 minutes. For the large tenant demanding real-time, their queries must hit the primary or a synchronous replica. But you cannot route all their dashboard traffic to the primary without impacting write throughput. The solution: use logical replication to stream only the large tenant’s tables to a dedicated analytics replica. This replica gets real-time data for that tenant without carrying the full database load. You are trading replication infrastructure complexity for workload isolation — a reasonable trade at this scale.Your team is designing an event sourcing system for a financial services application. Walk me through how you would handle schema evolution when event shapes change over time, and how you prevent the event store from becoming a performance bottleneck.
Your team is designing an event sourcing system for a financial services application. Walk me through how you would handle schema evolution when event shapes change over time, and how you prevent the event store from becoming a performance bottleneck.
OrderCreated event from v1 (which had price as an integer in cents) to v2 (which uses a money object with currency). Old events with the old shape will exist forever.The standard approach is event upcasting: store a schema_version field on every event, and when reading events, apply an upcasting pipeline that transforms older versions to the latest shape. For example, an upcaster for OrderCreated v1-to-v2 would transform {"price": 1999} into {"money": {"amount": 1999, "currency": "USD"}}, with USD as a default because v1 was single-currency. These upcasters are pure functions that run at read time, meaning you never modify stored events. The projection builder applies the upcaster chain before processing. This is similar to how database migrations work, except the source data (events) is never altered.For the event store performance: the fundamental scaling challenge is that the event store is append-only and grows forever. At financial services scale (millions of events per day), the table will reach billions of rows within a year. Partition by time (monthly or weekly), and index on (stream_id, version) for stream replay and (id) for subscription processing (global ordering). The id column should be a BIGSERIAL, not a UUID, because subscription processors need sequential ordering, and UUID ordering is meaningless.Projection rebuilds are the hidden performance risk. If a projection bug requires replaying all events for a stream with 500,000 events, that is a long-running read query. Introduce snapshot checkpoints: every N events (e.g., every 1,000), serialize the current aggregate state as a snapshot. Rebuilding then starts from the latest snapshot and only replays events since the snapshot. This turns an O(N) rebuild into O(1000) in the worst case.For write throughput, the append-only nature of the event store is actually an advantage — no updates means no row-level locking contention. The bottleneck is usually the unique constraint check on (stream_id, version) for optimistic concurrency. At very high throughput, this B-tree index insertion can become a hotspot. The mitigation is to shard the event store by stream_id, so concurrent writes to different streams hit different index pages. In PostgreSQL, hash partitioning on stream_id achieves this naturally.Follow-up: A regulator requires you to delete all data for a specific customer (GDPR right to erasure). But events are immutable. How do you handle this?This is the most philosophically challenging intersection of event sourcing and privacy law. You have three options, each with trade-offs. First, crypto-shredding: encrypt each customer’s event data with a customer-specific key. To “delete” a customer, destroy the key. The events remain but are unreadable. This is the most elegant solution and is widely accepted by GDPR auditors because the data is effectively destroyed. Second, tombstone events: append a CustomerDataErased event and modify projections to ignore that customer’s prior events. The raw events still exist, which some regulators may not accept. Third, event rewriting: create a new version of the event store with the customer’s events redacted, then swap. This is the nuclear option — expensive, risky, but unambiguous. I recommend crypto-shredding as the default strategy, decided at system design time, not after the first deletion request arrives.Compare the trade-offs of three different approaches to implementing a product search feature in an e-commerce database: full-text search in PostgreSQL, a dedicated search engine like Elasticsearch, and denormalized search tables. When would you choose each?
Compare the trade-offs of three different approaches to implementing a product search feature in an e-commerce database: full-text search in PostgreSQL, a dedicated search engine like Elasticsearch, and denormalized search tables. When would you choose each?
tsvector/tsquery with GIN indexes) is the right choice when search requirements are straightforward (keyword matching, basic ranking), the product catalog is under 10 million items, and you want to minimize infrastructure complexity. The advantages are transactional consistency (search results reflect the latest data immediately), no synchronization pipeline to maintain, and one less service to monitor at 3 AM. The disadvantages: PostgreSQL’s ranking algorithm is basic compared to Elasticsearch’s BM25 with custom scoring, faceted search (filter by brand, price range, color simultaneously) requires manual implementation with CTEs and conditional aggregation, and at scale the GIN index updates become a write amplification concern. A GIN index on a tsvector column can be 2-3x the size of the indexed data.Elasticsearch (or OpenSearch) is the right choice when search is a core differentiator (autocomplete, typo tolerance, relevance tuning, faceted navigation), the catalog exceeds 10 million items, and you can tolerate sub-second staleness. Elasticsearch excels at multi-faceted filtering (show me red shirts in size L under $50 with 4+ star ratings, sorted by relevance), fuzzy matching (“iphone” matches “iPhone”), and custom relevance scoring. The trade-off is operational: you now maintain a separate cluster, a synchronization pipeline (CDC or dual-write), and you must handle the eventual consistency window where a product is in PostgreSQL but not yet in the search index. Dual-writes are a consistency trap — use CDC (Change Data Capture) via Debezium or logical replication to feed Elasticsearch.Denormalized search tables are the right choice when you need specific, well-defined search patterns and want to avoid both full-text search complexity and a separate service. For example, a product_search materialized view that joins products, categories, brands, and pre-computed average ratings into a flat table with indexes matching your exact filter patterns. This gives you sub-millisecond exact-match queries without maintaining a search engine. The trade-off: no fuzzy matching, no relevance scoring, and the materialized view refresh adds latency. This is the “good enough” solution that many successful e-commerce sites use for their first few years.My recommendation: start with PostgreSQL full-text search (lowest operational cost), monitor search quality and latency metrics, and migrate to Elasticsearch only when users complain about search quality or performance degrades past your SLO. Premature introduction of Elasticsearch is one of the most common over-engineering decisions in e-commerce startups.Follow-up: You chose PostgreSQL full-text search initially. Six months later, product managers want autocomplete suggestions as users type. Does this change your recommendation?Autocomplete (prefix matching as the user types) is specifically where PostgreSQL full-text search struggles. tsquery operates on complete tokens, not prefixes. You can use LIKE 'iph%' with a text_pattern_ops index, or pg_trgm with a GIN trigram index for fuzzy prefix matching, but the performance and relevance quality degrade as the catalog grows. Autocomplete also requires extremely low latency (under 50ms) because it fires on every keystroke. At this point, I would introduce a lightweight in-memory solution like Redis with sorted sets containing product name prefixes, rather than jumping to Elasticsearch. The autocomplete index is small (product names only, not full documents), refreshes infrequently, and Redis delivers sub-millisecond responses. Reserve Elasticsearch for when you also need faceted search, synonyms, or complex relevance tuning — do not adopt it solely for autocomplete.You are tasked with migrating a 5TB PostgreSQL database from a monolithic schema to a sharded architecture with zero downtime. Walk me through your approach, including how you handle the cutover, data consistency during migration, and rollback strategy.
You are tasked with migrating a 5TB PostgreSQL database from a monolithic schema to a sharded architecture with zero downtime. Walk me through your approach, including how you handle the cutover, data consistency during migration, and rollback strategy.
pg_stat_statements — if more than 5% would require cross-shard joins, reconsider your shard key.Phase 2 — Data copy (1-2 weeks): Use pg_dump with --jobs for the initial bulk copy to each shard, filtering by shard key. For a 5TB database, this can take 12-24 hours even with parallel jobs. During this time, the monolith continues serving all traffic. After the initial copy, set up logical replication from the monolith to each shard, filtering by the shard key range. This keeps the shards up-to-date as new writes arrive on the monolith.Phase 3 — Shadow reads (1-2 weeks): Route a percentage of read traffic to the shards while continuing to serve from the monolith. Compare results for consistency. Log any discrepancies. This is your validation phase — if you find mismatches, fix the replication pipeline before proceeding. Shadow reads should be async (fire-and-forget comparisons) so they do not add latency to user requests.Phase 4 — Cutover writes (the critical step): Enable dual-writes: every write goes to both the monolith and the appropriate shard. The monolith remains the source of truth. Reads can now shift to the shards. Once you are confident the shards are consistent (zero discrepancies for 48-72 hours), flip the source of truth: writes go to shards first, and are replicated back to the monolith. The monolith is now the follower.Phase 5 — Decommission monolith: Once shards have been the source of truth for 2-4 weeks with no issues, stop replicating to the monolith. Keep it around (read-only) for another month as a safety net, then decommission.Rollback strategy: at every phase, the monolith is still receiving all data (either directly or via reverse replication). To roll back, simply point all traffic back to the monolith. The only phase where rollback is complex is Phase 4 after flipping the source of truth — at that point, any writes that went only to the shards must be replicated back to the monolith before rollback. This is why the dual-write period before the source-of-truth flip is critical.The biggest risk: schema migrations during the migration window. Freeze schema changes (or coordinate them extremely carefully) during phases 2-4. A schema migration that changes the shard key column is catastrophic during migration.Follow-up: The migration is 80% complete, but you discover that 3% of queries require cross-shard joins that you did not anticipate. What do you do?Do not proceed with the original plan. 3% cross-shard joins is high enough to cause significant latency regressions for real users. You have three options: (1) Denormalize the data needed for those joins into each shard, accepting the write amplification and consistency lag. (2) Introduce a query routing layer that detects cross-shard queries and fans them out to multiple shards, merging results — this adds latency and complexity but preserves the sharding model. (3) Re-evaluate the shard key. If the cross-shard queries are all driven by a secondary access pattern (e.g., organization-level reports while sharding by user_id), consider a secondary index shard or a read-replica that aggregates across shards for those specific query patterns. Option 3 is often the right answer — you do not need to solve cross-shard queries generically. You need to solve the specific 5-10 query patterns that cross shards.Summary
You’ve now seen how to apply database design principles to real-world systems:E-Commerce
Analytics
Social Feed
Multi-Tenant
Event Sourcing
Global
Course Completion 🎉
- Senior/Staff engineering interviews
- PostgreSQL open-source contributions
- Production database architecture and operations