Skip to main content

How to Use This Guide

This guide provides realistic mock interview scenarios with:
  • Typical interviewer questions and follow-ups
  • Expected discussion points at each stage
  • Common mistakes to avoid
  • Evaluation criteria
Interview Format: Most system design interviews are 45-60 minutes. Aim to spend:
  • 5 min: Requirements clarification
  • 5 min: High-level design
  • 25-30 min: Deep dive
  • 5-10 min: Wrap-up and questions

Mock Interview 1: Design TinyURL

Setup

Interviewer: "Let's design a URL shortening service like TinyURL 
or bit.ly. Users should be able to create short URLs and be 
redirected to the original URL."

Phase 1: Requirements (5 min)

Your questions should include:
✓ "What's the expected scale? How many URLs per day?"
  → "Let's say 100M new URLs per day, 10B redirects"

✓ "What's the expected URL length?"
  → "As short as possible, let's say 7 characters"

✓ "Should URLs expire?"
  → "Yes, default 1 year, customizable"

✓ "Do we need analytics?"
  → "Yes, basic click counts and geographic data"

✓ "Custom short URLs?"
  → "Nice to have, but focus on auto-generated first"

Phase 2: Capacity Estimation (3 min)

Show your math:

Write: 100M URLs/day
     = 100M / 86,400 ≈ 1,200 URLs/sec

Read: 10B redirects/day
    = 10B / 86,400 ≈ 115,000 redirects/sec
    → 100:1 read-heavy

Storage (5 years):
    100M × 365 × 5 = 182.5B URLs
    Each URL: ~500 bytes average
    Total: 182.5B × 500B = ~90 TB

Bandwidth:
    Write: 1,200 × 500B = 600 KB/s
    Read: 115,000 × 500B = 58 MB/s

Phase 3: High-Level Design (5 min)

┌─────────────────────────────────────────────────────────┐
│                                                         │
│   ┌────────┐      ┌────────────┐      ┌────────────┐  │
│   │ Client │─────►│    API     │─────►│   Cache    │  │
│   └────────┘      │  Gateway   │      │  (Redis)   │  │
│                   └─────┬──────┘      └────────────┘  │
│                         │                    │         │
│           ┌─────────────┼─────────────┐     │         │
│           │             │             │     │         │
│      ┌────▼────┐  ┌────▼────┐  ┌────▼────┐ │         │
│      │  URL    │  │ Redirect│  │Analytics│  │         │
│      │ Creator │  │ Service │  │ Service │  │         │
│      └────┬────┘  └────┬────┘  └────┬────┘  │         │
│           │            │            │        │         │
│           └────────────┼────────────┘        │         │
│                        │                     │         │
│                  ┌─────▼─────┐               │         │
│                  │  Database │◄──────────────┘         │
│                  │  (Sharded)│                         │
│                  └───────────┘                         │
│                                                         │
└─────────────────────────────────────────────────────────┘

Phase 4: Deep Dive Questions

Q: “How do you generate the short URL?”
Approach 1: Base62 encoding of auto-increment ID
- IDs: 1, 2, 3, ...
- Encode: 62^7 = 3.5 trillion combinations
- Pros: Simple, no collisions
- Cons: Predictable, single point (ID generator)

Approach 2: Random generation + collision check
- Generate random 7 chars
- Check database
- Retry on collision
- Cons: Extra DB call, collision probability increases

Approach 3: Pre-generated key service
- Background service generates unused keys
- Store in key pool
- URL service fetches from pool
- Pros: No collision check at write time
- Cons: Extra service, key management

My recommendation: Approach 3 for this scale
- Handle 1,200 writes/sec easily
- No collision overhead
- Keys distributed across instances
Q: “How do you handle the 115K reads/second?”
1. Caching Strategy:
   - Redis/Memcached cluster
   - LRU eviction
   - Cache size: 20% of daily active = 20M entries
   - Expected hit rate: 90%+

2. Cache key: short_url
   Cache value: {long_url, created_at, expires_at}
   TTL: 24 hours (refresh on access)

3. Read path:
   cache_hit → redirect
   cache_miss → DB lookup → cache write → redirect
Q: “How do you shard the database?”
Shard by short URL hash:
- Consistent hashing with virtual nodes
- shard = hash(short_url) % num_shards
- Each shard handles ~1/N of traffic

Why not by user_id?
- Redirect requests don't have user context
- Need to lookup by short URL

Replication:
- 3 replicas per shard
- Async replication (eventual consistency OK)
- Promotes replica on primary failure
Q: “How do you handle expiration?”
Options:
1. TTL in database (native support)
2. Background cleanup job
3. Lazy deletion (check on read)

Hybrid approach:
- Check expiration on read (immediate)
- Background job for cleanup (storage)
- Tombstone records for analytics

Evaluation Criteria

AspectJuniorSeniorStaff
RequirementsAsks basic questionsIdentifies edge casesChallenges assumptions
ScaleBasic mathAccurate estimatesCost optimization
DesignWorking solutionHandles failuresExtensible architecture
Trade-offsMentions oneCompares optionsQuantifies impact

Mock Interview 2: Design Twitter Timeline

Setup

Interviewer: "Let's design the home timeline feature for Twitter.
Users should see tweets from people they follow, in chronological
order or ranked by relevance."

Requirements Clarification

✓ "How many users? Active users?"
  → "300M monthly active, 150M daily active"

✓ "Average follows per user?"
  → "Average 200, some celebrities with millions"

✓ "Tweets per user per day?"
  → "Average 2 tweets/day"

✓ "Timeline page size?"
  → "20 tweets per page"

✓ "Real-time updates needed?"
  → "Yes, new tweets should appear within seconds"

✓ "Chronological or ranked?"
  → "Support both, ranked is default"

Capacity Estimation

Users: 150M DAU
Follows: 200 average
Tweets: 2/day/user

Write load:
    150M × 2 = 300M tweets/day
    = 3,500 tweets/second

Timeline reads:
    150M users × 10 timeline refreshes/day
    = 1.5B timeline reads/day
    = 17,000 timeline reads/second

Fan-out consideration:
    If user has 1M followers:
    1 tweet → 1M timeline updates
    100 tweets from celebs/day → 100M updates

High-Level Design

┌─────────────────────────────────────────────────────────────┐
│                Twitter Timeline Architecture                 │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Write Path (Tweet Creation):                               │
│  ┌──────┐    ┌─────────┐    ┌───────────┐                  │
│  │Client│───►│Tweet Svc│───►│Tweet Store│                  │
│  └──────┘    └────┬────┘    └───────────┘                  │
│                   │                                          │
│                   ▼                                          │
│            ┌──────────────┐                                  │
│            │ Fan-out Svc  │                                  │
│            └──────┬───────┘                                  │
│                   │                                          │
│     ┌─────────────┼─────────────┐                           │
│     │             │             │                            │
│     ▼             ▼             ▼                            │
│  ┌──────┐     ┌──────┐     ┌──────┐                        │
│  │User A│     │User B│     │User C│  Timeline Caches       │
│  │Cache │     │Cache │     │Cache │                        │
│  └──────┘     └──────┘     └──────┘                        │
│                                                              │
│  Read Path (Timeline Fetch):                                │
│  ┌──────┐    ┌───────────┐    ┌──────────┐                 │
│  │Client│───►│Timeline   │───►│ Cache or │                 │
│  └──────┘    │Service    │    │ Fan-in   │                 │
│              └───────────┘    └──────────┘                  │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Key Design Decisions

Q: “Fan-out on write vs fan-out on read?”
Fan-out on Write:
- When user tweets, push to all followers' timelines
- Pre-compute timelines
- Fast reads (just fetch from cache)
- Problem: Celebrities with 50M followers

Fan-out on Read:
- When user opens timeline, fetch from followees
- Compute on demand
- Slow for users following many people
- Works well for celebrities

Hybrid Approach (Twitter's solution):
- Regular users: Fan-out on write
- Celebrities (>10K followers): Fan-out on read
- At read time: Merge cached timeline + celebrity tweets
- Best of both worlds
Q: “How do you rank the timeline?”
Ranking Signals:
├── Recency (newer = higher)
├── Engagement (likes, retweets, replies)
├── User affinity (interaction history)
├── Content type (media, links, text)
├── Author factors (verified, follower count)
└── Negative signals (muted words, reported)

Implementation:
1. Feature extraction at write time
2. Store features with tweet
3. ML model scores at read time
4. Sort by score
5. A/B test ranking changes
Q: “How do you handle celebrities?”
Celebrity Detection:
- Follower count > 10,000 (configurable)
- Flag in user table

Write Path for Celebrities:
- Don't fan out at write time
- Tweet goes to celebrity tweet store only

Read Path Merge:
1. Fetch user's cached timeline (fan-out on write)
2. Fetch celebrity tweets (users they follow)
3. Merge and rank
4. Return top N

Cache Strategy:
- Cache merged timeline briefly (1 minute)
- Invalidate on new celebrity tweet

Mock Interview 3: Design Uber

Setup

Interviewer: "Design the core ride-matching system for Uber.
Focus on matching riders with nearby drivers efficiently."

Requirements

✓ Scale: 1M active drivers, 10M rides/day
✓ Matching latency: < 30 seconds
✓ Geographic scope: Global, 600+ cities
✓ Driver location updates: Every 4 seconds
✓ ETA calculation needed

Key Challenges

1. Real-time location tracking at scale
2. Efficient nearby driver search
3. Supply-demand matching
4. ETA estimation
5. Handling peak hours

High-Level Design

┌─────────────────────────────────────────────────────────────┐
│                  Uber Matching System                        │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Driver Location Updates:                                   │
│  ┌──────────┐    ┌─────────┐    ┌───────────────┐          │
│  │ Driver   │───►│ Location│───►│   Geospatial  │          │
│  │   App    │    │ Service │    │     Index     │          │
│  └──────────┘    └─────────┘    │   (QuadTree)  │          │
│                                 └───────┬───────┘          │
│                                         │                   │
│  Ride Request:                          │                   │
│  ┌──────────┐    ┌─────────┐    ┌───────▼───────┐          │
│  │  Rider   │───►│ Ride    │───►│   Matching    │          │
│  │   App    │    │ Service │    │    Engine     │          │
│  └──────────┘    └─────────┘    └───────┬───────┘          │
│                                         │                   │
│                                  ┌──────▼──────┐           │
│                                  │   Dispatch  │           │
│                                  │    to       │           │
│                                  │   Driver    │           │
│                                  └─────────────┘           │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Deep Dive: Geospatial Indexing

Option 1: Geohash
- Encode lat/lng to string: "9q8yy"
- Prefix search for nearby: "9q8y*"
- Simple, works with regular indexes
- Edge case: nearby but different prefix

Option 2: QuadTree
- Recursively divide space into quadrants
- Dynamic depth based on density
- Efficient range queries
- Better for non-uniform distribution

Option 3: H3 (Uber's solution)
- Hexagonal hierarchical grid
- Multiple resolutions
- Consistent neighbor relationships
- Handles polar regions well

Implementation:
- H3 resolution 9: ~100m² hexagons
- Store driver_id → h3_cell mapping
- For pickup at H3 cell X:
  - Find drivers in cell X
  - Expand to ring-1 neighbors if needed
  - Sort by ETA, filter by availability

Matching Algorithm

def match_rider_to_driver(rider_location, ride_type):
    """
    Find best driver for rider request.
    """
    # 1. Get nearby drivers
    rider_cell = h3.geo_to_h3(
        rider_location.lat, 
        rider_location.lng, 
        resolution=9
    )
    
    search_cells = [rider_cell] + h3.k_ring(rider_cell, k=3)
    
    candidates = []
    for cell in search_cells:
        drivers = driver_index.get_drivers(cell)
        candidates.extend(drivers)
    
    # 2. Filter by availability and vehicle type
    available = [d for d in candidates 
                 if d.status == 'available' 
                 and d.vehicle_type == ride_type]
    
    # 3. Calculate ETA for each
    for driver in available:
        driver.eta = routing_service.get_eta(
            driver.location, 
            rider_location
        )
    
    # 4. Score and rank
    scored = []
    for driver in available:
        score = calculate_match_score(
            driver, 
            rider_location,
            eta=driver.eta,
            driver_rating=driver.rating,
            acceptance_rate=driver.acceptance_rate
        )
        scored.append((driver, score))
    
    scored.sort(key=lambda x: x[1], reverse=True)
    
    # 5. Dispatch to top candidate
    best_driver = scored[0][0]
    dispatch_ride(best_driver, rider_location)
    
    return best_driver

Mock Interview 4: Design Netflix

Setup

Interviewer: "Design Netflix's video streaming service.
Focus on the playback experience and content delivery."

Key Components to Discuss

1. Content Ingestion & Processing
   - Transcoding pipeline
   - Multiple quality levels
   - DRM protection

2. Content Delivery
   - CDN architecture
   - Edge caching
   - Adaptive bitrate streaming

3. Recommendation System
   - Personalization
   - Content ranking
   - Thumbnails A/B testing

4. Playback Service
   - Session management
   - Resume playback
   - Multi-device sync

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                 Netflix Streaming Architecture               │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Content Preparation:                                       │
│  ┌────────┐    ┌──────────┐    ┌───────────────┐           │
│  │ Upload │───►│Transcoder│───►│ CDN Origin    │           │
│  │ (4K)   │    │ Pipeline │    │ (S3/GCS)      │           │
│  └────────┘    └──────────┘    └───────┬───────┘           │
│                                         │                    │
│  Playback:                              ▼                    │
│  ┌────────┐    ┌──────────┐    ┌───────────────┐           │
│  │ Client │◄──►│ Playback │◄──►│  CDN Edge     │           │
│  │  App   │    │ Service  │    │  (Netflix     │           │
│  └────────┘    └──────────┘    │   Open        │           │
│       │                        │   Connect)    │           │
│       │                        └───────────────┘           │
│       │                                                     │
│       ▼                                                     │
│  ┌────────────────────────────────────────────┐            │
│  │            Adaptive Streaming              │            │
│  │  Based on bandwidth, buffer, device        │            │
│  └────────────────────────────────────────────┘            │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Adaptive Bitrate Discussion Points

Quality Levels:
- 240p (0.3 Mbps) - Mobile on 3G
- 480p (1 Mbps) - Mobile on 4G  
- 720p (3 Mbps) - Tablet/Laptop
- 1080p (5 Mbps) - TV
- 4K HDR (15 Mbps) - Premium 4K TV

Switching Logic:
- Buffer-based: Switch based on buffer fullness
- Throughput-based: Switch based on measured bandwidth
- Hybrid: Consider both + prediction

Metrics to Track:
- Rebuffering ratio
- Time to first frame
- Average bitrate delivered
- Quality switches per session

Common Mistakes to Avoid

Red Flags in System Design Interviews:
  1. Jumping to solution without understanding requirements
  2. Single point of failure in design
  3. Ignoring scale implications
  4. Over-engineering for simple problems
  5. Not discussing trade-offs between options
  6. Forgetting about failures and edge cases
  7. Vague hand-waving instead of concrete solutions
  8. Not asking clarifying questions

Interview Checklist

Before the interview:
□ Review system design patterns
□ Practice estimation (powers of 2, QPS calculations)
□ Know CAP theorem and consistency models
□ Understand caching, databases, message queues

During the interview:
□ Clarify requirements (functional + non-functional)
□ Estimate scale and capacity
□ Draw high-level design
□ Identify and address bottlenecks
□ Discuss trade-offs
□ Consider failure scenarios
□ Mention monitoring and alerting

Communication:
□ Think out loud
□ Explain your reasoning
□ Acknowledge uncertainty
□ Ask for feedback
□ Be open to hints

Practice Problems

Week 1-2: Core Systems
  • URL Shortener
  • Rate Limiter
  • Key-Value Store
Week 3-4: Social Systems
  • Twitter Feed
  • Facebook News Feed
  • Instagram
Week 5-6: Infrastructure
  • Web Crawler
  • Notification System
  • Task Scheduler
Week 7-8: Complex Systems
  • Uber/Lyft
  • YouTube/Netflix
  • Google Search