Skip to main content

Why Estimation Matters

In system design interviews, you’re expected to:
  • Size your system - How much storage? How many servers?
  • Identify bottlenecks - Where will the system break?
  • Make trade-offs - Is this worth the complexity?
  • Validate assumptions - Does this approach even work?
Don’t aim for precision. Round numbers aggressively. The goal is order of magnitude, not exact values. 86,400 seconds ≈ 100,000 is perfectly fine.

Essential Numbers to Memorize

Time & Scale

DurationSecondsRounded
1 second11
1 minute60~100
1 hour3,600~4,000
1 day86,400~100,000
1 month2,592,000~2.5 million
1 year31,536,000~30 million

Data Units

UnitBytesPower of 2
1 KB1,0002^10 ≈ 1,000
1 MB1,000,0002^20 ≈ 1 million
1 GB10^92^30 ≈ 1 billion
1 TB10^122^40 ≈ 1 trillion
1 PB10^152^50

Latency Numbers

┌─────────────────────────────────────────────────────────────────┐
│                    Latency Comparison (2024)                    │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  L1 cache reference                    0.5 ns   ■               │
│  L2 cache reference                    7 ns     ■               │
│  RAM reference                         100 ns   ██              │
│  SSD random read                       150 µs   ████████        │
│  HDD seek                              10 ms    ████████████    │
│  Network (same datacenter)             0.5 ms   ████            │
│  Network (cross-continent)             150 ms   █████████████   │
│                                                                 │
│  Rule of thumb:                                                 │
│  • Memory is ~100x faster than SSD                             │
│  • SSD is ~100x faster than HDD                                │
│  • Same DC network is ~300x faster than cross-continent        │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Availability Numbers

AvailabilityDowntime/YearDowntime/MonthDowntime/Week
99% (two 9s)3.65 days7.3 hours1.68 hours
99.9% (three 9s)8.76 hours43.8 min10.1 min
99.99% (four 9s)52.6 min4.38 min1.01 min
99.999% (five 9s)5.26 min26.3 sec6.05 sec

Common Calculation Patterns

Pattern 1: QPS from Daily Active Users

┌─────────────────────────────────────────────────────────────────┐
│                 DAU → QPS Calculation                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Given:                                                         │
│  • 100 million DAU (Daily Active Users)                        │
│  • Average user makes 10 requests per day                      │
│                                                                 │
│  Total requests per day:                                        │
│  = 100M × 10 = 1 billion requests/day                          │
│                                                                 │
│  Average QPS (Queries Per Second):                             │
│  = 1B / 86,400 seconds                                         │
│  = 1B / 100K (rounded)                                         │
│  = 10,000 QPS                                                  │
│                                                                 │
│  Peak QPS (typically 2-3x average):                            │
│  = 10,000 × 2.5 = 25,000 QPS                                  │
│                                                                 │
│  ─────────────────────────────────────────────────────────────  │
│                                                                 │
│  Quick Formula:                                                 │
│  QPS ≈ DAU × requests_per_user / 100,000                       │
│  Peak QPS ≈ QPS × 2.5                                          │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Pattern 2: Storage Estimation

┌─────────────────────────────────────────────────────────────────┐
│                 Storage Calculation                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Example: Twitter-like service                                  │
│                                                                 │
│  Given:                                                         │
│  • 500 million users                                           │
│  • 20% are daily active (100M DAU)                             │
│  • Average 2 tweets per active user per day                    │
│  • Tweet = 140 chars (280 bytes) + 200 bytes metadata          │
│                                                                 │
│  Daily tweet storage:                                           │
│  = 100M users × 2 tweets × 500 bytes                           │
│  = 200M × 500 bytes                                            │
│  = 100 GB/day                                                  │
│                                                                 │
│  Annual storage (text only):                                   │
│  = 100 GB × 365                                                │
│  = 36.5 TB/year                                                │
│                                                                 │
│  With media (assume 10% of tweets have 2MB image):            │
│  = 100M × 2 × 0.1 × 2MB = 40 TB/day                           │
│  = 40 TB × 365 = 14.6 PB/year                                 │
│                                                                 │
│  ─────────────────────────────────────────────────────────────  │
│                                                                 │
│  Key insight: Media dominates text storage by 100x+            │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Pattern 3: Bandwidth Estimation

┌─────────────────────────────────────────────────────────────────┐
│                 Bandwidth Calculation                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Example: Video streaming service (Netflix-like)               │
│                                                                 │
│  Given:                                                         │
│  • 200 million subscribers                                     │
│  • 50% watch something daily (100M daily viewers)              │
│  • Average 2 hours of video per viewer                         │
│  • Video bitrate: 5 Mbps (1080p average)                       │
│                                                                 │
│  Peak concurrent viewers (assume 10% of daily):                │
│  = 100M × 0.1 = 10 million concurrent                          │
│                                                                 │
│  Peak bandwidth:                                                │
│  = 10M viewers × 5 Mbps                                        │
│  = 50 million Mbps                                             │
│  = 50 Tbps (Terabits per second)                               │
│                                                                 │
│  Daily data transfer:                                           │
│  = 100M viewers × 2 hours × 3600 sec × 5 Mbps                  │
│  = 100M × 7200 × 5 Mb                                          │
│  = 3.6 × 10^12 Mb = 3.6 Petabits = 450 PB/day                 │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Pattern 4: Server Capacity

┌─────────────────────────────────────────────────────────────────┐
│                 Server Capacity Estimation                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Rule of thumb for web servers:                                │
│  • 1 server can handle ~1000 concurrent connections            │
│  • 1 server can handle ~500-1000 QPS for simple API           │
│  • 1 server can handle ~100-200 QPS for complex operations    │
│                                                                 │
│  Example: 50,000 QPS API service                               │
│                                                                 │
│  Servers needed:                                                │
│  = 50,000 QPS / 500 QPS per server                             │
│  = 100 servers                                                  │
│                                                                 │
│  With 3x capacity buffer (for spikes + failures):              │
│  = 100 × 3 = 300 servers                                       │
│                                                                 │
│  ─────────────────────────────────────────────────────────────  │
│                                                                 │
│  Memory sizing:                                                 │
│  • 100K concurrent users, 10KB session data each               │
│  • Memory = 100K × 10KB = 1GB                                  │
│  • Per server (assume 10 servers): 100MB each                  │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Pattern 5: Cache Sizing (80/20 Rule)

┌─────────────────────────────────────────────────────────────────┐
│                 Cache Sizing with 80/20 Rule                    │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Principle: 20% of data serves 80% of requests                 │
│                                                                 │
│  Example: E-commerce product catalog                           │
│                                                                 │
│  Given:                                                         │
│  • 10 million products                                         │
│  • Average product data: 5 KB                                  │
│  • Total catalog: 10M × 5KB = 50 GB                            │
│                                                                 │
│  Cache 20% of products:                                        │
│  = 50 GB × 0.2 = 10 GB cache                                   │
│                                                                 │
│  Expected cache hit rate: ~80%                                 │
│                                                                 │
│  ─────────────────────────────────────────────────────────────  │
│                                                                 │
│  Alternative: Time-based estimation                            │
│                                                                 │
│  • 1 million requests/day to product pages                     │
│  • 70% are repeat views (dedup = 300K unique)                  │
│  • Cache last 24 hours of views                                │
│  • Cache size = 300K × 5KB = 1.5 GB                            │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Complete Example: URL Shortener

Let’s walk through a complete estimation for a URL shortener like bit.ly.

Requirements & Assumptions

┌─────────────────────────────────────────────────────────────────┐
│                 URL Shortener Estimation                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Functional:                                                    │
│  • Shorten long URLs → short URLs                              │
│  • Redirect short URLs → original URLs                         │
│  • Analytics (optional)                                        │
│                                                                 │
│  Non-functional:                                                │
│  • 100 million new URLs per month                              │
│  • 10:1 read-to-write ratio                                    │
│  • 5 year data retention                                       │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Traffic Estimation

# Write traffic
new_urls_per_month = 100_000_000
new_urls_per_second = 100_000_000 / (30 * 24 * 3600)
# ≈ 100M / 2.5M = 40 URLs/second

# Read traffic (10:1 ratio)
redirects_per_second = 40 * 10 = 400 QPS

# Peak traffic (3x average)
peak_write_qps = 40 * 3 = 120 QPS
peak_read_qps = 400 * 3 = 1200 QPS

Storage Estimation

# URL data
original_url_size = 500  # bytes average
short_url_size = 7       # characters (base62)
metadata_size = 100      # bytes (created_at, user_id, etc.)
total_per_url = 500 + 7 + 100 = ~600 bytes

# Monthly storage
monthly_storage = 100_000_000 * 600 = 60 GB/month

# 5 year storage
total_storage = 60 GB * 12 * 5 = 3.6 TB

# With 2x replication
replicated_storage = 3.6 * 2 = 7.2 TB

Short URL Length

# How many URLs can we encode?
# Using base62 (a-z, A-Z, 0-9)

# 62^6 = 56 billion combinations
# 62^7 = 3.5 trillion combinations

# We need: 100M/month × 12 × 5 = 6 billion URLs

# 7 characters is sufficient (3.5 trillion >> 6 billion)

Bandwidth Estimation

# Write bandwidth
write_bandwidth = 40 * 600 bytes = 24 KB/s

# Read bandwidth
# Redirect is small response (Location header)
read_bandwidth = 400 * 200 bytes = 80 KB/s

# Minimal bandwidth, not a concern

Memory (Cache) Estimation

# Cache hot URLs for fast redirects
# 80/20 rule: 20% URLs get 80% traffic

daily_reads = 400 * 86400 = 34.5 million reads
# Assume 30% unique URLs accessed daily
unique_daily = 34.5M * 0.3 = 10 million URLs

# Cache size
cache_size = 10_000_000 * 600 bytes = 6 GB

# Redis can easily handle this

Summary

┌─────────────────────────────────────────────────────────────────┐
│                 URL Shortener - Final Numbers                   │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Traffic:                                                       │
│  • Write: 40 QPS (peak: 120 QPS)                               │
│  • Read: 400 QPS (peak: 1200 QPS)                              │
│                                                                 │
│  Storage:                                                       │
│  • 3.6 TB over 5 years                                         │
│  • 7.2 TB with replication                                     │
│                                                                 │
│  Cache:                                                         │
│  • 6 GB Redis cache                                            │
│                                                                 │
│  Key Design Decisions:                                          │
│  • 7 character short URLs (base62)                             │
│  • Read-heavy → cache aggressively                             │
│  • Single database instance is sufficient                      │
│  • 2-3 application servers for redundancy                      │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Complete Example: Twitter Timeline

Requirements

┌─────────────────────────────────────────────────────────────────┐
│                 Twitter Timeline Estimation                     │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Given:                                                         │
│  • 500 million total users                                     │
│  • 200 million DAU                                             │
│  • Average user follows 200 people                             │
│  • 10% of users post daily (20M tweets/day)                    │
│  • Average user checks timeline 5 times/day                    │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Timeline Generation QPS

# Timeline reads
timeline_reads_per_day = 200_000_000 * 5 = 1 billion/day
timeline_qps = 1_000_000_000 / 8640011,600 QPS

# Tweet writes
tweet_writes_per_day = 20_000_000
tweet_qps = 20_000_000 / 86400230 QPS

# Ratio: 50:1 (heavily read-oriented)

Fan-out Calculation

┌─────────────────────────────────────────────────────────────────┐
│                 Fan-out on Write vs Read                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Fan-out on Write (push model):                                │
│  ────────────────────────────                                   │
│  • New tweet → push to all followers' timelines                │
│  • 230 QPS × 200 followers = 46,000 cache writes/sec           │
│                                                                 │
│  Problem: Celebrity with 50M followers                         │
│  • 1 tweet = 50 million cache writes!                          │
│  • Takes minutes to propagate                                  │
│                                                                 │
│  Solution: Hybrid approach                                      │
│  • Small accounts: Fan-out on write                            │
│  • Celebrities (>10K followers): Fan-out on read               │
│                                                                 │
│  Fan-out on Read (pull model):                                 │
│  ───────────────────────────                                    │
│  • User opens app → query all followees for recent tweets      │
│  • 11,600 QPS × 200 followees = 2.3M queries/sec              │
│  • Too expensive at read time!                                 │
│                                                                 │
│  Hybrid is the answer                                          │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Storage for Timelines

# Pre-computed timeline storage (fan-out on write)
# Store last 800 tweet IDs per user

timeline_size = 800 * 8 bytes (tweet ID) = 6.4 KB per user
total_timeline_storage = 500_000_000 * 6.4 KB = 3.2 TB

# Redis cluster with 3.2 TB RAM
# Or multiple Redis instances (10 × 320 GB)

Estimation Cheat Sheet

┌─────────────────────────────────────────────────────────────────┐
│                 Quick Reference Formulas                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  DAU to QPS:                                                    │
│  QPS = DAU × requests_per_user / 100,000                       │
│                                                                 │
│  Peak QPS:                                                      │
│  Peak = Average × 2.5 (or ×3 for social)                       │
│                                                                 │
│  Storage:                                                       │
│  Daily = DAU × actions × data_size                             │
│  Yearly = Daily × 365                                          │
│                                                                 │
│  Bandwidth:                                                     │
│  BW = QPS × response_size                                      │
│                                                                 │
│  Servers:                                                       │
│  Count = QPS / QPS_per_server × 3 (buffer)                     │
│                                                                 │
│  Cache:                                                         │
│  Size = working_set_size × 0.2 (80/20 rule)                    │
│                                                                 │
│  URL length (base62):                                          │
│  62^n > total_items_expected                                   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Interview Tips

Show your work: Write down assumptions clearly. State “assuming 100K seconds in a day” before calculating.Round aggressively: Use powers of 10. 86,400 → 100,000 is fine.Sanity check: Does the answer make sense? 1 million GB is suspicious.Ask about scale: “Are we designing for 1M or 100M users?” This changes everything.Know your powers: 2^10 ≈ 1000, 2^20 ≈ 1M, 2^30 ≈ 1B, 2^40 ≈ 1T

Capacity Planning Calculator

Use these utility classes for quick estimations in interviews or actual capacity planning:
from dataclasses import dataclass, field
from typing import Dict, Optional, List
from enum import Enum
import math

class DataUnit(Enum):
    BYTES = 1
    KB = 1024
    MB = 1024 ** 2
    GB = 1024 ** 3
    TB = 1024 ** 4
    PB = 1024 ** 5

class TimeUnit(Enum):
    SECOND = 1
    MINUTE = 60
    HOUR = 3600
    DAY = 86400
    WEEK = 604800
    MONTH = 2592000
    YEAR = 31536000

@dataclass
class SystemEstimate:
    """Complete system capacity estimation"""
    
    # Input parameters
    total_users: int
    dau_percentage: float = 0.2  # 20% DAU by default
    requests_per_user_per_day: int = 10
    write_to_read_ratio: float = 0.1  # 10% writes
    data_per_record_bytes: int = 1000
    retention_years: int = 5
    
    # Computed values (filled by calculate())
    dau: int = 0
    daily_requests: int = 0
    average_qps: float = 0
    peak_qps: float = 0
    write_qps: float = 0
    read_qps: float = 0
    daily_storage_bytes: int = 0
    yearly_storage_bytes: int = 0
    total_storage_bytes: int = 0
    cache_size_bytes: int = 0
    estimated_servers: int = 0
    
    def calculate(self) -> 'SystemEstimate':
        """Calculate all derived metrics"""
        
        # User metrics
        self.dau = int(self.total_users * self.dau_percentage)
        
        # Traffic metrics
        self.daily_requests = self.dau * self.requests_per_user_per_day
        self.average_qps = self.daily_requests / TimeUnit.DAY.value
        self.peak_qps = self.average_qps * 3  # 3x for peak
        
        self.write_qps = self.average_qps * self.write_to_read_ratio
        self.read_qps = self.average_qps * (1 - self.write_to_read_ratio)
        
        # Storage metrics
        write_requests = self.daily_requests * self.write_to_read_ratio
        self.daily_storage_bytes = int(write_requests * self.data_per_record_bytes)
        self.yearly_storage_bytes = self.daily_storage_bytes * 365
        self.total_storage_bytes = self.yearly_storage_bytes * self.retention_years
        
        # Cache (20% of hot data - 80/20 rule)
        self.cache_size_bytes = int(self.total_storage_bytes * 0.2)
        
        # Server estimation (500 QPS per server, 3x buffer)
        self.estimated_servers = max(3, int((self.peak_qps / 500) * 3))
        
        return self
    
    def format_bytes(self, bytes: int) -> str:
        """Convert bytes to human-readable format"""
        for unit in ['B', 'KB', 'MB', 'GB', 'TB', 'PB']:
            if abs(bytes) < 1024:
                return f"{bytes:.1f} {unit}"
            bytes /= 1024
        return f"{bytes:.1f} PB"
    
    def summary(self) -> str:
        """Generate readable summary"""
        return f"""
╔══════════════════════════════════════════════════════════╗
║                 SYSTEM CAPACITY ESTIMATE                  ║
╠══════════════════════════════════════════════════════════╣
║ TRAFFIC                                                   ║
║   Total Users:        {self.total_users:,}
║   Daily Active Users: {self.dau:,}
║   Daily Requests:     {self.daily_requests:,}
║   Average QPS:        {self.average_qps:.1f}
║   Peak QPS:           {self.peak_qps:.1f}
║   Write QPS:          {self.write_qps:.1f}
║   Read QPS:           {self.read_qps:.1f}
╠══════════════════════════════════════════════════════════╣
║ STORAGE                                                   ║
║   Daily Storage:      {self.format_bytes(self.daily_storage_bytes)}
║   Yearly Storage:     {self.format_bytes(self.yearly_storage_bytes)}
║   Total ({self.retention_years} years):   {self.format_bytes(self.total_storage_bytes)}
║   Cache Size (20%):   {self.format_bytes(self.cache_size_bytes)}
╠══════════════════════════════════════════════════════════╣
║ INFRASTRUCTURE                                            ║
║   Estimated Servers:  {self.estimated_servers}
╚══════════════════════════════════════════════════════════╝
"""

# ============== URL Shortener Calculator ==============
@dataclass  
class URLShortenerEstimate:
    """Specialized calculator for URL shortener systems"""
    
    new_urls_per_month: int = 100_000_000
    read_write_ratio: int = 10
    retention_years: int = 5
    average_url_length: int = 500
    
    def calculate(self):
        # Traffic
        self.write_qps = self.new_urls_per_month / TimeUnit.MONTH.value
        self.read_qps = self.write_qps * self.read_write_ratio
        self.peak_write_qps = self.write_qps * 3
        self.peak_read_qps = self.read_qps * 3
        
        # Storage
        self.record_size = self.average_url_length + 7 + 100  # URL + short + meta
        self.monthly_storage = self.new_urls_per_month * self.record_size
        self.yearly_storage = self.monthly_storage * 12
        self.total_storage = self.yearly_storage * self.retention_years
        
        # Short URL length calculation
        total_urls = self.new_urls_per_month * 12 * self.retention_years
        self.short_url_length = math.ceil(math.log(total_urls * 10, 62))
        
        # Cache (hot URLs)
        self.cache_size = int(self.total_storage * 0.05)  # 5% is hot
        
        return self

# ============== Video Streaming Calculator ==============
@dataclass
class VideoStreamingEstimate:
    """Calculator for video streaming services"""
    
    total_subscribers: int = 200_000_000
    daily_active_percentage: float = 0.5
    concurrent_percentage: float = 0.1
    hours_per_viewer: float = 2
    bitrate_mbps: float = 5  # 1080p average
    
    def calculate(self):
        self.daily_viewers = int(self.total_subscribers * self.daily_active_percentage)
        self.concurrent_viewers = int(self.daily_viewers * self.concurrent_percentage)
        
        # Bandwidth
        self.peak_bandwidth_mbps = self.concurrent_viewers * self.bitrate_mbps
        self.peak_bandwidth_tbps = self.peak_bandwidth_mbps / 1_000_000
        
        # Daily data transfer
        seconds_watched = self.daily_viewers * self.hours_per_viewer * 3600
        bits_transferred = seconds_watched * self.bitrate_mbps * 1_000_000
        self.daily_data_pb = bits_transferred / 8 / (1024 ** 5)
        
        # CDN edge servers needed (assuming 10Gbps per server)
        self.edge_servers = int(self.peak_bandwidth_mbps / 10000 * 2)  # 2x buffer
        
        return self

# ============== Social Media Calculator ==============
@dataclass
class SocialMediaEstimate:
    """Calculator for social media platforms (Twitter-like)"""
    
    total_users: int = 500_000_000
    dau: int = 200_000_000
    avg_following: int = 200
    posts_per_active_user_per_day: float = 0.1
    timeline_checks_per_day: int = 5
    
    def calculate(self):
        # Post metrics
        daily_posts = self.dau * self.posts_per_active_user_per_day
        self.post_qps = daily_posts / TimeUnit.DAY.value
        
        # Timeline read metrics
        timeline_reads = self.dau * self.timeline_checks_per_day
        self.timeline_qps = timeline_reads / TimeUnit.DAY.value
        
        # Fan-out analysis
        self.fanout_writes_per_post = self.avg_following
        self.total_fanout_qps = self.post_qps * self.fanout_writes_per_post
        
        # Timeline storage (800 tweet IDs per user)
        timeline_size_bytes = 800 * 8  # 8 bytes per ID
        self.timeline_cache_tb = (self.total_users * timeline_size_bytes) / (1024 ** 4)
        
        return self

# ============== Usage Examples ==============
if __name__ == "__main__":
    # E-commerce platform
    ecommerce = SystemEstimate(
        total_users=50_000_000,
        dau_percentage=0.1,
        requests_per_user_per_day=20,
        write_to_read_ratio=0.05,
        data_per_record_bytes=2000,
        retention_years=3
    ).calculate()
    print(ecommerce.summary())
    
    # URL Shortener
    url_shortener = URLShortenerEstimate(
        new_urls_per_month=100_000_000,
        read_write_ratio=10
    ).calculate()
    
    print(f"URL Shortener:")
    print(f"  Write QPS: {url_shortener.write_qps:.1f}")
    print(f"  Read QPS: {url_shortener.read_qps:.1f}")
    print(f"  Short URL Length: {url_shortener.short_url_length}")
    
    # Video Streaming
    streaming = VideoStreamingEstimate(
        total_subscribers=200_000_000
    ).calculate()
    
    print(f"\nVideo Streaming:")
    print(f"  Peak Bandwidth: {streaming.peak_bandwidth_tbps:.1f} Tbps")
    print(f"  Daily Data: {streaming.daily_data_pb:.1f} PB")
    print(f"  Edge Servers Needed: {streaming.edge_servers}")