Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Interview Practice Problems

Staff+ distributed systems interviews are among the most challenging in the industry. This module provides comprehensive practice problems with detailed solutions, covering the exact topics asked at Google, Meta, Amazon, and other top companies. Think of these problems the way a pilot thinks about flight simulators: the scenarios are designed to trigger the exact failure modes and trade-off decisions you will face in a real interview. Working through them builds the muscle memory so that under pressure, the right architectural instincts surface automatically.
Problem Count: 15+ detailed problems
Difficulty: Staff/Principal level
Format: Problem → Hints → Full Solution → Follow-up Questions

Problem 1: Design a Distributed Rate Limiter

Difficulty: Medium-Hard
Companies: Stripe, Cloudflare, AWS
Time: 45 minutes

Problem Statement

Design a rate limiting system that:
  • Limits requests per user to 100 requests per minute
  • Works across multiple API servers (horizontally scaled)
  • Has low latency (< 5ms overhead)
  • Handles millions of users

Hints

Consider Token Bucket, Leaky Bucket, Sliding Window, or Fixed Window algorithms.
  • Token Bucket: Good for burst tolerance
  • Sliding Window: More accurate, more complex
  • Fixed Window: Simple but has edge case issues
Where do you store rate limit state?
  • Local memory: Fast but not shared
  • Redis: Shared but adds latency
  • Hybrid: Local + periodic sync
Multiple servers checking/updating same counter simultaneously. Consider atomic operations or Lua scripts in Redis.

Solution

┌─────────────────────────────────────────────────────────────────────────────┐
│                    DISTRIBUTED RATE LIMITER ARCHITECTURE                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ALGORITHM: Sliding Window Log (most accurate for distributed)              │
│                                                                              │
│  ┌────────────────────────────────────────────────────────────────────┐     │
│  │                         API SERVERS                                │     │
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐          │     │
│  │  │ Server 1 │  │ Server 2 │  │ Server 3 │  │ Server N │          │     │
│  │  │          │  │          │  │          │  │          │          │     │
│  │  │ Rate     │  │ Rate     │  │ Rate     │  │ Rate     │          │     │
│  │  │ Limiter  │  │ Limiter  │  │ Limiter  │  │ Limiter  │          │     │
│  │  │ Client   │  │ Client   │  │ Client   │  │ Client   │          │     │
│  │  └────┬─────┘  └────┬─────┘  └────┬─────┘  └────┬─────┘          │     │
│  │       │             │             │             │                 │     │
│  └───────┼─────────────┼─────────────┼─────────────┼─────────────────┘     │
│          │             │             │             │                        │
│          └─────────────┴──────┬──────┴─────────────┘                        │
│                               │                                             │
│                               ▼                                             │
│  ┌────────────────────────────────────────────────────────────────────┐     │
│  │                      REDIS CLUSTER                                 │     │
│  │                                                                    │     │
│  │   Key: rate_limit:user_123                                        │     │
│  │   Type: Sorted Set                                                │     │
│  │   Members: [timestamp1, timestamp2, timestamp3, ...]              │     │
│  │   Score: timestamp (for range queries)                            │     │
│  │                                                                    │     │
│  └────────────────────────────────────────────────────────────────────┘     │
│                                                                              │
│  SLIDING WINDOW ALGORITHM:                                                  │
│  ─────────────────────────                                                  │
│  1. Remove all entries older than (now - window_size)                       │
│  2. Count remaining entries                                                 │
│  3. If count < limit, add current timestamp and allow                       │
│  4. Else, reject                                                            │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
Redis Lua Script (Atomic Operation):
-- KEYS[1] = rate limit key (e.g., "rate_limit:user_123")
-- ARGV[1] = window size in seconds (e.g., 60 for per-minute limiting)
-- ARGV[2] = max requests allowed (e.g., 100)
-- ARGV[3] = current timestamp in seconds (with fractional precision)
--
-- Why a Lua script? Redis executes Lua atomically -- no other command
-- can interleave between our "read count" and "add entry" steps.
-- Without this, two servers could both read count=99 and both allow
-- their request, exceeding the limit.

local key = KEYS[1]
local window = tonumber(ARGV[1])
local limit = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local window_start = now - window

-- Step 1: Evict stale entries outside the sliding window.
-- The sorted set score IS the timestamp, so this range removal
-- efficiently prunes everything older than our window boundary.
redis.call('ZREMRANGEBYSCORE', key, 0, window_start)

-- Step 2: Count how many requests remain in the current window.
local count = redis.call('ZCARD', key)

if count < limit then
    -- Step 3a: Under the limit -- record this request.
    -- We append math.random() to the member value to guarantee uniqueness
    -- (two requests in the same microsecond would otherwise collide).
    redis.call('ZADD', key, now, now .. '-' .. math.random())
    -- Set TTL so Redis auto-cleans keys for inactive users,
    -- preventing memory from growing unbounded.
    redis.call('EXPIRE', key, window)
    return 1  -- Allowed
else
    -- Step 3b: Over the limit -- reject without recording.
    return 0  -- Denied
end
Distributed pitfall: Clock skew between your API servers means the now timestamp passed to this script may differ by milliseconds across servers. For a 60-second window this is negligible, but for sub-second windows (e.g., 10 requests per 100ms), use Redis server time via redis.call('TIME') instead of client-supplied timestamps.
Follow-up Questions:
  1. How do you handle Redis failures?
    • Fail open (allow requests) or fail closed (deny)?
    • Local cache with eventual sync?
  2. How do you prevent gaming the system?
    • User changes IP? Use account ID, not IP
    • Distributed attacks? Per-IP limits too
  3. How do you handle different rate limits per user tier?
    • Store tier in user metadata
    • Different limits for free vs paid

Problem 2: Design a Distributed Lock Service

Difficulty: Hard
Companies: Google, Amazon, Uber
Time: 45 minutes

Problem Statement

Design a distributed lock service similar to Zookeeper or etcd that:
  • Provides mutual exclusion
  • Handles node failures
  • Prevents deadlocks
  • Supports lock TTL (lease-based)

Solution

┌─────────────────────────────────────────────────────────────────────────────┐
│                    DISTRIBUTED LOCK SERVICE DESIGN                           │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  KEY COMPONENTS:                                                            │
│                                                                              │
│  1. CONSENSUS LAYER (Raft/Paxos)                                           │
│     ┌──────────────────────────────────────────────────────────────┐        │
│     │                                                              │        │
│     │  Leader ◄────────► Follower ◄────────► Follower             │        │
│     │                                                              │        │
│     │  All lock operations go through Raft log                    │        │
│     │  Majority must agree before lock is granted                 │        │
│     │                                                              │        │
│     └──────────────────────────────────────────────────────────────┘        │
│                                                                              │
│  2. LOCK DATA STRUCTURE                                                     │
│     ┌──────────────────────────────────────────────────────────────┐        │
│     │  Lock {                                                      │        │
│     │    key: "/locks/resource-123"                               │        │
│     │    owner: "client-abc"        // Who holds the lock         │        │
│     │    lease_id: 12345            // For TTL/renewal            │        │
│     │    fence_token: 789           // Monotonically increasing   │        │
│     │    expires_at: 1699900000     // When lease expires         │        │
│     │  }                                                          │        │
│     └──────────────────────────────────────────────────────────────┘        │
│                                                                              │
│  3. FENCING TOKEN (Critical for correctness!)                              │
│                                                                              │
│     Without fencing:                                                        │
│     ┌────────────────────────────────────────────────────────────────┐      │
│     │  Client A acquires lock                                        │      │
│     │  Client A pauses (GC, network)                                 │      │
│     │  Lock expires, Client B acquires lock                          │      │
│     │  Client A resumes, thinks it still has lock                    │      │
│     │  BOTH clients write! Data corruption!                          │      │
│     └────────────────────────────────────────────────────────────────┘      │
│                                                                              │
│     With fencing:                                                           │
│     ┌────────────────────────────────────────────────────────────────┐      │
│     │  Client A acquires lock, fence_token = 34                      │      │
│     │  Client A pauses (GC, network)                                 │      │
│     │  Lock expires, Client B acquires, fence_token = 35             │      │
│     │  Client A resumes, sends token 34 to storage                   │      │
│     │  Storage rejects: 34 < current (35)                            │      │
│     │  Only Client B's writes succeed!                               │      │
│     └────────────────────────────────────────────────────────────────┘      │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
Lock Acquisition Algorithm:
class DistributedLock:
    """
    A lease-based distributed lock backed by a consensus service (etcd, ZooKeeper).
    
    Think of this like a hotel room key card: you get the card (lease) for a
    fixed duration. If you don't renew at the front desk before checkout time,
    the room is given to the next guest -- even if you're still in the shower.
    The fence token is like a monotonically increasing guest number printed on
    the card that the minibar checks before serving you.
    """
    def __init__(self, lock_service, resource_key):
        self.lock_service = lock_service
        self.resource_key = resource_key
        self.lease_id = None
        self.fence_token = None
    
    async def acquire(self, timeout_seconds: float = 30) -> bool:
        """
        Acquire the lock with retries.
        Returns True if acquired, False if timeout.
        
        Important: The caller MUST use the fence_token when writing to
        downstream storage. Without it, a stale lock holder (paused by GC
        or network delay) can corrupt data after its lease silently expires.
        """
        deadline = time.time() + timeout_seconds
        
        while time.time() < deadline:
            # Try to create lock in consensus layer.
            # The consensus layer (Raft/Paxos) ensures only one client
            # sees success for a given key at any point in time.
            result = await self.lock_service.try_acquire(
                key=self.resource_key,
                owner=self.client_id,
                ttl_seconds=30  # Lease duration -- choose based on your
                                # longest expected critical section + margin
            )
            
            if result.success:
                self.lease_id = result.lease_id
                self.fence_token = result.fence_token
                
                # Start background lease renewal so the lock doesn't
                # expire while we're still doing work. Renew at 1/3 of TTL
                # to leave margin for transient network hiccups.
                self._start_renewal_loop()
                return True
            
            # Lock held by someone else -- back off before retrying.
            # A fixed sleep here is fine for low contention. For high
            # contention, consider exponential backoff or a watch/notify
            # mechanism (etcd Watch, ZooKeeper getData with watcher).
            await asyncio.sleep(0.1)
        
        return False
    
    async def release(self):
        """
        Release the lock explicitly. Always do this in a finally block.
        If the process crashes before release, the TTL ensures the lock
        is eventually freed -- that's the whole point of lease-based locking.
        """
        if self.lease_id:
            await self.lock_service.release(
                key=self.resource_key,
                lease_id=self.lease_id
            )
            self._stop_renewal_loop()
    
    def _start_renewal_loop(self):
        """
        Renew lease periodically to prevent expiry while the lock holder
        is still active. The renewal interval (10s) is roughly TTL/3,
        giving us two retry opportunities before the 30s lease expires.
        """
        async def renew():
            while self.lease_id:
                try:
                    await self.lock_service.renew_lease(self.lease_id)
                    await asyncio.sleep(10)  # Renew every 10s (TTL is 30s)
                except Exception:
                    # Lost connection to lock service -- lease will expire.
                    # The fence token ensures stale writes are rejected
                    # even if we continue operating under the false belief
                    # that we still hold the lock.
                    break
        
        asyncio.create_task(renew())
Follow-up Questions:
  1. What happens during network partition?
    • Clients on minority side can’t acquire new locks
    • Existing leases expire, preventing split-brain
  2. How do you handle leader election for the lock service itself?
    • Raft consensus with quorum reads/writes
  3. Redlock controversy: Why is Redis Redlock problematic?
    • Martin Kleppmann’s analysis: Clock assumptions too strong — Redlock assumes bounded clock drift across independent Redis nodes, but NTP jumps and VM clock corrections can violate this
    • Antirez’s rebuttal: argues real-world clocks are “good enough” with proper monitoring
    • The bottom line: If you need the lock for correctness (not just performance), fencing tokens are non-negotiable regardless of the locking mechanism. If you need the lock only for efficiency (avoiding duplicate work), Redlock is acceptable

Problem 3: Design a Global Payment System

Difficulty: Very Hard
Companies: Stripe, Square, Adyen
Time: 60 minutes

Problem Statement

Design a payment processing system that:
  • Handles credit card charges globally
  • Guarantees exactly-once processing
  • Supports refunds and chargebacks
  • Processes millions of transactions per day

Solution

┌─────────────────────────────────────────────────────────────────────────────┐
│                    PAYMENT SYSTEM ARCHITECTURE                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│                           ┌─────────────────┐                               │
│                           │   API Gateway   │                               │
│                           │ (Idempotency)   │                               │
│                           └────────┬────────┘                               │
│                                    │                                        │
│                           ┌────────▼────────┐                               │
│                           │ Payment Service │                               │
│                           │   (Orchestrator)│                               │
│                           └────────┬────────┘                               │
│                                    │                                        │
│     ┌──────────────────────────────┼──────────────────────────────┐         │
│     │                              │                              │         │
│     ▼                              ▼                              ▼         │
│ ┌────────────┐            ┌────────────────┐           ┌────────────────┐  │
│ │  Fraud     │            │  Authorization │           │  Risk Engine   │  │
│ │  Detection │            │  Service       │           │                │  │
│ └────────────┘            └───────┬────────┘           └────────────────┘  │
│                                   │                                         │
│                    ┌──────────────┼──────────────┐                          │
│                    ▼              ▼              ▼                          │
│               ┌─────────┐   ┌─────────┐   ┌─────────┐                      │
│               │  Visa   │   │Mastercard│  │  AMEX   │ (Payment Networks)  │
│               └─────────┘   └─────────┘   └─────────┘                      │
│                                                                              │
│  STATE MACHINE FOR PAYMENT:                                                 │
│  ──────────────────────────                                                 │
│                                                                              │
│  ┌──────────┐    Auth     ┌────────────┐   Capture   ┌───────────┐         │
│  │ PENDING  │ ──────────► │ AUTHORIZED │ ──────────► │ CAPTURED  │         │
│  └──────────┘             └────────────┘             └───────────┘         │
│       │                         │                          │               │
│       │                         │ Void                     │ Refund       │
│       │                         ▼                          ▼               │
│       │                   ┌───────────┐             ┌───────────┐          │
│       └──────────────────►│  FAILED   │             │ REFUNDED  │          │
│          Decline          └───────────┘             └───────────┘          │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
Exactly-Once Processing: Think of this like a post office with registered mail: each letter (payment) gets a unique tracking number (idempotency key). If the sender asks “did my letter arrive?”, the post office can check the tracking number and say “yes, here’s the confirmation” — without delivering the letter a second time.
class PaymentProcessor:
    async def process_payment(
        self,
        idempotency_key: str,
        amount: int,         # Always in smallest currency unit (cents) to avoid
        currency: str,       # floating-point rounding issues in financial systems
        card_token: str
    ) -> PaymentResult:
        """
        Process payment with exactly-once guarantee.
        
        The idempotency_key is CLIENT-generated (typically a UUID). This is
        critical -- if the server generated it, the client couldn't safely
        retry without risking a duplicate charge, because the first request
        might have succeeded but the response was lost in transit.
        """
        
        # Step 1: Check idempotency store (Redis or DB with TTL).
        # This is our "deduplication gate" -- the single most important
        # line in the entire payment flow.
        existing = await self.idempotency_store.get(idempotency_key)
        if existing:
            if existing.status == "completed":
                return existing.result  # Return cached result -- safe replay
            elif existing.status == "in_progress":
                # Another server is processing this exact request right now.
                # Return 409 so the client retries after a delay.
                raise ConflictError("Payment in progress")
        
        # Step 2: Create payment record (source of truth)
        payment_id = generate_id()
        
        async with self.db.transaction():
            # Insert payment
            await self.db.insert("payments", {
                "id": payment_id,
                "idempotency_key": idempotency_key,
                "amount": amount,
                "currency": currency,
                "status": "PENDING"
            })
            
            # Insert into outbox for event
            await self.db.insert("outbox", {
                "event": "payment.created",
                "payload": {"payment_id": payment_id}
            })
        
        try:
            # Step 3: Call payment network
            auth_result = await self.call_payment_network(
                amount=amount,
                currency=currency,
                card_token=card_token
            )
            
            # Step 4: Update payment status
            async with self.db.transaction():
                await self.db.update(
                    "payments",
                    {"id": payment_id},
                    {"status": "AUTHORIZED" if auth_result.success else "FAILED",
                     "network_reference": auth_result.reference}
                )
                
                await self.db.insert("outbox", {
                    "event": "payment.authorized" if auth_result.success else "payment.failed",
                    "payload": {"payment_id": payment_id}
                })
            
            result = PaymentResult(
                payment_id=payment_id,
                success=auth_result.success,
                decline_code=auth_result.decline_code
            )
            
            # Step 5: Cache in idempotency store
            await self.idempotency_store.set(
                idempotency_key,
                {"status": "completed", "result": result},
                ttl=24 * 3600  # 24 hours
            )
            
            return result
            
        except Exception as e:
            # Network failure - payment status uncertain
            # Keep as PENDING, reconciliation job will resolve
            logger.error(f"Payment network error: {e}")
            raise
Key Design Decisions:
DecisionRationale
Transactional outboxDatabase and event publishing in same transaction
Idempotency keysClient-provided, prevents duplicate charges
State machineClear payment lifecycle, no invalid transitions
ReconciliationAsync job compares our state with payment networks
Saga for refundsMulti-step with compensation on failure

Problem 4: Design a Distributed Task Scheduler

Difficulty: Hard
Companies: Uber (Cadence), Airbnb, LinkedIn
Time: 45 minutes

Problem Statement

Design a system that:
  • Schedules tasks to run at specific times (cron-like)
  • Executes one-time delayed tasks
  • Guarantees at-least-once execution
  • Scales horizontally

Solution

┌─────────────────────────────────────────────────────────────────────────────┐
│                    DISTRIBUTED TASK SCHEDULER                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ARCHITECTURE:                                                              │
│                                                                              │
│  ┌────────────────────────────────────────────────────────────────────┐     │
│  │                        SCHEDULER LAYER                             │     │
│  │                                                                    │     │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐             │     │
│  │  │ Scheduler 1  │  │ Scheduler 2  │  │ Scheduler N  │             │     │
│  │  │  (Leader)    │  │ (Follower)   │  │ (Follower)   │             │     │
│  │  └──────┬───────┘  └──────────────┘  └──────────────┘             │     │
│  │         │                                                          │     │
│  │         │ Leader election via consensus                            │     │
│  │         │ (Only leader schedules tasks)                            │     │
│  │         │                                                          │     │
│  └─────────┼──────────────────────────────────────────────────────────┘     │
│            │                                                                 │
│            ▼                                                                 │
│  ┌────────────────────────────────────────────────────────────────────┐     │
│  │                        TASK QUEUE                                  │     │
│  │                                                                    │     │
│  │   Ready Queue (sorted by execution_time):                         │     │
│  │   ┌─────────────────────────────────────────────────────────────┐ │     │
│  │   │ task_1 (run at 10:00) │ task_2 (10:01) │ task_3 (10:05) │   │ │     │
│  │   └─────────────────────────────────────────────────────────────┘ │     │
│  │                                                                    │     │
│  │   Delay Queue (Redis ZSET or dedicated delay queue):              │     │
│  │   ┌─────────────────────────────────────────────────────────────┐ │     │
│  │   │ task_4 (due 11:00) │ task_5 (due 12:00) │ ...              │ │     │
│  │   └─────────────────────────────────────────────────────────────┘ │     │
│  │                                                                    │     │
│  └────────────────────────────────────────────────────────────────────┘     │
│            │                                                                 │
│            ▼                                                                 │
│  ┌────────────────────────────────────────────────────────────────────┐     │
│  │                        WORKER LAYER                                │     │
│  │                                                                    │     │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐             │     │
│  │  │  Worker 1    │  │  Worker 2    │  │  Worker N    │             │     │
│  │  │              │  │              │  │              │             │     │
│  │  │ - Pulls task │  │ - Pulls task │  │ - Pulls task │             │     │
│  │  │ - Executes   │  │ - Executes   │  │ - Executes   │             │     │
│  │  │ - Acks/Fails │  │ - Acks/Fails │  │ - Acks/Fails │             │     │
│  │  └──────────────┘  └──────────────┘  └──────────────┘             │     │
│  │                                                                    │     │
│  └────────────────────────────────────────────────────────────────────┘     │
│                                                                              │
│  TASK DATA MODEL:                                                           │
│  ────────────────                                                           │
│  {                                                                          │
│    "id": "task-123",                                                        │
│    "type": "send_email",                                                    │
│    "payload": {"to": "user@example.com", ...},                             │
│    "schedule_time": 1699900000,                                            │
│    "status": "PENDING|RUNNING|COMPLETED|FAILED",                           │
│    "retry_count": 0,                                                       │
│    "max_retries": 3,                                                       │
│    "created_at": 1699800000                                                │
│  }                                                                          │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
Handling Exactly-Once Semantics:
class TaskWorker:
    async def process_task(self, task: Task):
        """
        Process task with at-least-once guarantee.
        Idempotency is caller's responsibility.
        """
        
        # Set visibility timeout (task invisible to other workers)
        await self.queue.set_visibility_timeout(task.id, seconds=300)
        
        try:
            # Execute the task
            handler = self.handlers[task.type]
            await handler(task.payload)
            
            # Mark as complete
            await self.store.update_task(task.id, status="COMPLETED")
            await self.queue.ack(task.id)
            
        except RetryableError as e:
            if task.retry_count < task.max_retries:
                # Reschedule with exponential backoff
                next_run = datetime.now() + timedelta(
                    seconds=2 ** task.retry_count * 60
                )
                await self.store.update_task(
                    task.id,
                    status="PENDING",
                    schedule_time=next_run,
                    retry_count=task.retry_count + 1
                )
                await self.queue.nack(task.id)
            else:
                # Max retries exceeded
                await self.store.update_task(
                    task.id, 
                    status="FAILED",
                    error=str(e)
                )
                await self.queue.ack(task.id)
                
        except Exception as e:
            # Non-retryable error
            await self.store.update_task(
                task.id,
                status="FAILED",
                error=str(e)
            )
            await self.queue.ack(task.id)

Problem 5: Design a Distributed Unique ID Generator

Difficulty: Medium
Companies: Twitter (Snowflake), Instagram
Time: 30 minutes

Problem Statement

Design a system that generates globally unique IDs with:
  • 64-bit IDs
  • Sortable by time (roughly)
  • No coordination between nodes
  • Millions of IDs per second

Solution

┌─────────────────────────────────────────────────────────────────────────────┐
│                    SNOWFLAKE ID FORMAT (64 bits)                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌─────┬───────────────────────────┬────────────┬──────────────────────┐    │
│  │  0  │       41 bits            │  10 bits   │      12 bits         │    │
│  │     │    Timestamp (ms)        │  Node ID   │   Sequence Number    │    │
│  │     │  (69 years from epoch)   │ (1024 nodes)│  (4096 per ms)      │    │
│  └─────┴───────────────────────────┴────────────┴──────────────────────┘    │
│                                                                              │
│  CAPACITY:                                                                  │
│  ─────────                                                                  │
│  - 1024 nodes                                                               │
│  - 4096 IDs per millisecond per node                                        │
│  - 4.1 million IDs per second per node                                      │
│  - 69 years before timestamp overflow                                       │
│                                                                              │
│  EXAMPLE:                                                                   │
│  ────────                                                                   │
│  Timestamp: 1699900000000 (41 bits)                                         │
│  Node ID: 42 (10 bits)                                                      │
│  Sequence: 0 (12 bits)                                                      │
│                                                                              │
│  ID = (1699900000000 << 22) | (42 << 12) | 0                               │
│     = 7130486325248696320                                                   │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
Implementation: The elegance of Snowflake is that each node is a self-contained ID factory — like giving each post office its own stamp machine with a unique prefix. No two machines ever produce the same stamp, and you can read the timestamp and origin directly from the ID itself.
import time
import threading

class SnowflakeGenerator:
    def __init__(self, node_id: int, epoch: int = 1288834974657):
        """
        Initialize Snowflake ID generator.
        
        Args:
            node_id: Unique identifier for this generator (0-1023).
                     Must be assigned once and never reused concurrently.
            epoch: Custom epoch in milliseconds (Twitter's is 2010-11-04).
                   Using a custom epoch gives you more years before the
                   41-bit timestamp field overflows -- starting from Unix
                   epoch 1970 wastes ~40 years of range.
        """
        if not 0 <= node_id < 1024:
            raise ValueError("Node ID must be between 0 and 1023")
        
        self.node_id = node_id
        self.epoch = epoch
        self.sequence = 0
        self.last_timestamp = -1
        self._lock = threading.Lock()
    
    def _current_time_millis(self) -> int:
        return int(time.time() * 1000)
    
    def generate(self) -> int:
        """
        Generate a unique Snowflake ID. Thread-safe via mutex.
        
        In high-throughput systems, this lock can become a bottleneck.
        Alternatives: per-thread generators, or lock-free CAS loops.
        """
        with self._lock:
            timestamp = self._current_time_millis()
            
            if timestamp < self.last_timestamp:
                # Clock went backwards! Wait for it to catch up.
                raise ClockMovedBackwardsError(
                    f"Clock moved backwards by {self.last_timestamp - timestamp}ms"
                )
            
            if timestamp == self.last_timestamp:
                # Same millisecond, increment sequence
                self.sequence = (self.sequence + 1) & 0xFFF  # 12 bits
                
                if self.sequence == 0:
                    # Sequence exhausted, wait for next millisecond
                    timestamp = self._wait_next_millis(timestamp)
            else:
                # New millisecond, reset sequence
                self.sequence = 0
            
            self.last_timestamp = timestamp
            
            # Construct the ID
            id = (
                ((timestamp - self.epoch) << 22) |
                (self.node_id << 12) |
                self.sequence
            )
            
            return id
    
    def _wait_next_millis(self, last_timestamp: int) -> int:
        """Spin until we get to the next millisecond"""
        timestamp = self._current_time_millis()
        while timestamp <= last_timestamp:
            timestamp = self._current_time_millis()
        return timestamp
    
    @staticmethod
    def parse(id: int) -> dict:
        """Parse a Snowflake ID into its components"""
        return {
            "timestamp": (id >> 22) + 1288834974657,
            "node_id": (id >> 12) & 0x3FF,
            "sequence": id & 0xFFF
        }
Follow-up Questions:
  1. How do you assign node IDs?
    • Zookeeper/etcd for coordination
    • Use MAC address hash
    • Use Kubernetes pod ordinal
  2. What if clock goes backwards?
    • Throw exception (let operator fix NTP)
    • Wait for clock to catch up
    • Use logical clock component
  3. How do you handle datacenter failover?
    • Split node ID: 5 bits datacenter + 5 bits machine
    • Different epoch per datacenter

Problem 6: Design a Distributed Message Queue

Difficulty: Very Hard
Companies: Confluent, AWS (SQS), LinkedIn
Time: 60 minutes

Problem Statement

Design a message queue like Kafka that:
  • Handles millions of messages per second
  • Provides ordering guarantees (per partition)
  • Supports multiple consumers
  • Persists messages for replay

Key Design Points

┌─────────────────────────────────────────────────────────────────────────────┐
│                    DISTRIBUTED MESSAGE QUEUE                                 │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  TOPIC: orders (3 partitions, replication factor 3)                        │
│                                                                              │
│  ┌────────────────────────────────────────────────────────────────────┐     │
│  │                                                                    │     │
│  │   Partition 0          Partition 1          Partition 2           │     │
│  │   (Leader: B1)         (Leader: B2)         (Leader: B3)          │     │
│  │                                                                    │     │
│  │   ┌──────────────┐     ┌──────────────┐     ┌──────────────┐      │     │
│  │   │[0][1][2][3]  │     │[0][1][2][3]  │     │[0][1][2][3]  │      │     │
│  │   └──────────────┘     └──────────────┘     └──────────────┘      │     │
│  │   Broker 1 (L)         Broker 2 (L)         Broker 3 (L)          │     │
│  │   Broker 2 (F)         Broker 3 (F)         Broker 1 (F)          │     │
│  │   Broker 3 (F)         Broker 1 (F)         Broker 2 (F)          │     │
│  │                                                                    │     │
│  └────────────────────────────────────────────────────────────────────┘     │
│                                                                              │
│  CONSUMER GROUP: order-processors                                           │
│                                                                              │
│  ┌────────────────────────────────────────────────────────────────────┐     │
│  │                                                                    │     │
│  │   Consumer 1           Consumer 2           Consumer 3            │     │
│  │   (Partition 0)        (Partition 1)        (Partition 2)         │     │
│  │                                                                    │     │
│  │   Each partition assigned to exactly ONE consumer in group        │     │
│  │   If Consumer 2 dies, Partition 1 reassigned to 1 or 3           │     │
│  │                                                                    │     │
│  └────────────────────────────────────────────────────────────────────┘     │
│                                                                              │
│  GUARANTEES:                                                                │
│  ───────────                                                                │
│  - Ordering: Within a partition only                                        │
│  - Delivery: At-least-once (consumer commits offset after processing)      │
│  - Durability: Message persisted to N replicas before ack                  │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

More Practice Problems

Design Uber/Lyft

Topics: Real-time location, matching, surge pricing Key challenges: Geospatial indexing, low-latency dispatch

Design Twitter Timeline

Topics: Fan-out, feed ranking, real-time updates Key challenges: Hot users, read-heavy workload

Design Google Docs

Topics: CRDT, operational transformation, collaboration Key challenges: Conflict resolution, real-time sync

Design Netflix

Topics: CDN, video encoding, recommendations Key challenges: Global scale, adaptive streaming

Interview Tips

  1. Clarify requirements (5 min)
    • Scale: Users, requests/second, data size
    • Consistency: Strong or eventual?
    • Latency: What’s acceptable?
  2. High-level design (10 min)
    • Draw boxes and arrows
    • Identify core components
    • Explain data flow
  3. Deep dive (20 min)
    • Pick 2-3 most critical components
    • Discuss algorithms, data structures
    • Explain trade-offs
  4. Handle edge cases (10 min)
    • Failures: Node crashes, network partitions
    • Scale: Hot spots, bottlenecks
    • Consistency: Race conditions
  • Breadth: Can you identify all components?
  • Depth: Can you go deep on key areas?
  • Trade-offs: Do you understand CAP, consistency vs availability?
  • Practicality: Is your design buildable?
  • Communication: Can you explain clearly?
  • Starting to code before understanding the problem — spend at least 3-5 minutes on requirements
  • Not asking clarifying questions — interviewers deliberately leave gaps to see if you probe
  • Ignoring scale constraints — “works for 100 users” and “works for 100 million users” are fundamentally different systems
  • Over-engineering simple problems — if the interviewer says “single region, moderate scale,” do not design a globally distributed system
  • Forgetting failure modes — always ask yourself: “What happens when this component goes down?”
  • Not discussing trade-offs — saying “I would use Kafka” without explaining why (vs. SQS, RabbitMQ, etc.) signals shallow understanding
  • Treating consistency as binary — many candidates say “eventually consistent” without specifying what guarantees their users actually need (read-your-writes, monotonic reads, etc.)