Interview Practice Problems
Staff+ distributed systems interviews are among the most challenging in the industry. This module provides comprehensive practice problems with detailed solutions, covering the exact topics asked at Google, Meta, Amazon, and other top companies.
Problem Count : 15+ detailed problems
Difficulty : Staff/Principal level
Format : Problem → Hints → Full Solution → Follow-up Questions
Problem 1: Design a Distributed Rate Limiter
Difficulty : Medium-Hard
Companies : Stripe, Cloudflare, AWS
Time : 45 minutes
Problem Statement
Design a rate limiting system that:
Limits requests per user to 100 requests per minute
Works across multiple API servers (horizontally scaled)
Has low latency (< 5ms overhead)
Handles millions of users
Hints
Consider Token Bucket, Leaky Bucket, Sliding Window, or Fixed Window algorithms.
Token Bucket: Good for burst tolerance
Sliding Window: More accurate, more complex
Fixed Window: Simple but has edge case issues
Where do you store rate limit state?
Local memory: Fast but not shared
Redis: Shared but adds latency
Hybrid: Local + periodic sync
Multiple servers checking/updating same counter simultaneously.
Consider atomic operations or Lua scripts in Redis.
Solution
┌─────────────────────────────────────────────────────────────────────────────┐
│ DISTRIBUTED RATE LIMITER ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ALGORITHM: Sliding Window Log (most accurate for distributed) │
│ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ API SERVERS │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ Server 1 │ │ Server 2 │ │ Server 3 │ │ Server N │ │ │
│ │ │ │ │ │ │ │ │ │ │ │
│ │ │ Rate │ │ Rate │ │ Rate │ │ Rate │ │ │
│ │ │ Limiter │ │ Limiter │ │ Limiter │ │ Limiter │ │ │
│ │ │ Client │ │ Client │ │ Client │ │ Client │ │ │
│ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │
│ │ │ │ │ │ │ │
│ └───────┼─────────────┼─────────────┼─────────────┼─────────────────┘ │
│ │ │ │ │ │
│ └─────────────┴──────┬──────┴─────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ REDIS CLUSTER │ │
│ │ │ │
│ │ Key: rate_limit:user_123 │ │
│ │ Type: Sorted Set │ │
│ │ Members: [timestamp1, timestamp2, timestamp3, ...] │ │
│ │ Score: timestamp (for range queries) │ │
│ │ │ │
│ └────────────────────────────────────────────────────────────────────┘ │
│ │
│ SLIDING WINDOW ALGORITHM: │
│ ───────────────────────── │
│ 1. Remove all entries older than (now - window_size) │
│ 2. Count remaining entries │
│ 3. If count < limit, add current timestamp and allow │
│ 4. Else, reject │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Redis Lua Script (Atomic Operation) :
-- KEYS[1] = rate limit key
-- ARGV[1] = window size in seconds
-- ARGV[2] = max requests allowed
-- ARGV[3] = current timestamp
local key = KEYS [ 1 ]
local window = tonumber ( ARGV [ 1 ])
local limit = tonumber ( ARGV [ 2 ])
local now = tonumber ( ARGV [ 3 ])
local window_start = now - window
-- Remove old entries
redis . call ( 'ZREMRANGEBYSCORE' , key , 0 , window_start )
-- Count current entries
local count = redis . call ( 'ZCARD' , key )
if count < limit then
-- Add current request
redis . call ( 'ZADD' , key , now , now .. '-' .. math.random ())
redis . call ( 'EXPIRE' , key , window )
return 1 -- Allowed
else
return 0 -- Denied
end
Follow-up Questions :
How do you handle Redis failures?
Fail open (allow requests) or fail closed (deny)?
Local cache with eventual sync?
How do you prevent gaming the system?
User changes IP? Use account ID, not IP
Distributed attacks? Per-IP limits too
How do you handle different rate limits per user tier?
Store tier in user metadata
Different limits for free vs paid
Problem 2: Design a Distributed Lock Service
Difficulty : Hard
Companies : Google, Amazon, Uber
Time : 45 minutes
Problem Statement
Design a distributed lock service similar to Zookeeper or etcd that:
Provides mutual exclusion
Handles node failures
Prevents deadlocks
Supports lock TTL (lease-based)
Solution
┌─────────────────────────────────────────────────────────────────────────────┐
│ DISTRIBUTED LOCK SERVICE DESIGN │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ KEY COMPONENTS: │
│ │
│ 1. CONSENSUS LAYER (Raft/Paxos) │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ Leader ◄────────► Follower ◄────────► Follower │ │
│ │ │ │
│ │ All lock operations go through Raft log │ │
│ │ Majority must agree before lock is granted │ │
│ │ │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │
│ 2. LOCK DATA STRUCTURE │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ Lock { │ │
│ │ key: "/locks/resource-123" │ │
│ │ owner: "client-abc" // Who holds the lock │ │
│ │ lease_id: 12345 // For TTL/renewal │ │
│ │ fence_token: 789 // Monotonically increasing │ │
│ │ expires_at: 1699900000 // When lease expires │ │
│ │ } │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │
│ 3. FENCING TOKEN (Critical for correctness!) │
│ │
│ Without fencing: │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ Client A acquires lock │ │
│ │ Client A pauses (GC, network) │ │
│ │ Lock expires, Client B acquires lock │ │
│ │ Client A resumes, thinks it still has lock │ │
│ │ BOTH clients write! Data corruption! │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ │
│ With fencing: │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ Client A acquires lock, fence_token = 34 │ │
│ │ Client A pauses (GC, network) │ │
│ │ Lock expires, Client B acquires, fence_token = 35 │ │
│ │ Client A resumes, sends token 34 to storage │ │
│ │ Storage rejects: 34 < current (35) │ │
│ │ Only Client B's writes succeed! │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Lock Acquisition Algorithm :
class DistributedLock :
def __init__ ( self , lock_service , resource_key ):
self .lock_service = lock_service
self .resource_key = resource_key
self .lease_id = None
self .fence_token = None
async def acquire ( self , timeout_seconds : float = 30 ) -> bool :
"""
Acquire the lock with retries.
Returns True if acquired, False if timeout.
"""
deadline = time.time() + timeout_seconds
while time.time() < deadline:
# Try to create lock in consensus layer
result = await self .lock_service.try_acquire(
key = self .resource_key,
owner = self .client_id,
ttl_seconds = 30 # Lease duration
)
if result.success:
self .lease_id = result.lease_id
self .fence_token = result.fence_token
# Start background lease renewal
self ._start_renewal_loop()
return True
# Lock held by someone else, wait and retry
await asyncio.sleep( 0.1 )
return False
async def release ( self ):
"""Release the lock"""
if self .lease_id:
await self .lock_service.release(
key = self .resource_key,
lease_id = self .lease_id
)
self ._stop_renewal_loop()
def _start_renewal_loop ( self ):
"""Renew lease periodically to prevent expiry"""
async def renew ():
while self .lease_id:
try :
await self .lock_service.renew_lease( self .lease_id)
await asyncio.sleep( 10 ) # Renew every 10s (TTL is 30s)
except Exception :
# Lost connection, lease will expire
break
asyncio.create_task(renew())
Follow-up Questions :
What happens during network partition?
Clients on minority side can’t acquire new locks
Existing leases expire, preventing split-brain
How do you handle leader election for the lock service itself?
Raft consensus with quorum reads/writes
Redlock controversy : Why is Redis Redlock problematic?
Martin Kleppmann’s analysis: Clock assumptions too strong
Fencing tokens still needed for safety
Problem 3: Design a Global Payment System
Difficulty : Very Hard
Companies : Stripe, Square, Adyen
Time : 60 minutes
Problem Statement
Design a payment processing system that:
Handles credit card charges globally
Guarantees exactly-once processing
Supports refunds and chargebacks
Processes millions of transactions per day
Solution
┌─────────────────────────────────────────────────────────────────────────────┐
│ PAYMENT SYSTEM ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ │
│ │ API Gateway │ │
│ │ (Idempotency) │ │
│ └────────┬────────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │ Payment Service │ │
│ │ (Orchestrator)│ │
│ └────────┬────────┘ │
│ │ │
│ ┌──────────────────────────────┼──────────────────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌────────────┐ ┌────────────────┐ ┌────────────────┐ │
│ │ Fraud │ │ Authorization │ │ Risk Engine │ │
│ │ Detection │ │ Service │ │ │ │
│ └────────────┘ └───────┬────────┘ └────────────────┘ │
│ │ │
│ ┌──────────────┼──────────────┐ │
│ ▼ ▼ ▼ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Visa │ │Mastercard│ │ AMEX │ (Payment Networks) │
│ └─────────┘ └─────────┘ └─────────┘ │
│ │
│ STATE MACHINE FOR PAYMENT: │
│ ────────────────────────── │
│ │
│ ┌──────────┐ Auth ┌────────────┐ Capture ┌───────────┐ │
│ │ PENDING │ ──────────► │ AUTHORIZED │ ──────────► │ CAPTURED │ │
│ └──────────┘ └────────────┘ └───────────┘ │
│ │ │ │ │
│ │ │ Void │ Refund │
│ │ ▼ ▼ │
│ │ ┌───────────┐ ┌───────────┐ │
│ └──────────────────►│ FAILED │ │ REFUNDED │ │
│ Decline └───────────┘ └───────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Exactly-Once Processing :
class PaymentProcessor :
async def process_payment (
self ,
idempotency_key : str ,
amount : int ,
currency : str ,
card_token : str
) -> PaymentResult:
"""
Process payment with exactly-once guarantee.
"""
# Step 1: Check idempotency store
existing = await self .idempotency_store.get(idempotency_key)
if existing:
if existing.status == "completed" :
return existing.result # Return cached result
elif existing.status == "in_progress" :
raise ConflictError( "Payment in progress" )
# Step 2: Create payment record (source of truth)
payment_id = generate_id()
async with self .db.transaction():
# Insert payment
await self .db.insert( "payments" , {
"id" : payment_id,
"idempotency_key" : idempotency_key,
"amount" : amount,
"currency" : currency,
"status" : "PENDING"
})
# Insert into outbox for event
await self .db.insert( "outbox" , {
"event" : "payment.created" ,
"payload" : { "payment_id" : payment_id}
})
try :
# Step 3: Call payment network
auth_result = await self .call_payment_network(
amount = amount,
currency = currency,
card_token = card_token
)
# Step 4: Update payment status
async with self .db.transaction():
await self .db.update(
"payments" ,
{ "id" : payment_id},
{ "status" : "AUTHORIZED" if auth_result.success else "FAILED" ,
"network_reference" : auth_result.reference}
)
await self .db.insert( "outbox" , {
"event" : "payment.authorized" if auth_result.success else "payment.failed" ,
"payload" : { "payment_id" : payment_id}
})
result = PaymentResult(
payment_id = payment_id,
success = auth_result.success,
decline_code = auth_result.decline_code
)
# Step 5: Cache in idempotency store
await self .idempotency_store.set(
idempotency_key,
{ "status" : "completed" , "result" : result},
ttl = 24 * 3600 # 24 hours
)
return result
except Exception as e:
# Network failure - payment status uncertain
# Keep as PENDING, reconciliation job will resolve
logger.error( f "Payment network error: { e } " )
raise
Key Design Decisions :
Decision Rationale Transactional outbox Database and event publishing in same transaction Idempotency keys Client-provided, prevents duplicate charges State machine Clear payment lifecycle, no invalid transitions Reconciliation Async job compares our state with payment networks Saga for refunds Multi-step with compensation on failure
Problem 4: Design a Distributed Task Scheduler
Difficulty : Hard
Companies : Uber (Cadence), Airbnb, LinkedIn
Time : 45 minutes
Problem Statement
Design a system that:
Schedules tasks to run at specific times (cron-like)
Executes one-time delayed tasks
Guarantees at-least-once execution
Scales horizontally
Solution
┌─────────────────────────────────────────────────────────────────────────────┐
│ DISTRIBUTED TASK SCHEDULER │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ARCHITECTURE: │
│ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ SCHEDULER LAYER │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Scheduler 1 │ │ Scheduler 2 │ │ Scheduler N │ │ │
│ │ │ (Leader) │ │ (Follower) │ │ (Follower) │ │ │
│ │ └──────┬───────┘ └──────────────┘ └──────────────┘ │ │
│ │ │ │ │
│ │ │ Leader election via consensus │ │
│ │ │ (Only leader schedules tasks) │ │
│ │ │ │ │
│ └─────────┼──────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ TASK QUEUE │ │
│ │ │ │
│ │ Ready Queue (sorted by execution_time): │ │
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
│ │ │ task_1 (run at 10:00) │ task_2 (10:01) │ task_3 (10:05) │ │ │ │
│ │ └─────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ Delay Queue (Redis ZSET or dedicated delay queue): │ │
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
│ │ │ task_4 (due 11:00) │ task_5 (due 12:00) │ ... │ │ │
│ │ └─────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ WORKER LAYER │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Worker 1 │ │ Worker 2 │ │ Worker N │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ - Pulls task │ │ - Pulls task │ │ - Pulls task │ │ │
│ │ │ - Executes │ │ - Executes │ │ - Executes │ │ │
│ │ │ - Acks/Fails │ │ - Acks/Fails │ │ - Acks/Fails │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ │ │ │
│ └────────────────────────────────────────────────────────────────────┘ │
│ │
│ TASK DATA MODEL: │
│ ──────────────── │
│ { │
│ "id": "task-123", │
│ "type": "send_email", │
│ "payload": {"to": "[email protected] ", ...}, │
│ "schedule_time": 1699900000, │
│ "status": "PENDING|RUNNING|COMPLETED|FAILED", │
│ "retry_count": 0, │
│ "max_retries": 3, │
│ "created_at": 1699800000 │
│ } │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Handling Exactly-Once Semantics :
class TaskWorker :
async def process_task ( self , task : Task):
"""
Process task with at-least-once guarantee.
Idempotency is caller's responsibility.
"""
# Set visibility timeout (task invisible to other workers)
await self .queue.set_visibility_timeout(task.id, seconds = 300 )
try :
# Execute the task
handler = self .handlers[task.type]
await handler(task.payload)
# Mark as complete
await self .store.update_task(task.id, status = "COMPLETED" )
await self .queue.ack(task.id)
except RetryableError as e:
if task.retry_count < task.max_retries:
# Reschedule with exponential backoff
next_run = datetime.now() + timedelta(
seconds = 2 ** task.retry_count * 60
)
await self .store.update_task(
task.id,
status = "PENDING" ,
schedule_time = next_run,
retry_count = task.retry_count + 1
)
await self .queue.nack(task.id)
else :
# Max retries exceeded
await self .store.update_task(
task.id,
status = "FAILED" ,
error = str (e)
)
await self .queue.ack(task.id)
except Exception as e:
# Non-retryable error
await self .store.update_task(
task.id,
status = "FAILED" ,
error = str (e)
)
await self .queue.ack(task.id)
Problem 5: Design a Distributed Unique ID Generator
Difficulty : Medium
Companies : Twitter (Snowflake), Instagram
Time : 30 minutes
Problem Statement
Design a system that generates globally unique IDs with:
64-bit IDs
Sortable by time (roughly)
No coordination between nodes
Millions of IDs per second
Solution
┌─────────────────────────────────────────────────────────────────────────────┐
│ SNOWFLAKE ID FORMAT (64 bits) │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────┬───────────────────────────┬────────────┬──────────────────────┐ │
│ │ 0 │ 41 bits │ 10 bits │ 12 bits │ │
│ │ │ Timestamp (ms) │ Node ID │ Sequence Number │ │
│ │ │ (69 years from epoch) │ (1024 nodes)│ (4096 per ms) │ │
│ └─────┴───────────────────────────┴────────────┴──────────────────────┘ │
│ │
│ CAPACITY: │
│ ───────── │
│ - 1024 nodes │
│ - 4096 IDs per millisecond per node │
│ - 4.1 million IDs per second per node │
│ - 69 years before timestamp overflow │
│ │
│ EXAMPLE: │
│ ──────── │
│ Timestamp: 1699900000000 (41 bits) │
│ Node ID: 42 (10 bits) │
│ Sequence: 0 (12 bits) │
│ │
│ ID = (1699900000000 << 22) | (42 << 12) | 0 │
│ = 7130486325248696320 │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Implementation :
import time
import threading
class SnowflakeGenerator :
def __init__ ( self , node_id : int , epoch : int = 1288834974657 ):
"""
Initialize Snowflake ID generator.
Args:
node_id: Unique identifier for this generator (0-1023)
epoch: Custom epoch (Twitter's is 2010-11-04)
"""
if not 0 <= node_id < 1024 :
raise ValueError ( "Node ID must be between 0 and 1023" )
self .node_id = node_id
self .epoch = epoch
self .sequence = 0
self .last_timestamp = - 1
self ._lock = threading.Lock()
def _current_time_millis ( self ) -> int :
return int (time.time() * 1000 )
def generate ( self ) -> int :
"""Generate a unique Snowflake ID"""
with self ._lock:
timestamp = self ._current_time_millis()
if timestamp < self .last_timestamp:
# Clock went backwards! Wait for it to catch up.
raise ClockMovedBackwardsError(
f "Clock moved backwards by { self .last_timestamp - timestamp } ms"
)
if timestamp == self .last_timestamp:
# Same millisecond, increment sequence
self .sequence = ( self .sequence + 1 ) & 0x FFF # 12 bits
if self .sequence == 0 :
# Sequence exhausted, wait for next millisecond
timestamp = self ._wait_next_millis(timestamp)
else :
# New millisecond, reset sequence
self .sequence = 0
self .last_timestamp = timestamp
# Construct the ID
id = (
((timestamp - self .epoch) << 22 ) |
( self .node_id << 12 ) |
self .sequence
)
return id
def _wait_next_millis ( self , last_timestamp : int ) -> int :
"""Spin until we get to the next millisecond"""
timestamp = self ._current_time_millis()
while timestamp <= last_timestamp:
timestamp = self ._current_time_millis()
return timestamp
@ staticmethod
def parse ( id : int ) -> dict :
"""Parse a Snowflake ID into its components"""
return {
"timestamp" : ( id >> 22 ) + 1288834974657 ,
"node_id" : ( id >> 12 ) & 0x 3FF ,
"sequence" : id & 0x FFF
}
Follow-up Questions :
How do you assign node IDs?
Zookeeper/etcd for coordination
Use MAC address hash
Use Kubernetes pod ordinal
What if clock goes backwards?
Throw exception (let operator fix NTP)
Wait for clock to catch up
Use logical clock component
How do you handle datacenter failover?
Split node ID: 5 bits datacenter + 5 bits machine
Different epoch per datacenter
Problem 6: Design a Distributed Message Queue
Difficulty : Very Hard
Companies : Confluent, AWS (SQS), LinkedIn
Time : 60 minutes
Problem Statement
Design a message queue like Kafka that:
Handles millions of messages per second
Provides ordering guarantees (per partition)
Supports multiple consumers
Persists messages for replay
Key Design Points
┌─────────────────────────────────────────────────────────────────────────────┐
│ DISTRIBUTED MESSAGE QUEUE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ TOPIC: orders (3 partitions, replication factor 3) │
│ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ Partition 0 Partition 1 Partition 2 │ │
│ │ (Leader: B1) (Leader: B2) (Leader: B3) │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │[0][1][2][3] │ │[0][1][2][3] │ │[0][1][2][3] │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ │ Broker 1 (L) Broker 2 (L) Broker 3 (L) │ │
│ │ Broker 2 (F) Broker 3 (F) Broker 1 (F) │ │
│ │ Broker 3 (F) Broker 1 (F) Broker 2 (F) │ │
│ │ │ │
│ └────────────────────────────────────────────────────────────────────┘ │
│ │
│ CONSUMER GROUP: order-processors │
│ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ Consumer 1 Consumer 2 Consumer 3 │ │
│ │ (Partition 0) (Partition 1) (Partition 2) │ │
│ │ │ │
│ │ Each partition assigned to exactly ONE consumer in group │ │
│ │ If Consumer 2 dies, Partition 1 reassigned to 1 or 3 │ │
│ │ │ │
│ └────────────────────────────────────────────────────────────────────┘ │
│ │
│ GUARANTEES: │
│ ─────────── │
│ - Ordering: Within a partition only │
│ - Delivery: At-least-once (consumer commits offset after processing) │
│ - Durability: Message persisted to N replicas before ack │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
More Practice Problems
Design Uber/Lyft Topics : Real-time location, matching, surge pricing
Key challenges : Geospatial indexing, low-latency dispatch
Design Twitter Timeline Topics : Fan-out, feed ranking, real-time updates
Key challenges : Hot users, read-heavy workload
Design Google Docs Topics : CRDT, operational transformation, collaboration
Key challenges : Conflict resolution, real-time sync
Design Netflix Topics : CDN, video encoding, recommendations
Key challenges : Global scale, adaptive streaming
Interview Tips
How to Structure Your Answer
Clarify requirements (5 min)
Scale: Users, requests/second, data size
Consistency: Strong or eventual?
Latency: What’s acceptable?
High-level design (10 min)
Draw boxes and arrows
Identify core components
Explain data flow
Deep dive (20 min)
Pick 2-3 most critical components
Discuss algorithms, data structures
Explain trade-offs
Handle edge cases (10 min)
Failures: Node crashes, network partitions
Scale: Hot spots, bottlenecks
Consistency: Race conditions
What Interviewers Look For
Breadth : Can you identify all components?
Depth : Can you go deep on key areas?
Trade-offs : Do you understand CAP, consistency vs availability?
Practicality : Is your design buildable?
Communication : Can you explain clearly?
Starting to code before understanding the problem
Not asking clarifying questions
Ignoring scale constraints
Over-engineering simple problems
Forgetting failure modes
Not discussing trade-offs