Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Every large-scale system is built from common building blocks. Understanding these components and when to use them is essential for system design.

Load Balancers

Distribute traffic across multiple servers for scalability and reliability. Think of a load balancer as a restaurant host: customers (requests) arrive at the door, and the host decides which waiter (server) to assign them to based on who is least busy, who specializes in what section, or simply the next one in rotation. Without the host, all customers would crowd around the first waiter they see, leaving others idle. Load Balancer Strategies

Load Balancing Algorithms

AlgorithmDescriptionUse Case
Round RobinRotate through serversEqual server capacity
Weighted Round RobinBased on server capacityDifferent server specs
Least ConnectionsSend to least busyVariable request duration
IP HashSame client → same serverSession affinity
Least Response TimeFastest serverPerformance-critical

Layer 4 vs Layer 7

Layer 7 (Application)              Layer 4 (Transport)
┌─────────────────────┐           ┌─────────────────────┐
│ HTTP/HTTPS aware    │           │ TCP/UDP only        │
│ URL-based routing   │           │ IP + Port routing   │
│ SSL termination     │           │ Faster (less work)  │
│ Content switching   │           │ No content awareness│
│ Request modification│           │ Simple forwarding   │
└─────────────────────┘           └─────────────────────┘

Caching

Store frequently accessed data in fast storage to reduce latency and database load. Caching is the system design equivalent of keeping your most-used cooking ingredients on the counter instead of fetching them from the pantry every time. It is the single most impactful optimization you can make in most systems — a well-placed cache can reduce database load by 90% or more and drop response times from hundreds of milliseconds to single digits. The fundamental trade-off: you are trading memory (expensive, limited) for speed, and you are accepting the risk that cached data might be stale. Every caching decision is really a question of “how stale can this data be before it causes a problem?”

Cache Layers

┌─────────────────────────────────────────────────────────────┐
│                      Application                            │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────▼─────────────────────────────┐
│                    L1: In-Memory Cache                     │
│                    (Application RAM)                       │
│                    Latency: ~1ms                          │
└─────────────────────────────┬─────────────────────────────┘
                              │ Miss
┌─────────────────────────────▼─────────────────────────────┐
│                   L2: Distributed Cache                    │
│                   (Redis/Memcached)                        │
│                   Latency: ~5ms                           │
└─────────────────────────────┬─────────────────────────────┘
                              │ Miss
┌─────────────────────────────▼─────────────────────────────┐
│                      L3: Database                          │
│                   Latency: ~50-100ms                       │
└─────────────────────────────────────────────────────────────┘

Caching Strategies

Caching Strategies

Cache-Aside (Lazy Loading)

The most common pattern. Application manages the cache explicitly.
import redis
import json
from typing import Optional, Any
from functools import wraps

class CacheAsideService:
    """Cache-Aside pattern implementation with Redis"""
    
    def __init__(self, redis_client: redis.Redis, db, default_ttl: int = 3600):
        self.cache = redis_client
        self.db = db
        self.default_ttl = default_ttl
    
    def get_user(self, user_id: str) -> Optional[dict]:
        cache_key = f"user:{user_id}"
        
        # 1. Try cache first
        cached = self.cache.get(cache_key)
        if cached:
            print(f"Cache HIT for {cache_key}")
            return json.loads(cached)
        
        # 2. Cache miss - query database
        print(f"Cache MISS for {cache_key}")
        user = self.db.query("SELECT * FROM users WHERE id = %s", (user_id,))
        
        if user is None:
            return None
        
        # 3. Populate cache
        self.cache.setex(
            cache_key,
            self.default_ttl,
            json.dumps(user)
        )
        
        return user
    
    def update_user(self, user_id: str, data: dict) -> bool:
        """Update DB and invalidate cache (not update!)"""
        cache_key = f"user:{user_id}"
        
        # Update database
        self.db.update("UPDATE users SET name=%s WHERE id=%s", 
                       (data['name'], user_id))
        
        # Invalidate cache (next read will refresh it)
        self.cache.delete(cache_key)
        return True


# Decorator version for any function
def cache_aside(ttl: int = 3600):
    """Decorator for cache-aside pattern"""
    def decorator(func):
        @wraps(func)
        def wrapper(self, *args, **kwargs):
            cache_key = f"{func.__name__}:{':'.join(map(str, args))}"
            
            cached = self.cache.get(cache_key)
            if cached:
                return json.loads(cached)
            
            result = func(self, *args, **kwargs)
            
            if result is not None:
                self.cache.setex(cache_key, ttl, json.dumps(result))
            
            return result
        return wrapper
    return decorator


# Usage with decorator
class ProductService:
    def __init__(self, cache, db):
        self.cache = cache
        self.db = db
    
    @cache_aside(ttl=1800)  # Cache for 30 minutes
    def get_product(self, product_id: str) -> dict:
        return self.db.query("SELECT * FROM products WHERE id = %s", (product_id,))

Write-Through

Write to cache and database simultaneously. Guarantees cache consistency.
class WriteThroughCache:
    """Write-Through: Update cache and DB atomically"""
    
    def __init__(self, cache, db):
        self.cache = cache
        self.db = db
    
    def update_user(self, user_id: str, data: dict) -> bool:
        cache_key = f"user:{user_id}"
        
        # Use a transaction/pipeline for atomicity
        try:
            # 1. Update database first (source of truth)
            self.db.begin_transaction()
            self.db.update("UPDATE users SET data = %s WHERE id = %s",
                          (json.dumps(data), user_id))
            
            # 2. Update cache in same transaction
            self.cache.setex(cache_key, 3600, json.dumps(data))
            
            # 3. Commit only if both succeed
            self.db.commit()
            return True
            
        except Exception as e:
            self.db.rollback()
            self.cache.delete(cache_key)  # Ensure consistency
            raise e
    
    def get_user(self, user_id: str) -> dict:
        cache_key = f"user:{user_id}"
        
        # Always check cache first (it's always up-to-date)
        cached = self.cache.get(cache_key)
        if cached:
            return json.loads(cached)
        
        # Cache miss (cold start or expired)
        user = self.db.query("SELECT * FROM users WHERE id = %s", (user_id,))
        if user:
            self.cache.setex(cache_key, 3600, json.dumps(user))
        return user

Write-Behind (Write-Back)

Write to cache immediately, persist to database asynchronously. Maximum write performance.
import asyncio
from collections import deque
from typing import Dict, Any
import time

class WriteBehindCache:
    """
    Write-Behind: Cache writes immediately, DB writes async.
    Great for high-write scenarios (analytics, counters).
    """
    
    def __init__(self, cache, db, flush_interval: int = 5):
        self.cache = cache
        self.db = db
        self.write_buffer: deque = deque()
        self.flush_interval = flush_interval
        self._start_background_writer()
    
    async def update(self, key: str, value: Any) -> bool:
        """Write to cache immediately, queue DB write"""
        # 1. Update cache (instant!)
        self.cache.set(key, json.dumps(value))
        
        # 2. Add to write buffer
        self.write_buffer.append({
            'key': key,
            'value': value,
            'timestamp': time.time()
        })
        
        return True  # Returns immediately!
    
    async def increment_counter(self, key: str, amount: int = 1) -> int:
        """Perfect for view counts, like counts, etc."""
        new_value = self.cache.incrby(key, amount)
        
        # Batch counter updates (write once per interval)
        self.write_buffer.append({
            'key': key,
            'value': new_value,
            'timestamp': time.time(),
            'type': 'counter'
        })
        
        return new_value
    
    def _start_background_writer(self):
        """Background task that flushes writes to DB"""
        async def flush_loop():
            while True:
                await asyncio.sleep(self.flush_interval)
                await self._flush_to_db()
        
        asyncio.create_task(flush_loop())
    
    async def _flush_to_db(self):
        """Batch write pending updates to database"""
        if not self.write_buffer:
            return
        
        batch = []
        while self.write_buffer:
            batch.append(self.write_buffer.popleft())
        
        # Batch insert/update for efficiency
        try:
            await self.db.execute_batch(
                "INSERT INTO cache_data (key, value) VALUES (%s, %s) "
                "ON CONFLICT (key) DO UPDATE SET value = EXCLUDED.value",
                [(item['key'], json.dumps(item['value'])) for item in batch]
            )
            print(f"Flushed {len(batch)} writes to database")
        except Exception as e:
            # Put failed writes back in queue
            for item in batch:
                self.write_buffer.append(item)
            print(f"Flush failed, items re-queued: {e}")


# Usage: View counter
cache = WriteBehindCache(redis_client, db, flush_interval=10)
await cache.increment_counter("views:article:123")  # Instant!
# DB is updated every 10 seconds with batched writes

Cache Invalidation

StrategyDescriptionTrade-off
TTLExpire after timeSimple but may serve stale
Event-basedInvalidate on writeComplex but consistent
Version tagsChange key on updateWastes memory
Cache Invalidation Best Practices:
  1. Prefer invalidation over updating (avoids race conditions)
  2. Use events/pub-sub for distributed cache invalidation
  3. Set reasonable TTLs as a safety net
  4. Monitor cache hit rates (target >90%)

Message Queues

Enable asynchronous communication and decouple components. Think of a message queue like a post office: the sender drops off a letter (message) and goes about their day without waiting for the recipient to read it. The post office (queue) holds the letter until the recipient is ready to pick it up. This decoupling means the sender and recipient do not need to be available at the same time, and a surge of incoming mail does not overwhelm the recipient — it just queues up. This is one of the most powerful patterns for handling traffic spikes. Instead of your web server waiting synchronously for a slow operation (sending an email, resizing an image, generating a report), it drops a message on the queue and responds to the user immediately. A pool of background workers processes the queue at their own pace. Message Queue Architecture
Producer ──► Queue ──► Consumer

┌──────────┐     ┌─────────────────┐     ┌──────────┐
│ Service  │────►│   Message       │────►│  Worker  │
│    A     │     │   Queue         │     │    1     │
└──────────┘     │                 │     └──────────┘
                 │  ┌───┬───┬───┐  │     ┌──────────┐
                 │  │msg│msg│msg│  │────►│  Worker  │
                 │  └───┴───┴───┘  │     │    2     │
                 └─────────────────┘     └──────────┘

When to Use

Async Processing

Email sending, image processing, report generation

Load Leveling

Handle traffic spikes by queuing requests

Decoupling

Services don’t need to know about each other

Reliability

Messages persist if consumer is down

Queue Types

TypeDescriptionUse Case
Point-to-PointOne consumer per messageTask queues
Pub/SubMultiple consumersEvent broadcasting
Priority QueueProcess by priorityCritical tasks first
Dead Letter QueueFailed messagesError handling

Message Queue Implementation Examples

import redis
import json
import time
import threading
from typing import Callable, Any
from dataclasses import dataclass
from enum import Enum

class MessagePriority(Enum):
    LOW = 0
    NORMAL = 1
    HIGH = 2
    CRITICAL = 3

@dataclass
class Message:
    id: str
    payload: dict
    priority: MessagePriority = MessagePriority.NORMAL
    attempts: int = 0
    max_attempts: int = 3
    created_at: float = None
    
    def __post_init__(self):
        if self.created_at is None:
            self.created_at = time.time()

class MessageQueue:
    """Production-ready message queue with Redis"""
    
    def __init__(self, redis_client: redis.Redis, queue_name: str):
        self.redis = redis_client
        self.queue_name = queue_name
        self.processing_queue = f"{queue_name}:processing"
        self.dlq = f"{queue_name}:dlq"  # Dead Letter Queue
    
    def publish(self, message: Message) -> bool:
        """Add message to queue with priority"""
        msg_data = json.dumps({
            'id': message.id,
            'payload': message.payload,
            'priority': message.priority.value,
            'attempts': message.attempts,
            'max_attempts': message.max_attempts,
            'created_at': message.created_at
        })
        
        # Use sorted set for priority queue
        # Higher priority = higher score = processed first
        score = message.priority.value * 1e10 + (1e10 - message.created_at)
        self.redis.zadd(self.queue_name, {msg_data: score})
        return True
    
    def consume(self, handler: Callable[[dict], bool], 
                batch_size: int = 1) -> None:
        """Consume messages with at-least-once delivery"""
        while True:
            # Get highest priority message
            messages = self.redis.zpopmax(self.queue_name, batch_size)
            
            if not messages:
                time.sleep(0.1)  # No messages, wait
                continue
            
            for msg_data, score in messages:
                message = json.loads(msg_data)
                message['attempts'] += 1
                
                # Move to processing queue (visibility timeout)
                self.redis.setex(
                    f"{self.processing_queue}:{message['id']}", 
                    30,  # 30 second processing timeout
                    msg_data
                )
                
                try:
                    success = handler(message['payload'])
                    
                    if success:
                        # Acknowledge: remove from processing
                        self.redis.delete(f"{self.processing_queue}:{message['id']}")
                    else:
                        self._retry_or_dlq(message)
                        
                except Exception as e:
                    print(f"Error processing message: {e}")
                    self._retry_or_dlq(message)
    
    def _retry_or_dlq(self, message: dict):
        """Retry with backoff or send to Dead Letter Queue"""
        self.redis.delete(f"{self.processing_queue}:{message['id']}")
        
        if message['attempts'] >= message['max_attempts']:
            # Move to Dead Letter Queue
            self.redis.lpush(self.dlq, json.dumps(message))
            print(f"Message {message['id']} sent to DLQ after {message['attempts']} attempts")
        else:
            # Exponential backoff retry
            delay = 2 ** message['attempts']
            time.sleep(delay)
            
            msg = Message(
                id=message['id'],
                payload=message['payload'],
                priority=MessagePriority(message['priority']),
                attempts=message['attempts'],
                max_attempts=message['max_attempts']
            )
            self.publish(msg)


# Usage Example: Email Service
redis_client = redis.Redis(host='localhost', port=6379)
email_queue = MessageQueue(redis_client, 'emails')

# Producer: Queue an email
email_queue.publish(Message(
    id='email-123',
    payload={
        'to': 'user@example.com',
        'subject': 'Welcome!',
        'body': 'Thanks for signing up...'
    },
    priority=MessagePriority.HIGH
))

# Consumer: Process emails
def send_email(payload: dict) -> bool:
    print(f"Sending email to {payload['to']}")
    # actual email sending logic
    return True

# Run consumer in background
email_queue.consume(send_email)

Content Delivery Network (CDN)

Distribute content globally for faster access. The speed of light is a hard physical limit: a round trip from Tokyo to a US-based origin server takes roughly 150ms at minimum, and no amount of code optimization can fix physics. CDNs solve this by placing copies of your content in data centers around the world, so a user in Tokyo hits a server in Tokyo instead. It is like a franchise model for your data — the headquarters (origin) has the master copy, but local branches (edge PoPs) serve most customers directly.
                          ┌─────────────┐
                          │   Origin    │
                          │   Server    │
                          └──────┬──────┘

     ┌───────────────────────────┼───────────────────────────┐
     │                           │                           │
┌────▼────┐                ┌─────▼─────┐               ┌─────▼─────┐
│  Edge   │                │   Edge    │               │   Edge    │
│  (USA)  │                │  (Europe) │               │  (Asia)   │
└────┬────┘                └─────┬─────┘               └─────┬─────┘
     │                           │                           │
     │                           │                           │
┌────▼────┐                ┌─────▼─────┐               ┌─────▼─────┐
│US Users │                │ EU Users  │               │Asia Users │
│ ~20ms   │                │  ~20ms    │               │  ~20ms    │
└─────────┘                └───────────┘               └───────────┘

Without CDN: All users → Origin (100-300ms for distant users)

CDN Strategies

StrategyDescriptionBest For
PushUpload to CDN proactivelyStatic content, known files
PullCDN fetches on first requestDynamic content, large catalogs

API Gateway

Single entry point for all client requests. In a microservices architecture, an API gateway acts like the front desk of a large hotel — guests (clients) do not wander the hallways looking for housekeeping, room service, or the concierge individually. Instead, they make one call to the front desk, which routes the request to the right department, handles authentication (“Are you a guest here?”), enforces rate limits (“Please do not call us 100 times per minute”), and can even aggregate responses from multiple departments into a single reply.
┌──────────────────────────────────────────────────────────────┐
│                       API Gateway                            │
├──────────────────────────────────────────────────────────────┤
│  • Authentication         • Rate Limiting                    │
│  • SSL Termination        • Request Routing                  │
│  • Load Balancing         • Response Caching                 │
│  • API Versioning         • Request/Response Transform       │
│  • Analytics & Logging    • Circuit Breaking                 │
└──────────────────────────────────────────────────────────────┘
           │              │              │              │
    ┌──────▼──────┐ ┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
    │   Users     │ │  Orders   │ │  Products │ │  Payments │
    │   Service   │ │  Service  │ │  Service  │ │  Service  │
    └─────────────┘ └───────────┘ └───────────┘ └───────────┘

Database Replication

Copy data across multiple servers for availability and read scaling. Replication is your insurance policy against data loss and your scaling strategy for read-heavy workloads. The core trade-off is between consistency and latency: synchronous replication guarantees every replica has the latest data but slows down every write (the write is only confirmed after all replicas acknowledge it); asynchronous replication keeps writes fast but means replicas might serve slightly stale data for a brief window.

Master-Slave Replication

     Writes                           Reads
        │                               │
        ▼                               ▼
   ┌─────────┐                   ┌───────────┐
   │ Master  │                   │  Slave 1  │
   │  (RW)   │──── Replication ──│   (RO)    │
   └─────────┘         │         └───────────┘

                       │         ┌───────────┐
                       └─────────│  Slave 2  │
                                 │   (RO)    │
                                 └───────────┘

Multi-Master Replication

   ┌─────────┐            ┌─────────┐
   │ Master  │◄──────────►│ Master  │
   │   1     │  Sync      │   2     │
   │  (RW)   │            │  (RW)   │
   └─────────┘            └─────────┘
       │                       │
       ▼                       ▼
   ┌─────────┐            ┌─────────┐
   │ Slave 1 │            │ Slave 2 │
   └─────────┘            └─────────┘

Comparison Summary

ComponentPurposeWhen to Use
Load BalancerDistribute trafficMultiple servers
CacheSpeed up readsHot data, expensive queries
Message QueueAsync processingDecoupling, spike handling
CDNGlobal content deliveryStatic assets, global users
API GatewaySingle entry pointMicroservices, security
DB ReplicationAvailability & readsHigh availability, read-heavy
Design Tip: Don’t add components just because they’re common. Each adds complexity. Start simple and add components as specific problems arise. In interviews, this translates to: start with the simplest architecture that could work, then say “As we scale to X users, the bottleneck will be Y, so I would add Z.” This shows the interviewer you understand why each component exists, not just that you memorized a reference architecture.
Scalability Mental Model: When reasoning about which building blocks to introduce, think in orders of magnitude. At 100 QPS, a single server with a database is fine. At 1,000 QPS, you likely need a cache and a load balancer. At 10,000 QPS, you need database replication and possibly a CDN. At 100,000+ QPS, you are looking at sharding, message queues for async processing, and multi-region deployment. Each jump in scale introduces a new bottleneck that the next building block addresses.

Interview Deep-Dive

Strong Answer:I think about this in orders of magnitude, and at each stage the bottleneck shifts to a different component.Stage 1: 0 to 1,000 QPS (launch to ~1M DAU). Single app server, single Postgres database, Nginx as a reverse proxy. This handles a surprising amount of traffic — Postgres comfortably serves 1,000-5,000 simple queries per second. Total cost: maybe $200/month on a decent cloud instance. Do not add complexity prematurely.Stage 2: 1,000 to 10,000 QPS (~1M to 10M DAU). The database becomes the bottleneck. First add a Redis cache in front of Postgres — product catalog, user sessions, cart data. A single Redis instance handles 100K+ reads/sec. Cache hit ratio of 90% means Postgres only sees 100-1,000 QPS. Next, add a load balancer with 3-5 stateless app servers. Add read replicas to Postgres (1 primary, 2 replicas) and route read traffic to replicas. Total cost: ~$2,000-5,000/month.Stage 3: 10,000 to 100,000 QPS (~10M to 100M DAU). Network and bandwidth become factors. Add a CDN for static assets (product images, CSS, JS). This offloads 60-70% of total bandwidth from your servers. Add a message queue (Kafka or SQS) for async processing — order confirmation emails, inventory updates, analytics events. This prevents synchronous processing from blocking the checkout flow. Consider database sharding if your dataset exceeds what a single Postgres instance can hold (usually around 1-5TB with good indexing). Total cost: ~$20,000-50,000/month.Stage 4: 100,000+ QPS (100M+ DAU). Multi-region deployment. API gateway for routing, auth, and rate limiting. Shard the database by user_id or region. Multiple Kafka clusters. A caching tier with both local (in-process) and distributed (Redis cluster) layers. This is where you also need a dedicated search service (Elasticsearch) because Postgres full-text search cannot keep up. Total cost: $200,000+/month.The key insight: at each stage, you are solving the current bottleneck, not preemptively adding every component.Follow-up: You are at Stage 2, and your Redis cache goes down. What happens, and how do you prevent a cascading failure?Without protection, all 10,000 QPS suddenly hit Postgres, which can handle maybe 3,000 QPS. Response times spike, connection pool exhausts, app servers start timing out, and the load balancer marks them unhealthy. Total outage in 30-60 seconds. Prevention: First, implement circuit breakers — when cache miss rate exceeds 50%, start returning stale cached data from a local in-memory fallback (even 30-second-old data is better than a timeout). Second, use Redis Sentinel or Redis Cluster for automatic failover — a replica promotes to primary within seconds. Third, rate-limit the database connection pool so Postgres degrades gracefully (slower responses) instead of crashing (connection refused). Fourth, pre-warm the cache on recovery — do not rely on lazy loading because the “thundering herd” of 10K cache misses all hitting Postgres simultaneously will just crash it again.
Strong Answer:For this use case, the message queue is clearly the right choice, and the numbers make it obvious.With direct HTTP calls: The order service makes 3 synchronous calls per order (email, inventory, analytics). At 5,000 orders/min = 83 orders/sec, that is 249 HTTP calls/sec. If each downstream service takes 200ms to respond, each order takes 600ms for the downstream calls alone. But here is the real problem: if the email service goes down for 2 minutes, 10,000 orders fail or time out. The checkout flow is now coupled to whether the email service is healthy — and your customer does not care about email, they care about completing their purchase.With a message queue: The order service writes 3 messages per order to the queue (83 events/sec * 3 = 249 messages/sec — trivial for Kafka or even SQS). The order service responds to the customer in ~50ms (just the database write + queue publish). Each downstream service consumes at its own pace. If email is down for 2 minutes, the 10,000 email messages sit in the queue and are processed when it comes back. Zero orders fail.The numbers that matter:
  • Queue latency: Kafka publish takes ~2ms. SQS takes ~10ms. Either is negligible compared to 200ms HTTP calls.
  • Throughput headroom: Kafka handles 10K+ messages/sec per partition. You are at 249/sec. You have 40x headroom before you even need to think about scaling.
  • Storage for buffering: 5,000 orders/min * 1KB/message * 3 topics * 60 min of buffer = 900 MB. Essentially free.
When I would NOT use a queue: If the downstream response is needed to complete the order (e.g., a real-time fraud check that must return “approved” before the order can proceed). That is inherently synchronous and should remain an HTTP call — but with a circuit breaker and a fallback policy (auto-approve if fraud service is down, with post-hoc review).Follow-up: You use Kafka. Six months later, the analytics consumer falls behind by 2 hours due to a processing bug. Orders keep flowing. What are the implications and how do you handle it?This is one of the beautiful properties of Kafka: the order service and email/inventory consumers are completely unaffected. The analytics consumer has its own consumer group offset, so its lag is isolated. The 2 hours of messages (5,000/min * 120 min * 1KB = ~600 MB) are sitting safely in the Kafka topic’s retention window (typically 7 days). Fix the analytics bug, redeploy, and the consumer catches up by processing the backlog at full speed. Monitor consumer lag as a key metric — alert when any consumer group falls behind by more than 15 minutes. The architectural insight: this failure mode is exactly why queues exist. With direct HTTP calls, those analytics events would simply be lost.
Strong Answer:Let me work through this systematically.Total request volume: 200M DAU * 5 feed views = 1 billion feed requests/day. QPS: 1B / 86,400 = ~11,500 QPS. Peak (3x): ~35,000 QPS.Unique feeds: Not every request is unique. 200M users each have a unique feed, but the feed changes slowly (maybe 10-50 new posts per day for an average user). If we cache feeds with a 5-minute TTL, a user who refreshes 5 times in a day will get a cache hit on most of those refreshes.Cache sizing using the 80/20 rule: 20% of users generate 80% of feed views (power users). 200M * 20% = 40M feeds to cache. At 50KB each: 40M * 50KB = 2 TB. That is a lot. But we can be smarter.Refined approach: Cache the computed feed (list of post IDs + metadata) separately from the post content. The feed index is ~2KB (200 post IDs with timestamps). Post content is cached separately and shared across feeds. 40M feed indexes * 2KB = 80 GB — fits comfortably in a Redis cluster. The actual post content: if 10M unique posts are “hot” at any time, at 5KB each = 50 GB. Total: ~130 GB for the hot set.Target cache hit ratio: For feed reads, target 95%+. At 11,500 QPS with 95% cache hits, only 575 QPS hit the database — well within Postgres capacity. At 80% cache hits, 2,300 QPS hit the database, which is manageable but means 4x more database load.Cost math: A Redis cluster with 130 GB of memory = roughly 5 r6g.2xlarge instances (64 GB RAM each, with overhead) at ~0.50/hreach= 0.50/hr each = ~1,800/month. The alternative — serving 11,500 QPS from the database directly — would require multiple read replicas at roughly $5,000-10,000/month. The cache pays for itself 3-5x over.Follow-up: A celebrity with 50M followers posts something. 10% of their followers check their feed in the next 5 minutes. What happens to your cache?5 million feed cache entries need to be updated or invalidated within 5 minutes. If you use a “fan-out on write” approach (pre-computing feeds), that is 5M cache writes in 300 seconds = 16,700 cache writes/sec — achievable for a Redis cluster but a significant spike. The better approach for celebrities: use “fan-out on read.” Do not pre-compute feeds for celebrity followers. Instead, when a user reads their feed, merge their pre-computed feed (from normal users they follow) with a real-time lookup of recent posts from celebrities they follow. Cache the celebrity’s recent posts once (not per-follower), and the 5M users each read from that single cached entry. This converts 5M cache writes into 1 cache write + 5M cache reads, which is far more efficient.