> ## Documentation Index
> Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
> Use this file to discover all available pages before exploring further.

# System Design Building Blocks

> Essential components for designing scalable systems

## Overview

Every large-scale system is built from common building blocks. Understanding these components and when to use them is essential for system design.

## Load Balancers

Distribute traffic across multiple servers for scalability and reliability. Think of a load balancer as a restaurant host: customers (requests) arrive at the door, and the host decides which waiter (server) to assign them to based on who is least busy, who specializes in what section, or simply the next one in rotation. Without the host, all customers would crowd around the first waiter they see, leaving others idle.

<img src="https://mintcdn.com/devweeekends/2f8Rfaato9LS1FSq/images/system-design/load-balancer-strategies.svg?fit=max&auto=format&n=2f8Rfaato9LS1FSq&q=85&s=cc03ca365a15f6b61a5e8665ad2a1f79" alt="Load Balancer Strategies" width="1080" height="1080" data-path="images/system-design/load-balancer-strategies.svg" />

### Load Balancing Algorithms

| Algorithm                | Description               | Use Case                  |
| ------------------------ | ------------------------- | ------------------------- |
| **Round Robin**          | Rotate through servers    | Equal server capacity     |
| **Weighted Round Robin** | Based on server capacity  | Different server specs    |
| **Least Connections**    | Send to least busy        | Variable request duration |
| **IP Hash**              | Same client → same server | Session affinity          |
| **Least Response Time**  | Fastest server            | Performance-critical      |

### Layer 4 vs Layer 7

```
Layer 7 (Application)              Layer 4 (Transport)
┌─────────────────────┐           ┌─────────────────────┐
│ HTTP/HTTPS aware    │           │ TCP/UDP only        │
│ URL-based routing   │           │ IP + Port routing   │
│ SSL termination     │           │ Faster (less work)  │
│ Content switching   │           │ No content awareness│
│ Request modification│           │ Simple forwarding   │
└─────────────────────┘           └─────────────────────┘
```

## Caching

Store frequently accessed data in fast storage to reduce latency and database load. Caching is the system design equivalent of keeping your most-used cooking ingredients on the counter instead of fetching them from the pantry every time. It is the single most impactful optimization you can make in most systems -- a well-placed cache can reduce database load by 90% or more and drop response times from hundreds of milliseconds to single digits.

The fundamental trade-off: you are trading memory (expensive, limited) for speed, and you are accepting the risk that cached data might be stale. Every caching decision is really a question of "how stale can this data be before it causes a problem?"

### Cache Layers

```
┌─────────────────────────────────────────────────────────────┐
│                      Application                            │
└─────────────────────────────────────────────────────────────┘
                              │
┌─────────────────────────────▼─────────────────────────────┐
│                    L1: In-Memory Cache                     │
│                    (Application RAM)                       │
│                    Latency: ~1ms                          │
└─────────────────────────────┬─────────────────────────────┘
                              │ Miss
┌─────────────────────────────▼─────────────────────────────┐
│                   L2: Distributed Cache                    │
│                   (Redis/Memcached)                        │
│                   Latency: ~5ms                           │
└─────────────────────────────┬─────────────────────────────┘
                              │ Miss
┌─────────────────────────────▼─────────────────────────────┐
│                      L3: Database                          │
│                   Latency: ~50-100ms                       │
└─────────────────────────────────────────────────────────────┘
```

### Caching Strategies

<img src="https://mintcdn.com/devweeekends/2f8Rfaato9LS1FSq/images/system-design/caching-strategies.svg?fit=max&auto=format&n=2f8Rfaato9LS1FSq&q=85&s=a98de7116073b8f063a448483448efbc" alt="Caching Strategies" width="1080" height="720" data-path="images/system-design/caching-strategies.svg" />

#### Cache-Aside (Lazy Loading)

The most common pattern. Application manages the cache explicitly.

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    import redis
    import json
    from typing import Optional, Any
    from functools import wraps

    class CacheAsideService:
        """Cache-Aside pattern implementation with Redis"""
        
        def __init__(self, redis_client: redis.Redis, db, default_ttl: int = 3600):
            self.cache = redis_client
            self.db = db
            self.default_ttl = default_ttl
        
        def get_user(self, user_id: str) -> Optional[dict]:
            cache_key = f"user:{user_id}"
            
            # 1. Try cache first
            cached = self.cache.get(cache_key)
            if cached:
                print(f"Cache HIT for {cache_key}")
                return json.loads(cached)
            
            # 2. Cache miss - query database
            print(f"Cache MISS for {cache_key}")
            user = self.db.query("SELECT * FROM users WHERE id = %s", (user_id,))
            
            if user is None:
                return None
            
            # 3. Populate cache
            self.cache.setex(
                cache_key,
                self.default_ttl,
                json.dumps(user)
            )
            
            return user
        
        def update_user(self, user_id: str, data: dict) -> bool:
            """Update DB and invalidate cache (not update!)"""
            cache_key = f"user:{user_id}"
            
            # Update database
            self.db.update("UPDATE users SET name=%s WHERE id=%s", 
                           (data['name'], user_id))
            
            # Invalidate cache (next read will refresh it)
            self.cache.delete(cache_key)
            return True


    # Decorator version for any function
    def cache_aside(ttl: int = 3600):
        """Decorator for cache-aside pattern"""
        def decorator(func):
            @wraps(func)
            def wrapper(self, *args, **kwargs):
                cache_key = f"{func.__name__}:{':'.join(map(str, args))}"
                
                cached = self.cache.get(cache_key)
                if cached:
                    return json.loads(cached)
                
                result = func(self, *args, **kwargs)
                
                if result is not None:
                    self.cache.setex(cache_key, ttl, json.dumps(result))
                
                return result
            return wrapper
        return decorator


    # Usage with decorator
    class ProductService:
        def __init__(self, cache, db):
            self.cache = cache
            self.db = db
        
        @cache_aside(ttl=1800)  # Cache for 30 minutes
        def get_product(self, product_id: str) -> dict:
            return self.db.query("SELECT * FROM products WHERE id = %s", (product_id,))
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    const Redis = require('ioredis');

    class CacheAsideService {
      constructor(redisClient, db, defaultTTL = 3600) {
        this.cache = redisClient;
        this.db = db;
        this.defaultTTL = defaultTTL;
      }

      async getUser(userId) {
        const cacheKey = `user:${userId}`;
        
        // 1. Try cache first
        const cached = await this.cache.get(cacheKey);
        if (cached) {
          console.log(`Cache HIT for ${cacheKey}`);
          return JSON.parse(cached);
        }
        
        // 2. Cache miss - query database
        console.log(`Cache MISS for ${cacheKey}`);
        const user = await this.db.query(
          'SELECT * FROM users WHERE id = $1',
          [userId]
        );
        
        if (!user) return null;
        
        // 3. Populate cache with TTL
        await this.cache.setex(
          cacheKey,
          this.defaultTTL,
          JSON.stringify(user)
        );
        
        return user;
      }

      async updateUser(userId, data) {
        const cacheKey = `user:${userId}`;
        
        // Update database first
        await this.db.query(
          'UPDATE users SET name = $1 WHERE id = $2',
          [data.name, userId]
        );
        
        // Invalidate cache (not update - avoids race conditions)
        await this.cache.del(cacheKey);
        return true;
      }
    }

    // Decorator pattern for cache-aside
    function cacheAside(ttl = 3600) {
      return function(target, propertyKey, descriptor) {
        const originalMethod = descriptor.value;
        
        descriptor.value = async function(...args) {
          const cacheKey = `${propertyKey}:${args.join(':')}`;
          
          const cached = await this.cache.get(cacheKey);
          if (cached) {
            return JSON.parse(cached);
          }
          
          const result = await originalMethod.apply(this, args);
          
          if (result !== null) {
            await this.cache.setex(cacheKey, ttl, JSON.stringify(result));
          }
          
          return result;
        };
        
        return descriptor;
      };
    }

    // Usage with Express.js middleware
    const cacheMiddleware = (ttl = 300) => async (req, res, next) => {
      const cacheKey = `route:${req.originalUrl}`;
      
      try {
        const cached = await redis.get(cacheKey);
        if (cached) {
          return res.json(JSON.parse(cached));
        }
        
        // Store original json method
        const originalJson = res.json.bind(res);
        
        res.json = async (data) => {
          await redis.setex(cacheKey, ttl, JSON.stringify(data));
          return originalJson(data);
        };
        
        next();
      } catch (error) {
        next();  // On cache error, continue without cache
      }
    };

    // Use in routes
    app.get('/api/products/:id', cacheMiddleware(600), getProduct);
    ```
  </Tab>
</Tabs>

#### Write-Through

Write to cache and database simultaneously. Guarantees cache consistency.

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    class WriteThroughCache:
        """Write-Through: Update cache and DB atomically"""
        
        def __init__(self, cache, db):
            self.cache = cache
            self.db = db
        
        def update_user(self, user_id: str, data: dict) -> bool:
            cache_key = f"user:{user_id}"
            
            # Use a transaction/pipeline for atomicity
            try:
                # 1. Update database first (source of truth)
                self.db.begin_transaction()
                self.db.update("UPDATE users SET data = %s WHERE id = %s",
                              (json.dumps(data), user_id))
                
                # 2. Update cache in same transaction
                self.cache.setex(cache_key, 3600, json.dumps(data))
                
                # 3. Commit only if both succeed
                self.db.commit()
                return True
                
            except Exception as e:
                self.db.rollback()
                self.cache.delete(cache_key)  # Ensure consistency
                raise e
        
        def get_user(self, user_id: str) -> dict:
            cache_key = f"user:{user_id}"
            
            # Always check cache first (it's always up-to-date)
            cached = self.cache.get(cache_key)
            if cached:
                return json.loads(cached)
            
            # Cache miss (cold start or expired)
            user = self.db.query("SELECT * FROM users WHERE id = %s", (user_id,))
            if user:
                self.cache.setex(cache_key, 3600, json.dumps(user))
            return user
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    class WriteThroughCache {
      constructor(cache, db) {
        this.cache = cache;
        this.db = db;
      }

      async updateUser(userId, data) {
        const cacheKey = `user:${userId}`;
        
        const client = await this.db.connect();
        
        try {
          await client.query('BEGIN');
          
          // 1. Update database
          await client.query(
            'UPDATE users SET data = $1 WHERE id = $2',
            [JSON.stringify(data), userId]
          );
          
          // 2. Update cache
          await this.cache.setex(cacheKey, 3600, JSON.stringify(data));
          
          // 3. Commit transaction
          await client.query('COMMIT');
          return true;
          
        } catch (error) {
          await client.query('ROLLBACK');
          await this.cache.del(cacheKey);  // Ensure consistency
          throw error;
        } finally {
          client.release();
        }
      }

      async getUser(userId) {
        const cacheKey = `user:${userId}`;
        
        // Cache is always consistent, check it first
        const cached = await this.cache.get(cacheKey);
        if (cached) {
          return JSON.parse(cached);
        }
        
        // Populate cache on miss
        const result = await this.db.query(
          'SELECT * FROM users WHERE id = $1',
          [userId]
        );
        
        if (result.rows[0]) {
          await this.cache.setex(cacheKey, 3600, JSON.stringify(result.rows[0]));
          return result.rows[0];
        }
        
        return null;
      }
    }
    ```
  </Tab>
</Tabs>

#### Write-Behind (Write-Back)

Write to cache immediately, persist to database asynchronously. Maximum write performance.

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    import asyncio
    from collections import deque
    from typing import Dict, Any
    import time

    class WriteBehindCache:
        """
        Write-Behind: Cache writes immediately, DB writes async.
        Great for high-write scenarios (analytics, counters).
        """
        
        def __init__(self, cache, db, flush_interval: int = 5):
            self.cache = cache
            self.db = db
            self.write_buffer: deque = deque()
            self.flush_interval = flush_interval
            self._start_background_writer()
        
        async def update(self, key: str, value: Any) -> bool:
            """Write to cache immediately, queue DB write"""
            # 1. Update cache (instant!)
            self.cache.set(key, json.dumps(value))
            
            # 2. Add to write buffer
            self.write_buffer.append({
                'key': key,
                'value': value,
                'timestamp': time.time()
            })
            
            return True  # Returns immediately!
        
        async def increment_counter(self, key: str, amount: int = 1) -> int:
            """Perfect for view counts, like counts, etc."""
            new_value = self.cache.incrby(key, amount)
            
            # Batch counter updates (write once per interval)
            self.write_buffer.append({
                'key': key,
                'value': new_value,
                'timestamp': time.time(),
                'type': 'counter'
            })
            
            return new_value
        
        def _start_background_writer(self):
            """Background task that flushes writes to DB"""
            async def flush_loop():
                while True:
                    await asyncio.sleep(self.flush_interval)
                    await self._flush_to_db()
            
            asyncio.create_task(flush_loop())
        
        async def _flush_to_db(self):
            """Batch write pending updates to database"""
            if not self.write_buffer:
                return
            
            batch = []
            while self.write_buffer:
                batch.append(self.write_buffer.popleft())
            
            # Batch insert/update for efficiency
            try:
                await self.db.execute_batch(
                    "INSERT INTO cache_data (key, value) VALUES (%s, %s) "
                    "ON CONFLICT (key) DO UPDATE SET value = EXCLUDED.value",
                    [(item['key'], json.dumps(item['value'])) for item in batch]
                )
                print(f"Flushed {len(batch)} writes to database")
            except Exception as e:
                # Put failed writes back in queue
                for item in batch:
                    self.write_buffer.append(item)
                print(f"Flush failed, items re-queued: {e}")


    # Usage: View counter
    cache = WriteBehindCache(redis_client, db, flush_interval=10)
    await cache.increment_counter("views:article:123")  # Instant!
    # DB is updated every 10 seconds with batched writes
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    class WriteBehindCache {
      constructor(cache, db, flushInterval = 5000) {
        this.cache = cache;
        this.db = db;
        this.writeBuffer = [];
        this.flushInterval = flushInterval;
        this.startBackgroundWriter();
      }

      async update(key, value) {
        // 1. Update cache immediately
        await this.cache.set(key, JSON.stringify(value));
        
        // 2. Queue for async DB write
        this.writeBuffer.push({
          key,
          value,
          timestamp: Date.now()
        });
        
        return true;  // Returns instantly!
      }

      async incrementCounter(key, amount = 1) {
        // Atomic increment in Redis
        const newValue = await this.cache.incrby(key, amount);
        
        // Batch counter updates
        this.writeBuffer.push({
          key,
          value: newValue,
          timestamp: Date.now(),
          type: 'counter'
        });
        
        return newValue;
      }

      startBackgroundWriter() {
        setInterval(async () => {
          await this.flushToDB();
        }, this.flushInterval);
      }

      async flushToDB() {
        if (this.writeBuffer.length === 0) return;
        
        // Take all pending writes
        const batch = [...this.writeBuffer];
        this.writeBuffer = [];
        
        try {
          // Batch upsert for efficiency
          const values = batch.map((item, i) => 
            `($${i * 2 + 1}, $${i * 2 + 2})`
          ).join(', ');
          
          const params = batch.flatMap(item => [item.key, JSON.stringify(item.value)]);
          
          await this.db.query(`
            INSERT INTO cache_data (key, value)
            VALUES ${values}
            ON CONFLICT (key) DO UPDATE SET value = EXCLUDED.value
          `, params);
          
          console.log(`Flushed ${batch.length} writes to database`);
        } catch (error) {
          // Re-queue failed writes
          this.writeBuffer = [...batch, ...this.writeBuffer];
          console.error('Flush failed, items re-queued:', error);
        }
      }
    }

    // Usage: Real-time view counter
    const cache = new WriteBehindCache(redis, db, 10000);

    // Express route - super fast!
    app.post('/api/articles/:id/view', async (req, res) => {
      const views = await cache.incrementCounter(`views:${req.params.id}`);
      res.json({ views });  // Responds in ~1ms
    });
    ```
  </Tab>
</Tabs>

### Cache Invalidation

| Strategy         | Description          | Trade-off                  |
| ---------------- | -------------------- | -------------------------- |
| **TTL**          | Expire after time    | Simple but may serve stale |
| **Event-based**  | Invalidate on write  | Complex but consistent     |
| **Version tags** | Change key on update | Wastes memory              |

<Tip>
  **Cache Invalidation Best Practices**:

  1. Prefer invalidation over updating (avoids race conditions)
  2. Use events/pub-sub for distributed cache invalidation
  3. Set reasonable TTLs as a safety net
  4. Monitor cache hit rates (target >90%)
</Tip>

## Message Queues

Enable asynchronous communication and decouple components. Think of a message queue like a post office: the sender drops off a letter (message) and goes about their day without waiting for the recipient to read it. The post office (queue) holds the letter until the recipient is ready to pick it up. This decoupling means the sender and recipient do not need to be available at the same time, and a surge of incoming mail does not overwhelm the recipient -- it just queues up.

This is one of the most powerful patterns for handling traffic spikes. Instead of your web server waiting synchronously for a slow operation (sending an email, resizing an image, generating a report), it drops a message on the queue and responds to the user immediately. A pool of background workers processes the queue at their own pace.

<img src="https://mintcdn.com/devweeekends/2f8Rfaato9LS1FSq/images/system-design/message-queue-detailed.svg?fit=max&auto=format&n=2f8Rfaato9LS1FSq&q=85&s=38202d5f758fca8e137b82e45827aa55" alt="Message Queue Architecture" width="1080" height="800" data-path="images/system-design/message-queue-detailed.svg" />

```
Producer ──► Queue ──► Consumer

┌──────────┐     ┌─────────────────┐     ┌──────────┐
│ Service  │────►│   Message       │────►│  Worker  │
│    A     │     │   Queue         │     │    1     │
└──────────┘     │                 │     └──────────┘
                 │  ┌───┬───┬───┐  │     ┌──────────┐
                 │  │msg│msg│msg│  │────►│  Worker  │
                 │  └───┴───┴───┘  │     │    2     │
                 └─────────────────┘     └──────────┘
```

### When to Use

<CardGroup cols={2}>
  <Card title="Async Processing" icon="clock">
    Email sending, image processing, report generation
  </Card>

  <Card title="Load Leveling" icon="chart-line">
    Handle traffic spikes by queuing requests
  </Card>

  <Card title="Decoupling" icon="unlink">
    Services don't need to know about each other
  </Card>

  <Card title="Reliability" icon="shield">
    Messages persist if consumer is down
  </Card>
</CardGroup>

### Queue Types

| Type                  | Description              | Use Case             |
| --------------------- | ------------------------ | -------------------- |
| **Point-to-Point**    | One consumer per message | Task queues          |
| **Pub/Sub**           | Multiple consumers       | Event broadcasting   |
| **Priority Queue**    | Process by priority      | Critical tasks first |
| **Dead Letter Queue** | Failed messages          | Error handling       |

### Message Queue Implementation Examples

<Tabs>
  <Tab title="Python (with Redis)">
    ```python theme={null}
    import redis
    import json
    import time
    import threading
    from typing import Callable, Any
    from dataclasses import dataclass
    from enum import Enum

    class MessagePriority(Enum):
        LOW = 0
        NORMAL = 1
        HIGH = 2
        CRITICAL = 3

    @dataclass
    class Message:
        id: str
        payload: dict
        priority: MessagePriority = MessagePriority.NORMAL
        attempts: int = 0
        max_attempts: int = 3
        created_at: float = None
        
        def __post_init__(self):
            if self.created_at is None:
                self.created_at = time.time()

    class MessageQueue:
        """Production-ready message queue with Redis"""
        
        def __init__(self, redis_client: redis.Redis, queue_name: str):
            self.redis = redis_client
            self.queue_name = queue_name
            self.processing_queue = f"{queue_name}:processing"
            self.dlq = f"{queue_name}:dlq"  # Dead Letter Queue
        
        def publish(self, message: Message) -> bool:
            """Add message to queue with priority"""
            msg_data = json.dumps({
                'id': message.id,
                'payload': message.payload,
                'priority': message.priority.value,
                'attempts': message.attempts,
                'max_attempts': message.max_attempts,
                'created_at': message.created_at
            })
            
            # Use sorted set for priority queue
            # Higher priority = higher score = processed first
            score = message.priority.value * 1e10 + (1e10 - message.created_at)
            self.redis.zadd(self.queue_name, {msg_data: score})
            return True
        
        def consume(self, handler: Callable[[dict], bool], 
                    batch_size: int = 1) -> None:
            """Consume messages with at-least-once delivery"""
            while True:
                # Get highest priority message
                messages = self.redis.zpopmax(self.queue_name, batch_size)
                
                if not messages:
                    time.sleep(0.1)  # No messages, wait
                    continue
                
                for msg_data, score in messages:
                    message = json.loads(msg_data)
                    message['attempts'] += 1
                    
                    # Move to processing queue (visibility timeout)
                    self.redis.setex(
                        f"{self.processing_queue}:{message['id']}", 
                        30,  # 30 second processing timeout
                        msg_data
                    )
                    
                    try:
                        success = handler(message['payload'])
                        
                        if success:
                            # Acknowledge: remove from processing
                            self.redis.delete(f"{self.processing_queue}:{message['id']}")
                        else:
                            self._retry_or_dlq(message)
                            
                    except Exception as e:
                        print(f"Error processing message: {e}")
                        self._retry_or_dlq(message)
        
        def _retry_or_dlq(self, message: dict):
            """Retry with backoff or send to Dead Letter Queue"""
            self.redis.delete(f"{self.processing_queue}:{message['id']}")
            
            if message['attempts'] >= message['max_attempts']:
                # Move to Dead Letter Queue
                self.redis.lpush(self.dlq, json.dumps(message))
                print(f"Message {message['id']} sent to DLQ after {message['attempts']} attempts")
            else:
                # Exponential backoff retry
                delay = 2 ** message['attempts']
                time.sleep(delay)
                
                msg = Message(
                    id=message['id'],
                    payload=message['payload'],
                    priority=MessagePriority(message['priority']),
                    attempts=message['attempts'],
                    max_attempts=message['max_attempts']
                )
                self.publish(msg)


    # Usage Example: Email Service
    redis_client = redis.Redis(host='localhost', port=6379)
    email_queue = MessageQueue(redis_client, 'emails')

    # Producer: Queue an email
    email_queue.publish(Message(
        id='email-123',
        payload={
            'to': 'user@example.com',
            'subject': 'Welcome!',
            'body': 'Thanks for signing up...'
        },
        priority=MessagePriority.HIGH
    ))

    # Consumer: Process emails
    def send_email(payload: dict) -> bool:
        print(f"Sending email to {payload['to']}")
        # actual email sending logic
        return True

    # Run consumer in background
    email_queue.consume(send_email)
    ```
  </Tab>

  <Tab title="JavaScript (with Bull/Redis)">
    ```javascript theme={null}
    const Queue = require('bull');
    const Redis = require('ioredis');

    // Create queue with Redis
    const emailQueue = new Queue('email-queue', {
      redis: { host: 'localhost', port: 6379 },
      defaultJobOptions: {
        attempts: 3,
        backoff: {
          type: 'exponential',
          delay: 2000  // Start with 2 second delay
        },
        removeOnComplete: 100,  // Keep last 100 completed
        removeOnFail: 1000      // Keep last 1000 failed
      }
    });

    // === PRODUCER ===
    class EmailProducer {
      async sendWelcomeEmail(userId, email) {
        await emailQueue.add('welcome', {
          userId,
          to: email,
          subject: 'Welcome to Our Platform!',
          template: 'welcome'
        }, {
          priority: 1,  // Higher priority
          delay: 0
        });
      }

      async sendNewsletterBatch(subscribers) {
        // Bulk add for efficiency
        const jobs = subscribers.map(sub => ({
          name: 'newsletter',
          data: { to: sub.email, subject: 'Weekly Update' },
          opts: { priority: 10 }  // Lower priority than welcome emails
        }));
        
        await emailQueue.addBulk(jobs);
        console.log(`Queued ${jobs.length} newsletter emails`);
      }

      async scheduleReminder(userId, email, sendAt) {
        // Delayed job
        const delay = sendAt.getTime() - Date.now();
        
        await emailQueue.add('reminder', {
          userId,
          to: email,
          subject: 'Don\'t forget!'
        }, {
          delay,  // Send at specific time
          jobId: `reminder-${userId}`  // Prevent duplicates
        });
      }
    }

    // === CONSUMER ===
    class EmailConsumer {
      constructor() {
        this.setupProcessors();
        this.setupEventHandlers();
      }

      setupProcessors() {
        // Process different job types
        emailQueue.process('welcome', 5, async (job) => {
          console.log(`Processing welcome email for ${job.data.to}`);
          await this.sendEmail(job.data);
          return { sent: true, timestamp: new Date() };
        });

        emailQueue.process('newsletter', 10, async (job) => {
          // Higher concurrency for newsletters
          await this.sendEmail(job.data);
          return { sent: true };
        });

        emailQueue.process('reminder', 3, async (job) => {
          await this.sendEmail(job.data);
          return { sent: true };
        });
      }

      async sendEmail(data) {
        // Actual email sending logic
        console.log(`Sending email to ${data.to}: ${data.subject}`);
        // await nodemailer.send(data);
      }

      setupEventHandlers() {
        emailQueue.on('completed', (job, result) => {
          console.log(`Job ${job.id} completed:`, result);
        });

        emailQueue.on('failed', (job, err) => {
          console.error(`Job ${job.id} failed:`, err.message);
          
          if (job.attemptsMade >= job.opts.attempts) {
            // Send to monitoring/alerting
            this.notifyFailure(job, err);
          }
        });

        emailQueue.on('stalled', (job) => {
          console.warn(`Job ${job.id} stalled - will be retried`);
        });
      }

      notifyFailure(job, error) {
        // Alert ops team about permanent failure
        console.error(`ALERT: Email job ${job.id} permanently failed`);
      }
    }

    // === MONITORING ===
    async function getQueueStats() {
      const [waiting, active, completed, failed, delayed] = await Promise.all([
        emailQueue.getWaitingCount(),
        emailQueue.getActiveCount(),
        emailQueue.getCompletedCount(),
        emailQueue.getFailedCount(),
        emailQueue.getDelayedCount()
      ]);

      return { waiting, active, completed, failed, delayed };
    }

    // Express endpoint for queue health
    app.get('/api/queue/health', async (req, res) => {
      const stats = await getQueueStats();
      res.json(stats);
    });

    // Start consumer
    const consumer = new EmailConsumer();

    // Usage
    const producer = new EmailProducer();
    await producer.sendWelcomeEmail('user-123', 'new@user.com');
    ```
  </Tab>
</Tabs>

## Content Delivery Network (CDN)

Distribute content globally for faster access. The speed of light is a hard physical limit: a round trip from Tokyo to a US-based origin server takes roughly 150ms at minimum, and no amount of code optimization can fix physics. CDNs solve this by placing copies of your content in data centers around the world, so a user in Tokyo hits a server in Tokyo instead. It is like a franchise model for your data -- the headquarters (origin) has the master copy, but local branches (edge PoPs) serve most customers directly.

```
                          ┌─────────────┐
                          │   Origin    │
                          │   Server    │
                          └──────┬──────┘
                                 │
     ┌───────────────────────────┼───────────────────────────┐
     │                           │                           │
┌────▼────┐                ┌─────▼─────┐               ┌─────▼─────┐
│  Edge   │                │   Edge    │               │   Edge    │
│  (USA)  │                │  (Europe) │               │  (Asia)   │
└────┬────┘                └─────┬─────┘               └─────┬─────┘
     │                           │                           │
     │                           │                           │
┌────▼────┐                ┌─────▼─────┐               ┌─────▼─────┐
│US Users │                │ EU Users  │               │Asia Users │
│ ~20ms   │                │  ~20ms    │               │  ~20ms    │
└─────────┘                └───────────┘               └───────────┘

Without CDN: All users → Origin (100-300ms for distant users)
```

### CDN Strategies

| Strategy | Description                  | Best For                        |
| -------- | ---------------------------- | ------------------------------- |
| **Push** | Upload to CDN proactively    | Static content, known files     |
| **Pull** | CDN fetches on first request | Dynamic content, large catalogs |

## API Gateway

Single entry point for all client requests. In a microservices architecture, an API gateway acts like the front desk of a large hotel -- guests (clients) do not wander the hallways looking for housekeeping, room service, or the concierge individually. Instead, they make one call to the front desk, which routes the request to the right department, handles authentication ("Are you a guest here?"), enforces rate limits ("Please do not call us 100 times per minute"), and can even aggregate responses from multiple departments into a single reply.

```
┌──────────────────────────────────────────────────────────────┐
│                       API Gateway                            │
├──────────────────────────────────────────────────────────────┤
│  • Authentication         • Rate Limiting                    │
│  • SSL Termination        • Request Routing                  │
│  • Load Balancing         • Response Caching                 │
│  • API Versioning         • Request/Response Transform       │
│  • Analytics & Logging    • Circuit Breaking                 │
└──────────────────────────────────────────────────────────────┘
           │              │              │              │
    ┌──────▼──────┐ ┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
    │   Users     │ │  Orders   │ │  Products │ │  Payments │
    │   Service   │ │  Service  │ │  Service  │ │  Service  │
    └─────────────┘ └───────────┘ └───────────┘ └───────────┘
```

## Database Replication

Copy data across multiple servers for availability and read scaling. Replication is your insurance policy against data loss and your scaling strategy for read-heavy workloads. The core trade-off is between consistency and latency: synchronous replication guarantees every replica has the latest data but slows down every write (the write is only confirmed after all replicas acknowledge it); asynchronous replication keeps writes fast but means replicas might serve slightly stale data for a brief window.

### Master-Slave Replication

```
     Writes                           Reads
        │                               │
        ▼                               ▼
   ┌─────────┐                   ┌───────────┐
   │ Master  │                   │  Slave 1  │
   │  (RW)   │──── Replication ──│   (RO)    │
   └─────────┘         │         └───────────┘
                       │
                       │         ┌───────────┐
                       └─────────│  Slave 2  │
                                 │   (RO)    │
                                 └───────────┘
```

### Multi-Master Replication

```
   ┌─────────┐            ┌─────────┐
   │ Master  │◄──────────►│ Master  │
   │   1     │  Sync      │   2     │
   │  (RW)   │            │  (RW)   │
   └─────────┘            └─────────┘
       │                       │
       ▼                       ▼
   ┌─────────┐            ┌─────────┐
   │ Slave 1 │            │ Slave 2 │
   └─────────┘            └─────────┘
```

## Comparison Summary

| Component          | Purpose                 | When to Use                   |
| ------------------ | ----------------------- | ----------------------------- |
| **Load Balancer**  | Distribute traffic      | Multiple servers              |
| **Cache**          | Speed up reads          | Hot data, expensive queries   |
| **Message Queue**  | Async processing        | Decoupling, spike handling    |
| **CDN**            | Global content delivery | Static assets, global users   |
| **API Gateway**    | Single entry point      | Microservices, security       |
| **DB Replication** | Availability & reads    | High availability, read-heavy |

<Tip>
  **Design Tip**: Don't add components just because they're common. Each adds complexity. Start simple and add components as specific problems arise. In interviews, this translates to: start with the simplest architecture that could work, then say "As we scale to X users, the bottleneck will be Y, so I would add Z." This shows the interviewer you understand *why* each component exists, not just that you memorized a reference architecture.
</Tip>

<Note>
  **Scalability Mental Model**: When reasoning about which building blocks to introduce, think in orders of magnitude. At 100 QPS, a single server with a database is fine. At 1,000 QPS, you likely need a cache and a load balancer. At 10,000 QPS, you need database replication and possibly a CDN. At 100,000+ QPS, you are looking at sharding, message queues for async processing, and multi-region deployment. Each jump in scale introduces a new bottleneck that the next building block addresses.
</Note>

## Interview Deep-Dive

<AccordionGroup>
  <Accordion title="You are designing an e-commerce platform. Starting from zero, walk me through which building blocks you would add and at what scale thresholds. Be specific about the numbers.">
    **Strong Answer:**

    I think about this in orders of magnitude, and at each stage the bottleneck shifts to a different component.

    **Stage 1: 0 to 1,000 QPS (launch to \~1M DAU)**. Single app server, single Postgres database, Nginx as a reverse proxy. This handles a surprising amount of traffic -- Postgres comfortably serves 1,000-5,000 simple queries per second. Total cost: maybe \$200/month on a decent cloud instance. Do not add complexity prematurely.

    **Stage 2: 1,000 to 10,000 QPS (\~1M to 10M DAU)**. The database becomes the bottleneck. First add a Redis cache in front of Postgres -- product catalog, user sessions, cart data. A single Redis instance handles 100K+ reads/sec. Cache hit ratio of 90% means Postgres only sees 100-1,000 QPS. Next, add a load balancer with 3-5 stateless app servers. Add read replicas to Postgres (1 primary, 2 replicas) and route read traffic to replicas. Total cost: \~\$2,000-5,000/month.

    **Stage 3: 10,000 to 100,000 QPS (\~10M to 100M DAU)**. Network and bandwidth become factors. Add a CDN for static assets (product images, CSS, JS). This offloads 60-70% of total bandwidth from your servers. Add a message queue (Kafka or SQS) for async processing -- order confirmation emails, inventory updates, analytics events. This prevents synchronous processing from blocking the checkout flow. Consider database sharding if your dataset exceeds what a single Postgres instance can hold (usually around 1-5TB with good indexing). Total cost: \~\$20,000-50,000/month.

    **Stage 4: 100,000+ QPS (100M+ DAU)**. Multi-region deployment. API gateway for routing, auth, and rate limiting. Shard the database by user\_id or region. Multiple Kafka clusters. A caching tier with both local (in-process) and distributed (Redis cluster) layers. This is where you also need a dedicated search service (Elasticsearch) because Postgres full-text search cannot keep up. Total cost: \$200,000+/month.

    The key insight: at each stage, you are solving the *current* bottleneck, not preemptively adding every component.

    **Follow-up: You are at Stage 2, and your Redis cache goes down. What happens, and how do you prevent a cascading failure?**

    Without protection, all 10,000 QPS suddenly hit Postgres, which can handle maybe 3,000 QPS. Response times spike, connection pool exhausts, app servers start timing out, and the load balancer marks them unhealthy. Total outage in 30-60 seconds. Prevention: First, implement circuit breakers -- when cache miss rate exceeds 50%, start returning stale cached data from a local in-memory fallback (even 30-second-old data is better than a timeout). Second, use Redis Sentinel or Redis Cluster for automatic failover -- a replica promotes to primary within seconds. Third, rate-limit the database connection pool so Postgres degrades gracefully (slower responses) instead of crashing (connection refused). Fourth, pre-warm the cache on recovery -- do not rely on lazy loading because the "thundering herd" of 10K cache misses all hitting Postgres simultaneously will just crash it again.
  </Accordion>

  <Accordion title="Your team is debating whether to use a message queue or direct HTTP calls between your order service and email/inventory/analytics services. The order service processes 5,000 orders per minute. Make the case for one approach.">
    **Strong Answer:**

    For this use case, the message queue is clearly the right choice, and the numbers make it obvious.

    **With direct HTTP calls**: The order service makes 3 synchronous calls per order (email, inventory, analytics). At 5,000 orders/min = 83 orders/sec, that is 249 HTTP calls/sec. If each downstream service takes 200ms to respond, each order takes 600ms for the downstream calls alone. But here is the real problem: if the email service goes down for 2 minutes, 10,000 orders fail or time out. The checkout flow is now coupled to whether the email service is healthy -- and your customer does not care about email, they care about completing their purchase.

    **With a message queue**: The order service writes 3 messages per order to the queue (83 events/sec \* 3 = 249 messages/sec -- trivial for Kafka or even SQS). The order service responds to the customer in \~50ms (just the database write + queue publish). Each downstream service consumes at its own pace. If email is down for 2 minutes, the 10,000 email messages sit in the queue and are processed when it comes back. Zero orders fail.

    **The numbers that matter**:

    * Queue latency: Kafka publish takes \~2ms. SQS takes \~10ms. Either is negligible compared to 200ms HTTP calls.
    * Throughput headroom: Kafka handles 10K+ messages/sec per partition. You are at 249/sec. You have 40x headroom before you even need to think about scaling.
    * Storage for buffering: 5,000 orders/min \* 1KB/message \* 3 topics \* 60 min of buffer = 900 MB. Essentially free.

    **When I would NOT use a queue**: If the downstream response is needed to complete the order (e.g., a real-time fraud check that must return "approved" before the order can proceed). That is inherently synchronous and should remain an HTTP call -- but with a circuit breaker and a fallback policy (auto-approve if fraud service is down, with post-hoc review).

    **Follow-up: You use Kafka. Six months later, the analytics consumer falls behind by 2 hours due to a processing bug. Orders keep flowing. What are the implications and how do you handle it?**

    This is one of the beautiful properties of Kafka: the order service and email/inventory consumers are completely unaffected. The analytics consumer has its own consumer group offset, so its lag is isolated. The 2 hours of messages (5,000/min \* 120 min \* 1KB = \~600 MB) are sitting safely in the Kafka topic's retention window (typically 7 days). Fix the analytics bug, redeploy, and the consumer catches up by processing the backlog at full speed. Monitor consumer lag as a key metric -- alert when any consumer group falls behind by more than 15 minutes. The architectural insight: this failure mode is exactly why queues exist. With direct HTTP calls, those analytics events would simply be lost.
  </Accordion>

  <Accordion title="Estimate the cache sizing for a social media platform with 200M DAU where each user views their feed 5 times per day. Each feed response is 50KB. What is the right cache size, and what cache hit ratio should you target?">
    **Strong Answer:**

    Let me work through this systematically.

    **Total request volume**: 200M DAU \* 5 feed views = 1 billion feed requests/day. QPS: 1B / 86,400 = \~11,500 QPS. Peak (3x): \~35,000 QPS.

    **Unique feeds**: Not every request is unique. 200M users each have a unique feed, but the feed changes slowly (maybe 10-50 new posts per day for an average user). If we cache feeds with a 5-minute TTL, a user who refreshes 5 times in a day will get a cache hit on most of those refreshes.

    **Cache sizing using the 80/20 rule**: 20% of users generate 80% of feed views (power users). 200M \* 20% = 40M feeds to cache. At 50KB each: 40M \* 50KB = 2 TB. That is a lot. But we can be smarter.

    **Refined approach**: Cache the computed feed (list of post IDs + metadata) separately from the post content. The feed index is \~2KB (200 post IDs with timestamps). Post content is cached separately and shared across feeds. 40M feed indexes \* 2KB = 80 GB -- fits comfortably in a Redis cluster. The actual post content: if 10M unique posts are "hot" at any time, at 5KB each = 50 GB. Total: \~130 GB for the hot set.

    **Target cache hit ratio**: For feed reads, target 95%+. At 11,500 QPS with 95% cache hits, only 575 QPS hit the database -- well within Postgres capacity. At 80% cache hits, 2,300 QPS hit the database, which is manageable but means 4x more database load.

    **Cost math**: A Redis cluster with 130 GB of memory = roughly 5 r6g.2xlarge instances (64 GB RAM each, with overhead) at \~$0.50/hr each = ~$1,800/month. The alternative -- serving 11,500 QPS from the database directly -- would require multiple read replicas at roughly \$5,000-10,000/month. The cache pays for itself 3-5x over.

    **Follow-up: A celebrity with 50M followers posts something. 10% of their followers check their feed in the next 5 minutes. What happens to your cache?**

    5 million feed cache entries need to be updated or invalidated within 5 minutes. If you use a "fan-out on write" approach (pre-computing feeds), that is 5M cache writes in 300 seconds = 16,700 cache writes/sec -- achievable for a Redis cluster but a significant spike. The better approach for celebrities: use "fan-out on read." Do not pre-compute feeds for celebrity followers. Instead, when a user reads their feed, merge their pre-computed feed (from normal users they follow) with a real-time lookup of recent posts from celebrities they follow. Cache the celebrity's recent posts once (not per-follower), and the 5M users each read from that single cached entry. This converts 5M cache writes into 1 cache write + 5M cache reads, which is far more efficient.
  </Accordion>
</AccordionGroup>
