Skip to main content

Overview

Every large-scale system is built from common building blocks. Understanding these components and when to use them is essential for system design.

Load Balancers

Distribute traffic across multiple servers for scalability and reliability. Load Balancer Strategies

Load Balancing Algorithms

AlgorithmDescriptionUse Case
Round RobinRotate through serversEqual server capacity
Weighted Round RobinBased on server capacityDifferent server specs
Least ConnectionsSend to least busyVariable request duration
IP HashSame client → same serverSession affinity
Least Response TimeFastest serverPerformance-critical

Layer 4 vs Layer 7

Layer 7 (Application)              Layer 4 (Transport)
┌─────────────────────┐           ┌─────────────────────┐
│ HTTP/HTTPS aware    │           │ TCP/UDP only        │
│ URL-based routing   │           │ IP + Port routing   │
│ SSL termination     │           │ Faster (less work)  │
│ Content switching   │           │ No content awareness│
│ Request modification│           │ Simple forwarding   │
└─────────────────────┘           └─────────────────────┘

Caching

Store frequently accessed data in fast storage to reduce latency and database load.

Cache Layers

┌─────────────────────────────────────────────────────────────┐
│                      Application                            │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────▼─────────────────────────────┐
│                    L1: In-Memory Cache                     │
│                    (Application RAM)                       │
│                    Latency: ~1ms                          │
└─────────────────────────────┬─────────────────────────────┘
                              │ Miss
┌─────────────────────────────▼─────────────────────────────┐
│                   L2: Distributed Cache                    │
│                   (Redis/Memcached)                        │
│                   Latency: ~5ms                           │
└─────────────────────────────┬─────────────────────────────┘
                              │ Miss
┌─────────────────────────────▼─────────────────────────────┐
│                      L3: Database                          │
│                   Latency: ~50-100ms                       │
└─────────────────────────────────────────────────────────────┘

Caching Strategies

Caching Strategies

Cache-Aside (Lazy Loading)

The most common pattern. Application manages the cache explicitly.
import redis
import json
from typing import Optional, Any
from functools import wraps

class CacheAsideService:
    """Cache-Aside pattern implementation with Redis"""
    
    def __init__(self, redis_client: redis.Redis, db, default_ttl: int = 3600):
        self.cache = redis_client
        self.db = db
        self.default_ttl = default_ttl
    
    def get_user(self, user_id: str) -> Optional[dict]:
        cache_key = f"user:{user_id}"
        
        # 1. Try cache first
        cached = self.cache.get(cache_key)
        if cached:
            print(f"Cache HIT for {cache_key}")
            return json.loads(cached)
        
        # 2. Cache miss - query database
        print(f"Cache MISS for {cache_key}")
        user = self.db.query("SELECT * FROM users WHERE id = %s", (user_id,))
        
        if user is None:
            return None
        
        # 3. Populate cache
        self.cache.setex(
            cache_key,
            self.default_ttl,
            json.dumps(user)
        )
        
        return user
    
    def update_user(self, user_id: str, data: dict) -> bool:
        """Update DB and invalidate cache (not update!)"""
        cache_key = f"user:{user_id}"
        
        # Update database
        self.db.update("UPDATE users SET name=%s WHERE id=%s", 
                       (data['name'], user_id))
        
        # Invalidate cache (next read will refresh it)
        self.cache.delete(cache_key)
        return True


# Decorator version for any function
def cache_aside(ttl: int = 3600):
    """Decorator for cache-aside pattern"""
    def decorator(func):
        @wraps(func)
        def wrapper(self, *args, **kwargs):
            cache_key = f"{func.__name__}:{':'.join(map(str, args))}"
            
            cached = self.cache.get(cache_key)
            if cached:
                return json.loads(cached)
            
            result = func(self, *args, **kwargs)
            
            if result is not None:
                self.cache.setex(cache_key, ttl, json.dumps(result))
            
            return result
        return wrapper
    return decorator


# Usage with decorator
class ProductService:
    def __init__(self, cache, db):
        self.cache = cache
        self.db = db
    
    @cache_aside(ttl=1800)  # Cache for 30 minutes
    def get_product(self, product_id: str) -> dict:
        return self.db.query("SELECT * FROM products WHERE id = %s", (product_id,))

Write-Through

Write to cache and database simultaneously. Guarantees cache consistency.
class WriteThroughCache:
    """Write-Through: Update cache and DB atomically"""
    
    def __init__(self, cache, db):
        self.cache = cache
        self.db = db
    
    def update_user(self, user_id: str, data: dict) -> bool:
        cache_key = f"user:{user_id}"
        
        # Use a transaction/pipeline for atomicity
        try:
            # 1. Update database first (source of truth)
            self.db.begin_transaction()
            self.db.update("UPDATE users SET data = %s WHERE id = %s",
                          (json.dumps(data), user_id))
            
            # 2. Update cache in same transaction
            self.cache.setex(cache_key, 3600, json.dumps(data))
            
            # 3. Commit only if both succeed
            self.db.commit()
            return True
            
        except Exception as e:
            self.db.rollback()
            self.cache.delete(cache_key)  # Ensure consistency
            raise e
    
    def get_user(self, user_id: str) -> dict:
        cache_key = f"user:{user_id}"
        
        # Always check cache first (it's always up-to-date)
        cached = self.cache.get(cache_key)
        if cached:
            return json.loads(cached)
        
        # Cache miss (cold start or expired)
        user = self.db.query("SELECT * FROM users WHERE id = %s", (user_id,))
        if user:
            self.cache.setex(cache_key, 3600, json.dumps(user))
        return user

Write-Behind (Write-Back)

Write to cache immediately, persist to database asynchronously. Maximum write performance.
import asyncio
from collections import deque
from typing import Dict, Any
import time

class WriteBehindCache:
    """
    Write-Behind: Cache writes immediately, DB writes async.
    Great for high-write scenarios (analytics, counters).
    """
    
    def __init__(self, cache, db, flush_interval: int = 5):
        self.cache = cache
        self.db = db
        self.write_buffer: deque = deque()
        self.flush_interval = flush_interval
        self._start_background_writer()
    
    async def update(self, key: str, value: Any) -> bool:
        """Write to cache immediately, queue DB write"""
        # 1. Update cache (instant!)
        self.cache.set(key, json.dumps(value))
        
        # 2. Add to write buffer
        self.write_buffer.append({
            'key': key,
            'value': value,
            'timestamp': time.time()
        })
        
        return True  # Returns immediately!
    
    async def increment_counter(self, key: str, amount: int = 1) -> int:
        """Perfect for view counts, like counts, etc."""
        new_value = self.cache.incrby(key, amount)
        
        # Batch counter updates (write once per interval)
        self.write_buffer.append({
            'key': key,
            'value': new_value,
            'timestamp': time.time(),
            'type': 'counter'
        })
        
        return new_value
    
    def _start_background_writer(self):
        """Background task that flushes writes to DB"""
        async def flush_loop():
            while True:
                await asyncio.sleep(self.flush_interval)
                await self._flush_to_db()
        
        asyncio.create_task(flush_loop())
    
    async def _flush_to_db(self):
        """Batch write pending updates to database"""
        if not self.write_buffer:
            return
        
        batch = []
        while self.write_buffer:
            batch.append(self.write_buffer.popleft())
        
        # Batch insert/update for efficiency
        try:
            await self.db.execute_batch(
                "INSERT INTO cache_data (key, value) VALUES (%s, %s) "
                "ON CONFLICT (key) DO UPDATE SET value = EXCLUDED.value",
                [(item['key'], json.dumps(item['value'])) for item in batch]
            )
            print(f"Flushed {len(batch)} writes to database")
        except Exception as e:
            # Put failed writes back in queue
            for item in batch:
                self.write_buffer.append(item)
            print(f"Flush failed, items re-queued: {e}")


# Usage: View counter
cache = WriteBehindCache(redis_client, db, flush_interval=10)
await cache.increment_counter("views:article:123")  # Instant!
# DB is updated every 10 seconds with batched writes

Cache Invalidation

StrategyDescriptionTrade-off
TTLExpire after timeSimple but may serve stale
Event-basedInvalidate on writeComplex but consistent
Version tagsChange key on updateWastes memory
Cache Invalidation Best Practices:
  1. Prefer invalidation over updating (avoids race conditions)
  2. Use events/pub-sub for distributed cache invalidation
  3. Set reasonable TTLs as a safety net
  4. Monitor cache hit rates (target >90%)

Message Queues

Enable asynchronous communication and decouple components. Message Queue Architecture
Producer ──► Queue ──► Consumer

┌──────────┐     ┌─────────────────┐     ┌──────────┐
│ Service  │────►│   Message       │────►│  Worker  │
│    A     │     │   Queue         │     │    1     │
└──────────┘     │                 │     └──────────┘
                 │  ┌───┬───┬───┐  │     ┌──────────┐
                 │  │msg│msg│msg│  │────►│  Worker  │
                 │  └───┴───┴───┘  │     │    2     │
                 └─────────────────┘     └──────────┘

When to Use

Async Processing

Email sending, image processing, report generation

Load Leveling

Handle traffic spikes by queuing requests

Decoupling

Services don’t need to know about each other

Reliability

Messages persist if consumer is down

Queue Types

TypeDescriptionUse Case
Point-to-PointOne consumer per messageTask queues
Pub/SubMultiple consumersEvent broadcasting
Priority QueueProcess by priorityCritical tasks first
Dead Letter QueueFailed messagesError handling

Message Queue Implementation Examples

import redis
import json
import time
import threading
from typing import Callable, Any
from dataclasses import dataclass
from enum import Enum

class MessagePriority(Enum):
    LOW = 0
    NORMAL = 1
    HIGH = 2
    CRITICAL = 3

@dataclass
class Message:
    id: str
    payload: dict
    priority: MessagePriority = MessagePriority.NORMAL
    attempts: int = 0
    max_attempts: int = 3
    created_at: float = None
    
    def __post_init__(self):
        if self.created_at is None:
            self.created_at = time.time()

class MessageQueue:
    """Production-ready message queue with Redis"""
    
    def __init__(self, redis_client: redis.Redis, queue_name: str):
        self.redis = redis_client
        self.queue_name = queue_name
        self.processing_queue = f"{queue_name}:processing"
        self.dlq = f"{queue_name}:dlq"  # Dead Letter Queue
    
    def publish(self, message: Message) -> bool:
        """Add message to queue with priority"""
        msg_data = json.dumps({
            'id': message.id,
            'payload': message.payload,
            'priority': message.priority.value,
            'attempts': message.attempts,
            'max_attempts': message.max_attempts,
            'created_at': message.created_at
        })
        
        # Use sorted set for priority queue
        # Higher priority = higher score = processed first
        score = message.priority.value * 1e10 + (1e10 - message.created_at)
        self.redis.zadd(self.queue_name, {msg_data: score})
        return True
    
    def consume(self, handler: Callable[[dict], bool], 
                batch_size: int = 1) -> None:
        """Consume messages with at-least-once delivery"""
        while True:
            # Get highest priority message
            messages = self.redis.zpopmax(self.queue_name, batch_size)
            
            if not messages:
                time.sleep(0.1)  # No messages, wait
                continue
            
            for msg_data, score in messages:
                message = json.loads(msg_data)
                message['attempts'] += 1
                
                # Move to processing queue (visibility timeout)
                self.redis.setex(
                    f"{self.processing_queue}:{message['id']}", 
                    30,  # 30 second processing timeout
                    msg_data
                )
                
                try:
                    success = handler(message['payload'])
                    
                    if success:
                        # Acknowledge: remove from processing
                        self.redis.delete(f"{self.processing_queue}:{message['id']}")
                    else:
                        self._retry_or_dlq(message)
                        
                except Exception as e:
                    print(f"Error processing message: {e}")
                    self._retry_or_dlq(message)
    
    def _retry_or_dlq(self, message: dict):
        """Retry with backoff or send to Dead Letter Queue"""
        self.redis.delete(f"{self.processing_queue}:{message['id']}")
        
        if message['attempts'] >= message['max_attempts']:
            # Move to Dead Letter Queue
            self.redis.lpush(self.dlq, json.dumps(message))
            print(f"Message {message['id']} sent to DLQ after {message['attempts']} attempts")
        else:
            # Exponential backoff retry
            delay = 2 ** message['attempts']
            time.sleep(delay)
            
            msg = Message(
                id=message['id'],
                payload=message['payload'],
                priority=MessagePriority(message['priority']),
                attempts=message['attempts'],
                max_attempts=message['max_attempts']
            )
            self.publish(msg)


# Usage Example: Email Service
redis_client = redis.Redis(host='localhost', port=6379)
email_queue = MessageQueue(redis_client, 'emails')

# Producer: Queue an email
email_queue.publish(Message(
    id='email-123',
    payload={
        'to': '[email protected]',
        'subject': 'Welcome!',
        'body': 'Thanks for signing up...'
    },
    priority=MessagePriority.HIGH
))

# Consumer: Process emails
def send_email(payload: dict) -> bool:
    print(f"Sending email to {payload['to']}")
    # actual email sending logic
    return True

# Run consumer in background
email_queue.consume(send_email)

Content Delivery Network (CDN)

Distribute content globally for faster access.
                          ┌─────────────┐
                          │   Origin    │
                          │   Server    │
                          └──────┬──────┘

     ┌───────────────────────────┼───────────────────────────┐
     │                           │                           │
┌────▼────┐                ┌─────▼─────┐               ┌─────▼─────┐
│  Edge   │                │   Edge    │               │   Edge    │
│  (USA)  │                │  (Europe) │               │  (Asia)   │
└────┬────┘                └─────┬─────┘               └─────┬─────┘
     │                           │                           │
     │                           │                           │
┌────▼────┐                ┌─────▼─────┐               ┌─────▼─────┐
│US Users │                │ EU Users  │               │Asia Users │
│ ~20ms   │                │  ~20ms    │               │  ~20ms    │
└─────────┘                └───────────┘               └───────────┘

Without CDN: All users → Origin (100-300ms for distant users)

CDN Strategies

StrategyDescriptionBest For
PushUpload to CDN proactivelyStatic content, known files
PullCDN fetches on first requestDynamic content, large catalogs

API Gateway

Single entry point for all client requests.
┌──────────────────────────────────────────────────────────────┐
│                       API Gateway                            │
├──────────────────────────────────────────────────────────────┤
│  • Authentication         • Rate Limiting                    │
│  • SSL Termination        • Request Routing                  │
│  • Load Balancing         • Response Caching                 │
│  • API Versioning         • Request/Response Transform       │
│  • Analytics & Logging    • Circuit Breaking                 │
└──────────────────────────────────────────────────────────────┘
           │              │              │              │
    ┌──────▼──────┐ ┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
    │   Users     │ │  Orders   │ │  Products │ │  Payments │
    │   Service   │ │  Service  │ │  Service  │ │  Service  │
    └─────────────┘ └───────────┘ └───────────┘ └───────────┘

Database Replication

Copy data across multiple servers for availability and read scaling.

Master-Slave Replication

     Writes                           Reads
        │                               │
        ▼                               ▼
   ┌─────────┐                   ┌───────────┐
   │ Master  │                   │  Slave 1  │
   │  (RW)   │──── Replication ──│   (RO)    │
   └─────────┘         │         └───────────┘

                       │         ┌───────────┐
                       └─────────│  Slave 2  │
                                 │   (RO)    │
                                 └───────────┘

Multi-Master Replication

   ┌─────────┐            ┌─────────┐
   │ Master  │◄──────────►│ Master  │
   │   1     │  Sync      │   2     │
   │  (RW)   │            │  (RW)   │
   └─────────┘            └─────────┘
       │                       │
       ▼                       ▼
   ┌─────────┐            ┌─────────┐
   │ Slave 1 │            │ Slave 2 │
   └─────────┘            └─────────┘

Comparison Summary

ComponentPurposeWhen to Use
Load BalancerDistribute trafficMultiple servers
CacheSpeed up readsHot data, expensive queries
Message QueueAsync processingDecoupling, spike handling
CDNGlobal content deliveryStatic assets, global users
API GatewaySingle entry pointMicroservices, security
DB ReplicationAvailability & readsHigh availability, read-heavy
Design Tip: Don’t add components just because they’re common. Each adds complexity. Start simple and add components as specific problems arise.