Distributed Caching

Caching is often the difference between a system that handles 100 QPS and one that handles 1 million QPS. This module covers everything from cache fundamentals to advanced distributed caching patterns.

Track Duration: 10-14 hours
Key Topics: Cache Strategies, Consistency, Invalidation, Redis, Memcached, Cache Stampede
Interview Focus: Cache consistency, invalidation strategies, hot key handling

Why Caching Matters

┌─────────────────────────────────────────────────────────────────────────────┐
│                    LATENCY COMPARISON                                        │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Operation                              Time                                │
│  ─────────────────────────────────────  ─────────────                       │
│  L1 cache reference                     0.5 ns                              │
│  L2 cache reference                     7 ns                                │
│  Main memory reference                  100 ns                              │
│  ──────────────────────────────────────────────────                         │
│  In-memory cache (Redis on localhost)  100-500 μs                           │
│  In-memory cache (Redis same DC)       1-5 ms                               │
│  SSD random read                        150 μs                              │
│  ──────────────────────────────────────────────────                         │
│  Database query (simple)               1-10 ms                              │
│  Database query (complex)              10-100 ms                            │
│  Network round trip (same DC)          0.5-1 ms                             │
│  Network round trip (cross-region)     50-150 ms                            │
│                                                                              │
│  IMPACT OF CACHING:                                                         │
│  ──────────────────                                                         │
│  Without cache: 100ms DB query × 1000 QPS = 100 DB connections busy        │
│  With 99% cache hit: 1ms cache × 990 + 100ms DB × 10 = 2 DB connections    │
│                                                                              │
│  CACHING IS A FORCE MULTIPLIER                                              │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Cache Strategies

Read Strategies

Cache-Aside (Lazy Loading)
Read-Through
Refresh-Ahead

Application manages cache explicitly. Most common pattern.

┌──────────┐     1. Read      ┌───────────┐
│  Client  │ ───────────────► │   Cache   │
│          │ ◄─────────────── │  (Redis)  │
└────┬─────┘   2. Miss/Hit    └───────────┘
     │                              │
     │ 3. On miss,                  │
     │    read from DB              │
     ▼                              │
┌──────────┐                        │
│ Database │                        │
└────┬─────┘                        │
     │                              │
     └───── 4. Store in cache ──────┘

async def get_user(user_id: str) -> User:
    # 1. Try cache first
    cached = await cache.get(f"user:{user_id}")
    if cached:
        return User.from_json(cached)
    
    # 2. Cache miss - read from database
    user = await db.query("SELECT * FROM users WHERE id = ?", user_id)
    
    # 3. Populate cache for next time
    await cache.set(f"user:{user_id}", user.to_json(), ttl=3600)
    
    return user

Pros: Simple, only caches what’s needed, cache failures don’t break reads Cons: Cache miss = slow first read, potential for stale data

Cache itself fetches from database on miss.

┌──────────┐     Read       ┌───────────┐
│  Client  │ ─────────────► │   Cache   │
│          │ ◄───────────── │  (Layer)  │
└──────────┘                └─────┬─────┘
                                  │
                                  │ On miss,
                                  │ cache fetches
                                  ▼
                            ┌──────────┐
                            │ Database │
                            └──────────┘

class ReadThroughCache:
    def __init__(self, cache, loader):
        self.cache = cache
        self.loader = loader
    
    async def get(self, key: str):
        cached = await self.cache.get(key)
        if cached is not None:
            return cached
        
        # Cache handles loading
        value = await self.loader(key)
        await self.cache.set(key, value)
        return value

# Usage
cache = ReadThroughCache(
    redis_client,
    loader=lambda key: db.get_user(key.split(":")[1])
)
user = await cache.get("user:123")

Pros: Simpler application code, centralized caching logic Cons: Cache must understand data model, less flexible

Proactively refresh cache before expiration.

┌──────────────────────────────────────────────┐
│  Timeline for cached item (TTL = 60s)        │
│                                              │
│  0s        45s            60s                │
│  ├──────────┼──────────────┼                 │
│  │ CACHED   │ REFRESH ZONE │ EXPIRED         │
│  │          │              │                 │
│  └──────────┴──────────────┴                 │
│                                              │
│  When accessed in refresh zone:              │
│  1. Return current cached value immediately  │
│  2. Async: Fetch fresh value, update cache   │
│                                              │
└──────────────────────────────────────────────┘

async def get_with_refresh_ahead(key: str, ttl: int = 60, refresh_at: int = 45):
    cached, remaining_ttl = await cache.get_with_ttl(key)
    
    if cached is None:
        # Cache miss - fetch and cache
        value = await fetch_from_db(key)
        await cache.set(key, value, ttl=ttl)
        return value
    
    # Check if we're in the refresh zone
    if remaining_ttl < (ttl - refresh_at):
        # Trigger async refresh
        asyncio.create_task(refresh_cache(key, ttl))
    
    # Return cached value immediately
    return cached

async def refresh_cache(key: str, ttl: int):
    value = await fetch_from_db(key)
    await cache.set(key, value, ttl=ttl)

Pros: Always fast reads, data stays fresh Cons: Complex, may refresh data that’s never read again

Write Strategies

Write-Through
Write-Behind (Write-Back)
Write-Around

Write to cache AND database synchronously.

┌──────────┐     Write      ┌───────────┐     Write     ┌──────────┐
│  Client  │ ─────────────► │   Cache   │ ─────────────►│ Database │
│          │ ◄───────────── │  (Sync)   │ ◄─────────────│          │
└──────────┘      ACK       └───────────┘      ACK      └──────────┘

async def update_user(user_id: str, data: dict):
    # Write to database first (source of truth)
    await db.update("UPDATE users SET ... WHERE id = ?", user_id, data)
    
    # Then update cache
    user = await db.get_user(user_id)
    await cache.set(f"user:{user_id}", user.to_json(), ttl=3600)
    
    return user

Pros: Cache always consistent, no stale data Cons: Higher write latency, both must succeed

Write to cache immediately, persist to DB asynchronously.

┌──────────┐     Write      ┌───────────┐
│  Client  │ ─────────────► │   Cache   │
│          │ ◄───────────── │           │
└──────────┘      ACK       └─────┬─────┘
                                  │
                                  │ Async (batched)
                                  ▼
                            ┌──────────┐
                            │ Database │
                            └──────────┘

class WriteBehindCache:
    def __init__(self, cache, db, batch_size=100, flush_interval=1.0):
        self.cache = cache
        self.db = db
        self.pending_writes = []
        self.batch_size = batch_size
        self.flush_interval = flush_interval
        asyncio.create_task(self._flush_loop())
    
    async def write(self, key: str, value: any):
        # Write to cache immediately
        await self.cache.set(key, value)
        
        # Queue for database write
        self.pending_writes.append((key, value))
        
        if len(self.pending_writes) >= self.batch_size:
            await self._flush()
    
    async def _flush(self):
        if not self.pending_writes:
            return
        
        batch = self.pending_writes
        self.pending_writes = []
        
        # Batch write to database
        await self.db.batch_upsert(batch)
    
    async def _flush_loop(self):
        while True:
            await asyncio.sleep(self.flush_interval)
            await self._flush()

Pros: Very fast writes, reduced DB load (batching) Cons: Data loss risk if cache fails before persist, complex

Write directly to database, don’t update cache.

┌──────────┐                 ┌───────────┐
│  Client  │                 │   Cache   │
└────┬─────┘                 └───────────┘
     │
     │ Write directly
     ▼
┌──────────┐
│ Database │
└──────────┘

async def update_user(user_id: str, data: dict):
    # Write to database only
    await db.update("UPDATE users SET ... WHERE id = ?", user_id, data)
    
    # Either invalidate cache (let it refetch)
    await cache.delete(f"user:{user_id}")
    
    # Or let TTL expire naturally
    # (depends on staleness tolerance)

Pros: Simple, good for write-heavy infrequently-read data Cons: First read after write is slow

Cache Invalidation

“There are only two hard things in Computer Science: cache invalidation and naming things.” — Phil Karlton

Invalidation Strategies

┌─────────────────────────────────────────────────────────────────────────────┐
│                    CACHE INVALIDATION APPROACHES                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  1. TTL (Time To Live)                                                      │
│  ─────────────────────                                                      │
│  Set expiration time on each entry                                          │
│  Simple but may serve stale data until expiry                               │
│                                                                              │
│  cache.set("user:123", data, ttl=3600)  # Expires in 1 hour                │
│                                                                              │
│  2. EXPLICIT INVALIDATION                                                   │
│  ────────────────────────                                                   │
│  Delete from cache when data changes                                        │
│  Must track all places that cache data                                      │
│                                                                              │
│  await db.update_user(user_id, data)                                       │
│  await cache.delete(f"user:{user_id}")                                     │
│                                                                              │
│  3. EVENT-DRIVEN INVALIDATION                                               │
│  ────────────────────────────                                               │
│  Publish events when data changes                                           │
│  Cache subscribers invalidate on events                                     │
│                                                                              │
│  Database → Event Bus → Cache Invalidator                                   │
│     (CDC)     (Kafka)     (Consumer)                                        │
│                                                                              │
│  4. VERSION-BASED INVALIDATION                                              │
│  ─────────────────────────────                                              │
│  Include version in cache key                                               │
│  Bump version to invalidate all entries                                     │
│                                                                              │
│  cache.get(f"user:{user_id}:v{version}")                                   │
│  # Change version to invalidate all user caches                             │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

The Delete vs Update Debate

# APPROACH 1: Delete on write (RECOMMENDED)
async def update_user(user_id: str, data: dict):
    await db.update_user(user_id, data)
    await cache.delete(f"user:{user_id}")  # Let next read populate
    
# APPROACH 2: Update on write
async def update_user(user_id: str, data: dict):
    updated_user = await db.update_user(user_id, data)
    await cache.set(f"user:{user_id}", updated_user, ttl=3600)

Why Delete is Usually Better:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    DELETE vs UPDATE                                          │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  RACE CONDITION with UPDATE:                                                │
│  ────────────────────────────                                               │
│                                                                              │
│  Thread 1                              Thread 2                             │
│  ────────                              ────────                             │
│  1. Read user (version 1)                                                   │
│                                        2. Read user (version 1)             │
│  3. Write user (version 2)                                                  │
│                                        4. Write user (version 3)            │
│  5. Update cache (version 2)                                                │
│                                        6. Update cache (version 3)          │
│                                                                              │
│  If steps complete in order: 1,2,3,5,4,6 → Cache has version 3 (correct)   │
│  If steps complete as: 1,2,3,4,5,6 → Cache has version 2 (WRONG!)          │
│                                                                              │
│  ═══════════════════════════════════════════════════════════════════════    │
│                                                                              │
│  DELETE is SAFER:                                                           │
│  ────────────────                                                           │
│                                                                              │
│  Thread 1: Write version 2, DELETE cache                                   │
│  Thread 2: Write version 3, DELETE cache                                   │
│                                                                              │
│  Cache is empty, next read gets version 3 (correct!)                       │
│                                                                              │
│  EXCEPTION: Update is OK if you use compare-and-swap                       │
│  (check version before updating cache)                                      │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Cache Stampede (Thundering Herd)

When a popular cache key expires, many requests simultaneously hit the database.

┌─────────────────────────────────────────────────────────────────────────────┐
│                    CACHE STAMPEDE                                            │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Timeline:                                                                  │
│                                                                              │
│  t=0     Key "popular" cached (TTL 60s)                                    │
│  t=59    1000 requests/second hitting cache ✓                              │
│  t=60    Cache expires! All 1000 requests now hit DB simultaneously!       │
│          DB overwhelmed → slow response → more requests pile up            │
│                                                                              │
│  Requests:  ──────────────────────────█████████████████████████            │
│  Cache:     ████████████████████████──────────────────────────             │
│  Database:  ───────────────────────███████████████████████████             │
│                                     ↑                                       │
│                                     Stampede!                               │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Stampede Prevention Strategies

Locking (Single Flight)
Probabilistic Early Expiry
External Lock (Redis Lock)
Stale-While-Revalidate

Only one request fetches, others wait.

import asyncio
from typing import Dict

class SingleFlight:
    """
    Deduplicates concurrent requests for the same key.
    Only one fetch happens, all waiters get the result.
    """
    
    def __init__(self):
        self._in_flight: Dict[str, asyncio.Future] = {}
    
    async def do(self, key: str, fetch_func) -> any:
        if key in self._in_flight:
            # Wait for in-flight request
            return await self._in_flight[key]
        
        # We're the first, create a future
        future = asyncio.Future()
        self._in_flight[key] = future
        
        try:
            result = await fetch_func()
            future.set_result(result)
            return result
        except Exception as e:
            future.set_exception(e)
            raise
        finally:
            del self._in_flight[key]

# Usage
single_flight = SingleFlight()

async def get_user(user_id: str):
    cached = await cache.get(f"user:{user_id}")
    if cached:
        return cached
    
    # All concurrent requests for same user will share one DB query
    user = await single_flight.do(
        f"user:{user_id}",
        lambda: db.get_user(user_id)
    )
    
    await cache.set(f"user:{user_id}", user, ttl=3600)
    return user

Pros: Prevents duplicate DB queries Cons: All waiters blocked on one request

Refresh before expiry with some probability.

import random
import math

async def get_with_probabilistic_refresh(
    key: str,
    fetch_func,
    ttl: int = 3600,
    beta: float = 1.0  # Controls refresh aggressiveness
):
    cached, remaining_ttl = await cache.get_with_ttl(key)
    
    if cached is None:
        # Cache miss
        value = await fetch_func()
        await cache.set(key, value, ttl=ttl)
        return value
    
    # Calculate probability of refresh
    # Higher probability as TTL approaches 0
    expiry_gap = remaining_ttl / ttl
    random_factor = random.random()
    should_refresh = random_factor > math.exp(-beta * math.log(1 / expiry_gap))
    
    if should_refresh:
        # Refresh in background
        asyncio.create_task(_refresh(key, fetch_func, ttl))
    
    return cached

async def _refresh(key: str, fetch_func, ttl: int):
    value = await fetch_func()
    await cache.set(key, value, ttl=ttl)

Pros: Spreads refreshes over time, no locks Cons: May still have some concurrent refreshes

Use distributed lock for refresh.

async def get_with_lock(key: str, fetch_func, ttl: int = 3600):
    cached = await cache.get(key)
    if cached:
        return cached
    
    lock_key = f"lock:{key}"
    
    # Try to acquire lock
    acquired = await cache.set_nx(lock_key, "1", ttl=10)  # 10s lock timeout
    
    if acquired:
        try:
            # We have the lock, fetch and cache
            value = await fetch_func()
            await cache.set(key, value, ttl=ttl)
            return value
        finally:
            await cache.delete(lock_key)
    else:
        # Another process is fetching, wait and retry
        for _ in range(10):  # Max 10 retries
            await asyncio.sleep(0.1)  # 100ms
            cached = await cache.get(key)
            if cached:
                return cached
        
        # Give up waiting, fetch ourselves
        return await fetch_func()

Pros: Only one fetch, works across processes Cons: Lock overhead, potential deadlocks if not careful

Serve stale data while refreshing in background.

async def get_stale_while_revalidate(
    key: str,
    fetch_func,
    ttl: int = 3600,
    stale_ttl: int = 7200  # Serve stale for 2 hours
):
    cached, remaining_ttl = await cache.get_with_ttl(key)
    
    if cached is None:
        # Complete miss
        value = await fetch_func()
        await cache.set(key, value, ttl=stale_ttl)
        await cache.set(f"{key}:fresh_until", time.time() + ttl, ttl=stale_ttl)
        return value
    
    fresh_until = await cache.get(f"{key}:fresh_until") or 0
    
    if time.time() > fresh_until:
        # Data is stale, refresh in background
        asyncio.create_task(_refresh(key, fetch_func, ttl, stale_ttl))
    
    # Return cached data (possibly stale)
    return cached

async def _refresh(key: str, fetch_func, ttl: int, stale_ttl: int):
    value = await fetch_func()
    await cache.set(key, value, ttl=stale_ttl)
    await cache.set(f"{key}:fresh_until", time.time() + ttl, ttl=stale_ttl)

Pros: Always fast, never blocks on refresh Cons: May serve stale data, more complex

Hot Key Problem

When one key gets disproportionate traffic.

┌─────────────────────────────────────────────────────────────────────────────┐
│                    HOT KEY PROBLEM                                           │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  SCENARIO:                                                                  │
│  Celebrity tweet goes viral, "tweet:12345" gets 100K QPS                   │
│  All requests go to ONE Redis node (key-based sharding)                    │
│  That node becomes bottleneck, even if others are idle                     │
│                                                                              │
│  ┌─────────────┐                                                            │
│  │ Redis Node 1│ ← 100K QPS (overwhelmed!)                                 │
│  │ tweet:12345 │                                                            │
│  └─────────────┘                                                            │
│  ┌─────────────┐                                                            │
│  │ Redis Node 2│ ← 1K QPS (idle)                                           │
│  │ other keys  │                                                            │
│  └─────────────┘                                                            │
│  ┌─────────────┐                                                            │
│  │ Redis Node 3│ ← 1K QPS (idle)                                           │
│  │ other keys  │                                                            │
│  └─────────────┘                                                            │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Hot Key Solutions

Local Cache (L1)
Key Replication
Request Coalescing

Add in-process cache before Redis.

from functools import lru_cache
import time

class TwoLevelCache:
    """
    L1: In-process (fast, small, per-instance)
    L2: Distributed (slower, large, shared)
    """
    
    def __init__(self, l2_client, l1_max_size=1000, l1_ttl=10):
        self.l2 = l2_client
        self.l1_ttl = l1_ttl
        self.l1 = {}  # {key: (value, expire_at)}
    
    async def get(self, key: str):
        # Check L1 first
        if key in self.l1:
            value, expire_at = self.l1[key]
            if time.time() < expire_at:
                return value
            del self.l1[key]
        
        # L1 miss, check L2
        value = await self.l2.get(key)
        
        if value is not None:
            # Store in L1 for next time
            self.l1[key] = (value, time.time() + self.l1_ttl)
        
        return value

# Hot keys served from memory, never hit Redis!

Pros: Extremely fast, reduces L2 load Cons: Inconsistency between instances, memory usage

Spread hot key across multiple shards.

import random

class ReplicatedKeyCache:
    """
    Replicate hot keys across multiple logical slots.
    Reads pick a random replica, writes update all.
    """
    
    def __init__(self, cache, replicas=10):
        self.cache = cache
        self.replicas = replicas
        self.hot_keys = set()  # Track known hot keys
    
    def _get_replica_key(self, key: str, replica: int = None) -> str:
        if replica is None:
            replica = random.randint(0, self.replicas - 1)
        return f"{key}:replica:{replica}"
    
    async def get(self, key: str):
        if key in self.hot_keys:
            # Pick random replica
            replica_key = self._get_replica_key(key)
            return await self.cache.get(replica_key)
        return await self.cache.get(key)
    
    async def set_hot(self, key: str, value: any, ttl: int = 3600):
        """Set a known hot key across all replicas"""
        self.hot_keys.add(key)
        
        # Write to all replicas
        tasks = [
            self.cache.set(self._get_replica_key(key, i), value, ttl=ttl)
            for i in range(self.replicas)
        ]
        await asyncio.gather(*tasks)
    
    async def delete_hot(self, key: str):
        """Delete hot key from all replicas"""
        self.hot_keys.discard(key)
        
        tasks = [
            self.cache.delete(self._get_replica_key(key, i))
            for i in range(self.replicas)
        ]
        await asyncio.gather(*tasks)

Pros: Spreads load across cluster Cons: More storage, complex invalidation

Batch concurrent requests for same key.

from collections import defaultdict
import asyncio

class RequestCoalescer:
    """
    Coalesce concurrent requests for the same key.
    Instead of N cache hits, do 1 cache hit and broadcast result.
    """
    
    def __init__(self, cache, batch_window_ms=5):
        self.cache = cache
        self.batch_window = batch_window_ms / 1000
        self.pending = defaultdict(list)  # key -> [(event, result_holder)]
        self._lock = asyncio.Lock()
    
    async def get(self, key: str):
        async with self._lock:
            if key in self.pending:
                # Join existing batch
                event = asyncio.Event()
                result_holder = {'value': None, 'error': None}
                self.pending[key].append((event, result_holder))
                
                # Wait outside lock
                async with self._lock:
                    pass
                await event.wait()
                
                if result_holder['error']:
                    raise result_holder['error']
                return result_holder['value']
            
            # Start a new batch
            self.pending[key] = []
        
        # Wait for more requests to join
        await asyncio.sleep(self.batch_window)
        
        # Get waiters and clear
        async with self._lock:
            waiters = self.pending.pop(key, [])
        
        # Fetch once
        try:
            value = await self.cache.get(key)
            
            # Notify all waiters
            for event, result_holder in waiters:
                result_holder['value'] = value
                event.set()
            
            return value
        
        except Exception as e:
            for event, result_holder in waiters:
                result_holder['error'] = e
                event.set()
            raise

Pros: Reduces cache hits by order of magnitude Cons: Adds small latency (batch window), complex

Advanced: Sidecar Caching & The “Mesh” Pattern

At Staff/Principal level, you must move beyond the “application-manages-cache” (Cache-Aside) model. In a microservices architecture, managing cache logic in every service leads to duplication and inconsistency.

1. Sidecar Caching (e.g., Pelikan, Mcrouter)

Instead of the application connecting directly to Redis, it connects to a Local Sidecar Proxy.

Benefits:
- Language Agnostic: The sidecar handles complex logic (consistent hashing, retries, failover).
- Connection Pooling: Reduces the number of open connections to the central Redis cluster.
- Request Coalescing: The sidecar can merge multiple identical requests from the local app into one upstream request.

2. EVCache (The Netflix Pattern)

EVCache is a distributed, sharded, replicated in-memory caching system based on Memcached.

Sidecar-based: Every EC2 instance has an EVCache sidecar.
Global Replication: Writes are replicated asynchronously to other AWS regions, allowing for Local Reads globally.

3. Layered Cache Hierarchy (EVCache vs. Redis)

Tier	Name	Latency	Scope
L1	Local Memory	< 1μs	Per-Process (In-memory)
L2	Sidecar Cache	< 1ms	Per-Node (Unix Domain Socket)
L3	Regional Cache	1-5ms	Per-Region (Shared Redis)
L4	Global Cache	50ms+	Cross-Region (Replicated)

[ App ] --- (Unix Socket) ---> [ Sidecar Proxy ] --- (TCP) ---> [ Redis Cluster ]
                                      │
                                      └─ (Async Replication) ─► [ Remote Region ]

Staff Tip: Sidecar caching is the only way to scale caching in a Polyglot Microservices environment. It decouples the “How to cache” (retries, hashing) from the “What to cache” (business logic).

Distributed Cache Architectures

Redis Cluster

┌─────────────────────────────────────────────────────────────────────────────┐
│                    REDIS CLUSTER ARCHITECTURE                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  16,384 HASH SLOTS distributed across nodes:                                │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                        Hash Slots                                   │    │
│  │  0      5460    5461    10922   10923    16383                     │    │
│  │  ├───────┼───────┼────────┼───────┼────────┤                       │    │
│  │  │ Node A│       │ Node B │       │ Node C │                       │    │
│  │  └───────┴───────┴────────┴───────┴────────┘                       │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│  REPLICATION FOR HIGH AVAILABILITY:                                         │
│                                                                              │
│  ┌─────────────┐   ┌─────────────┐   ┌─────────────┐                       │
│  │   Node A    │   │   Node B    │   │   Node C    │                       │
│  │   (Master)  │   │   (Master)  │   │   (Master)  │                       │
│  └──────┬──────┘   └──────┬──────┘   └──────┬──────┘                       │
│         │                 │                 │                               │
│         ▼                 ▼                 ▼                               │
│  ┌─────────────┐   ┌─────────────┐   ┌─────────────┐                       │
│  │   Node A'   │   │   Node B'   │   │   Node C'   │                       │
│  │   (Replica) │   │   (Replica) │   │   (Replica) │                       │
│  └─────────────┘   └─────────────┘   └─────────────┘                       │
│                                                                              │
│  KEY ROUTING:                                                               │
│  ────────────                                                               │
│  slot = CRC16(key) % 16384                                                  │
│  Route to node owning that slot                                             │
│                                                                              │
│  HASH TAGS (force keys to same slot):                                       │
│  {user:123}:profile  ─┐                                                     │
│  {user:123}:settings ─┼─ Same slot (hash computed on "user:123")           │
│  {user:123}:orders   ─┘                                                     │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Memcached vs Redis

┌─────────────────────────────────────────────────────────────────────────────┐
│                    MEMCACHED vs REDIS COMPARISON                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  FEATURE              MEMCACHED           REDIS                             │
│  ───────              ─────────           ─────                             │
│  Data structures      Strings only        Strings, Lists, Sets,            │
│                                           Hashes, Sorted Sets,              │
│                                           Streams, HyperLogLog              │
│                                                                              │
│  Persistence          None                RDB snapshots, AOF                │
│                       (pure cache)        (can be persistent)               │
│                                                                              │
│  Replication          None built-in       Master-replica                    │
│                                           (async by default)                │
│                                                                              │
│  Clustering           Client-side         Redis Cluster                     │
│                       sharding            (automatic sharding)              │
│                                                                              │
│  Memory efficiency    Better for          More overhead per key            │
│                       small values                                          │
│                                                                              │
│  Threading            Multi-threaded      Single-threaded (mostly)          │
│                       (better multi-core) (simpler, no locks)               │
│                                                                              │
│  Eviction             LRU, random         LRU, LFU, volatile-lru,          │
│                                           volatile-ttl, etc.                │
│                                                                              │
│  ═══════════════════════════════════════════════════════════════════════    │
│                                                                              │
│  USE MEMCACHED WHEN:                                                        │
│  • Pure caching (ephemeral data)                                            │
│  • Simple key-value with large fan-out                                      │
│  • Need multi-threaded performance                                          │
│                                                                              │
│  USE REDIS WHEN:                                                            │
│  • Need complex data structures                                             │
│  • Pub/sub, streams, or queues                                              │
│  • Need persistence or replication                                          │
│  • Lua scripting for atomic operations                                      │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Cache Consistency Patterns

Eventual Consistency

┌─────────────────────────────────────────────────────────────────────────────┐
│                    CACHE CONSISTENCY APPROACHES                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  1. CACHE-ASIDE (Common, Eventually Consistent)                             │
│     ───────────────────────────────────────────                             │
│     Write: Update DB, delete cache                                          │
│     Read:  Check cache, miss → read DB → populate cache                     │
│                                                                              │
│     Consistency gap: Between DB write and cache delete                      │
│     Another request might read stale data                                   │
│                                                                              │
│  2. READ-YOUR-WRITES CONSISTENCY                                            │
│     ────────────────────────────────                                        │
│     After writing, user always sees their own write                         │
│                                                                              │
│     Implementation: Session-local cache or sticky sessions                  │
│                                                                              │
│  3. DOUBLE-DELETE PATTERN                                                   │
│     ──────────────────────                                                  │
│     1. Delete cache                                                         │
│     2. Update database                                                      │
│     3. Wait (e.g., 500ms)                                                   │
│     4. Delete cache again                                                   │
│                                                                              │
│     Why? Prevents race condition where stale read re-populates cache       │
│                                                                              │
│  4. CDC (Change Data Capture)                                               │
│     ──────────────────────────                                              │
│     Database → Debezium → Kafka → Cache Invalidator                        │
│                                                                              │
│     Reliable, ordered, but adds latency                                     │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Double-Delete Implementation

async def update_user_with_double_delete(user_id: str, data: dict):
    """
    Double-delete pattern for better cache consistency.
    
    Timeline:
    1. Delete cache (invalidate current stale entry)
    2. Update database
    3. Wait for replication lag + processing time
    4. Delete cache again (catch any reads that slipped through)
    """
    cache_key = f"user:{user_id}"
    
    # First delete
    await cache.delete(cache_key)
    
    # Update database
    await db.update_user(user_id, data)
    
    # Schedule second delete
    asyncio.create_task(_delayed_delete(cache_key, delay=0.5))

async def _delayed_delete(key: str, delay: float):
    await asyncio.sleep(delay)
    await cache.delete(key)

Advanced Design Scenarios

Scenario 1: Multi-Level Caching for Global APIs

Your company exposes a global read-heavy API (product catalog, content, or configuration) with users across multiple regions. You want sub-100ms p99 latencies while still keeping a single source-of-truth database.

┌─────────────────────────────────────────────────────────────────────┐
│                    MULTI-LEVEL CACHE HIERARCHY                      │
├─────────────────────────────────────────────────────────────────────┤
│  Client (Browser/Mobile)                                           │
│    │                                                               │
│    ▼                                                               │
│  CDN / Edge Cache (Level 0)   ← cache static + some JSON          │
│    │                                                               │
│    ▼                                                               │
│  API Gateway / Reverse Proxy (Level 1 - regional)                  │
│    │  In-memory + Redis/Envoy cache                                │
│    ▼                                                               │
│  App Service (Level 2 - per-instance L1)                           │
│    │                                                               │
│    ▼                                                               │
│  Primary Database (source of truth)                                │
└─────────────────────────────────────────────────────────────────────┘

Core ideas:

Cache as many layers as possible:
- Edge (CDN): cache public, cacheable responses with long TTLs and ETag/Last-Modified for validation.
- Regional gateway (L1.5): shared Redis or Envoy cache per region for JSON APIs.
- App instances (L2): small in-process L1 caches for the hottest keys.
Key design:
- Include versioning in keys for global invalidation: product:{id}:v{schema_version}.
- For per-tenant data: tenant:{tenant_id}:product:{id}:v{version}.
Invalidation:
- For rarely-changing data, use long TTLs + explicit purge when changes happen.
- On update, clear from inner layers first (DB → regional Redis → CDN purge API).

Failure modes & strategies:

CDN down or misbehaving: have monitoring that can bypass CDN and hit regional gateway directly.
Regional cache cluster degraded: instances fall back to L1 in-process cache + direct DB reads with tighter rate limits.
Schema changes: bump schema_version in cache keys to avoid mixed old/new value formats.

Scenario 2: Configuration and Feature Flag Caching

You have a central configuration/feature flag service that many microservices depend on. Latency spikes or downtime in this service must not take down the entire fleet. Requirements:

Services should start even if the config service is temporarily unavailable.
Each service instance should cache configuration locally and refresh in the background.
Changes should propagate within seconds, not minutes.

Architecture:

Source of truth: configuration stored in a durable DB (e.g., Postgres) behind a config service.
Distributed cache: Redis/Etcd used by the config service to cache hot config blobs (config:service_name).
Service-local cache: each microservice maintains an in-process cache of parsed config.
Notification channel: a message bus (Kafka, SNS/SQS, Redis Pub/Sub) carries “config changed” events.

# In the config service
async def update_feature_flags(payload: dict):
    await db.update_flags(payload)
    await cache.set("config:features", json.dumps(payload), ttl=300)
    await event_bus.publish("config.updated", {"keys": ["features"]})

# In each service
async def config_watcher():
    async for event in event_bus.subscribe("config.updated"):
        for key in event["keys"]:
            await refresh_config_key(key)

async def refresh_config_key(key: str):
    raw = await cache.get(f"config:{key}")
    if raw is None:
        raw = await config_http_client.get(f"/config/{key}")
    LOCAL_CONFIG[key] = json.loads(raw)

Design notes:

On startup, services block briefly to fetch essential config; thereafter they serve from local memory.
If the config service and cache are both down, services can keep using the last known good config in memory.
For safety-sensitive flags, include expiry timestamps so services can disable risky behavior if config has not refreshed in too long.

Scenario 3: Bounded-Staleness Caching on Top of a Strong Database

Sometimes you want strictly serializable writes but are happy to serve data up to (T) seconds old for reads. This is common for dashboards, search results, and secondary views. Goal: “Reads may be stale by at most T seconds, never more. Writes must always see their own effect.” Design:

Keep strong consistency in the primary DB (e.g., Spanner/CockroachDB/Postgres with serializable isolation).
Layer a cache in front with staleness metadata per key.

@dataclass
class CachedValue:
    value: dict
    last_write_time: float  # Unix timestamp

MAX_STALENESS_SEC = 5.0

async def get_with_bounded_staleness(key: str) -> dict:
    now = time.time()
    wrapped: CachedValue | None = await cache.get(key)
    
    if wrapped and now - wrapped.last_write_time <= MAX_STALENESS_SEC:
        return wrapped.value
    
    # Too stale or missing: read from DB
    row = await db.fetch_row(key)
    wrapped = CachedValue(value=row, last_write_time=now)
    await cache.set(key, wrapped, ttl=60)
    return row

Write path:

On write, delete cache (or write a fresh CachedValue with current last_write_time).
If you have a write-heavy workload, combine this with double-delete or CDC-backed invalidation to avoid stale repopulation races.

Trade-offs:

You can tighten or loosen MAX_STALENESS_SEC per endpoint: 1–2s for user-facing dashboards, 30–60s for admin analytics.
Reads that cannot tolerate any staleness bypass cache entirely and always hit the DB.

Interview Questions

Q: How would you design a caching layer for a social media feed?

Q: How do you handle cache stampede for a popular product page?

Solutions in order of sophistication:

Single-flight: Only one DB query, others wait
Lock with fallback: If can’t get lock, serve stale
Probabilistic refresh: Spread expiry over time
Stale-while-revalidate: Always serve stale, refresh async

Best approach: Combine single-flight + stale-while-revalidate

Always fast response (stale is OK for products)
Background refresh prevents stampede
Single-flight prevents duplicate refreshes

Q: When would you use cache-aside vs write-through?

Cache-Aside (lazy loading):

Read-heavy workloads
When not all data needs to be cached
When cache failure shouldn’t break writes
Most common for web applications

Write-Through:

When cache consistency is critical
Write-heavy workloads where you need fast subsequent reads
When you can tolerate higher write latency

Write-Behind:

Write-heavy, read-light workloads
When you can batch database writes
When you can tolerate potential data loss
Gaming leaderboards, view counts, etc.

Key Takeaways

Cache Strategically

Not everything needs caching. Cache hot paths where latency matters most.

Plan for Invalidation

Design invalidation strategy upfront. Delete is usually safer than update.

Handle the Stampede

Popular keys + expiration = danger. Use locking, probabilities, or stale serving.

Accept Eventual Consistency

For most use cases, eventual is fine. Build UX around it.

Overview

Testing & Code Quality

Crash Courses

AI Engineering

Math for ML - Understanding Linear Algebra

Probability & Statistics for ML

Math for ML - Understanding Calculus

ML Mastery

Deep Learning Mastery

NestJS Mastery

Microservices Mastery

Low Level Design

OOP Concepts

SOLID Principles

Design Patterns

LLD Case Studies

System Design (HLD)

Senior Level (L5+/Staff)

HLD Case Studies

Engineering Fundamentals

DevOps & Operations

Azure Cloud Engineering

AWS Cloud

AWS Monitoring & Observability

AWS Security Services

AWS Serverless

AWS Operations

AWS Advanced

AWS Case Studies

GCP Cloud Engineering

DevOps Tools

Database Engineering

HIPAA Compliance Mastery

Operating Systems

Linux Internals

Distributed Systems

Networking Mastery

Build Your Own X

Go Lang Mastery

C Programming

Classic Research Papers

Distributed System Tools

​Distributed Caching

​Why Caching Matters

​Cache Strategies

​Read Strategies

​Write Strategies

​Cache Invalidation

​Invalidation Strategies

​The Delete vs Update Debate

​Cache Stampede (Thundering Herd)

​Stampede Prevention Strategies

​Hot Key Problem

​Hot Key Solutions

​Advanced: Sidecar Caching & The “Mesh” Pattern

​1. Sidecar Caching (e.g., Pelikan, Mcrouter)

​2. EVCache (The Netflix Pattern)

​3. Layered Cache Hierarchy (EVCache vs. Redis)

​Distributed Cache Architectures

​Redis Cluster

​Memcached vs Redis

​Cache Consistency Patterns

​Eventual Consistency

​Double-Delete Implementation

​Advanced Design Scenarios

​Scenario 1: Multi-Level Caching for Global APIs

​Scenario 2: Configuration and Feature Flag Caching

​Scenario 3: Bounded-Staleness Caching on Top of a Strong Database

​Interview Questions

​Key Takeaways

Cache Strategically

Plan for Invalidation

Handle the Stampede

Accept Eventual Consistency

Distributed Caching

Why Caching Matters

Cache Strategies

Read Strategies

Write Strategies

Cache Invalidation

Invalidation Strategies

The Delete vs Update Debate

Cache Stampede (Thundering Herd)

Stampede Prevention Strategies

Hot Key Problem

Hot Key Solutions

Advanced: Sidecar Caching & The “Mesh” Pattern

1. Sidecar Caching (e.g., Pelikan, Mcrouter)

2. EVCache (The Netflix Pattern)

3. Layered Cache Hierarchy (EVCache vs. Redis)

Distributed Cache Architectures

Redis Cluster

Memcached vs Redis

Cache Consistency Patterns

Eventual Consistency

Double-Delete Implementation

Advanced Design Scenarios

Scenario 1: Multi-Level Caching for Global APIs

Scenario 2: Configuration and Feature Flag Caching

Scenario 3: Bounded-Staleness Caching on Top of a Strong Database

Interview Questions

Key Takeaways