> ## Documentation Index
> Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Distributed Caching

> Cache architectures, consistency strategies, and patterns for high-performance distributed systems

# Distributed Caching

Caching is often the difference between a system that handles 100 QPS and one that handles 1 million QPS. This module covers everything from cache fundamentals to advanced distributed caching patterns.

The fundamental insight behind caching is straightforward: if you ask the same question repeatedly, remembering the answer is faster than re-computing it. It is the same reason you keep your phone number in contacts instead of looking it up in the phone book every time. But distributed caching introduces a twist that makes it one of the hardest problems in systems design: when the underlying answer changes, how do you make sure every cached copy is updated? This is the cache invalidation problem, and it is where most production cache bugs live.

<Info>
  **Track Duration**: 10-14 hours\
  **Key Topics**: Cache Strategies, Consistency, Invalidation, Redis, Memcached, Cache Stampede\
  **Interview Focus**: Cache consistency, invalidation strategies, hot key handling
</Info>

***

## Why Caching Matters

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                    LATENCY COMPARISON                                        │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Operation                              Time                                │
│  ─────────────────────────────────────  ─────────────                       │
│  L1 cache reference                     0.5 ns                              │
│  L2 cache reference                     7 ns                                │
│  Main memory reference                  100 ns                              │
│  ──────────────────────────────────────────────────                         │
│  In-memory cache (Redis on localhost)  100-500 μs                           │
│  In-memory cache (Redis same DC)       1-5 ms                               │
│  SSD random read                        150 μs                              │
│  ──────────────────────────────────────────────────                         │
│  Database query (simple)               1-10 ms                              │
│  Database query (complex)              10-100 ms                            │
│  Network round trip (same DC)          0.5-1 ms                             │
│  Network round trip (cross-region)     50-150 ms                            │
│                                                                              │
│  IMPACT OF CACHING:                                                         │
│  ──────────────────                                                         │
│  Without cache: 100ms DB query × 1000 QPS = 100 DB connections busy        │
│  With 99% cache hit: 1ms cache × 990 + 100ms DB × 10 = 2 DB connections    │
│                                                                              │
│  CACHING IS A FORCE MULTIPLIER                                              │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
```

***

## Cache Strategies

### Read Strategies

<Tabs>
  <Tab title="Cache-Aside (Lazy Loading)">
    Application manages cache explicitly. Most common pattern.

    ```
    ┌──────────┐     1. Read      ┌───────────┐
    │  Client  │ ───────────────► │   Cache   │
    │          │ ◄─────────────── │  (Redis)  │
    └────┬─────┘   2. Miss/Hit    └───────────┘
         │                              │
         │ 3. On miss,                  │
         │    read from DB              │
         ▼                              │
    ┌──────────┐                        │
    │ Database │                        │
    └────┬─────┘                        │
         │                              │
         └───── 4. Store in cache ──────┘
    ```

    ```python theme={null}
    async def get_user(user_id: str) -> User:
        # Step 1: Check the cache first (fast path -- sub-millisecond)
        cached = await cache.get(f"user:{user_id}")
        if cached:
            return User.from_json(cached)
        
        # Step 2: Cache miss -- fall through to the database (slow path -- 5-50ms)
        # This only happens on the FIRST request for this user or after TTL expires
        user = await db.query("SELECT * FROM users WHERE id = ?", user_id)
        
        # Step 3: Populate cache so the NEXT request for this user is fast
        # TTL of 3600s (1 hour) means we tolerate up to 1 hour of staleness
        await cache.set(f"user:{user_id}", user.to_json(), ttl=3600)
        
        return user
    ```

    **Pros**: Simple, only caches what's needed, cache failures don't break reads
    **Cons**: Cache miss = slow first read, potential for stale data
  </Tab>

  <Tab title="Read-Through">
    Cache itself fetches from database on miss.

    ```
    ┌──────────┐     Read       ┌───────────┐
    │  Client  │ ─────────────► │   Cache   │
    │          │ ◄───────────── │  (Layer)  │
    └──────────┘                └─────┬─────┘
                                      │
                                      │ On miss,
                                      │ cache fetches
                                      ▼
                                ┌──────────┐
                                │ Database │
                                └──────────┘
    ```

    ```python theme={null}
    class ReadThroughCache:
        def __init__(self, cache, loader):
            self.cache = cache
            self.loader = loader
        
        async def get(self, key: str):
            cached = await self.cache.get(key)
            if cached is not None:
                return cached
            
            # Cache handles loading
            value = await self.loader(key)
            await self.cache.set(key, value)
            return value

    # Usage
    cache = ReadThroughCache(
        redis_client,
        loader=lambda key: db.get_user(key.split(":")[1])
    )
    user = await cache.get("user:123")
    ```

    **Pros**: Simpler application code, centralized caching logic
    **Cons**: Cache must understand data model, less flexible
  </Tab>

  <Tab title="Refresh-Ahead">
    Proactively refresh cache before expiration.

    ```
    ┌──────────────────────────────────────────────┐
    │  Timeline for cached item (TTL = 60s)        │
    │                                              │
    │  0s        45s            60s                │
    │  ├──────────┼──────────────┼                 │
    │  │ CACHED   │ REFRESH ZONE │ EXPIRED         │
    │  │          │              │                 │
    │  └──────────┴──────────────┴                 │
    │                                              │
    │  When accessed in refresh zone:              │
    │  1. Return current cached value immediately  │
    │  2. Async: Fetch fresh value, update cache   │
    │                                              │
    └──────────────────────────────────────────────┘
    ```

    ```python theme={null}
    async def get_with_refresh_ahead(key: str, ttl: int = 60, refresh_at: int = 45):
        cached, remaining_ttl = await cache.get_with_ttl(key)
        
        if cached is None:
            # Cache miss - fetch and cache
            value = await fetch_from_db(key)
            await cache.set(key, value, ttl=ttl)
            return value
        
        # Check if we're in the refresh zone
        if remaining_ttl < (ttl - refresh_at):
            # Trigger async refresh
            asyncio.create_task(refresh_cache(key, ttl))
        
        # Return cached value immediately
        return cached

    async def refresh_cache(key: str, ttl: int):
        value = await fetch_from_db(key)
        await cache.set(key, value, ttl=ttl)
    ```

    **Pros**: Always fast reads, data stays fresh
    **Cons**: Complex, may refresh data that's never read again
  </Tab>
</Tabs>

### Write Strategies

<Tabs>
  <Tab title="Write-Through">
    Write to cache AND database synchronously.

    ```
    ┌──────────┐     Write      ┌───────────┐     Write     ┌──────────┐
    │  Client  │ ─────────────► │   Cache   │ ─────────────►│ Database │
    │          │ ◄───────────── │  (Sync)   │ ◄─────────────│          │
    └──────────┘      ACK       └───────────┘      ACK      └──────────┘
    ```

    ```python theme={null}
    async def update_user(user_id: str, data: dict):
        # Write to database first (source of truth)
        await db.update("UPDATE users SET ... WHERE id = ?", user_id, data)
        
        # Then update cache
        user = await db.get_user(user_id)
        await cache.set(f"user:{user_id}", user.to_json(), ttl=3600)
        
        return user
    ```

    **Pros**: Cache always consistent, no stale data
    **Cons**: Higher write latency, both must succeed
  </Tab>

  <Tab title="Write-Behind (Write-Back)">
    Write to cache immediately, persist to DB asynchronously.

    ```
    ┌──────────┐     Write      ┌───────────┐
    │  Client  │ ─────────────► │   Cache   │
    │          │ ◄───────────── │           │
    └──────────┘      ACK       └─────┬─────┘
                                      │
                                      │ Async (batched)
                                      ▼
                                ┌──────────┐
                                │ Database │
                                └──────────┘
    ```

    ```python theme={null}
    class WriteBehindCache:
        def __init__(self, cache, db, batch_size=100, flush_interval=1.0):
            self.cache = cache
            self.db = db
            self.pending_writes = []
            self.batch_size = batch_size
            self.flush_interval = flush_interval
            asyncio.create_task(self._flush_loop())
        
        async def write(self, key: str, value: any):
            # Write to cache immediately
            await self.cache.set(key, value)
            
            # Queue for database write
            self.pending_writes.append((key, value))
            
            if len(self.pending_writes) >= self.batch_size:
                await self._flush()
        
        async def _flush(self):
            if not self.pending_writes:
                return
            
            batch = self.pending_writes
            self.pending_writes = []
            
            # Batch write to database
            await self.db.batch_upsert(batch)
        
        async def _flush_loop(self):
            while True:
                await asyncio.sleep(self.flush_interval)
                await self._flush()
    ```

    **Pros**: Very fast writes, reduced DB load (batching)
    **Cons**: Data loss risk if cache fails before persist, complex
  </Tab>

  <Tab title="Write-Around">
    Write directly to database, don't update cache.

    ```
    ┌──────────┐                 ┌───────────┐
    │  Client  │                 │   Cache   │
    └────┬─────┘                 └───────────┘
         │
         │ Write directly
         ▼
    ┌──────────┐
    │ Database │
    └──────────┘
    ```

    ```python theme={null}
    async def update_user(user_id: str, data: dict):
        # Write to database only
        await db.update("UPDATE users SET ... WHERE id = ?", user_id, data)
        
        # Either invalidate cache (let it refetch)
        await cache.delete(f"user:{user_id}")
        
        # Or let TTL expire naturally
        # (depends on staleness tolerance)
    ```

    **Pros**: Simple, good for write-heavy infrequently-read data
    **Cons**: First read after write is slow
  </Tab>
</Tabs>

***

## Cache Invalidation

<Warning>
  **"There are only two hard things in Computer Science: cache invalidation and naming things."** — Phil Karlton
</Warning>

### Invalidation Strategies

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                    CACHE INVALIDATION APPROACHES                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  1. TTL (Time To Live)                                                      │
│  ─────────────────────                                                      │
│  Set expiration time on each entry                                          │
│  Simple but may serve stale data until expiry                               │
│                                                                              │
│  cache.set("user:123", data, ttl=3600)  # Expires in 1 hour                │
│                                                                              │
│  2. EXPLICIT INVALIDATION                                                   │
│  ────────────────────────                                                   │
│  Delete from cache when data changes                                        │
│  Must track all places that cache data                                      │
│                                                                              │
│  await db.update_user(user_id, data)                                       │
│  await cache.delete(f"user:{user_id}")                                     │
│                                                                              │
│  3. EVENT-DRIVEN INVALIDATION                                               │
│  ────────────────────────────                                               │
│  Publish events when data changes                                           │
│  Cache subscribers invalidate on events                                     │
│                                                                              │
│  Database → Event Bus → Cache Invalidator                                   │
│     (CDC)     (Kafka)     (Consumer)                                        │
│                                                                              │
│  4. VERSION-BASED INVALIDATION                                              │
│  ─────────────────────────────                                              │
│  Include version in cache key                                               │
│  Bump version to invalidate all entries                                     │
│                                                                              │
│  cache.get(f"user:{user_id}:v{version}")                                   │
│  # Change version to invalidate all user caches                             │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
```

### The Delete vs Update Debate

```python theme={null}
# APPROACH 1: Delete on write (RECOMMENDED)
async def update_user(user_id: str, data: dict):
    await db.update_user(user_id, data)
    await cache.delete(f"user:{user_id}")  # Let next read populate
    
# APPROACH 2: Update on write
async def update_user(user_id: str, data: dict):
    updated_user = await db.update_user(user_id, data)
    await cache.set(f"user:{user_id}", updated_user, ttl=3600)
```

**Why Delete is Usually Better** (this is one of the most important cache design decisions you will make):

<Tip>
  **The rule of thumb a senior engineer follows**: Invalidate (delete) on write, populate on read. This is simpler, safer against race conditions, and wastes less cache space on data that might never be read again. Only use "update on write" if you have measured that the cache-miss penalty after deletion is unacceptable AND you can guarantee ordering (e.g., via compare-and-swap).
</Tip>

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                    DELETE vs UPDATE                                          │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  RACE CONDITION with UPDATE:                                                │
│  ────────────────────────────                                               │
│                                                                              │
│  Thread 1                              Thread 2                             │
│  ────────                              ────────                             │
│  1. Read user (version 1)                                                   │
│                                        2. Read user (version 1)             │
│  3. Write user (version 2)                                                  │
│                                        4. Write user (version 3)            │
│  5. Update cache (version 2)                                                │
│                                        6. Update cache (version 3)          │
│                                                                              │
│  If steps complete in order: 1,2,3,5,4,6 → Cache has version 3 (correct)   │
│  If steps complete as: 1,2,3,4,5,6 → Cache has version 2 (WRONG!)          │
│                                                                              │
│  ═══════════════════════════════════════════════════════════════════════    │
│                                                                              │
│  DELETE is SAFER:                                                           │
│  ────────────────                                                           │
│                                                                              │
│  Thread 1: Write version 2, DELETE cache                                   │
│  Thread 2: Write version 3, DELETE cache                                   │
│                                                                              │
│  Cache is empty, next read gets version 3 (correct!)                       │
│                                                                              │
│  EXCEPTION: Update is OK if you use compare-and-swap                       │
│  (check version before updating cache)                                      │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
```

***

## Cache Stampede (Thundering Herd)

When a popular cache key expires, many requests simultaneously hit the database.

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                    CACHE STAMPEDE                                            │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Timeline:                                                                  │
│                                                                              │
│  t=0     Key "popular" cached (TTL 60s)                                    │
│  t=59    1000 requests/second hitting cache ✓                              │
│  t=60    Cache expires! All 1000 requests now hit DB simultaneously!       │
│          DB overwhelmed → slow response → more requests pile up            │
│                                                                              │
│  Requests:  ──────────────────────────█████████████████████████            │
│  Cache:     ████████████████████████──────────────────────────             │
│  Database:  ───────────────────────███████████████████████████             │
│                                     ↑                                       │
│                                     Stampede!                               │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
```

### Stampede Prevention Strategies

<Tabs>
  <Tab title="Locking (Single Flight)">
    Only one request fetches, others wait.

    ```python theme={null}
    import asyncio
    from typing import Dict

    class SingleFlight:
        """
        Deduplicates concurrent requests for the same key.
        Only one fetch happens, all waiters get the result.
        """
        
        def __init__(self):
            self._in_flight: Dict[str, asyncio.Future] = {}
        
        async def do(self, key: str, fetch_func) -> any:
            if key in self._in_flight:
                # Wait for in-flight request
                return await self._in_flight[key]
            
            # We're the first, create a future
            future = asyncio.Future()
            self._in_flight[key] = future
            
            try:
                result = await fetch_func()
                future.set_result(result)
                return result
            except Exception as e:
                future.set_exception(e)
                raise
            finally:
                del self._in_flight[key]

    # Usage
    single_flight = SingleFlight()

    async def get_user(user_id: str):
        cached = await cache.get(f"user:{user_id}")
        if cached:
            return cached
        
        # All concurrent requests for same user will share one DB query
        user = await single_flight.do(
            f"user:{user_id}",
            lambda: db.get_user(user_id)
        )
        
        await cache.set(f"user:{user_id}", user, ttl=3600)
        return user
    ```

    **Pros**: Prevents duplicate DB queries
    **Cons**: All waiters blocked on one request
  </Tab>

  <Tab title="Probabilistic Early Expiry">
    Refresh before expiry with some probability.

    ```python theme={null}
    import random
    import math

    async def get_with_probabilistic_refresh(
        key: str,
        fetch_func,
        ttl: int = 3600,
        beta: float = 1.0  # Controls refresh aggressiveness
    ):
        cached, remaining_ttl = await cache.get_with_ttl(key)
        
        if cached is None:
            # Cache miss
            value = await fetch_func()
            await cache.set(key, value, ttl=ttl)
            return value
        
        # Calculate probability of refresh
        # Higher probability as TTL approaches 0
        expiry_gap = remaining_ttl / ttl
        random_factor = random.random()
        should_refresh = random_factor > math.exp(-beta * math.log(1 / expiry_gap))
        
        if should_refresh:
            # Refresh in background
            asyncio.create_task(_refresh(key, fetch_func, ttl))
        
        return cached

    async def _refresh(key: str, fetch_func, ttl: int):
        value = await fetch_func()
        await cache.set(key, value, ttl=ttl)
    ```

    **Pros**: Spreads refreshes over time, no locks
    **Cons**: May still have some concurrent refreshes
  </Tab>

  <Tab title="External Lock (Redis Lock)">
    Use distributed lock for refresh.

    ```python theme={null}
    async def get_with_lock(key: str, fetch_func, ttl: int = 3600):
        cached = await cache.get(key)
        if cached:
            return cached
        
        lock_key = f"lock:{key}"
        
        # Try to acquire lock
        acquired = await cache.set_nx(lock_key, "1", ttl=10)  # 10s lock timeout
        
        if acquired:
            try:
                # We have the lock, fetch and cache
                value = await fetch_func()
                await cache.set(key, value, ttl=ttl)
                return value
            finally:
                await cache.delete(lock_key)
        else:
            # Another process is fetching, wait and retry
            for _ in range(10):  # Max 10 retries
                await asyncio.sleep(0.1)  # 100ms
                cached = await cache.get(key)
                if cached:
                    return cached
            
            # Give up waiting, fetch ourselves
            return await fetch_func()
    ```

    **Pros**: Only one fetch, works across processes
    **Cons**: Lock overhead, potential deadlocks if not careful
  </Tab>

  <Tab title="Stale-While-Revalidate">
    Serve stale data while refreshing in background.

    ```python theme={null}
    async def get_stale_while_revalidate(
        key: str,
        fetch_func,
        ttl: int = 3600,
        stale_ttl: int = 7200  # Serve stale for 2 hours
    ):
        cached, remaining_ttl = await cache.get_with_ttl(key)
        
        if cached is None:
            # Complete miss
            value = await fetch_func()
            await cache.set(key, value, ttl=stale_ttl)
            await cache.set(f"{key}:fresh_until", time.time() + ttl, ttl=stale_ttl)
            return value
        
        fresh_until = await cache.get(f"{key}:fresh_until") or 0
        
        if time.time() > fresh_until:
            # Data is stale, refresh in background
            asyncio.create_task(_refresh(key, fetch_func, ttl, stale_ttl))
        
        # Return cached data (possibly stale)
        return cached

    async def _refresh(key: str, fetch_func, ttl: int, stale_ttl: int):
        value = await fetch_func()
        await cache.set(key, value, ttl=stale_ttl)
        await cache.set(f"{key}:fresh_until", time.time() + ttl, ttl=stale_ttl)
    ```

    **Pros**: Always fast, never blocks on refresh
    **Cons**: May serve stale data, more complex
  </Tab>
</Tabs>

***

## Hot Key Problem

When one key gets disproportionate traffic.

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                    HOT KEY PROBLEM                                           │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  SCENARIO:                                                                  │
│  Celebrity tweet goes viral, "tweet:12345" gets 100K QPS                   │
│  All requests go to ONE Redis node (key-based sharding)                    │
│  That node becomes bottleneck, even if others are idle                     │
│                                                                              │
│  ┌─────────────┐                                                            │
│  │ Redis Node 1│ ← 100K QPS (overwhelmed!)                                 │
│  │ tweet:12345 │                                                            │
│  └─────────────┘                                                            │
│  ┌─────────────┐                                                            │
│  │ Redis Node 2│ ← 1K QPS (idle)                                           │
│  │ other keys  │                                                            │
│  └─────────────┘                                                            │
│  ┌─────────────┐                                                            │
│  │ Redis Node 3│ ← 1K QPS (idle)                                           │
│  │ other keys  │                                                            │
│  └─────────────┘                                                            │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
```

### Hot Key Solutions

<Tabs>
  <Tab title="Local Cache (L1)">
    Add in-process cache before Redis.

    ```python theme={null}
    from functools import lru_cache
    import time

    class TwoLevelCache:
        """
        L1: In-process (fast, small, per-instance)
        L2: Distributed (slower, large, shared)
        """
        
        def __init__(self, l2_client, l1_max_size=1000, l1_ttl=10):
            self.l2 = l2_client
            self.l1_ttl = l1_ttl
            self.l1 = {}  # {key: (value, expire_at)}
        
        async def get(self, key: str):
            # Check L1 first
            if key in self.l1:
                value, expire_at = self.l1[key]
                if time.time() < expire_at:
                    return value
                del self.l1[key]
            
            # L1 miss, check L2
            value = await self.l2.get(key)
            
            if value is not None:
                # Store in L1 for next time
                self.l1[key] = (value, time.time() + self.l1_ttl)
            
            return value

    # Hot keys served from memory, never hit Redis!
    ```

    **Pros**: Extremely fast, reduces L2 load
    **Cons**: Inconsistency between instances, memory usage
  </Tab>

  <Tab title="Key Replication">
    Spread hot key across multiple shards.

    ```python theme={null}
    import random

    class ReplicatedKeyCache:
        """
        Replicate hot keys across multiple logical slots.
        Reads pick a random replica, writes update all.
        """
        
        def __init__(self, cache, replicas=10):
            self.cache = cache
            self.replicas = replicas
            self.hot_keys = set()  # Track known hot keys
        
        def _get_replica_key(self, key: str, replica: int = None) -> str:
            if replica is None:
                replica = random.randint(0, self.replicas - 1)
            return f"{key}:replica:{replica}"
        
        async def get(self, key: str):
            if key in self.hot_keys:
                # Pick random replica
                replica_key = self._get_replica_key(key)
                return await self.cache.get(replica_key)
            return await self.cache.get(key)
        
        async def set_hot(self, key: str, value: any, ttl: int = 3600):
            """Set a known hot key across all replicas"""
            self.hot_keys.add(key)
            
            # Write to all replicas
            tasks = [
                self.cache.set(self._get_replica_key(key, i), value, ttl=ttl)
                for i in range(self.replicas)
            ]
            await asyncio.gather(*tasks)
        
        async def delete_hot(self, key: str):
            """Delete hot key from all replicas"""
            self.hot_keys.discard(key)
            
            tasks = [
                self.cache.delete(self._get_replica_key(key, i))
                for i in range(self.replicas)
            ]
            await asyncio.gather(*tasks)
    ```

    **Pros**: Spreads load across cluster
    **Cons**: More storage, complex invalidation
  </Tab>

  <Tab title="Request Coalescing">
    Batch concurrent requests for same key.

    ```python theme={null}
    from collections import defaultdict
    import asyncio

    class RequestCoalescer:
        """
        Coalesce concurrent requests for the same key.
        Instead of N cache hits, do 1 cache hit and broadcast result.
        """
        
        def __init__(self, cache, batch_window_ms=5):
            self.cache = cache
            self.batch_window = batch_window_ms / 1000
            self.pending = defaultdict(list)  # key -> [(event, result_holder)]
            self._lock = asyncio.Lock()
        
        async def get(self, key: str):
            async with self._lock:
                if key in self.pending:
                    # Join existing batch
                    event = asyncio.Event()
                    result_holder = {'value': None, 'error': None}
                    self.pending[key].append((event, result_holder))
                    
                    # Wait outside lock
                    async with self._lock:
                        pass
                    await event.wait()
                    
                    if result_holder['error']:
                        raise result_holder['error']
                    return result_holder['value']
                
                # Start a new batch
                self.pending[key] = []
            
            # Wait for more requests to join
            await asyncio.sleep(self.batch_window)
            
            # Get waiters and clear
            async with self._lock:
                waiters = self.pending.pop(key, [])
            
            # Fetch once
            try:
                value = await self.cache.get(key)
                
                # Notify all waiters
                for event, result_holder in waiters:
                    result_holder['value'] = value
                    event.set()
                
                return value
            
            except Exception as e:
                for event, result_holder in waiters:
                    result_holder['error'] = e
                    event.set()
                raise
    ```

    **Pros**: Reduces cache hits by order of magnitude
    **Cons**: Adds small latency (batch window), complex
  </Tab>
</Tabs>

***

## Advanced: Sidecar Caching & The "Mesh" Pattern

At **Staff/Principal level**, you must move beyond the "application-manages-cache" (Cache-Aside) model. In a microservices architecture, managing cache logic in every service leads to duplication and inconsistency.

### 1. Sidecar Caching (e.g., Pelikan, Mcrouter)

Instead of the application connecting directly to Redis, it connects to a **Local Sidecar Proxy**.

* **Benefits**:
  * **Language Agnostic**: The sidecar handles complex logic (consistent hashing, retries, failover).
  * **Connection Pooling**: Reduces the number of open connections to the central Redis cluster.
  * **Request Coalescing**: The sidecar can merge multiple identical requests from the local app into one upstream request.

### 2. EVCache (The Netflix Pattern)

EVCache is a distributed, sharded, replicated in-memory caching system based on Memcached.

* **Sidecar-based**: Every EC2 instance has an EVCache sidecar.
* **Global Replication**: Writes are replicated asynchronously to other AWS regions, allowing for **Local Reads** globally.

### 3. Layered Cache Hierarchy (EVCache vs. Redis)

| Tier   | Name               | Latency | Scope                         |
| :----- | :----------------- | :------ | :---------------------------- |
| **L1** | **Local Memory**   | \< 1μs  | Per-Process (In-memory)       |
| **L2** | **Sidecar Cache**  | \< 1ms  | Per-Node (Unix Domain Socket) |
| **L3** | **Regional Cache** | 1-5ms   | Per-Region (Shared Redis)     |
| **L4** | **Global Cache**   | 50ms+   | Cross-Region (Replicated)     |

```text theme={null}
[ App ] --- (Unix Socket) ---> [ Sidecar Proxy ] --- (TCP) ---> [ Redis Cluster ]
                                      │
                                      └─ (Async Replication) ─► [ Remote Region ]
```

**Staff Tip:** Sidecar caching is the only way to scale caching in a **Polyglot Microservices** environment. It decouples the "How to cache" (retries, hashing) from the "What to cache" (business logic).

***

***

## Distributed Cache Architectures

### Redis Cluster

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                    REDIS CLUSTER ARCHITECTURE                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  16,384 HASH SLOTS distributed across nodes:                                │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                        Hash Slots                                   │    │
│  │  0      5460    5461    10922   10923    16383                     │    │
│  │  ├───────┼───────┼────────┼───────┼────────┤                       │    │
│  │  │ Node A│       │ Node B │       │ Node C │                       │    │
│  │  └───────┴───────┴────────┴───────┴────────┘                       │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│  REPLICATION FOR HIGH AVAILABILITY:                                         │
│                                                                              │
│  ┌─────────────┐   ┌─────────────┐   ┌─────────────┐                       │
│  │   Node A    │   │   Node B    │   │   Node C    │                       │
│  │   (Master)  │   │   (Master)  │   │   (Master)  │                       │
│  └──────┬──────┘   └──────┬──────┘   └──────┬──────┘                       │
│         │                 │                 │                               │
│         ▼                 ▼                 ▼                               │
│  ┌─────────────┐   ┌─────────────┐   ┌─────────────┐                       │
│  │   Node A'   │   │   Node B'   │   │   Node C'   │                       │
│  │   (Replica) │   │   (Replica) │   │   (Replica) │                       │
│  └─────────────┘   └─────────────┘   └─────────────┘                       │
│                                                                              │
│  KEY ROUTING:                                                               │
│  ────────────                                                               │
│  slot = CRC16(key) % 16384                                                  │
│  Route to node owning that slot                                             │
│                                                                              │
│  HASH TAGS (force keys to same slot):                                       │
│  {user:123}:profile  ─┐                                                     │
│  {user:123}:settings ─┼─ Same slot (hash computed on "user:123")           │
│  {user:123}:orders   ─┘                                                     │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
```

### Memcached vs Redis

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                    MEMCACHED vs REDIS COMPARISON                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  FEATURE              MEMCACHED           REDIS                             │
│  ───────              ─────────           ─────                             │
│  Data structures      Strings only        Strings, Lists, Sets,            │
│                                           Hashes, Sorted Sets,              │
│                                           Streams, HyperLogLog              │
│                                                                              │
│  Persistence          None                RDB snapshots, AOF                │
│                       (pure cache)        (can be persistent)               │
│                                                                              │
│  Replication          None built-in       Master-replica                    │
│                                           (async by default)                │
│                                                                              │
│  Clustering           Client-side         Redis Cluster                     │
│                       sharding            (automatic sharding)              │
│                                                                              │
│  Memory efficiency    Better for          More overhead per key            │
│                       small values                                          │
│                                                                              │
│  Threading            Multi-threaded      Single-threaded (mostly)          │
│                       (better multi-core) (simpler, no locks)               │
│                                                                              │
│  Eviction             LRU, random         LRU, LFU, volatile-lru,          │
│                                           volatile-ttl, etc.                │
│                                                                              │
│  ═══════════════════════════════════════════════════════════════════════    │
│                                                                              │
│  USE MEMCACHED WHEN:                                                        │
│  • Pure caching (ephemeral data)                                            │
│  • Simple key-value with large fan-out                                      │
│  • Need multi-threaded performance                                          │
│                                                                              │
│  USE REDIS WHEN:                                                            │
│  • Need complex data structures                                             │
│  • Pub/sub, streams, or queues                                              │
│  • Need persistence or replication                                          │
│  • Lua scripting for atomic operations                                      │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
```

***

## Cache Consistency Patterns

### Eventual Consistency

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                    CACHE CONSISTENCY APPROACHES                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  1. CACHE-ASIDE (Common, Eventually Consistent)                             │
│     ───────────────────────────────────────────                             │
│     Write: Update DB, delete cache                                          │
│     Read:  Check cache, miss → read DB → populate cache                     │
│                                                                              │
│     Consistency gap: Between DB write and cache delete                      │
│     Another request might read stale data                                   │
│                                                                              │
│  2. READ-YOUR-WRITES CONSISTENCY                                            │
│     ────────────────────────────────                                        │
│     After writing, user always sees their own write                         │
│                                                                              │
│     Implementation: Session-local cache or sticky sessions                  │
│                                                                              │
│  3. DOUBLE-DELETE PATTERN (also called "delayed invalidation")               │
│     ──────────────────────                                                  │
│     1. Delete cache                                                         │
│     2. Update database                                                      │
│     3. Wait (e.g., 500ms -- long enough for any in-flight reads to finish) │
│     4. Delete cache again                                                   │
│                                                                              │
│     Why? Between steps 1 and 2, a concurrent reader may fetch the old       │
│     value from the DB and re-populate the cache. The second delete clears  │
│     this stale entry. The wait duration should exceed your DB replication   │
│     lag plus typical read latency.                                          │
│                                                                              │
│  4. CDC (Change Data Capture)                                               │
│     ──────────────────────────                                              │
│     Database → Debezium → Kafka → Cache Invalidator                        │
│                                                                              │
│     Reliable, ordered, but adds latency                                     │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
```

### Double-Delete Implementation

```python theme={null}
async def update_user_with_double_delete(user_id: str, data: dict):
    """
    Double-delete pattern for better cache consistency.
    
    Timeline:
    1. Delete cache (invalidate current stale entry)
    2. Update database
    3. Wait for replication lag + processing time
    4. Delete cache again (catch any reads that slipped through)
    """
    cache_key = f"user:{user_id}"
    
    # First delete
    await cache.delete(cache_key)
    
    # Update database
    await db.update_user(user_id, data)
    
    # Schedule second delete
    asyncio.create_task(_delayed_delete(cache_key, delay=0.5))

async def _delayed_delete(key: str, delay: float):
    await asyncio.sleep(delay)
    await cache.delete(key)
```

***

## Advanced Design Scenarios

### Scenario 1: Multi-Level Caching for Global APIs

Your company exposes a global read-heavy API (product catalog, content, or configuration) with users across multiple regions. You want sub-100ms p99 latencies while still keeping a single source-of-truth database.

```
┌─────────────────────────────────────────────────────────────────────┐
│                    MULTI-LEVEL CACHE HIERARCHY                      │
├─────────────────────────────────────────────────────────────────────┤
│  Client (Browser/Mobile)                                           │
│    │                                                               │
│    ▼                                                               │
│  CDN / Edge Cache (Level 0)   ← cache static + some JSON          │
│    │                                                               │
│    ▼                                                               │
│  API Gateway / Reverse Proxy (Level 1 - regional)                  │
│    │  In-memory + Redis/Envoy cache                                │
│    ▼                                                               │
│  App Service (Level 2 - per-instance L1)                           │
│    │                                                               │
│    ▼                                                               │
│  Primary Database (source of truth)                                │
└─────────────────────────────────────────────────────────────────────┘
```

**Core ideas**:

* **Cache as many layers as possible**:
  * **Edge (CDN)**: cache public, cacheable responses with long TTLs and `ETag`/`Last-Modified` for validation.
  * **Regional gateway (L1.5)**: shared Redis or Envoy cache per region for JSON APIs.
  * **App instances (L2)**: small in-process L1 caches for the hottest keys.
* **Key design**:
  * Include **versioning** in keys for global invalidation: `product:{id}:v{schema_version}`.
  * For per-tenant data: `tenant:{tenant_id}:product:{id}:v{version}`.
* **Invalidation**:
  * For rarely-changing data, use **long TTLs + explicit purge** when changes happen.
  * On update, clear from inner layers first (DB → regional Redis → CDN purge API).

**Failure modes & strategies**:

* **CDN down or misbehaving**: have monitoring that can bypass CDN and hit regional gateway directly.
* **Regional cache cluster degraded**: instances fall back to L1 in-process cache + direct DB reads with **tighter rate limits**.
* **Schema changes**: bump `schema_version` in cache keys to avoid mixed old/new value formats.

***

### Scenario 2: Configuration and Feature Flag Caching

You have a central configuration/feature flag service that many microservices depend on. Latency spikes or downtime in this service must not take down the entire fleet.

**Requirements**:

* Services should **start** even if the config service is temporarily unavailable.
* Each service instance should **cache configuration locally** and refresh in the background.
* Changes should propagate within **seconds**, not minutes.

**Architecture**:

* **Source of truth**: configuration stored in a durable DB (e.g., Postgres) behind a config service.
* **Distributed cache**: Redis/Etcd used by the config service to cache hot config blobs (`config:service_name`).
* **Service-local cache**: each microservice maintains an in-process cache of parsed config.
* **Notification channel**: a message bus (Kafka, SNS/SQS, Redis Pub/Sub) carries "config changed" events.

```python theme={null}
# In the config service
async def update_feature_flags(payload: dict):
    await db.update_flags(payload)
    await cache.set("config:features", json.dumps(payload), ttl=300)
    await event_bus.publish("config.updated", {"keys": ["features"]})

# In each service
async def config_watcher():
    async for event in event_bus.subscribe("config.updated"):
        for key in event["keys"]:
            await refresh_config_key(key)

async def refresh_config_key(key: str):
    raw = await cache.get(f"config:{key}")
    if raw is None:
        raw = await config_http_client.get(f"/config/{key}")
    LOCAL_CONFIG[key] = json.loads(raw)
```

**Design notes**:

* On startup, services **block briefly** to fetch essential config; thereafter they serve from local memory.
* If the config service and cache are both down, services can keep using the **last known good** config in memory.
* For safety-sensitive flags, include **expiry timestamps** so services can disable risky behavior if config has not refreshed in too long.

***

### Scenario 3: Bounded-Staleness Caching on Top of a Strong Database

Sometimes you want **strictly serializable writes** but are happy to serve data up to (T) seconds old for reads. This is common for dashboards, search results, and secondary views.

**Goal**: "Reads may be stale by at most **T seconds**, never more. Writes must always see their own effect."

**Design**:

* Keep strong consistency in the primary DB (e.g., Spanner/CockroachDB/Postgres with serializable isolation).
* Layer a cache in front with **staleness metadata** per key.

```python theme={null}
@dataclass
class CachedValue:
    value: dict
    last_write_time: float  # Unix timestamp

MAX_STALENESS_SEC = 5.0

async def get_with_bounded_staleness(key: str) -> dict:
    now = time.time()
    wrapped: CachedValue | None = await cache.get(key)
    
    if wrapped and now - wrapped.last_write_time <= MAX_STALENESS_SEC:
        return wrapped.value
    
    # Too stale or missing: read from DB
    row = await db.fetch_row(key)
    wrapped = CachedValue(value=row, last_write_time=now)
    await cache.set(key, wrapped, ttl=60)
    return row
```

**Write path**:

* On write, **delete** cache (or write a fresh `CachedValue` with current `last_write_time`).
* If you have a write-heavy workload, combine this with **double-delete** or CDC-backed invalidation to avoid stale repopulation races.

**Trade-offs**:

* You can tighten or loosen `MAX_STALENESS_SEC` per endpoint: 1–2s for user-facing dashboards, 30–60s for admin analytics.
* Reads that cannot tolerate any staleness bypass cache entirely and always hit the DB.

***

## Interview Questions

<AccordionGroup>
  <Accordion title="Q: How would you design a caching layer for a social media feed?" icon="rss">
    **Key considerations**:

    1. **Cache structure**:
       * User's feed: `feed:{user_id}` → List of post IDs
       * Individual posts: `post:{post_id}` → Post data
       * Two-level: Feed IDs (small) + Post content (larger, reusable)

    2. **Invalidation strategy**:
       * New post: Fan-out to update all follower feeds
       * For large fan-out (celebrities): Don't fan-out, pull on read
       * Post edit/delete: Delete post cache, let feed IDs remain

    3. **Hot keys (celebrity posts)**:
       * Local L1 cache + remote L2
       * Replicate across shards
       * Consider CDN for very popular content

    4. **Consistency**:
       * Eventual is usually acceptable for feeds
       * Author should see own posts immediately (read-your-writes)
  </Accordion>

  <Accordion title="Q: How do you handle cache stampede for a popular product page?" icon="cart-shopping">
    **Solutions in order of sophistication**:

    1. **Single-flight**: Only one DB query, others wait
    2. **Lock with fallback**: If can't get lock, serve stale
    3. **Probabilistic refresh**: Spread expiry over time
    4. **Stale-while-revalidate**: Always serve stale, refresh async

    **Best approach**: Combine single-flight + stale-while-revalidate

    * Always fast response (stale is OK for products)
    * Background refresh prevents stampede
    * Single-flight prevents duplicate refreshes
  </Accordion>

  <Accordion title="Q: When would you use cache-aside vs write-through?" icon="pen">
    **Cache-Aside (lazy loading)**:

    * Read-heavy workloads
    * When not all data needs to be cached
    * When cache failure shouldn't break writes
    * Most common for web applications

    **Write-Through**:

    * When cache consistency is critical
    * Write-heavy workloads where you need fast subsequent reads
    * When you can tolerate higher write latency

    **Write-Behind**:

    * Write-heavy, read-light workloads
    * When you can batch database writes
    * When you can tolerate potential data loss
    * Gaming leaderboards, view counts, etc.
  </Accordion>
</AccordionGroup>

***

## Key Takeaways

<CardGroup cols={2}>
  <Card title="Cache Strategically" icon="bullseye">
    Not everything needs caching. Cache hot paths where latency matters most.
  </Card>

  <Card title="Plan for Invalidation" icon="trash">
    Design invalidation strategy upfront. Delete is usually safer than update.
  </Card>

  <Card title="Handle the Stampede" icon="users">
    Popular keys + expiration = danger. Use locking, probabilities, or stale serving.
  </Card>

  <Card title="Accept Eventual Consistency" icon="clock-rotate-left">
    For most use cases, eventual is fine. Build UX around it.
  </Card>
</CardGroup>