> ## Documentation Index
> Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
> Use this file to discover all available pages before exploring further.

# 20. Caching Strategies

> Master distributed caching patterns, cache invalidation, and Redis implementations for high-performance microservices

# Caching Strategies

Caching is critical for microservices performance, but it introduces one of computer science's two hardest problems (the other being naming things, and off-by-one errors). In a monolith, cache invalidation is already tricky. In microservices, Service A caches data that Service B can modify -- and now you are dealing with distributed cache invalidation, which is exponentially harder. This chapter covers not just how to cache, but how to cache safely, how to fail gracefully when your cache goes down, and how to prevent the thundering herd problem that has caused more outages than most people realize.

<Info>
  **Learning Objectives:**

  * Implement various caching patterns
  * Design cache invalidation strategies
  * Use Redis for distributed caching
  * Handle cache failures gracefully
  * Optimize cache hit rates
</Info>

***

## Why Caching Matters

Before we dive into patterns, let us be precise about *why* caching is worth the complexity. Every cache you add to a system is a promise you are making: "I can serve this data faster than the source of truth, and I will accept some risk of staleness in exchange." That trade-off is nearly always worth it for read-heavy workloads because databases are slow compared to in-memory stores (milliseconds vs microseconds) and expensive to scale horizontally. But if you do not understand the trade-off, you will cache things that should not be cached (live pricing, security tokens, personalized feeds) and end up with bugs that are nearly impossible to reproduce.

In microservices specifically, caching takes on a new dimension: it reduces coupling. If Service A caches a user's profile from Service B for five minutes, Service A can continue functioning even if Service B is briefly unavailable. The cache acts as a buffer of eventual consistency that lets services tolerate each other's failures. This is why caching is not just an optimization -- it is a resilience pattern.

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                    CACHING IMPACT                                            │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  WITHOUT CACHING:                                                            │
│  ───────────────────                                                        │
│                                                                              │
│  Client ──▶ API Gateway ──▶ Order Service ──▶ Database                      │
│                                    │              │                          │
│                                    │              │ 50ms                     │
│                                    ├──▶ User Service ──▶ Database            │
│                                    │                        │ 30ms           │
│                                    └──▶ Product Service ──▶ Database         │
│                                                                │ 40ms        │
│                                                                              │
│  Total Latency: ~120ms (serial) or ~50ms (parallel)                         │
│  Database Load: Every request hits DB                                       │
│                                                                              │
│  ═══════════════════════════════════════════════════════════════════════════│
│                                                                              │
│  WITH CACHING:                                                               │
│  ─────────────                                                              │
│                                                                              │
│  Client ──▶ API Gateway ──▶ Order Service                                   │
│                                    │                                         │
│                                    ├──▶ Redis Cache ──▶ (hit) Return         │
│                                    │        │ 1ms                            │
│                                    │        └──▶ (miss) ──▶ Database         │
│                                    │                           │ 50ms        │
│                                                                              │
│  Cache Hit Latency: ~5ms                                                    │
│  Cache Miss Latency: ~55ms                                                  │
│  Database Load: Only cache misses hit DB                                    │
│                                                                              │
│  With 90% cache hit rate: Average latency = 0.9×5 + 0.1×55 = 10ms           │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
```

***

## Caching Patterns

<Warning>
  **Caveats and Common Pitfalls with Caching Patterns**

  Caching bugs are among the hardest to reproduce, because they depend on timing, traffic shape, and order of events. The patterns below cover most of what breaks:

  * **Cache stampede (thundering herd) on popular keys.** A popular key (home page, featured product, celebrity user profile) has its TTL expire. In the next 50 milliseconds, 5,000 concurrent requests all miss, all query the database, and the database melts. Naive cache-aside does not prevent this; the default behavior actually causes it. Teams discover this only during traffic spikes, when the cache they added "for performance" becomes the reason production is down.
  * **Cache poisoning.** A malformed request or a bug causes a wrong value to be cached under a legitimate key. Every subsequent request returns the poisoned value until the TTL expires. If the TTL is an hour, you serve an hour of wrong data to every user. If the bug writes under many keys, you poison the whole cache. Cache writes must be as validated as database writes.
  * **Personalization bleed / wrong-user data.** Service caches a response under a key that is not user-scoped ("home\_page\_blocks"), but the response was actually personalized for user 1234. Next user gets user 1234's data. This leaks PII and, depending on the data, may be a security incident. It happens constantly in HTTP caches and CDNs where cache keys do not include the auth context.
  * **Ignoring the cost of a cache miss.** Teams size the database for the cached traffic rate and forget that a cache outage takes the traffic back to the uncached rate -- often 10-50x higher. When Redis fails over, the database is immediately overloaded and crashes, cascading into full platform downtime.
</Warning>

<Tip>
  **Solutions and Patterns for Safe Caching**

  * **Single-flight / request coalescing for stampede.** When multiple concurrent requests miss for the same key, only one of them queries the source. The rest wait on a shared promise or channel and get the same result. Libraries: Go's `singleflight`, Node's `promise-memoize`, Java's Caffeine with `AsyncLoadingCache`. This is the single most important stampede defense.
  * **Probabilistic early expiration (XFetch algorithm).** Instead of refreshing exactly at TTL, start refreshing probabilistically as TTL approaches. A small fraction of requests renew the cache "early," so by the time real expiration hits, the value is already fresh. Prevents the cliff-edge stampede entirely.
  * **Validate everything you cache.** Schema-check the response before caching it. A 500 error with a cache-control header is a classic poisoning vector -- you must not cache error responses unless you are explicitly caching negative results with a short TTL.
  * **Cache key discipline.** Every cache key must include every axis of variation: user\_id for personalized data, locale, tenant, feature flag state, auth scope. When in doubt, add it. A good heuristic: write a one-sentence description of exactly what this cache entry contains; every noun in the sentence is part of the key.
  * **Always design for the cache-off scenario.** What happens when the cache is 0 percent healthy? If the answer is "the database catches fire," you do not have a cache, you have a mandatory prefetch layer with no failover. Load-test with the cache disabled. Build local memory fallbacks and circuit breakers to shed load before the database dies.
</Tip>

The biggest mistake teams make with caching is treating "add a cache" as a single decision. It is not -- there are at least five distinct patterns, each with different consistency, performance, and failure-mode characteristics. Picking the wrong one creates bugs that do not show up in testing but manifest as weird production behavior: "why did this user see stale data for 10 minutes?" or "why did Redis crashing lose us \$50,000 in orders?". Think of these patterns as answers to two questions: (1) when do reads populate the cache? and (2) when do writes update the cache?

The patterns below trade safety, freshness, and write throughput in different ways. Cache-aside is the safest and the default for most teams. Write-through guarantees read-your-writes consistency at the cost of write latency. Write-behind is dangerous but fast. Read-through and refresh-ahead are niche patterns you reach for when the first three are not enough.

### Pattern Comparison

| Pattern           | How It Works                                                | Consistency                       | Performance                         | Risk                                     | Best For                                                 |
| ----------------- | ----------------------------------------------------------- | --------------------------------- | ----------------------------------- | ---------------------------------------- | -------------------------------------------------------- |
| **Cache-Aside**   | App checks cache, falls back to DB, writes to cache on miss | Stale reads possible (TTL window) | First request slow (cache miss)     | Low (cache failure = slower, not broken) | Most read-heavy workloads; safest default                |
| **Write-Through** | App writes to cache AND DB together                         | Strong (cache always fresh)       | Writes slower (two writes)          | Medium (write failure needs handling)    | Data that is read immediately after write                |
| **Write-Behind**  | App writes to cache, async flush to DB                      | Weak (cache is ahead of DB)       | Fastest writes                      | High (cache crash = data loss)           | Analytics, counters, data you can afford to lose         |
| **Read-Through**  | Cache itself fetches from DB on miss                        | Same as cache-aside               | Same as cache-aside                 | Low                                      | When you want the cache layer to be transparent          |
| **Refresh-Ahead** | Cache proactively refreshes before TTL expires              | Strong (always fresh)             | Best (no cache misses for hot keys) | Low                                      | Highly predictable access patterns (e.g., homepage data) |

**Decision framework:**

* **Default choice:** Cache-aside. It is the simplest, safest, and works for 80% of use cases.
* **User just updated their profile and sees stale data:** Switch to write-through for that specific entity.
* **Dashboard counters, view counts, analytics:** Write-behind is fine; losing a few counts is acceptable.
* **Homepage hero content that must never be stale:** Refresh-ahead with a short TTL.

### Cache-Aside (Lazy Loading)

Cache-aside is the most common pattern because it is the safest: if your cache goes down entirely, your application still works (it just hits the database for every request). The application is in full control of what gets cached and when. The downside is that the first request for any piece of data is always slow (cache miss), and there is a window between a database write and cache invalidation where stale data can be served.

Why this pattern dominates real-world microservices: it decouples the cache from your write path completely. The database is always the source of truth. If Redis is down, writes continue unchanged -- they just do not invalidate anything. If Redis is slow, your reads degrade gracefully back to database speed. Compare this to write-through, where a cache outage can block writes entirely. In a distributed system where failures are expected, cache-aside's "cache is an optimization, not a dependency" mindset is usually what you want.

A subtle gotcha: if your invalidation step fails (you updated the DB but `cache.del()` threw), you are now serving stale data until the TTL expires. Always set a TTL, even when you have invalidation -- TTL is your safety net against missed invalidations. Some teams also capture invalidation failures in a dead-letter queue and retry them asynchronously.

<Tabs>
  <Tab title="Node.js">
    ```javascript theme={null}
    // Most common pattern - application manages cache

    class ProductService {
      constructor(cache, repository) {
        this.cache = cache;  // Redis client
        this.repository = repository;  // Database
        this.ttl = 3600;  // 1 hour
      }

      async getProduct(productId) {
        const cacheKey = `product:${productId}`;
        
        // 1. Try to get from cache
        const cached = await this.cache.get(cacheKey);
        if (cached) {
          console.log(`Cache HIT for ${cacheKey}`);
          return JSON.parse(cached);
        }
        
        console.log(`Cache MISS for ${cacheKey}`);
        
        // 2. Get from database
        const product = await this.repository.findById(productId);
        if (!product) return null;
        
        // 3. Store in cache
        await this.cache.setEx(cacheKey, this.ttl, JSON.stringify(product));
        
        return product;
      }

      async updateProduct(productId, updates) {
        // Update database
        const product = await this.repository.update(productId, updates);
        
        // Invalidate cache
        await this.cache.del(`product:${productId}`);
        
        // Also invalidate any list caches
        await this.cache.del(`products:category:${product.categoryId}`);
        await this.cache.del('products:featured');
        
        return product;
      }
    }
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    # Cache-aside pattern in Python - idiomatic async with redis.asyncio
    import json
    import logging
    from typing import Optional
    from redis.asyncio import Redis
    from pydantic import BaseModel

    logger = logging.getLogger(__name__)


    class Product(BaseModel):
        id: str
        name: str
        price: float
        category_id: str


    class ProductService:
        def __init__(self, cache: Redis, repository, ttl: int = 3600):
            self.cache = cache
            self.repository = repository
            self.ttl = ttl  # 1 hour

        async def get_product(self, product_id: str) -> Optional[Product]:
            cache_key = f"product:{product_id}"

            # 1. Try to get from cache
            cached = await self.cache.get(cache_key)
            if cached:
                logger.info("Cache HIT for %s", cache_key)
                return Product.model_validate_json(cached)

            logger.info("Cache MISS for %s", cache_key)

            # 2. Get from database
            product = await self.repository.find_by_id(product_id)
            if product is None:
                return None

            # 3. Store in cache (TTL acts as safety net)
            await self.cache.setex(
                cache_key,
                self.ttl,
                product.model_dump_json(),
            )
            return product

        async def update_product(self, product_id: str, updates: dict) -> Product:
            # Update database (source of truth)
            product = await self.repository.update(product_id, updates)

            # Invalidate related cache entries - pipeline for atomic-ish batch
            async with self.cache.pipeline(transaction=False) as pipe:
                pipe.delete(f"product:{product_id}")
                pipe.delete(f"products:category:{product.category_id}")
                pipe.delete("products:featured")
                await pipe.execute()

            return product
    ```
  </Tab>
</Tabs>

### Write-Through

Write-through flips the safety trade-off: every write goes to both the cache and the database synchronously, so reads are always consistent with the last write. This is the pattern you want when users expect "read-your-writes" semantics -- after I update my profile picture, I should see it immediately, not in 5 minutes when the TTL expires. But the cost is real: every write is now twice as slow (two systems to write to), and if the cache is down, writes either fail or need a fallback path.

Where write-through shines: user profiles, shopping cart state, session data, and any "last write wins" scenarios where the write volume is moderate. Where it hurts: high-write-volume data like analytics events, where doubling every write blows up your Redis. The honest truth is that most teams who say they use write-through actually use write-around-with-invalidation and just do not know the difference -- and in most cases that is fine.

<Tabs>
  <Tab title="Node.js">
    ```javascript theme={null}
    // Write to cache and database together

    class UserService {
      constructor(cache, repository) {
        this.cache = cache;
        this.repository = repository;
        this.ttl = 7200;  // 2 hours
      }

      async createUser(userData) {
        // 1. Write to database
        const user = await this.repository.create(userData);
        
        // 2. Write to cache immediately
        const cacheKey = `user:${user.id}`;
        await this.cache.setEx(cacheKey, this.ttl, JSON.stringify(user));
        
        // Also cache by email for lookup
        await this.cache.setEx(`user:email:${user.email}`, this.ttl, user.id);
        
        return user;
      }

      async updateUser(userId, updates) {
        // Update database
        const user = await this.repository.update(userId, updates);
        
        // Update cache
        const cacheKey = `user:${userId}`;
        await this.cache.setEx(cacheKey, this.ttl, JSON.stringify(user));
        
        return user;
      }

      async getUser(userId) {
        const cacheKey = `user:${userId}`;
        
        // Try cache first
        const cached = await this.cache.get(cacheKey);
        if (cached) {
          return JSON.parse(cached);
        }
        
        // Fallback to database
        const user = await this.repository.findById(userId);
        if (user) {
          await this.cache.setEx(cacheKey, this.ttl, JSON.stringify(user));
        }
        
        return user;
      }
    }
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    # Write-through with Pydantic models and async Redis
    from typing import Optional
    from redis.asyncio import Redis
    from pydantic import BaseModel, EmailStr


    class User(BaseModel):
        id: str
        email: EmailStr
        name: str


    class UserService:
        def __init__(self, cache: Redis, repository, ttl: int = 7200):
            self.cache = cache
            self.repository = repository
            self.ttl = ttl  # 2 hours

        async def create_user(self, user_data: dict) -> User:
            # 1. Write to database
            user = await self.repository.create(user_data)

            # 2. Write to cache immediately (pipeline both writes)
            async with self.cache.pipeline(transaction=False) as pipe:
                pipe.setex(f"user:{user.id}", self.ttl, user.model_dump_json())
                pipe.setex(f"user:email:{user.email}", self.ttl, user.id)
                await pipe.execute()

            return user

        async def update_user(self, user_id: str, updates: dict) -> User:
            # Update database
            user = await self.repository.update(user_id, updates)

            # Sync cache with new state (no stale-read window)
            await self.cache.setex(
                f"user:{user.id}",
                self.ttl,
                user.model_dump_json(),
            )
            return user

        async def get_user(self, user_id: str) -> Optional[User]:
            cache_key = f"user:{user_id}"

            cached = await self.cache.get(cache_key)
            if cached:
                return User.model_validate_json(cached)

            # Fallback to database
            user = await self.repository.find_by_id(user_id)
            if user is not None:
                await self.cache.setex(
                    cache_key,
                    self.ttl,
                    user.model_dump_json(),
                )
            return user
    ```
  </Tab>
</Tabs>

### Write-Behind (Write-Back)

Write-behind is the most dangerous pattern here because it risks data loss: if Redis crashes before the queued write reaches the database, that data is gone. Use this only for data you can afford to lose (analytics counters, view counts) or where you have other durability guarantees in place.

Think about what you are buying with write-behind: raw write throughput. If you are tracking page views on a popular product, every page view is one increment to a counter. Doing that synchronously against PostgreSQL would generate thousands of transactions per second for a single row -- guaranteed lock contention and write amplification. With write-behind, you update Redis in memory (microseconds), and a background job batches updates to the database every second or minute. You have traded durability for throughput, and for counters, that is a good trade.

The architectural trap: teams use write-behind for "temporary" data, then other services start relying on that data being durable in the database. Now you have an implicit contract that write-behind cannot fulfill. Document clearly that write-behind caches are not durable, and put a monitoring alert on queue depth -- if the queue backs up, you know writes are being lost.

<Tabs>
  <Tab title="Node.js">
    ```javascript theme={null}
    // Write to cache first, async write to database

    class InventoryService {
      constructor(cache, repository, queue) {
        this.cache = cache;
        this.repository = repository;
        this.queue = queue;  // For async DB writes
      }

      async updateStock(productId, quantity) {
        const cacheKey = `inventory:${productId}`;
        
        // 1. Update cache immediately (fast response)
        const current = await this.cache.get(cacheKey);
        const currentStock = current ? parseInt(current) : 0;
        const newStock = currentStock + quantity;
        
        await this.cache.set(cacheKey, newStock.toString());
        
        // 2. Queue database write (async)
        await this.queue.add('inventory-update', {
          productId,
          quantity,
          newStock,
          timestamp: Date.now()
        });
        
        return { productId, stock: newStock };
      }

      async getStock(productId) {
        const cacheKey = `inventory:${productId}`;
        
        const stock = await this.cache.get(cacheKey);
        if (stock !== null) {
          return parseInt(stock);
        }
        
        // Load from DB if not in cache
        const inventory = await this.repository.findByProductId(productId);
        if (inventory) {
          await this.cache.set(cacheKey, inventory.stock.toString());
          return inventory.stock;
        }
        
        return 0;
      }
    }

    // Background worker processes the queue
    class InventoryWriter {
      constructor(repository) {
        this.repository = repository;
      }

      async processUpdate(job) {
        const { productId, newStock } = job.data;
        
        // Batch writes for efficiency
        await this.repository.updateStock(productId, newStock);
      }
    }
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    # Write-behind with Redis atomic INCRBY and a Celery/RQ-style queue
    import time
    from typing import Protocol
    from redis.asyncio import Redis


    class Queue(Protocol):
        async def enqueue(self, task_name: str, payload: dict) -> None: ...


    class InventoryService:
        def __init__(self, cache: Redis, repository, queue: Queue):
            self.cache = cache
            self.repository = repository
            self.queue = queue

        async def update_stock(self, product_id: str, delta: int) -> dict:
            cache_key = f"inventory:{product_id}"

            # 1. Atomic increment in Redis (survives concurrent callers)
            # INCRBY returns the new value atomically - no read-modify-write race
            new_stock = await self.cache.incrby(cache_key, delta)

            # 2. Queue durable write (background worker flushes to DB)
            await self.queue.enqueue(
                "inventory-update",
                {
                    "product_id": product_id,
                    "delta": delta,
                    "new_stock": new_stock,
                    "timestamp": time.time(),
                },
            )
            return {"product_id": product_id, "stock": new_stock}

        async def get_stock(self, product_id: str) -> int:
            cache_key = f"inventory:{product_id}"

            stock = await self.cache.get(cache_key)
            if stock is not None:
                return int(stock)

            # Hydrate from DB on first access
            inventory = await self.repository.find_by_product_id(product_id)
            if inventory is not None:
                await self.cache.set(cache_key, str(inventory.stock))
                return inventory.stock
            return 0


    # Background worker (e.g., running under Celery/Arq/RQ)
    class InventoryWriter:
        def __init__(self, repository):
            self.repository = repository

        async def process_update(self, job: dict) -> None:
            # Batch multiple updates per product for efficiency
            await self.repository.update_stock(
                product_id=job["product_id"],
                stock=job["new_stock"],
            )
    ```
  </Tab>
</Tabs>

### Read-Through

Read-through looks identical to cache-aside from the outside, but the key difference is where the "load on miss" logic lives. In cache-aside, the application knows about the cache and the database separately. In read-through, the cache itself knows how to load data -- the application just asks the cache and does not care where the data came from. This is appealing because it centralizes the cache-load logic, but it also couples the cache to your data sources, which is why it is less common in practice.

When would you pick this? When you have many different consumers of the same data and you want all of them to share the same cache-loading logic. If five services all need to load a user by ID on cache miss, do you want five copies of that logic or one? Read-through centralizes it. The trade-off: the cache layer becomes stateful and service-like, which complicates deployment and testing.

<Tabs>
  <Tab title="Node.js">
    ```javascript theme={null}
    // Cache handles loading data

    class ReadThroughCache {
      constructor(redisClient, dataLoader) {
        this.redis = redisClient;
        this.dataLoader = dataLoader;  // Function to load from DB
      }

      async get(key, options = {}) {
        const { ttl = 3600, loader } = options;
        
        // Try cache
        const cached = await this.redis.get(key);
        if (cached) {
          return JSON.parse(cached);
        }
        
        // Load using provided loader or default
        const loadFn = loader || this.dataLoader;
        const data = await loadFn(key);
        
        if (data !== null && data !== undefined) {
          await this.redis.setEx(key, ttl, JSON.stringify(data));
        }
        
        return data;
      }
    }

    // Usage
    const cache = new ReadThroughCache(redis, async (key) => {
      // Default loader extracts ID from key
      const [type, id] = key.split(':');
      
      switch (type) {
        case 'user':
          return userRepository.findById(id);
        case 'product':
          return productRepository.findById(id);
        default:
          return null;
      }
    });

    // Get user (automatically loads if not cached)
    const user = await cache.get('user:123');
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    # Read-through cache using Python's rich callable/async-callable support
    import json
    from typing import Any, Awaitable, Callable, Optional
    from redis.asyncio import Redis

    Loader = Callable[[str], Awaitable[Any]]


    class ReadThroughCache:
        def __init__(self, redis: Redis, default_loader: Loader):
            self.redis = redis
            self.default_loader = default_loader

        async def get(
            self,
            key: str,
            *,
            ttl: int = 3600,
            loader: Optional[Loader] = None,
        ) -> Any:
            cached = await self.redis.get(key)
            if cached:
                return json.loads(cached)

            # Delegate to provided loader or fall back to default
            load_fn = loader or self.default_loader
            data = await load_fn(key)

            if data is not None:
                await self.redis.setex(key, ttl, json.dumps(data, default=str))
            return data


    # Usage
    async def default_loader(key: str):
        type_, id_ = key.split(":", 1)
        if type_ == "user":
            return await user_repository.find_by_id(id_)
        if type_ == "product":
            return await product_repository.find_by_id(id_)
        return None


    cache = ReadThroughCache(redis, default_loader)
    user = await cache.get("user:123")
    ```
  </Tab>
</Tabs>

***

## Redis Implementation

Redis has become the de facto standard for distributed caching in microservices, and it is worth understanding *why* that happened rather than just accepting it. Redis wins because it is single-threaded (so no lock contention within a node), stores everything in memory (sub-millisecond reads), supports rich data structures (hashes, sorted sets, streams) beyond simple key-value, and has robust clustering for horizontal scaling. The alternatives -- Memcached is simpler but less capable; Hazelcast and Apache Ignite are richer but heavier -- each win on specific axes, but Redis hits the sweet spot for most teams.

A word of caution: Redis is not a durable database. When you see teams using Redis as a "fast database" for critical data, that is usually a mistake waiting to happen. AOF and RDB persistence help but do not eliminate data loss on crash. Treat Redis as a cache and rebuild state from the source of truth when needed. When people try to use Redis as both the hot path and the durability layer, they end up with a system that loses data during failovers and is slower to recover than it would be with proper database-backed architecture.

### Connection and Configuration

The connection logic below handles three production-critical concerns: reconnection with backoff (your Redis cluster will fail over at some point), cluster mode (for horizontal scaling beyond a single node), and error handling that does not crash your application. A naive Redis client that throws on every connection error will cascade into a full application restart -- which is exactly what you do not want in a distributed system. Instead, treat Redis errors as transient and let the fallback path (hit the database) handle them.

<Tabs>
  <Tab title="Node.js">
    ```javascript theme={null}
    // cache/redis-client.js
    const { createClient, createCluster } = require('redis');

    class RedisCache {
      constructor(config = {}) {
        this.config = {
          url: process.env.REDIS_URL || 'redis://localhost:6379',
          cluster: process.env.REDIS_CLUSTER === 'true',
          ...config
        };
        
        this.client = null;
        this.isConnected = false;
      }

      async connect() {
        if (this.config.cluster) {
          // Redis Cluster for production
          this.client = createCluster({
            rootNodes: [
              { url: process.env.REDIS_NODE_1 },
              { url: process.env.REDIS_NODE_2 },
              { url: process.env.REDIS_NODE_3 }
            ],
            defaults: {
              socket: {
                connectTimeout: 5000,
                keepAlive: 5000
              }
            }
          });
        } else {
          // Single node for development
          this.client = createClient({
            url: this.config.url,
            socket: {
              connectTimeout: 5000,
              keepAlive: 5000,
              reconnectStrategy: (retries) => {
                if (retries > 10) {
                  console.error('Redis: Max reconnection attempts reached');
                  return new Error('Max reconnection attempts');
                }
                return Math.min(retries * 100, 3000);
              }
            }
          });
        }

        this.client.on('error', (err) => {
          console.error('Redis error:', err);
          this.isConnected = false;
        });

        this.client.on('connect', () => {
          console.log('Redis connected');
          this.isConnected = true;
        });

        this.client.on('reconnecting', () => {
          console.log('Redis reconnecting...');
        });

        await this.client.connect();
        return this;
      }

      async get(key) {
        try {
          return await this.client.get(key);
        } catch (error) {
          console.error(`Redis GET error for ${key}:`, error);
          return null;
        }
      }

      async set(key, value, options = {}) {
        try {
          const stringValue = typeof value === 'object' 
            ? JSON.stringify(value) 
            : value;
          
          if (options.ttl) {
            await this.client.setEx(key, options.ttl, stringValue);
          } else {
            await this.client.set(key, stringValue);
          }
          return true;
        } catch (error) {
          console.error(`Redis SET error for ${key}:`, error);
          return false;
        }
      }

      async setEx(key, seconds, value) {
        return this.set(key, value, { ttl: seconds });
      }

      async del(key) {
        try {
          await this.client.del(key);
          return true;
        } catch (error) {
          console.error(`Redis DEL error for ${key}:`, error);
          return false;
        }
      }

      async mget(keys) {
        try {
          return await this.client.mGet(keys);
        } catch (error) {
          console.error('Redis MGET error:', error);
          return keys.map(() => null);
        }
      }

      async mset(keyValues, ttl = null) {
        try {
          const pipeline = this.client.multi();
          
          for (const [key, value] of Object.entries(keyValues)) {
            const stringValue = typeof value === 'object' 
              ? JSON.stringify(value) 
              : value;
            
            if (ttl) {
              pipeline.setEx(key, ttl, stringValue);
            } else {
              pipeline.set(key, stringValue);
            }
          }
          
          await pipeline.exec();
          return true;
        } catch (error) {
          console.error('Redis MSET error:', error);
          return false;
        }
      }

      async disconnect() {
        if (this.client) {
          await this.client.quit();
        }
      }
    }

    module.exports = { RedisCache };
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    # cache/redis_client.py - production-ready async Redis wrapper
    import json
    import logging
    import os
    from typing import Any, Optional
    from redis.asyncio import Redis, ConnectionPool
    from redis.asyncio.cluster import RedisCluster
    from redis.exceptions import RedisError
    from redis.backoff import ExponentialBackoff
    from redis.retry import Retry

    logger = logging.getLogger(__name__)


    class RedisCache:
        """Resilient Redis wrapper. Errors never raise to callers -
        return None/False so the application can fall back to DB."""

        def __init__(self, url: Optional[str] = None, cluster: bool = False):
            self.url = url or os.getenv("REDIS_URL", "redis://localhost:6379")
            self.cluster = cluster or os.getenv("REDIS_CLUSTER") == "true"
            self.client: Optional[Redis] = None
            self.is_connected = False

        async def connect(self) -> "RedisCache":
            # ExponentialBackoff with retry on network errors
            retry = Retry(ExponentialBackoff(cap=3, base=0.1), retries=10)

            if self.cluster:
                self.client = RedisCluster.from_url(
                    self.url,
                    decode_responses=True,
                    socket_connect_timeout=5,
                    socket_keepalive=True,
                    retry=retry,
                )
            else:
                pool = ConnectionPool.from_url(
                    self.url,
                    decode_responses=True,
                    socket_connect_timeout=5,
                    socket_keepalive=True,
                    max_connections=50,
                    retry=retry,
                )
                self.client = Redis(connection_pool=pool)

            # Verify connectivity
            await self.client.ping()
            self.is_connected = True
            logger.info("Redis connected: %s (cluster=%s)", self.url, self.cluster)
            return self

        async def get(self, key: str) -> Optional[str]:
            try:
                return await self.client.get(key)
            except RedisError as exc:
                logger.error("Redis GET error for %s: %s", key, exc)
                return None  # Fall through to DB, never propagate

        async def set(
            self,
            key: str,
            value: Any,
            ttl: Optional[int] = None,
        ) -> bool:
            try:
                payload = (
                    json.dumps(value, default=str)
                    if not isinstance(value, (str, bytes, int, float))
                    else value
                )
                if ttl:
                    await self.client.setex(key, ttl, payload)
                else:
                    await self.client.set(key, payload)
                return True
            except RedisError as exc:
                logger.error("Redis SET error for %s: %s", key, exc)
                return False

        async def setex(self, key: str, seconds: int, value: Any) -> bool:
            return await self.set(key, value, ttl=seconds)

        async def delete(self, key: str) -> bool:
            try:
                await self.client.delete(key)
                return True
            except RedisError as exc:
                logger.error("Redis DEL error for %s: %s", key, exc)
                return False

        async def mget(self, keys: list[str]) -> list[Optional[str]]:
            try:
                return await self.client.mget(keys)
            except RedisError as exc:
                logger.error("Redis MGET error: %s", exc)
                return [None] * len(keys)

        async def mset(
            self,
            pairs: dict[str, Any],
            ttl: Optional[int] = None,
        ) -> bool:
            try:
                async with self.client.pipeline(transaction=False) as pipe:
                    for key, value in pairs.items():
                        payload = (
                            json.dumps(value, default=str)
                            if not isinstance(value, (str, bytes, int, float))
                            else value
                        )
                        if ttl:
                            pipe.setex(key, ttl, payload)
                        else:
                            pipe.set(key, payload)
                    await pipe.execute()
                return True
            except RedisError as exc:
                logger.error("Redis MSET error: %s", exc)
                return False

        async def disconnect(self) -> None:
            if self.client is not None:
                await self.client.aclose()
    ```
  </Tab>
</Tabs>

### Caching Service

A thin wrapper around the raw Redis client is one of those patterns that seems unnecessary at first but pays off enormously over time. By centralizing key prefixing, serialization, TTL defaults, and pattern-based invalidation in one place, you avoid the "twelve services, twelve slightly different cache key formats" problem. When you need to debug a cache entry, having a consistent `service-name:entity:id` prefix makes grep-ing Redis trivial. When you need to migrate cache shapes, you change one file instead of hunting through every service.

Pay attention to `invalidatePattern` specifically. The naive way to implement this is `KEYS pattern*` -- and it will work fine in development with 100 keys. In production with 10 million keys, `KEYS` blocks the Redis server for seconds and causes an outage. `SCAN` is cursor-based and non-blocking, processing keys in small batches. This is one of the most common production mistakes I see in Redis deployments: `KEYS` used anywhere in application code.

<Tabs>
  <Tab title="Node.js">
    ```javascript theme={null}
    // cache/caching-service.js

    class CachingService {
      constructor(redis) {
        this.redis = redis;
        this.defaultTTL = 3600;  // 1 hour
        this.prefix = process.env.SERVICE_NAME || 'app';
      }

      _key(key) {
        return `${this.prefix}:${key}`;
      }

      async get(key) {
        const data = await this.redis.get(this._key(key));
        return data ? JSON.parse(data) : null;
      }

      async set(key, value, ttl = this.defaultTTL) {
        return this.redis.setEx(this._key(key), ttl, JSON.stringify(value));
      }

      async getOrSet(key, fetchFn, ttl = this.defaultTTL) {
        // Try cache first
        let data = await this.get(key);
        
        if (data !== null) {
          return { data, cached: true };
        }
        
        // Fetch from source
        data = await fetchFn();
        
        if (data !== null && data !== undefined) {
          await this.set(key, data, ttl);
        }
        
        return { data, cached: false };
      }

      async invalidate(key) {
        return this.redis.del(this._key(key));
      }

      async invalidatePattern(pattern) {
        // Use SCAN to find matching keys (don't use KEYS in production)
        const keys = [];
        let cursor = 0;
        
        do {
          const result = await this.redis.client.scan(cursor, {
            MATCH: this._key(pattern),
            COUNT: 100
          });
          
          cursor = result.cursor;
          keys.push(...result.keys);
        } while (cursor !== 0);
        
        if (keys.length > 0) {
          await this.redis.client.del(keys);
        }
        
        return keys.length;
      }

      // Hash operations for objects
      async hget(key, field) {
        const data = await this.redis.client.hGet(this._key(key), field);
        return data ? JSON.parse(data) : null;
      }

      async hset(key, field, value, ttl = null) {
        await this.redis.client.hSet(this._key(key), field, JSON.stringify(value));
        
        if (ttl) {
          await this.redis.client.expire(this._key(key), ttl);
        }
      }

      async hgetall(key) {
        const data = await this.redis.client.hGetAll(this._key(key));
        
        const result = {};
        for (const [field, value] of Object.entries(data)) {
          result[field] = JSON.parse(value);
        }
        
        return result;
      }

      // Sorted sets for leaderboards, time-based data
      async zadd(key, score, member) {
        return this.redis.client.zAdd(this._key(key), { score, value: member });
      }

      async zrange(key, start, stop, options = {}) {
        return this.redis.client.zRange(this._key(key), start, stop, options);
      }

      // List operations for queues, recent items
      async lpush(key, ...values) {
        return this.redis.client.lPush(this._key(key), values);
      }

      async lrange(key, start, stop) {
        return this.redis.client.lRange(this._key(key), start, stop);
      }

      async ltrim(key, start, stop) {
        return this.redis.client.lTrim(this._key(key), start, stop);
      }
    }
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    # cache/caching_service.py
    import json
    import os
    from typing import Any, Awaitable, Callable, Optional
    from redis.asyncio import Redis


    class CachingService:
        """High-level cache API with key prefixing, serialization,
        and pattern invalidation using SCAN (never KEYS)."""

        def __init__(self, redis: Redis, default_ttl: int = 3600):
            self.redis = redis
            self.default_ttl = default_ttl
            self.prefix = os.getenv("SERVICE_NAME", "app")

        def _key(self, key: str) -> str:
            return f"{self.prefix}:{key}"

        async def get(self, key: str) -> Any:
            data = await self.redis.get(self._key(key))
            return json.loads(data) if data else None

        async def set(self, key: str, value: Any, ttl: Optional[int] = None) -> bool:
            ttl = ttl or self.default_ttl
            return bool(
                await self.redis.setex(
                    self._key(key),
                    ttl,
                    json.dumps(value, default=str),
                )
            )

        async def get_or_set(
            self,
            key: str,
            fetch_fn: Callable[[], Awaitable[Any]],
            ttl: Optional[int] = None,
        ) -> tuple[Any, bool]:
            """Returns (data, was_cached). Core helper for cache-aside."""
            data = await self.get(key)
            if data is not None:
                return data, True

            data = await fetch_fn()
            if data is not None:
                await self.set(key, data, ttl=ttl)
            return data, False

        async def invalidate(self, key: str) -> bool:
            return bool(await self.redis.delete(self._key(key)))

        async def invalidate_pattern(self, pattern: str) -> int:
            """Use SCAN for non-blocking pattern matching - NEVER KEYS in prod."""
            full_pattern = self._key(pattern)
            deleted = 0
            batch: list[str] = []

            # scan_iter yields keys in batches without blocking Redis
            async for key in self.redis.scan_iter(match=full_pattern, count=100):
                batch.append(key)
                # Flush batches to avoid oversized DEL commands
                if len(batch) >= 500:
                    deleted += await self.redis.delete(*batch)
                    batch.clear()

            if batch:
                deleted += await self.redis.delete(*batch)
            return deleted

        # --- Hash operations for field-level caching ---
        async def hget(self, key: str, field: str) -> Any:
            data = await self.redis.hget(self._key(key), field)
            return json.loads(data) if data else None

        async def hset(
            self,
            key: str,
            field: str,
            value: Any,
            ttl: Optional[int] = None,
        ) -> None:
            full_key = self._key(key)
            await self.redis.hset(full_key, field, json.dumps(value, default=str))
            if ttl:
                await self.redis.expire(full_key, ttl)

        async def hgetall(self, key: str) -> dict[str, Any]:
            data = await self.redis.hgetall(self._key(key))
            return {field: json.loads(value) for field, value in data.items()}

        # --- Sorted sets for leaderboards / time-ranges ---
        async def zadd(self, key: str, score: float, member: str) -> int:
            return await self.redis.zadd(self._key(key), {member: score})

        async def zrange(
            self,
            key: str,
            start: int,
            stop: int,
            *,
            withscores: bool = False,
        ) -> list:
            return await self.redis.zrange(
                self._key(key), start, stop, withscores=withscores
            )

        # --- Lists for recent items / queues ---
        async def lpush(self, key: str, *values: str) -> int:
            return await self.redis.lpush(self._key(key), *values)

        async def lrange(self, key: str, start: int, stop: int) -> list[str]:
            return await self.redis.lrange(self._key(key), start, stop)

        async def ltrim(self, key: str, start: int, stop: int) -> bool:
            return bool(await self.redis.ltrim(self._key(key), start, stop))
    ```
  </Tab>
</Tabs>

***

## Cache Invalidation

<Warning>
  **Caveats and Common Pitfalls with Cache Invalidation in Microservices**

  Invalidation in a single-service world is hard. In a microservices fan-out world, it compounds in ways most teams never model:

  * **Invalidation fan-out grows combinatorially.** When a single User update should invalidate caches in Order Service, Recommendations Service, Search Service, Feed Service, and a CDN edge, you now have five services whose invalidation logic must agree. A bug in any one of them leaves stale data somewhere. As services multiply, the invalidation graph outgrows any human's ability to hold it in their head.
  * **Lost events mean permanently stale cache.** If the "UserUpdated" event is lost in transit (Kafka consumer crash, retention expired, network partition), the cache entries that should have been invalidated stay until their natural TTL -- or worse, refresh themselves from a stale read model and stay wrong. Event-driven invalidation without a safety net is a slow-motion bug factory.
  * **Nobody owns the invalidation contract.** Producer changes the data; which consumer is responsible for the cache? Often every consumer implements its own invalidation logic, and when the cache schema changes (new key structure, added dimensions), each consumer must update independently. In practice, they do not, and you serve inconsistent versions of the same data from different services.
  * **Personalization plus caching is a correctness trap.** "Top 10 recommendations for user X" cached under `recs:user:X` gets invalidated when user X's preferences change -- but what about when *another user* triggers a model retrain that changes every user's recommendations? Suddenly you have 100 million cache entries that should all be invalidated. Teams ship this feature without noticing until users complain about seeing stale recommendations for weeks.
</Warning>

<Tip>
  **Solutions and Patterns for Sane Cache Invalidation**

  * **Always pair event-driven invalidation with a TTL safety net.** Even if you invalidate on every change event, set a maximum TTL (hours, not days) so that missed events eventually self-heal. The TTL is not your primary invalidation mechanism; it is the backstop that prevents permanent staleness.
  * **Publish a versioned cache-key contract.** The owning service publishes a schema that describes its cache key structure and the events that invalidate each pattern. Consumers subscribe to this contract, not to individual events. When the schema changes, every consumer is notified and can update deterministically.
  * **Use tag-based invalidation for combinatorial cases.** Instead of tracking individual cache keys, tag entries ("product:123", "category:shoes", "brand:nike"). When an event says "everything in category:shoes is stale," you invalidate by tag. Redis supports this via secondary indexes; CDN vendors (Fastly, Cloudflare) support surrogate keys. Tag-based invalidation is how you avoid enumerating every affected key.
  * **Consider accept-stale-while-revalidating.** For data where immediate correctness is not critical, serve the stale value while asynchronously refreshing in the background. The user gets fast, slightly stale data; the cache self-heals within milliseconds. This is the HTTP `stale-while-revalidate` directive applied inside your own caches.
  * **Run a periodic "cache verifier."** A background job samples cache entries, compares them to the source of truth, and flags drift. This catches the long tail of missed invalidations that TTL alone would not surface for hours. Treat cache drift as a real SLO, not an afterthought.
</Tip>

"There are only two hard things in Computer Science: cache invalidation and naming things." -- Phil Karlton. This quote is repeated endlessly, but most engineers do not internalize *why* cache invalidation is hard until they have shipped a bug caused by it. The difficulty is not the mechanism (delete a key from Redis is trivial); it is knowing *when* to invalidate and *which* keys depend on which data. In a microservices system where Service B might change data that Services A, C, and D all cache, figuring out the complete invalidation graph is a genuinely unsolved problem.

Strategies below exist on a spectrum of precision vs complexity. TTL is imprecise but simple -- accept 5 minutes of staleness and do nothing else. Event-based is precise but requires messaging infrastructure and careful dependency tracking. The right answer depends on your tolerance for staleness and your operational maturity.

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                    CACHE INVALIDATION STRATEGIES                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  1. TIME-BASED (TTL)                                                        │
│     ─────────────────                                                       │
│     • Set expiration time on cache entries                                  │
│     • Simple, but may serve stale data                                      │
│     • Good for: product catalogs, user profiles                             │
│                                                                              │
│  2. EVENT-BASED                                                             │
│     ────────────────                                                        │
│     • Invalidate on write events                                            │
│     • Real-time freshness                                                   │
│     • Good for: inventory, prices, orders                                   │
│                                                                              │
│  3. VERSION-BASED                                                           │
│     ─────────────────                                                       │
│     • Include version in cache key                                          │
│     • Change version to invalidate                                          │
│     • Good for: configuration, templates                                    │
│                                                                              │
│  4. TAG-BASED                                                               │
│     ─────────────────                                                       │
│     • Tag entries with categories                                           │
│     • Invalidate by tag                                                     │
│     • Good for: related data (all products in category)                     │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
```

### Event-Based Invalidation

Event-based invalidation is the most architecturally clean way to solve the cross-service cache problem. The idea: when data changes, the owning service publishes an event. Any service that caches this data subscribes to the event and invalidates its local cache. The cache becomes an artifact of eventual consistency -- it catches up when the event arrives, typically within milliseconds on a well-tuned Kafka or RabbitMQ setup.

What can go wrong: events can be lost. If your event bus drops a message (network issue, consumer crash, topic retention expired), you permanently serve stale data. Defense-in-depth matters here: always pair event-based invalidation with a TTL safety net. The TTL catches missed invalidations. Also consider a "cache verification" job that periodically samples cached entries and cross-checks them against the source of truth.

Another trap: invalidation cascades. If updating a product invalidates its cache, the category cache, the search results cache, and the homepage featured products cache, a single write now triggers four cache operations. Multiplied across frequent writes, this can overwhelm Redis. Batch invalidations where possible and set up metrics on invalidation rate to catch pathological cases early.

<Tabs>
  <Tab title="Node.js">
    ```javascript theme={null}
    // Event-driven cache invalidation across services

    // Event consumer for cache invalidation
    class CacheInvalidator {
      constructor(cache, eventBus) {
        this.cache = cache;
        this.eventBus = eventBus;
        
        this.setupListeners();
      }

      setupListeners() {
        // Product events
        this.eventBus.subscribe('product.updated', this.onProductUpdated.bind(this));
        this.eventBus.subscribe('product.deleted', this.onProductDeleted.bind(this));
        
        // Price events
        this.eventBus.subscribe('price.changed', this.onPriceChanged.bind(this));
        
        // Inventory events
        this.eventBus.subscribe('inventory.updated', this.onInventoryUpdated.bind(this));
        
        // Category events
        this.eventBus.subscribe('category.updated', this.onCategoryUpdated.bind(this));
      }

      async onProductUpdated(event) {
        const { productId, categoryId } = event;
        
        // Invalidate specific product
        await this.cache.invalidate(`product:${productId}`);
        
        // Invalidate product lists
        await this.cache.invalidatePattern(`products:list:*`);
        await this.cache.invalidate(`products:category:${categoryId}`);
        await this.cache.invalidate('products:featured');
        
        console.log(`Cache invalidated for product ${productId}`);
      }

      async onProductDeleted(event) {
        const { productId, categoryId } = event;
        
        await this.cache.invalidate(`product:${productId}`);
        await this.cache.invalidatePattern(`products:*`);
        
        console.log(`Cache invalidated for deleted product ${productId}`);
      }

      async onPriceChanged(event) {
        const { productId, oldPrice, newPrice } = event;
        
        // Invalidate product cache
        await this.cache.invalidate(`product:${productId}`);
        
        // Invalidate any price-based lists
        await this.cache.invalidate('products:deals');
        await this.cache.invalidatePattern('products:price-range:*');
        
        console.log(`Price cache invalidated for product ${productId}`);
      }

      async onInventoryUpdated(event) {
        const { productId, oldStock, newStock } = event;
        
        // Invalidate inventory cache
        await this.cache.invalidate(`inventory:${productId}`);
        
        // If went out of stock or back in stock
        if ((oldStock > 0 && newStock === 0) || (oldStock === 0 && newStock > 0)) {
          await this.cache.invalidatePattern(`products:*`);
        }
      }

      async onCategoryUpdated(event) {
        const { categoryId } = event;
        
        // Invalidate category and all products in it
        await this.cache.invalidate(`category:${categoryId}`);
        await this.cache.invalidate(`products:category:${categoryId}`);
        await this.cache.invalidatePattern('categories:*');
      }
    }
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    # Event-driven cache invalidator - subscribe to a message broker
    # (Kafka via aiokafka, RabbitMQ via aio-pika, or Redis Streams)
    import logging
    from typing import Protocol
    from pydantic import BaseModel

    logger = logging.getLogger(__name__)


    # --- Event schemas (Pydantic v2) ---
    class ProductUpdated(BaseModel):
        product_id: str
        category_id: str


    class PriceChanged(BaseModel):
        product_id: str
        old_price: float
        new_price: float


    class InventoryUpdated(BaseModel):
        product_id: str
        old_stock: int
        new_stock: int


    class CategoryUpdated(BaseModel):
        category_id: str


    class EventBus(Protocol):
        async def subscribe(self, topic: str, handler) -> None: ...


    class CacheInvalidator:
        def __init__(self, cache, event_bus: EventBus):
            self.cache = cache
            self.event_bus = event_bus

        async def start(self) -> None:
            # Wire up topic -> handler map
            handlers = {
                "product.updated": (ProductUpdated, self.on_product_updated),
                "product.deleted": (ProductUpdated, self.on_product_deleted),
                "price.changed": (PriceChanged, self.on_price_changed),
                "inventory.updated": (InventoryUpdated, self.on_inventory_updated),
                "category.updated": (CategoryUpdated, self.on_category_updated),
            }
            for topic, (schema, handler) in handlers.items():
                await self.event_bus.subscribe(
                    topic,
                    lambda raw, s=schema, h=handler: h(s.model_validate(raw)),
                )

        async def on_product_updated(self, event: ProductUpdated) -> None:
            # Invalidate direct entity + dependent list views
            await self.cache.invalidate(f"product:{event.product_id}")
            await self.cache.invalidate_pattern("products:list:*")
            await self.cache.invalidate(f"products:category:{event.category_id}")
            await self.cache.invalidate("products:featured")
            logger.info("Cache invalidated for product %s", event.product_id)

        async def on_product_deleted(self, event: ProductUpdated) -> None:
            await self.cache.invalidate(f"product:{event.product_id}")
            await self.cache.invalidate_pattern("products:*")
            logger.info("Cache invalidated for deleted product %s", event.product_id)

        async def on_price_changed(self, event: PriceChanged) -> None:
            await self.cache.invalidate(f"product:{event.product_id}")
            await self.cache.invalidate("products:deals")
            await self.cache.invalidate_pattern("products:price-range:*")

        async def on_inventory_updated(self, event: InventoryUpdated) -> None:
            await self.cache.invalidate(f"inventory:{event.product_id}")
            # Stock transition (in/out of stock) affects list views
            if (event.old_stock > 0 and event.new_stock == 0) or (
                event.old_stock == 0 and event.new_stock > 0
            ):
                await self.cache.invalidate_pattern("products:*")

        async def on_category_updated(self, event: CategoryUpdated) -> None:
            await self.cache.invalidate(f"category:{event.category_id}")
            await self.cache.invalidate(f"products:category:{event.category_id}")
            await self.cache.invalidate_pattern("categories:*")
    ```
  </Tab>
</Tabs>

### Tag-Based Invalidation

Tag-based invalidation solves a problem that both key-based and pattern-based invalidation struggle with: "invalidate everything related to X." If you cache products, and each product belongs to a category, and each category belongs to a department, how do you invalidate all products in a department when the department changes? Key-based requires you to know every cached key. Pattern-based requires cooperating naming. Tag-based lets you attach arbitrary labels to cache entries and invalidate all entries with a given label.

The mechanism: every cached value stores which tags it belongs to, and every tag stores a reverse index of which keys are tagged with it. Adding a cache entry costs two or three extra Redis operations; invalidating a tag involves reading the tag's key set and deleting all referenced entries. Memory overhead is real (the tag indexes), so cap the cardinality of tags -- do not tag every entry with every conceivable attribute.

<Tabs>
  <Tab title="Node.js">
    ```javascript theme={null}
    // Tagging cache entries for group invalidation

    class TaggedCache {
      constructor(redis) {
        this.redis = redis;
      }

      async set(key, value, options = {}) {
        const { ttl = 3600, tags = [] } = options;
        
        // Store the value
        await this.redis.setEx(key, ttl, JSON.stringify(value));
        
        // Store key in tag sets
        for (const tag of tags) {
          await this.redis.client.sAdd(`tag:${tag}`, key);
          await this.redis.client.expire(`tag:${tag}`, ttl * 2);  // Tags live longer
        }
        
        // Store tags for this key
        if (tags.length > 0) {
          await this.redis.client.sAdd(`key-tags:${key}`, tags);
          await this.redis.client.expire(`key-tags:${key}`, ttl);
        }
      }

      async get(key) {
        const data = await this.redis.get(key);
        return data ? JSON.parse(data) : null;
      }

      async invalidateByTag(tag) {
        // Get all keys with this tag
        const keys = await this.redis.client.sMembers(`tag:${tag}`);
        
        if (keys.length === 0) return 0;
        
        // Delete all cached values
        await this.redis.client.del(keys);
        
        // Clean up tag sets
        for (const key of keys) {
          const keyTags = await this.redis.client.sMembers(`key-tags:${key}`);
          for (const t of keyTags) {
            await this.redis.client.sRem(`tag:${t}`, key);
          }
          await this.redis.client.del(`key-tags:${key}`);
        }
        
        // Delete the tag set
        await this.redis.client.del(`tag:${tag}`);
        
        return keys.length;
      }

      async invalidateByTags(tags) {
        let count = 0;
        for (const tag of tags) {
          count += await this.invalidateByTag(tag);
        }
        return count;
      }
    }

    // Usage
    const cache = new TaggedCache(redis);

    // Cache a product with tags
    await cache.set(`product:123`, product, {
      ttl: 3600,
      tags: [`category:${product.categoryId}`, 'products', 'featured']
    });

    // Later: invalidate all products in a category
    await cache.invalidateByTag('category:electronics');

    // Or invalidate all featured products
    await cache.invalidateByTag('featured');
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    # Tag-based cache with reverse index in Redis sets
    import json
    from typing import Any, Iterable, Optional
    from redis.asyncio import Redis


    class TaggedCache:
        def __init__(self, redis: Redis):
            self.redis = redis

        async def set(
            self,
            key: str,
            value: Any,
            *,
            ttl: int = 3600,
            tags: Iterable[str] = (),
        ) -> None:
            tags = list(tags)
            async with self.redis.pipeline(transaction=False) as pipe:
                # 1. Store the value
                pipe.setex(key, ttl, json.dumps(value, default=str))

                # 2. For each tag, add this key to the tag's reverse index
                for tag in tags:
                    pipe.sadd(f"tag:{tag}", key)
                    pipe.expire(f"tag:{tag}", ttl * 2)  # Tags outlive values

                # 3. Also store the tag list per key (for cleanup)
                if tags:
                    pipe.sadd(f"key-tags:{key}", *tags)
                    pipe.expire(f"key-tags:{key}", ttl)

                await pipe.execute()

        async def get(self, key: str) -> Optional[Any]:
            data = await self.redis.get(key)
            return json.loads(data) if data else None

        async def invalidate_by_tag(self, tag: str) -> int:
            """Delete every cache entry tagged with `tag`."""
            tag_key = f"tag:{tag}"
            keys = await self.redis.smembers(tag_key)
            if not keys:
                return 0

            async with self.redis.pipeline(transaction=False) as pipe:
                # Nuke the cached values themselves
                pipe.delete(*keys)

                # Clean up reverse indexes for each invalidated key
                for key in keys:
                    key_tags = await self.redis.smembers(f"key-tags:{key}")
                    for other_tag in key_tags:
                        pipe.srem(f"tag:{other_tag}", key)
                    pipe.delete(f"key-tags:{key}")

                pipe.delete(tag_key)
                await pipe.execute()

            return len(keys)

        async def invalidate_by_tags(self, tags: Iterable[str]) -> int:
            total = 0
            for tag in tags:
                total += await self.invalidate_by_tag(tag)
            return total


    # Usage
    cache = TaggedCache(redis)

    await cache.set(
        f"product:123",
        product.model_dump(),
        ttl=3600,
        tags=[f"category:{product.category_id}", "products", "featured"],
    )

    # Invalidate all products in the electronics category
    await cache.invalidate_by_tag("category:electronics")
    ```
  </Tab>
</Tabs>

***

## Cache Patterns for Microservices

### Multi-Level Caching

Multi-level caching (also called tiered caching) recognizes that not all caches are equal. An in-process cache like a Python dict or Node.js `Map` has microsecond access times but is per-instance -- if you have 10 pods, you have 10 separate caches. A distributed cache like Redis has sub-millisecond access times but is shared across all instances. The database is milliseconds but durable. Each tier serves a different purpose, and combining them gives you the best of all worlds: L1 absorbs the hottest keys at wire speed, L2 handles the long tail across all instances, and the database is the source of truth.

The tricky part is keeping L1 coherent across instances. If Service Instance A invalidates a key in Redis but Service Instance B has that key in its local cache, B will serve stale data until its local TTL expires. The solution is pub/sub: when any instance invalidates a key, it publishes an invalidation message on a Redis channel, and all instances listen and invalidate their L1. This is the pattern below.

Why this matters for microservices: network calls to Redis are still network calls. At high QPS (10,000+ RPS per instance), even 1ms of Redis latency adds up. An L1 cache with 90% hit rate eliminates 90% of those Redis round trips. The cost is coherence complexity, which is manageable if you accept very short L1 TTLs (30-60 seconds) and tolerate minor staleness.

<Tabs>
  <Tab title="Node.js">
    ```javascript theme={null}
    // L1: Local in-memory cache (fastest)
    // L2: Distributed Redis cache (shared)
    // L3: Database (source of truth)

    const NodeCache = require('node-cache');

    class MultiLevelCache {
      constructor(redis) {
        // L1: Local cache (per instance, fast but not shared)
        this.l1 = new NodeCache({
          stdTTL: 60,  // 1 minute
          checkperiod: 30,
          maxKeys: 1000
        });
        
        // L2: Redis (shared across instances)
        this.l2 = redis;
        
        this.l2TTL = 3600;  // 1 hour
      }

      async get(key, fetchFn) {
        // Try L1 (local memory)
        let data = this.l1.get(key);
        if (data !== undefined) {
          console.log(`L1 HIT: ${key}`);
          return data;
        }

        // Try L2 (Redis)
        const cached = await this.l2.get(key);
        if (cached) {
          console.log(`L2 HIT: ${key}`);
          data = JSON.parse(cached);
          
          // Populate L1
          this.l1.set(key, data);
          return data;
        }

        // L3: Fetch from source
        console.log(`CACHE MISS: ${key}`);
        data = await fetchFn();
        
        if (data !== null && data !== undefined) {
          // Populate both caches
          this.l1.set(key, data);
          await this.l2.setEx(key, this.l2TTL, JSON.stringify(data));
        }

        return data;
      }

      async invalidate(key) {
        // Invalidate both levels
        this.l1.del(key);
        await this.l2.del(key);
      }

      async invalidateLocal(key) {
        // Invalidate only local cache (for pub/sub invalidation)
        this.l1.del(key);
      }
    }

    // Cross-instance invalidation with Redis Pub/Sub
    class CacheCoordinator {
      constructor(multiLevelCache, redis) {
        this.cache = multiLevelCache;
        this.redis = redis;
        this.channel = 'cache-invalidation';
        this.instanceId = process.env.INSTANCE_ID || require('crypto').randomUUID();
      }

      async setup() {
        // Subscribe to invalidation messages
        const subscriber = this.redis.duplicate();
        await subscriber.connect();
        
        await subscriber.subscribe(this.channel, (message) => {
          const { key, sourceInstance } = JSON.parse(message);
          
          // Don't process our own messages
          if (sourceInstance === this.instanceId) return;
          
          // Invalidate local cache
          this.cache.invalidateLocal(key);
          console.log(`Received invalidation for ${key} from ${sourceInstance}`);
        });
      }

      async invalidate(key) {
        // Invalidate locally and in Redis
        await this.cache.invalidate(key);
        
        // Notify other instances
        await this.redis.client.publish(this.channel, JSON.stringify({
          key,
          sourceInstance: this.instanceId
        }));
      }
    }
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    # Multi-level cache: L1 (cachetools.TTLCache) -> L2 (Redis) -> L3 (DB)
    import asyncio
    import json
    import logging
    import os
    import uuid
    from typing import Any, Awaitable, Callable, Optional
    from cachetools import TTLCache
    from redis.asyncio import Redis

    logger = logging.getLogger(__name__)


    class MultiLevelCache:
        def __init__(
            self,
            redis: Redis,
            l1_maxsize: int = 1000,
            l1_ttl: int = 60,
            l2_ttl: int = 3600,
        ):
            # L1: bounded-size TTLCache - safe for memory-limited containers
            self.l1: TTLCache = TTLCache(maxsize=l1_maxsize, ttl=l1_ttl)
            self.l2 = redis
            self.l2_ttl = l2_ttl

        async def get(
            self,
            key: str,
            fetch_fn: Callable[[], Awaitable[Any]],
        ) -> Any:
            # L1 lookup (in-process, microseconds)
            if key in self.l1:
                logger.debug("L1 HIT: %s", key)
                return self.l1[key]

            # L2 lookup (Redis, sub-millisecond)
            cached = await self.l2.get(key)
            if cached:
                logger.debug("L2 HIT: %s", key)
                data = json.loads(cached)
                self.l1[key] = data  # Warm L1
                return data

            # L3: source of truth
            logger.debug("CACHE MISS: %s", key)
            data = await fetch_fn()

            if data is not None:
                self.l1[key] = data
                await self.l2.setex(key, self.l2_ttl, json.dumps(data, default=str))
            return data

        async def invalidate(self, key: str) -> None:
            self.l1.pop(key, None)
            await self.l2.delete(key)

        def invalidate_local(self, key: str) -> None:
            """For pub/sub-driven L1 invalidation only."""
            self.l1.pop(key, None)


    class CacheCoordinator:
        """Cross-instance L1 coherence via Redis pub/sub.
        Each instance publishes invalidations and listens for others."""

        def __init__(self, cache: MultiLevelCache, redis: Redis):
            self.cache = cache
            self.redis = redis
            self.channel = "cache-invalidation"
            self.instance_id = os.getenv("INSTANCE_ID", str(uuid.uuid4()))
            self._task: Optional[asyncio.Task] = None

        async def start(self) -> None:
            # Dedicated pubsub connection - don't mix with request traffic
            pubsub = self.redis.pubsub()
            await pubsub.subscribe(self.channel)
            self._task = asyncio.create_task(self._listen(pubsub))
            logger.info("CacheCoordinator started (instance=%s)", self.instance_id)

        async def _listen(self, pubsub) -> None:
            async for message in pubsub.listen():
                if message["type"] != "message":
                    continue
                try:
                    payload = json.loads(message["data"])
                except (ValueError, TypeError):
                    continue

                # Ignore self-published messages
                if payload.get("source_instance") == self.instance_id:
                    continue

                self.cache.invalidate_local(payload["key"])
                logger.debug(
                    "L1 invalidated for %s from %s",
                    payload["key"],
                    payload.get("source_instance"),
                )

        async def invalidate(self, key: str) -> None:
            # Local + Redis + broadcast to other instances
            await self.cache.invalidate(key)
            await self.redis.publish(
                self.channel,
                json.dumps({"key": key, "source_instance": self.instance_id}),
            )

        async def stop(self) -> None:
            if self._task:
                self._task.cancel()
    ```
  </Tab>
</Tabs>

### Request Coalescing (Thundering Herd Prevention)

The thundering herd problem is one of those issues that never shows up in testing but can take down production. Here is the scenario: a popular cache key expires. In the milliseconds before the cache is repopulated, 500 concurrent requests all see a cache miss and all hit the database simultaneously. Your database connection pool is exhausted, queries start timing out, and now you have a cascading failure. Request coalescing ensures that only one of those 500 requests actually fetches from the database -- the other 499 wait for that one result.

The insight: if 500 requests are asking the database for the same thing at the same time, you only need to ask the database once. Everyone else can share the answer. This is sometimes called "singleflight" (borrowed from Go's `golang.org/x/sync/singleflight`). In Python, the natural primitive is an `asyncio.Future` or `asyncio.Event` that later arrivals await on while the first arrival does the actual work.

Why this matters for microservices specifically: thundering herds cascade. If your product service gets hammered, its database gets hammered, which makes its queries slower, which makes the initial requests time out, which causes retries, which amplifies the problem. Request coalescing breaks this cycle at the source -- the database sees one query instead of 500, and the 499 other callers experience a slight delay rather than a failure.

<Tabs>
  <Tab title="Node.js">
    ```javascript theme={null}
    // Prevent multiple identical requests from hitting the database

    class CoalescingCache {
      constructor(cache) {
        this.cache = cache;
        this.pending = new Map();  // In-flight requests
      }

      async get(key, fetchFn, ttl = 3600) {
        // Check cache first
        const cached = await this.cache.get(key);
        if (cached) {
          return JSON.parse(cached);
        }

        // Check if request is already in flight
        if (this.pending.has(key)) {
          console.log(`Coalescing request for ${key}`);
          return this.pending.get(key);
        }

        // Create new request
        const promise = this.fetchAndCache(key, fetchFn, ttl);
        this.pending.set(key, promise);

        try {
          const result = await promise;
          return result;
        } finally {
          this.pending.delete(key);
        }
      }

      async fetchAndCache(key, fetchFn, ttl) {
        const data = await fetchFn();
        
        if (data !== null && data !== undefined) {
          await this.cache.setEx(key, ttl, JSON.stringify(data));
        }
        
        return data;
      }
    }

    // Usage - even with 100 concurrent requests, only 1 DB query
    const cache = new CoalescingCache(redis);

    // Simulate 100 concurrent requests for same product
    const requests = Array(100).fill().map(() =>
      cache.get('product:123', () => productRepository.findById('123'))
    );

    const results = await Promise.all(requests);
    // Only 1 database query was made!
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    # Request coalescing with asyncio - the "singleflight" pattern
    import asyncio
    import json
    from typing import Any, Awaitable, Callable, Optional


    class CoalescingCache:
        """Collapses concurrent cache misses for the same key into a single
        fetch. Prevents thundering herd against the DB."""

        def __init__(self, cache):
            self.cache = cache
            # key -> Future that resolves to the fetched value
            self._pending: dict[str, asyncio.Future] = {}
            # Guard the _pending map itself from racy access
            self._lock = asyncio.Lock()

        async def get(
            self,
            key: str,
            fetch_fn: Callable[[], Awaitable[Any]],
            ttl: int = 3600,
        ) -> Any:
            # Fast path: check cache first
            cached = await self.cache.get(key)
            if cached:
                return json.loads(cached)

            async with self._lock:
                # Re-check cache inside lock (double-checked locking)
                cached = await self.cache.get(key)
                if cached:
                    return json.loads(cached)

                # Is a fetch already in flight for this key?
                if key in self._pending:
                    future = self._pending[key]
                else:
                    # Kick off the single fetch; other callers await this future
                    future = asyncio.ensure_future(
                        self._fetch_and_cache(key, fetch_fn, ttl)
                    )
                    self._pending[key] = future
                    # Schedule cleanup when the fetch completes
                    future.add_done_callback(
                        lambda _f, k=key: self._pending.pop(k, None)
                    )

            # Await outside the lock - don't hold the lock during the fetch
            return await future

        async def _fetch_and_cache(
            self,
            key: str,
            fetch_fn: Callable[[], Awaitable[Any]],
            ttl: int,
        ) -> Any:
            data = await fetch_fn()
            if data is not None:
                await self.cache.setex(key, ttl, json.dumps(data, default=str))
            return data


    # Usage: 100 concurrent requests -> exactly 1 DB query
    cache = CoalescingCache(redis_client)

    results = await asyncio.gather(
        *[
            cache.get("product:123", lambda: product_repo.find_by_id("123"))
            for _ in range(100)
        ]
    )
    ```
  </Tab>
</Tabs>

### Cache Warming

Cold caches are a silent killer of newly deployed services. Your system tests passed with a warm cache, so latency looks like 5ms. You deploy. For the first few minutes, the cache is empty, every request is a miss, and latency is 100ms. If that first cold-start traffic is heavy, the database gets hammered and cascades into broader failure. Cache warming preempts this: before the service accepts production traffic, pre-load the cache with data you *know* will be requested.

What to warm: the Pareto principle applies. Typically 10% of cache keys serve 80% of traffic. Identify those hot keys (featured products, popular categories, top-viewed items from yesterday's logs) and pre-populate them. You do not need to warm everything -- the long tail can be populated on demand. Some teams go further with "continuous warming": a background job periodically refreshes the hot keys so they never expire.

Cache warming is also part of your deployment strategy. If you use Kubernetes rolling updates, new pods should warm their local cache during their readiness check, before traffic routes to them. This way, the load balancer only sends traffic to a pod after its cache is useful. Otherwise, the first pod to receive traffic after a deploy will have a 100% miss rate and degrade user experience until it warms up naturally.

<Tabs>
  <Tab title="Node.js">
    ```javascript theme={null}
    // Pre-populate cache before traffic hits

    class CacheWarmer {
      constructor(cache, repository) {
        this.cache = cache;
        this.repository = repository;
      }

      async warmAll() {
        console.log('Starting cache warming...');
        
        await Promise.all([
          this.warmFeaturedProducts(),
          this.warmCategories(),
          this.warmPopularProducts(),
          this.warmConfigurations()
        ]);
        
        console.log('Cache warming complete');
      }

      async warmFeaturedProducts() {
        const products = await this.repository.getFeaturedProducts();
        
        for (const product of products) {
          await this.cache.set(`product:${product.id}`, product, 3600);
        }
        
        await this.cache.set('products:featured', products, 3600);
        console.log(`Warmed ${products.length} featured products`);
      }

      async warmCategories() {
        const categories = await this.repository.getAllCategories();
        
        for (const category of categories) {
          await this.cache.set(`category:${category.id}`, category, 7200);
          
          // Also warm products per category
          const products = await this.repository.getProductsByCategory(category.id);
          await this.cache.set(`products:category:${category.id}`, products, 3600);
        }
        
        await this.cache.set('categories:all', categories, 7200);
        console.log(`Warmed ${categories.length} categories`);
      }

      async warmPopularProducts() {
        // Top 100 most viewed products
        const products = await this.repository.getPopularProducts(100);
        
        for (const product of products) {
          await this.cache.set(`product:${product.id}`, product, 3600);
        }
        
        console.log(`Warmed ${products.length} popular products`);
      }

      async warmConfigurations() {
        const configs = await this.repository.getAllConfigurations();
        
        for (const [key, value] of Object.entries(configs)) {
          await this.cache.set(`config:${key}`, value, 86400);  // 24 hours
        }
        
        console.log(`Warmed ${Object.keys(configs).length} configurations`);
      }
    }

    // Run on startup
    app.on('ready', async () => {
      const warmer = new CacheWarmer(cache, repository);
      await warmer.warmAll();
    });

    // Also run periodically
    setInterval(async () => {
      await warmer.warmPopularProducts();
    }, 30 * 60 * 1000);  // Every 30 minutes
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    # Cache warming - run at FastAPI startup and on a background schedule
    import asyncio
    import logging
    from contextlib import asynccontextmanager
    from fastapi import FastAPI

    logger = logging.getLogger(__name__)


    class CacheWarmer:
        def __init__(self, cache, repository):
            self.cache = cache
            self.repository = repository

        async def warm_all(self) -> None:
            logger.info("Starting cache warming...")
            # Run warmers concurrently - they touch different keys
            await asyncio.gather(
                self.warm_featured_products(),
                self.warm_categories(),
                self.warm_popular_products(),
                self.warm_configurations(),
            )
            logger.info("Cache warming complete")

        async def warm_featured_products(self) -> None:
            products = await self.repository.get_featured_products()
            async with self.cache.redis.pipeline(transaction=False) as pipe:
                for product in products:
                    pipe.setex(f"product:{product.id}", 3600, product.model_dump_json())
                pipe.setex(
                    "products:featured",
                    3600,
                    json.dumps([p.model_dump() for p in products], default=str),
                )
                await pipe.execute()
            logger.info("Warmed %d featured products", len(products))

        async def warm_categories(self) -> None:
            categories = await self.repository.get_all_categories()
            for category in categories:
                await self.cache.set(f"category:{category.id}", category.model_dump(), ttl=7200)
                products = await self.repository.get_products_by_category(category.id)
                await self.cache.set(
                    f"products:category:{category.id}",
                    [p.model_dump() for p in products],
                    ttl=3600,
                )
            await self.cache.set(
                "categories:all",
                [c.model_dump() for c in categories],
                ttl=7200,
            )
            logger.info("Warmed %d categories", len(categories))

        async def warm_popular_products(self) -> None:
            # Pareto: top 100 keys serve ~80% of traffic
            products = await self.repository.get_popular_products(limit=100)
            for product in products:
                await self.cache.set(f"product:{product.id}", product.model_dump(), ttl=3600)
            logger.info("Warmed %d popular products", len(products))

        async def warm_configurations(self) -> None:
            configs = await self.repository.get_all_configurations()
            for key, value in configs.items():
                await self.cache.set(f"config:{key}", value, ttl=86400)  # 24h
            logger.info("Warmed %d configurations", len(configs))


    # FastAPI lifespan hook - warm on startup, schedule periodic re-warm
    @asynccontextmanager
    async def lifespan(app: FastAPI):
        warmer = CacheWarmer(app.state.cache, app.state.repository)
        # Warm before readiness probe passes
        await warmer.warm_all()

        # Periodic refresh of hot keys
        async def periodic_warm():
            while True:
                await asyncio.sleep(30 * 60)  # Every 30 minutes
                try:
                    await warmer.warm_popular_products()
                except Exception:
                    logger.exception("Periodic warm failed")

        task = asyncio.create_task(periodic_warm())
        yield
        task.cancel()


    app = FastAPI(lifespan=lifespan)
    ```
  </Tab>
</Tabs>

***

## Cache Failure Handling

<Warning>
  **Caveats and Common Pitfalls with Cache Failure**

  Cache failure is the single most common cause of large-scale outages at companies with mature microservice architectures. The pitfalls:

  * **"Fail open" becomes "fail catastrophic."** The intuitive default is "if cache is down, go to the database." In a low-traffic system this works. At scale, it converts a cache outage into a database outage in under 60 seconds. Your system assumed a 95 percent cache hit rate and a database sized for 5 percent of total traffic; now the database gets 100 percent and capacity-panics.
  * **Cache warming on recovery triggers a second stampede.** Redis comes back after a 10-minute outage. Suddenly every service rushes to repopulate, sending a flood of reads to the database and a flood of writes to Redis simultaneously. The recovering cache cluster can actually go down *again* under the write pressure of being re-warmed. This is a real, repeated failure mode.
  * **Local fallback caches grow unbounded and cause OOM kills.** Under Redis outage, services fall back to in-process caches. Without size limits, these caches grow until the JVM/Node process hits its memory limit and gets OOM-killed by Kubernetes. The "fallback" makes the outage worse by killing pods.
  * **Circuit breaker tuning is invisible until it matters.** Teams set the circuit breaker to "open after 50 percent error rate over 10 seconds" and never revisit. During a partial Redis degradation (say, 15 percent error rate), the breaker never opens, and the service accumulates connection timeouts until the connection pool exhausts.
</Warning>

<Tip>
  **Solutions and Patterns for Cache Failure Resilience**

  * **Design for the cache-off scenario up front.** Load test your system with the cache completely disabled. Whatever latency and throughput you measure is your true floor. Capacity-plan the database to handle at least a significant fraction of peak traffic without the cache, not just the cache-miss rate. This is not free, but it is what keeps cache failure from becoming database failure.
  * **Multi-tier caching with bounded local fallback.** L1 is a size-bounded in-process LRU cache (typically tens of megabytes, tens of thousands of entries). L2 is Redis. Even during a full Redis outage, L1 absorbs the hot keys and dramatically flattens the spike to the database. Use libraries with explicit size limits: Caffeine (Java), lru-cache (Node), cachetools (Python).
  * **Staggered cache warming on recovery.** Instead of warming the entire cache at once, warm by key-popularity percentile with jitter. Top 1 percent of keys warmed first over 30 seconds with random delay, next 10 percent over the following minute, and so on. This prevents the recovery thundering herd.
  * **Circuit breaker on the cache client, not just the database.** When Redis latency climbs past a threshold, stop calling Redis entirely for some fraction of requests and go straight to L1 or the database. Half-open the breaker periodically to test recovery. This prevents slow-but-alive Redis from degrading the service worse than fully-dead Redis would.
  * **Treat cache as a capacity multiplier, not a dependency.** Every feature that "requires" the cache to function is a latent outage. If a feature cannot work without cache hits, either make the underlying database path faster (indexes, read replicas) or accept that the feature will degrade visibly when the cache fails. Hiding the degradation is worse than degrading visibly.
</Tip>

When Redis dies, what happens to your service? The honest answer for most teams is "we find out." Designing for cache failure up front costs a bit more complexity, but it means your service survives cache outages instead of cascading into a full outage. The goal is graceful degradation: when Redis is healthy, serve from cache (fast). When Redis is unhealthy, serve from the database or a local fallback (slower, but still working). Never let a cache outage become an application outage.

The resilient cache below uses three techniques working together. First, health monitoring: a background ping tells us when Redis comes and goes. Second, circuit-breaker-style behavior: when Redis is known to be down, do not even try to call it -- skip straight to the fallback. Third, a local memory fallback: keep a small in-process cache of recently-seen values, so that during a Redis outage, popular keys continue to be served from local memory while the long tail falls through to the database.

<Tabs>
  <Tab title="Node.js">
    ```javascript theme={null}
    // Graceful degradation when cache fails

    class ResilientCache {
      constructor(redis, options = {}) {
        this.redis = redis;
        this.fallbackTTL = options.fallbackTTL || 60;  // 1 minute local fallback
        this.localFallback = new Map();
        this.isHealthy = true;
        
        // Monitor health
        this.startHealthCheck();
      }

      startHealthCheck() {
        setInterval(async () => {
          try {
            await this.redis.client.ping();
            if (!this.isHealthy) {
              console.log('Redis connection restored');
              this.isHealthy = true;
              this.localFallback.clear();  // Clear stale fallback data
            }
          } catch (error) {
            if (this.isHealthy) {
              console.error('Redis connection lost:', error.message);
              this.isHealthy = false;
            }
          }
        }, 5000);  // Check every 5 seconds
      }

      async get(key) {
        if (!this.isHealthy) {
          // Use local fallback
          const fallback = this.localFallback.get(key);
          if (fallback && fallback.expires > Date.now()) {
            return fallback.value;
          }
          return null;
        }

        try {
          const data = await this.redis.get(key);
          
          if (data) {
            // Update local fallback
            this.localFallback.set(key, {
              value: data,
              expires: Date.now() + (this.fallbackTTL * 1000)
            });
          }
          
          return data;
        } catch (error) {
          console.error(`Cache get error for ${key}:`, error.message);
          this.isHealthy = false;
          
          // Try local fallback
          const fallback = this.localFallback.get(key);
          return fallback?.value || null;
        }
      }

      async set(key, value, ttl = 3600) {
        if (!this.isHealthy) {
          // Store in local fallback only
          this.localFallback.set(key, {
            value,
            expires: Date.now() + (Math.min(ttl, this.fallbackTTL) * 1000)
          });
          return false;
        }

        try {
          await this.redis.setEx(key, ttl, value);
          
          // Also update local fallback
          this.localFallback.set(key, {
            value,
            expires: Date.now() + (this.fallbackTTL * 1000)
          });
          
          return true;
        } catch (error) {
          console.error(`Cache set error for ${key}:`, error.message);
          this.isHealthy = false;
          
          // Store in local fallback
          this.localFallback.set(key, {
            value,
            expires: Date.now() + (this.fallbackTTL * 1000)
          });
          
          return false;
        }
      }

      getHealth() {
        return {
          healthy: this.isHealthy,
          fallbackSize: this.localFallback.size
        };
      }
    }
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    # Resilient cache with health check + local fallback (cachetools.TTLCache)
    import asyncio
    import logging
    from typing import Any, Optional
    from cachetools import TTLCache
    from redis.asyncio import Redis
    from redis.exceptions import RedisError

    logger = logging.getLogger(__name__)


    class ResilientCache:
        """Never raises to callers. During Redis outage, serves from bounded
        in-memory TTLCache; returns None if the key is not in the fallback."""

        def __init__(
            self,
            redis: Redis,
            fallback_size: int = 10_000,
            fallback_ttl: int = 60,
        ):
            self.redis = redis
            self.fallback: TTLCache = TTLCache(
                maxsize=fallback_size,
                ttl=fallback_ttl,
            )
            self.is_healthy = True
            self._health_task: Optional[asyncio.Task] = None

        async def start(self) -> None:
            self._health_task = asyncio.create_task(self._health_loop())

        async def _health_loop(self) -> None:
            while True:
                await asyncio.sleep(5)
                try:
                    await self.redis.ping()
                    if not self.is_healthy:
                        logger.info("Redis connection restored")
                        self.is_healthy = True
                        self.fallback.clear()  # Drop potentially-stale fallback
                except RedisError as exc:
                    if self.is_healthy:
                        logger.error("Redis connection lost: %s", exc)
                        self.is_healthy = False

        async def get(self, key: str) -> Optional[Any]:
            if not self.is_healthy:
                # Short-circuit: serve from local fallback if present
                return self.fallback.get(key)

            try:
                data = await self.redis.get(key)
                if data is not None:
                    self.fallback[key] = data  # Warm the safety net
                return data
            except RedisError as exc:
                logger.error("Cache get error for %s: %s", key, exc)
                self.is_healthy = False
                return self.fallback.get(key)

        async def set(self, key: str, value: Any, ttl: int = 3600) -> bool:
            if not self.is_healthy:
                # Accept the write into fallback only - will be lost on restart
                self.fallback[key] = value
                return False

            try:
                await self.redis.setex(key, ttl, value)
                self.fallback[key] = value
                return True
            except RedisError as exc:
                logger.error("Cache set error for %s: %s", key, exc)
                self.is_healthy = False
                self.fallback[key] = value
                return False

        def get_health(self) -> dict:
            return {
                "healthy": self.is_healthy,
                "fallback_size": len(self.fallback),
                "fallback_maxsize": self.fallback.maxsize,
            }

        async def stop(self) -> None:
            if self._health_task:
                self._health_task.cancel()


    # FastAPI decorator style with fastapi-cache2 for simpler use cases
    from fastapi import FastAPI
    from fastapi_cache import FastAPICache
    from fastapi_cache.backends.redis import RedisBackend
    from fastapi_cache.decorator import cache

    app = FastAPI()


    @app.on_event("startup")
    async def on_startup():
        FastAPICache.init(RedisBackend(redis_client), prefix="api-cache")


    @app.get("/products/{product_id}")
    @cache(expire=3600)  # Clean decorator-based caching
    async def get_product(product_id: str):
        return await product_repository.find_by_id(product_id)
    ```
  </Tab>
</Tabs>

***

## Interview Questions

<AccordionGroup>
  <Accordion title="Q1: What is cache-aside pattern and when to use it?">
    **Answer:**

    **Cache-Aside (Lazy Loading):**

    1. Application checks cache first
    2. On miss, fetch from database
    3. Store in cache for future requests

    ```javascript theme={null}
    const data = await cache.get(key);
    if (!data) {
      data = await db.fetch(key);
      await cache.set(key, data);
    }
    return data;
    ```

    **Pros:**

    * Only cache what's needed
    * Cache failures don't break reads
    * Simple to implement

    **Cons:**

    * First request is slow (cache miss)
    * Potential stale data
    * Three round trips on miss

    **Use for:** Most read-heavy scenarios
  </Accordion>

  <Accordion title="Q2: How do you handle cache invalidation in microservices?">
    **Answer:**

    **Strategies:**

    1. **TTL-based**: Set expiration, accept staleness
    2. **Event-based**: Publish events on writes, subscribers invalidate
    3. **Tag-based**: Group related entries, invalidate by tag
    4. **Version-based**: Include version in key, change on update

    **Best Practice for Microservices:**

    * Publish events when data changes
    * Each service invalidates its own cache
    * Use short TTLs as safety net

    ```javascript theme={null}
    // On update
    await db.update(product);
    await eventBus.publish('product.updated', { productId });

    // Consumers invalidate their caches
    eventBus.on('product.updated', (e) => cache.del(`product:${e.productId}`));
    ```
  </Accordion>

  <Accordion title="Q3: What is the thundering herd problem and how to prevent it?">
    **Answer:**

    **Problem:** When cache expires, many concurrent requests hit database simultaneously.

    **Solutions:**

    1. **Request Coalescing**: Single request, others wait

    ```javascript theme={null}
    if (pending.has(key)) return pending.get(key);
    pending.set(key, fetchFromDB(key));
    ```

    2. **Probabilistic Early Expiration**: Randomly refresh before TTL

    3. **Background Refresh**: Refresh in background before expiry

    4. **Locking**: Only one process refreshes cache

    ```javascript theme={null}
    if (await lock.acquire(key)) {
      data = await fetchFromDB();
      await cache.set(key, data);
      lock.release(key);
    }
    ```
  </Accordion>

  <Accordion title="Q4: When would you use write-through vs write-behind caching?">
    **Answer:**

    **Write-Through:**

    * Write to cache AND database in same operation
    * Strong consistency
    * Slower writes (two writes)

    **Use for:** Data that must be consistent, user profiles

    **Write-Behind (Write-Back):**

    * Write to cache first, async to database
    * Fast writes
    * Risk of data loss if cache fails

    **Use for:** Analytics, metrics, high-frequency updates

    **Decision factors:**

    * Consistency requirements
    * Write frequency
    * Tolerance for data loss
  </Accordion>

  <Accordion title="Q5: How do you handle cache in a multi-instance deployment?">
    **Answer:**

    **Challenges:**

    * Each instance has local cache
    * Invalidation must reach all instances
    * Data inconsistency between instances

    **Solutions:**

    1. **Distributed Cache Only** (Redis)
       * No local cache
       * Always consistent
       * Higher latency

    2. **Multi-Level Cache + Pub/Sub**
       * L1: Local (fast)
       * L2: Redis (shared)
       * Invalidate via pub/sub

    3. **Short TTL for Local Cache**
       * Local cache: 60s
       * Redis cache: 1 hour
       * Accept brief inconsistency

    ```javascript theme={null}
    // Pub/sub invalidation
    redis.subscribe('cache-invalidate', (key) => {
      localCache.del(key);
    });
    ```
  </Accordion>
</AccordionGroup>

***

## Chapter Summary

<Info>
  **Key Takeaways:**

  * Choose the right caching pattern (cache-aside, write-through, write-behind)
  * Implement robust invalidation strategies
  * Handle cache failures gracefully with fallbacks
  * Use multi-level caching for performance
  * Prevent thundering herd with request coalescing
  * Warm caches proactively for predictable latency
</Info>

**Next Chapter:** Load Balancing - Distribution strategies for microservices.

***

## Interview Deep-Dive

<AccordionGroup>
  <Accordion title="'Your Redis cache cluster goes down during peak traffic. Within 30 seconds, your PostgreSQL database is overwhelmed and the entire system crashes. What happened and how do you prevent it?'">
    **Strong Answer:**

    This is the thundering herd problem (also called cache stampede). When the cache is unavailable, every request that would have been a cache hit becomes a database query. If your cache hit rate is 95%, you suddenly have 20x the database load (going from 5% to 100% of requests hitting the database). The database cannot handle 20x its normal load and collapses.

    Prevention requires multiple layers. First, circuit breaker on the database: if database connection pool utilization exceeds 80%, start rejecting new queries with a 503 rather than queuing them until the database runs out of connections and crashes. This is the "shed load gracefully" principle -- some failed requests are better than a total system crash.

    Second, fallback to stale cache data. When Redis goes down, do not immediately go to the database. Use a local in-memory cache (LRU cache in the process) as a second tier. The local cache has a shorter TTL and smaller capacity, but it absorbs the most frequent queries. Even a 50% local hit rate halves the database load during a Redis outage.

    Third, request coalescing (also called "single flight"). When multiple concurrent requests ask for the same cache key and all get a miss, only one request should query the database. The others wait for the first request to complete and share its result. This prevents 100 concurrent requests for the same popular product from all hitting the database simultaneously.

    Fourth, cache warming on recovery. When Redis comes back, do not wait for natural traffic to repopulate the cache. Run a warming script that pre-loads the top 1,000 most frequently accessed keys from the database. This prevents a second thundering herd when traffic shifts back from the local cache to the recovering Redis.

    **Follow-up: "How do you design the local in-memory cache to avoid memory issues in a containerized environment?"**

    The local cache must be size-bounded (not TTL-bounded) to prevent OOM kills. I use an LRU (Least Recently Used) cache with a fixed maximum number of entries. For a Node.js service with 512MB memory limit, I allocate at most 50MB for the local cache (roughly 10,000-50,000 entries depending on value size). The cache uses a `Map` with a doubly-linked list for LRU eviction. Libraries like `lru-cache` in Node.js handle this efficiently. Monitoring is critical: I expose the cache hit rate and size as Prometheus metrics so I can tell whether the local cache is actually helping during Redis outages.
  </Accordion>

  <Accordion title="'Service A caches user profiles with a 1-hour TTL. Service B can update user profiles. How do you ensure Service A does not serve stale data for up to an hour after a profile update?'">
    **Strong Answer:**

    This is the distributed cache invalidation problem, and it has four solutions with different trade-off profiles.

    Option one: event-driven invalidation. When Service B updates a profile, it publishes a UserProfileUpdated event to Kafka. Service A subscribes to this event and invalidates (or updates) the corresponding cache entry. Staleness window: seconds (the time for the event to propagate through Kafka). This is the approach I recommend for most cases because it is decoupled and reliable.

    Option two: write-through from Service B to Service A's cache. When Service B updates a profile, it also writes the updated data directly to Service A's Redis cache (or sends a cache invalidation message). Staleness window: milliseconds. But it creates coupling -- Service B needs to know about Service A's caching strategy.

    Option three: shorter TTL. Reduce the cache TTL from 1 hour to 30 seconds. Staleness window: 30 seconds maximum. Simple but increases database load by 120x (one query every 30 seconds versus every hour per user). Only viable if the dataset is small or the database can handle the load.

    Option four: cache-aside with version checking. Service A stores the cache entry with a version number. On every read, Service A makes a lightweight "version check" call to Service B (just returns the current version, not the full profile). If the version matches, use the cache. If not, fetch fresh data. This is a hybrid between caching and fresh reads.

    For user profiles specifically, I use option one (event-driven invalidation) because profile updates are infrequent (most users update their profile rarely), so the event volume is manageable, and the seconds of propagation delay is acceptable. For pricing data (which changes frequently and staleness has financial impact), I would use option four with version checking.

    **Follow-up: "What if the Kafka event is delayed or lost? Service A still serves stale data."**

    I add a maximum TTL as a safety net. Even with event-driven invalidation, the cache entry has a TTL of 1 hour. If the invalidation event is lost, the worst case is 1 hour of stale data, not infinite staleness. For critical data where even 1 hour is too long, I add a "cache verification" background job that periodically samples cached entries and verifies them against the source. Any stale entries are invalidated. This belt-and-suspenders approach ensures that event loss does not lead to permanently stale cache entries.
  </Accordion>

  <Accordion title="'Compare cache-aside, write-through, and write-behind patterns. When would you use each in a microservices architecture?'">
    **Strong Answer:**

    Cache-aside (lazy loading): the application checks the cache first, fetches from the database on miss, and populates the cache. The application manages the cache explicitly. I use this for most read-heavy workloads because it is the safest: if the cache goes down, the application still works (it just hits the database). The downside is the first request for any data is always slow (cache miss), and there is a consistency window between a write and the next cache read.

    Write-through: the application writes to both the cache and the database simultaneously (or the cache writes to the database on behalf of the application). Every write immediately updates the cache, so reads are always fresh. I use this for data that is read immediately after writing -- for example, after a user updates their profile, the next page load should show the updated profile. The downside: write latency increases because you are writing to two stores, and if the cache is down, writes fail (unless you add fallback logic).

    Write-behind (write-back): the application writes to the cache, and the cache asynchronously writes to the database later. This gives the lowest write latency because the application only waits for the cache write (microseconds) not the database write (milliseconds). I use this for high-throughput writes where some data loss is acceptable -- for example, page view counters, analytics events, or session updates. The risk: if the cache crashes before flushing to the database, those writes are lost. This is not acceptable for financial data.

    In a microservices architecture, I use cache-aside for 80% of caching needs (product catalogs, user profiles, configuration). Write-through for the session service (session updates must be immediately visible across all service instances). Write-behind for the analytics service (high write volume, eventual consistency is fine, losing a few page view events is acceptable).

    The pattern I explicitly avoid: read-through caching at the database level (like MySQL query cache). It adds complexity inside the database, is difficult to invalidate correctly, and was removed from MySQL 8.0 for good reason. Application-level caching with Redis gives you more control over invalidation, TTL, and monitoring.

    **Follow-up: "How do you handle cache warming for a new service deployment that starts with an empty cache?"**

    Cold start is a real problem -- the first few minutes after deployment have 0% cache hit rate, which means 100% database load. I use two strategies. First, pre-warming: during the deployment's readiness check, the service loads the top N most-accessed keys from a pre-computed list (based on yesterday's access patterns). Second, gradual traffic shift: using Kubernetes rolling updates with `maxSurge: 1`, new pods receive traffic gradually while old pods (with warm caches) still handle most requests. By the time the old pods are terminated, the new pods have warmed up through natural traffic.
  </Accordion>
</AccordionGroup>

***

## Interview Questions with Structured Answers

<AccordionGroup>
  <Accordion title="Your cache hit rate drops from 95 percent to 40 percent immediately after a deploy, and latency doubles. Walk me through your debugging process and the first three hypotheses you investigate.">
    **Strong Answer Framework**

    1. **Stabilize first, debug second.** If latency is user-visible and the database is near capacity, roll back the deploy immediately. Debugging a live incident on a cold cache is a bad trade. Only investigate on the running-but-rolled-back state or in a reproduction environment.
    2. **Pull the three obvious signals in parallel.** Cache keyspace size (did total keys drop or spike?), cache hit rate broken down by key pattern (is it one pattern or everything?), and database query mix (which queries are hitting the database that were previously cached?). These three answer "what changed" in under five minutes.
    3. **Hypothesis one: cache key format changed.** The most common cause. The new deploy includes a refactor that changed how keys are constructed -- maybe added a user locale, a feature flag, or a version prefix. Every old cache entry is now orphaned; every new request is a miss. Diagnosis: grep recent PRs for cache key construction, diff the key format with `redis-cli --scan --pattern` samples, verify that new keys look different from old keys.
    4. **Hypothesis two: TTL was accidentally shortened.** Someone changed `300` to `30` in a config or env var. Every entry expires in 30 seconds instead of 5 minutes, so the hit rate craters. Diagnosis: check recent config changes, log cache-set calls with their TTL, compare current TTL distribution to pre-deploy baseline.
    5. **Hypothesis three: serialization changed.** The deploy moved from JSON to Protobuf, or changed a schema version prefix. New writers encode the new format; old readers cannot deserialize it and treat it as a miss. Diagnosis: look for deserialize errors in logs, check for a recent library upgrade that changed default serialization, verify round-trip for a single known key.
    6. **Hypothesis four (often missed): cache sharding / routing changed.** A deploy updated the Redis client library and the consistent-hashing algorithm subtly changed. Keys that used to live on shard A now live on shard B; shard B has empty cache; every lookup misses. Diagnosis: check the Redis cluster slot distribution before and after, verify CRC32 or similar hash output for a sample key.
    7. **Finally, long-tail hypotheses.** Eviction policy change (`allkeys-lru` to `volatile-lru`), memory limit lowered so entries get evicted aggressively, new traffic pattern (a new feature generates high-cardinality keys that blow out the cache), or a bug where the service is writing but never reading (cache key used on read path differs from write path).

    **Real-World Example**

    Facebook had a widely-discussed incident around 2019 where a memcached client library upgrade silently changed key hashing. Hit rates across multiple services dropped to roughly their steady-state miss rates, and databases were overwhelmed within minutes. The fix was to revert the library. The postmortem lesson was that cache key transformations must be treated as breaking changes with explicit migration paths, not invisible implementation details. Similar stories have been published by LinkedIn (kafka-related cache invalidation), Twitter (timeline cache re-sharding), and Shopify (Rails cache key fingerprints changing across framework upgrades).

    **Senior Follow-up Questions**

    <Note>
      **Q: "What if the hit rate drop is not uniform, but only affects a subset of users?"**

      This is a strong signal for a cache-key or serialization bug scoped to a specific user cohort. Check whether the affected users share some attribute: locale, feature flag state, account tier, geographic region. If EU users are affected but US users are not, the bug is probably in the EU-specific code path or in a flag that routes EU traffic differently. Split the hit rate metric by the suspected dimension and confirm before fixing. Partial-impact bugs often indicate a feature flag rollout that was assumed to be safe but changed cache semantics.
    </Note>

    <Note>
      **Q: "Your database is dying while you debug. What do you do right now?"**

      Shed load. Temporarily disable the feature that is generating the uncached traffic if possible, or serve stale-but-acceptable data from a secondary cache tier, or rate-limit the affected endpoint at the API gateway. The principle: reduce the load the database is seeing before you fix the root cause. A failed endpoint serving 429s is better than an overloaded database that cascades into platform-wide failure. Once the database is stable, debug the cache issue offline. Many engineers freeze here because shedding load "feels wrong"; doing nothing is worse.
    </Note>

    <Note>
      **Q: "You roll back the deploy and hit rate recovers. How do you prevent this specific class of bug in future deploys?"**

      Two controls. First, make cache-key and cache-serialization changes require explicit migration code that can run against existing entries, and gate deploys that would orphan more than some threshold of cached keys behind a manual approval. Second, add a canary for cache hit rate as a first-class deploy guardrail: if the new version running at 1 percent traffic shows a more-than-a-few-percent drop in hit rate versus the baseline, the deploy auto-rolls back before reaching 100 percent. Treating cache hit rate as a release-blocking SLO catches these bugs during canary, not during full rollout.
    </Note>

    **Common Wrong Answers**

    * **"I would increase the cache size to compensate."** This misdiagnoses the symptom as the cause. If keys are being generated differently or serialized incorrectly, a bigger cache does not help -- you will fill it with the wrong things. Scaling up before diagnosing wastes money and delays the real fix.
    * **"I would lower the TTL to refresh everything sooner."** Lowering the TTL makes the problem worse. The hit rate is low because entries are missing or malformed; refreshing more often means more database load, not less. The correct instinct under a hit-rate crash is to stabilize and diagnose, not tune.

    **Further Reading**

    * "Scaling Memcache at Facebook" (Nishtala et al., NSDI 2013) -- foundational paper on operating cache at scale and the failure modes to expect.
    * "On Consistent Hashing and Random Trees" (Karger et al., 1997) -- the algorithmic underpinnings of why a client-library upgrade can silently break your cache topology.
    * Netflix Tech Blog posts on EVCache -- practical coverage of cross-region cache consistency and the debugging stories behind their current architecture.
  </Accordion>

  <Accordion title="You have a product page cached for 5 minutes. Two users A and B view it concurrently at the moment the cache expires. What can go wrong, and how do you prevent every failure mode?">
    **Strong Answer Framework**

    1. **Name the failure modes up front.** At the moment of TTL expiration, multiple concurrent misses create three classical bugs: thundering herd on the source database, cache stampede compounding under load, and (if writes are involved) a race where one user's write overwrites another user's fresh cache entry.
    2. **Address thundering herd with request coalescing.** Implement single-flight. When N concurrent requests miss for the same key, only one queries the database. Others wait on a shared promise or channel. N becomes 1 regardless of concurrency level. Libraries: Go `singleflight.Group`, Node `promise-memoize`, Python `aiocache` with lock parameter.
    3. **Layer on probabilistic early expiration.** Before natural TTL, start refreshing probabilistically (the XFetch algorithm). A small random fraction of requests triggers an async refresh while still serving the current cached value. By the time natural expiration hits, the value has usually been refreshed already.
    4. **Address the concurrent-write race with cache versioning.** If the page content can be updated between A's miss and B's miss, A's slow computation could overwrite B's fresh value. Attach a version (ETag, monotonic counter) to each cache write. Use `SET key value EX ttl NX` (Redis) or a compare-and-swap operation to reject writes that arrive after a newer version.
    5. **Address cache poisoning.** If the source returns a 500 error or malformed response, do not cache it. Validate the response shape before caching. For transient errors, consider a short negative-cache TTL (seconds, not minutes) so a broken backend does not poison reads for the whole TTL window.
    6. **Plan the failure mode.** What happens if the source is slow (10 seconds instead of 50 ms)? The single-flight request blocks; all other waiters block with it. Add a timeout on the source call and a fallback: if the source is slow, serve the stale cache entry past its TTL (`stale-while-revalidate`) while logging the backend degradation.

    **Real-World Example**

    Fastly and Cloudflare both publicly document their `stale-while-revalidate` and origin-shielding behavior for exactly this case. When a popular URL expires at the edge, Fastly's shield PoP coalesces origin requests across all edge PoPs so the origin sees exactly one request regardless of global concurrency. This is the same pattern applied at the CDN layer. Redis itself, at the application cache layer, does not do this for you -- you have to implement coalescing in the client.

    **Senior Follow-up Questions**

    <Note>
      **Q: "What if the cached value is personalized per user? Does coalescing still work?"**

      Coalescing only works when the cache key is shared. For per-user data, each user has a distinct key, so coalescing across users does not apply. Instead, the relevant protection is against a single user's rapid retries -- if a user refreshes the same page 20 times in a second while their cache is missing, coalesce their own concurrent requests. For truly per-user cold starts, the volume is bounded by user count, and the thundering herd concern is lower. Thundering herd is mostly a problem for *shared* hot keys.
    </Note>

    <Note>
      **Q: "Under probabilistic early expiration, you are doing more backend work than strictly necessary. How do you tune the parameter?"**

      The XFetch algorithm has a parameter (often called beta) that controls how aggressively to pre-refresh. Too low: you refresh exactly at TTL and get stampedes. Too high: you refresh every request and defeat the point of caching. In practice, you want the expected number of pre-refreshes per key per TTL window to be around 1-2. Tune by observing: if you see stampede symptoms (latency spike at exact TTL expiration), increase beta. If you see the backend doing unnecessarily high work during steady state, decrease it. A beta around 1.0 is a typical starting point.
    </Note>

    <Note>
      **Q: "What if the data being cached changes mid-computation -- say, a product's price updates while you are mid-fetch?"**

      This is the cache invalidation race. The defensive pattern: include the source's version or last-modified timestamp in the cached value. On write, use compare-and-swap: only write if the version you just fetched is newer than the version currently cached. If the current cached version is newer (because someone else updated in between), discard your value and re-read. Redis Lua scripts are a good fit because they give you atomicity across the get-check-set sequence. Without this, you can end up with the cache permanently one version behind reality.
    </Note>

    **Common Wrong Answers**

    * **"I would use a shorter TTL so the window of inconsistency is smaller."** Shortening the TTL increases the frequency of cache misses and therefore increases the thundering herd problem. The issue is not the TTL length; it is the concurrent behavior at expiration. Coalescing and early expiration address the real bug.
    * **"I would lock the key in Redis for the duration of the backend fetch."** Distributed locking has its own failure modes (lock holder dies, TTL on lock expires mid-fetch, network partitions). In-process single-flight (coalescing within one service instance) is usually enough, and it does not introduce a new distributed correctness problem. Redlock is rarely justified for cache-fill races.

    **Further Reading**

    * "Optimal Probabilistic Cache Stampede Prevention" (Vattani, Chierichetti, Lowenstein) -- the foundational XFetch paper.
    * Fastly's documentation on `stale-while-revalidate` and origin shielding -- a worked-example of stampede prevention at the CDN layer.
    * "Caches: the price of admission" on martinfowler.com -- broader treatment of caching trade-offs with stampede prevention as one of several concerns.
  </Accordion>

  <Accordion title="Your recommendation service caches the top-10 items per user. One day you notice users are occasionally seeing recommendations that were computed for a different user. How did this happen, and how do you fix it?">
    **Strong Answer Framework**

    1. **Name this clearly: personalization bleed / cache key collision.** Two logically different values ended up under the same cache key, and one user's request returned another user's data. This is simultaneously a correctness bug, a privacy issue (if recommendations reveal anything about the other user), and potentially a regulated incident under GDPR or CCPA.
    2. **Escalate appropriately.** If any PII or sensitive behavioral data is being leaked across users, trigger the security incident process. Even "innocent" recommendation data can encode protected categories (religion, health) via what the user has viewed. Do not debug silently; inform the security and privacy teams.
    3. **Hypothesize the root cause.** Most likely: the cache key did not include the user\_id, or included only a hashed session-scoped value that collided. Possible variations: the key used a tenant id and forgot the user, or used a user id that happens to reset on logout and be reused.
    4. **Find the immediate leak.** Flip a feature flag to bypass the cache and serve all recommendations directly from the source. Latency will increase; correctness is restored. Do this *before* continuing diagnosis. Never continue to serve wrong data while you debug.
    5. **Audit the cache contents.** Scan a sample of entries. For each, compare "who this was computed for" (from the original request context, traceable via logs) versus "who this is keyed under" (the cache key). Any mismatch is an instance of the bug.
    6. **Fix the cache key permanently.** The cache key must include every dimension of variation. For personalized data, that always includes the user identifier (ideally a stable, server-issued user id, not a session id). Consider adding a namespace prefix per service and a schema version, so accidental collisions across services or across key-schema changes are impossible.
    7. **Add guardrails.** Validate cache responses against the request context at read time: the cached payload includes a "for\_user\_id" field, and the read path rejects it if it does not match the requesting user. This is belt-and-suspenders defense against future cache-key regressions.
    8. **Close out with a postmortem.** Publish what happened, which users were affected, what data was exposed, what the fix is, and what detection (canary, monitoring, contract test) will prevent a recurrence.

    **Real-World Example**

    In around 2018-2019, Steam had a cache misconfiguration at the CDN layer that caused users to see other users' account pages during a holiday traffic spike. The root cause was a caching layer treating logged-in content as cacheable and not including the auth context in the cache key. The incident was publicly acknowledged, and the fix involved both the immediate cache flush and a deeper change to mark authenticated endpoints as no-cache at the CDN layer. The same class of bug has hit multiple CDN-backed e-commerce sites over the years -- it is not a rare failure mode.

    **Senior Follow-up Questions**

    <Note>
      **Q: "How do you prevent this class of bug at design time, before you need to debug it?"**

      Make cache keys explicit first-class objects in your codebase. Instead of `cache.get("recs_" + userId)`, have a `RecsCacheKey(userId, locale, featureFlagVersion)` class with a single construction path. Every field is required; the compiler or type system enforces it. Then write a contract test that shows "different users produce different keys" and run it on every build. Combine this with cache-response validation (the payload carries the `for_user_id`, rejected if mismatched). Together, these make personalization bleed structurally impossible rather than just unlikely.
    </Note>

    <Note>
      **Q: "What if the leak only happens under high load, not in normal traffic? Where do you look?"**

      High-load-only bugs in caching usually come from shared mutable state in the request-handling path. A common culprit: an object reused across requests (a pre-allocated buffer, a thread-local that was not cleared, a ThreadLocal in a thread-pool executor where pooled threads carry state between users). Another: the cache client library returning a shared mutable reference rather than a defensive copy, and one request mutating the object while another reads it. Under low load, requests rarely overlap on the same pool thread; under high load, they do. Diagnose with load testing under thread contention and memory analysis tools.
    </Note>

    <Note>
      **Q: "You fix the cache key but your logs show the bug was live for three weeks. How do you handle the retroactive privacy implications?"**

      Scope the impact. Query logs to enumerate every user whose requests returned mismatched data (if request-level logging exists) or at least the total number of cache lookups that were serving cross-user data. Engage legal and privacy early. Depending on jurisdiction (GDPR Article 33 requires notification within 72 hours of awareness), you may be legally required to notify affected users and regulators. Do not wait to be told. A clean, proactive disclosure is better than being discovered later. Internally, use this as the forcing function to implement the architectural fixes (explicit key objects, response validation) that prevent the bug class entirely.
    </Note>

    **Common Wrong Answers**

    * **"I would just add the user id to the key and deploy the fix."** This fixes the immediate bug but does not address the systemic weakness. There are many other "dimensions of variation" (locale, tenant, A/B test arm) that could produce the same class of bug tomorrow. A senior answer recognizes this as a pattern, not an instance, and proposes structural fixes.
    * **"I would flush the cache and it will be fine."** Flushing removes the currently-poisoned entries but the key-construction bug is still in the code; new requests will repopulate the cache with the same collision. Flushing is hygiene, not a fix.

    **Further Reading**

    * "You Cannot Trust the Browser" patterns and CDN cache-control best practices (documented by Fastly, Cloudflare, Akamai) -- personalization-bleed is most dangerous at CDN layers where cache keys are URL-based.
    * "GDPR Article 33: Notification of a personal data breach to the supervisory authority" -- the specific legal timeline that scopes how fast you must respond.
    * OWASP Web Security Testing Guide, section on cache poisoning and cache deception -- adversarial framings of this same class of bug.
  </Accordion>
</AccordionGroup>
