Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Caching Strategies

Caching is critical for microservices performance, but it introduces one of computer science’s two hardest problems (the other being naming things, and off-by-one errors). In a monolith, cache invalidation is already tricky. In microservices, Service A caches data that Service B can modify — and now you are dealing with distributed cache invalidation, which is exponentially harder. This chapter covers not just how to cache, but how to cache safely, how to fail gracefully when your cache goes down, and how to prevent the thundering herd problem that has caused more outages than most people realize.
Learning Objectives:
  • Implement various caching patterns
  • Design cache invalidation strategies
  • Use Redis for distributed caching
  • Handle cache failures gracefully
  • Optimize cache hit rates

Why Caching Matters

Before we dive into patterns, let us be precise about why caching is worth the complexity. Every cache you add to a system is a promise you are making: “I can serve this data faster than the source of truth, and I will accept some risk of staleness in exchange.” That trade-off is nearly always worth it for read-heavy workloads because databases are slow compared to in-memory stores (milliseconds vs microseconds) and expensive to scale horizontally. But if you do not understand the trade-off, you will cache things that should not be cached (live pricing, security tokens, personalized feeds) and end up with bugs that are nearly impossible to reproduce. In microservices specifically, caching takes on a new dimension: it reduces coupling. If Service A caches a user’s profile from Service B for five minutes, Service A can continue functioning even if Service B is briefly unavailable. The cache acts as a buffer of eventual consistency that lets services tolerate each other’s failures. This is why caching is not just an optimization — it is a resilience pattern.
┌─────────────────────────────────────────────────────────────────────────────┐
│                    CACHING IMPACT                                            │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  WITHOUT CACHING:                                                            │
│  ───────────────────                                                        │
│                                                                              │
│  Client ──▶ API Gateway ──▶ Order Service ──▶ Database                      │
│                                    │              │                          │
│                                    │              │ 50ms                     │
│                                    ├──▶ User Service ──▶ Database            │
│                                    │                        │ 30ms           │
│                                    └──▶ Product Service ──▶ Database         │
│                                                                │ 40ms        │
│                                                                              │
│  Total Latency: ~120ms (serial) or ~50ms (parallel)                         │
│  Database Load: Every request hits DB                                       │
│                                                                              │
│  ═══════════════════════════════════════════════════════════════════════════│
│                                                                              │
│  WITH CACHING:                                                               │
│  ─────────────                                                              │
│                                                                              │
│  Client ──▶ API Gateway ──▶ Order Service                                   │
│                                    │                                         │
│                                    ├──▶ Redis Cache ──▶ (hit) Return         │
│                                    │        │ 1ms                            │
│                                    │        └──▶ (miss) ──▶ Database         │
│                                    │                           │ 50ms        │
│                                                                              │
│  Cache Hit Latency: ~5ms                                                    │
│  Cache Miss Latency: ~55ms                                                  │
│  Database Load: Only cache misses hit DB                                    │
│                                                                              │
│  With 90% cache hit rate: Average latency = 0.9×5 + 0.1×55 = 10ms           │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Caching Patterns

Caveats and Common Pitfalls with Caching PatternsCaching bugs are among the hardest to reproduce, because they depend on timing, traffic shape, and order of events. The patterns below cover most of what breaks:
  • Cache stampede (thundering herd) on popular keys. A popular key (home page, featured product, celebrity user profile) has its TTL expire. In the next 50 milliseconds, 5,000 concurrent requests all miss, all query the database, and the database melts. Naive cache-aside does not prevent this; the default behavior actually causes it. Teams discover this only during traffic spikes, when the cache they added “for performance” becomes the reason production is down.
  • Cache poisoning. A malformed request or a bug causes a wrong value to be cached under a legitimate key. Every subsequent request returns the poisoned value until the TTL expires. If the TTL is an hour, you serve an hour of wrong data to every user. If the bug writes under many keys, you poison the whole cache. Cache writes must be as validated as database writes.
  • Personalization bleed / wrong-user data. Service caches a response under a key that is not user-scoped (“home_page_blocks”), but the response was actually personalized for user 1234. Next user gets user 1234’s data. This leaks PII and, depending on the data, may be a security incident. It happens constantly in HTTP caches and CDNs where cache keys do not include the auth context.
  • Ignoring the cost of a cache miss. Teams size the database for the cached traffic rate and forget that a cache outage takes the traffic back to the uncached rate — often 10-50x higher. When Redis fails over, the database is immediately overloaded and crashes, cascading into full platform downtime.
Solutions and Patterns for Safe Caching
  • Single-flight / request coalescing for stampede. When multiple concurrent requests miss for the same key, only one of them queries the source. The rest wait on a shared promise or channel and get the same result. Libraries: Go’s singleflight, Node’s promise-memoize, Java’s Caffeine with AsyncLoadingCache. This is the single most important stampede defense.
  • Probabilistic early expiration (XFetch algorithm). Instead of refreshing exactly at TTL, start refreshing probabilistically as TTL approaches. A small fraction of requests renew the cache “early,” so by the time real expiration hits, the value is already fresh. Prevents the cliff-edge stampede entirely.
  • Validate everything you cache. Schema-check the response before caching it. A 500 error with a cache-control header is a classic poisoning vector — you must not cache error responses unless you are explicitly caching negative results with a short TTL.
  • Cache key discipline. Every cache key must include every axis of variation: user_id for personalized data, locale, tenant, feature flag state, auth scope. When in doubt, add it. A good heuristic: write a one-sentence description of exactly what this cache entry contains; every noun in the sentence is part of the key.
  • Always design for the cache-off scenario. What happens when the cache is 0 percent healthy? If the answer is “the database catches fire,” you do not have a cache, you have a mandatory prefetch layer with no failover. Load-test with the cache disabled. Build local memory fallbacks and circuit breakers to shed load before the database dies.
The biggest mistake teams make with caching is treating “add a cache” as a single decision. It is not — there are at least five distinct patterns, each with different consistency, performance, and failure-mode characteristics. Picking the wrong one creates bugs that do not show up in testing but manifest as weird production behavior: “why did this user see stale data for 10 minutes?” or “why did Redis crashing lose us $50,000 in orders?”. Think of these patterns as answers to two questions: (1) when do reads populate the cache? and (2) when do writes update the cache? The patterns below trade safety, freshness, and write throughput in different ways. Cache-aside is the safest and the default for most teams. Write-through guarantees read-your-writes consistency at the cost of write latency. Write-behind is dangerous but fast. Read-through and refresh-ahead are niche patterns you reach for when the first three are not enough.

Pattern Comparison

PatternHow It WorksConsistencyPerformanceRiskBest For
Cache-AsideApp checks cache, falls back to DB, writes to cache on missStale reads possible (TTL window)First request slow (cache miss)Low (cache failure = slower, not broken)Most read-heavy workloads; safest default
Write-ThroughApp writes to cache AND DB togetherStrong (cache always fresh)Writes slower (two writes)Medium (write failure needs handling)Data that is read immediately after write
Write-BehindApp writes to cache, async flush to DBWeak (cache is ahead of DB)Fastest writesHigh (cache crash = data loss)Analytics, counters, data you can afford to lose
Read-ThroughCache itself fetches from DB on missSame as cache-asideSame as cache-asideLowWhen you want the cache layer to be transparent
Refresh-AheadCache proactively refreshes before TTL expiresStrong (always fresh)Best (no cache misses for hot keys)LowHighly predictable access patterns (e.g., homepage data)
Decision framework:
  • Default choice: Cache-aside. It is the simplest, safest, and works for 80% of use cases.
  • User just updated their profile and sees stale data: Switch to write-through for that specific entity.
  • Dashboard counters, view counts, analytics: Write-behind is fine; losing a few counts is acceptable.
  • Homepage hero content that must never be stale: Refresh-ahead with a short TTL.

Cache-Aside (Lazy Loading)

Cache-aside is the most common pattern because it is the safest: if your cache goes down entirely, your application still works (it just hits the database for every request). The application is in full control of what gets cached and when. The downside is that the first request for any piece of data is always slow (cache miss), and there is a window between a database write and cache invalidation where stale data can be served. Why this pattern dominates real-world microservices: it decouples the cache from your write path completely. The database is always the source of truth. If Redis is down, writes continue unchanged — they just do not invalidate anything. If Redis is slow, your reads degrade gracefully back to database speed. Compare this to write-through, where a cache outage can block writes entirely. In a distributed system where failures are expected, cache-aside’s “cache is an optimization, not a dependency” mindset is usually what you want. A subtle gotcha: if your invalidation step fails (you updated the DB but cache.del() threw), you are now serving stale data until the TTL expires. Always set a TTL, even when you have invalidation — TTL is your safety net against missed invalidations. Some teams also capture invalidation failures in a dead-letter queue and retry them asynchronously.
// Most common pattern - application manages cache

class ProductService {
  constructor(cache, repository) {
    this.cache = cache;  // Redis client
    this.repository = repository;  // Database
    this.ttl = 3600;  // 1 hour
  }

  async getProduct(productId) {
    const cacheKey = `product:${productId}`;
    
    // 1. Try to get from cache
    const cached = await this.cache.get(cacheKey);
    if (cached) {
      console.log(`Cache HIT for ${cacheKey}`);
      return JSON.parse(cached);
    }
    
    console.log(`Cache MISS for ${cacheKey}`);
    
    // 2. Get from database
    const product = await this.repository.findById(productId);
    if (!product) return null;
    
    // 3. Store in cache
    await this.cache.setEx(cacheKey, this.ttl, JSON.stringify(product));
    
    return product;
  }

  async updateProduct(productId, updates) {
    // Update database
    const product = await this.repository.update(productId, updates);
    
    // Invalidate cache
    await this.cache.del(`product:${productId}`);
    
    // Also invalidate any list caches
    await this.cache.del(`products:category:${product.categoryId}`);
    await this.cache.del('products:featured');
    
    return product;
  }
}

Write-Through

Write-through flips the safety trade-off: every write goes to both the cache and the database synchronously, so reads are always consistent with the last write. This is the pattern you want when users expect “read-your-writes” semantics — after I update my profile picture, I should see it immediately, not in 5 minutes when the TTL expires. But the cost is real: every write is now twice as slow (two systems to write to), and if the cache is down, writes either fail or need a fallback path. Where write-through shines: user profiles, shopping cart state, session data, and any “last write wins” scenarios where the write volume is moderate. Where it hurts: high-write-volume data like analytics events, where doubling every write blows up your Redis. The honest truth is that most teams who say they use write-through actually use write-around-with-invalidation and just do not know the difference — and in most cases that is fine.
// Write to cache and database together

class UserService {
  constructor(cache, repository) {
    this.cache = cache;
    this.repository = repository;
    this.ttl = 7200;  // 2 hours
  }

  async createUser(userData) {
    // 1. Write to database
    const user = await this.repository.create(userData);
    
    // 2. Write to cache immediately
    const cacheKey = `user:${user.id}`;
    await this.cache.setEx(cacheKey, this.ttl, JSON.stringify(user));
    
    // Also cache by email for lookup
    await this.cache.setEx(`user:email:${user.email}`, this.ttl, user.id);
    
    return user;
  }

  async updateUser(userId, updates) {
    // Update database
    const user = await this.repository.update(userId, updates);
    
    // Update cache
    const cacheKey = `user:${userId}`;
    await this.cache.setEx(cacheKey, this.ttl, JSON.stringify(user));
    
    return user;
  }

  async getUser(userId) {
    const cacheKey = `user:${userId}`;
    
    // Try cache first
    const cached = await this.cache.get(cacheKey);
    if (cached) {
      return JSON.parse(cached);
    }
    
    // Fallback to database
    const user = await this.repository.findById(userId);
    if (user) {
      await this.cache.setEx(cacheKey, this.ttl, JSON.stringify(user));
    }
    
    return user;
  }
}

Write-Behind (Write-Back)

Write-behind is the most dangerous pattern here because it risks data loss: if Redis crashes before the queued write reaches the database, that data is gone. Use this only for data you can afford to lose (analytics counters, view counts) or where you have other durability guarantees in place. Think about what you are buying with write-behind: raw write throughput. If you are tracking page views on a popular product, every page view is one increment to a counter. Doing that synchronously against PostgreSQL would generate thousands of transactions per second for a single row — guaranteed lock contention and write amplification. With write-behind, you update Redis in memory (microseconds), and a background job batches updates to the database every second or minute. You have traded durability for throughput, and for counters, that is a good trade. The architectural trap: teams use write-behind for “temporary” data, then other services start relying on that data being durable in the database. Now you have an implicit contract that write-behind cannot fulfill. Document clearly that write-behind caches are not durable, and put a monitoring alert on queue depth — if the queue backs up, you know writes are being lost.
// Write to cache first, async write to database

class InventoryService {
  constructor(cache, repository, queue) {
    this.cache = cache;
    this.repository = repository;
    this.queue = queue;  // For async DB writes
  }

  async updateStock(productId, quantity) {
    const cacheKey = `inventory:${productId}`;
    
    // 1. Update cache immediately (fast response)
    const current = await this.cache.get(cacheKey);
    const currentStock = current ? parseInt(current) : 0;
    const newStock = currentStock + quantity;
    
    await this.cache.set(cacheKey, newStock.toString());
    
    // 2. Queue database write (async)
    await this.queue.add('inventory-update', {
      productId,
      quantity,
      newStock,
      timestamp: Date.now()
    });
    
    return { productId, stock: newStock };
  }

  async getStock(productId) {
    const cacheKey = `inventory:${productId}`;
    
    const stock = await this.cache.get(cacheKey);
    if (stock !== null) {
      return parseInt(stock);
    }
    
    // Load from DB if not in cache
    const inventory = await this.repository.findByProductId(productId);
    if (inventory) {
      await this.cache.set(cacheKey, inventory.stock.toString());
      return inventory.stock;
    }
    
    return 0;
  }
}

// Background worker processes the queue
class InventoryWriter {
  constructor(repository) {
    this.repository = repository;
  }

  async processUpdate(job) {
    const { productId, newStock } = job.data;
    
    // Batch writes for efficiency
    await this.repository.updateStock(productId, newStock);
  }
}

Read-Through

Read-through looks identical to cache-aside from the outside, but the key difference is where the “load on miss” logic lives. In cache-aside, the application knows about the cache and the database separately. In read-through, the cache itself knows how to load data — the application just asks the cache and does not care where the data came from. This is appealing because it centralizes the cache-load logic, but it also couples the cache to your data sources, which is why it is less common in practice. When would you pick this? When you have many different consumers of the same data and you want all of them to share the same cache-loading logic. If five services all need to load a user by ID on cache miss, do you want five copies of that logic or one? Read-through centralizes it. The trade-off: the cache layer becomes stateful and service-like, which complicates deployment and testing.
// Cache handles loading data

class ReadThroughCache {
  constructor(redisClient, dataLoader) {
    this.redis = redisClient;
    this.dataLoader = dataLoader;  // Function to load from DB
  }

  async get(key, options = {}) {
    const { ttl = 3600, loader } = options;
    
    // Try cache
    const cached = await this.redis.get(key);
    if (cached) {
      return JSON.parse(cached);
    }
    
    // Load using provided loader or default
    const loadFn = loader || this.dataLoader;
    const data = await loadFn(key);
    
    if (data !== null && data !== undefined) {
      await this.redis.setEx(key, ttl, JSON.stringify(data));
    }
    
    return data;
  }
}

// Usage
const cache = new ReadThroughCache(redis, async (key) => {
  // Default loader extracts ID from key
  const [type, id] = key.split(':');
  
  switch (type) {
    case 'user':
      return userRepository.findById(id);
    case 'product':
      return productRepository.findById(id);
    default:
      return null;
  }
});

// Get user (automatically loads if not cached)
const user = await cache.get('user:123');

Redis Implementation

Redis has become the de facto standard for distributed caching in microservices, and it is worth understanding why that happened rather than just accepting it. Redis wins because it is single-threaded (so no lock contention within a node), stores everything in memory (sub-millisecond reads), supports rich data structures (hashes, sorted sets, streams) beyond simple key-value, and has robust clustering for horizontal scaling. The alternatives — Memcached is simpler but less capable; Hazelcast and Apache Ignite are richer but heavier — each win on specific axes, but Redis hits the sweet spot for most teams. A word of caution: Redis is not a durable database. When you see teams using Redis as a “fast database” for critical data, that is usually a mistake waiting to happen. AOF and RDB persistence help but do not eliminate data loss on crash. Treat Redis as a cache and rebuild state from the source of truth when needed. When people try to use Redis as both the hot path and the durability layer, they end up with a system that loses data during failovers and is slower to recover than it would be with proper database-backed architecture.

Connection and Configuration

The connection logic below handles three production-critical concerns: reconnection with backoff (your Redis cluster will fail over at some point), cluster mode (for horizontal scaling beyond a single node), and error handling that does not crash your application. A naive Redis client that throws on every connection error will cascade into a full application restart — which is exactly what you do not want in a distributed system. Instead, treat Redis errors as transient and let the fallback path (hit the database) handle them.
// cache/redis-client.js
const { createClient, createCluster } = require('redis');

class RedisCache {
  constructor(config = {}) {
    this.config = {
      url: process.env.REDIS_URL || 'redis://localhost:6379',
      cluster: process.env.REDIS_CLUSTER === 'true',
      ...config
    };
    
    this.client = null;
    this.isConnected = false;
  }

  async connect() {
    if (this.config.cluster) {
      // Redis Cluster for production
      this.client = createCluster({
        rootNodes: [
          { url: process.env.REDIS_NODE_1 },
          { url: process.env.REDIS_NODE_2 },
          { url: process.env.REDIS_NODE_3 }
        ],
        defaults: {
          socket: {
            connectTimeout: 5000,
            keepAlive: 5000
          }
        }
      });
    } else {
      // Single node for development
      this.client = createClient({
        url: this.config.url,
        socket: {
          connectTimeout: 5000,
          keepAlive: 5000,
          reconnectStrategy: (retries) => {
            if (retries > 10) {
              console.error('Redis: Max reconnection attempts reached');
              return new Error('Max reconnection attempts');
            }
            return Math.min(retries * 100, 3000);
          }
        }
      });
    }

    this.client.on('error', (err) => {
      console.error('Redis error:', err);
      this.isConnected = false;
    });

    this.client.on('connect', () => {
      console.log('Redis connected');
      this.isConnected = true;
    });

    this.client.on('reconnecting', () => {
      console.log('Redis reconnecting...');
    });

    await this.client.connect();
    return this;
  }

  async get(key) {
    try {
      return await this.client.get(key);
    } catch (error) {
      console.error(`Redis GET error for ${key}:`, error);
      return null;
    }
  }

  async set(key, value, options = {}) {
    try {
      const stringValue = typeof value === 'object' 
        ? JSON.stringify(value) 
        : value;
      
      if (options.ttl) {
        await this.client.setEx(key, options.ttl, stringValue);
      } else {
        await this.client.set(key, stringValue);
      }
      return true;
    } catch (error) {
      console.error(`Redis SET error for ${key}:`, error);
      return false;
    }
  }

  async setEx(key, seconds, value) {
    return this.set(key, value, { ttl: seconds });
  }

  async del(key) {
    try {
      await this.client.del(key);
      return true;
    } catch (error) {
      console.error(`Redis DEL error for ${key}:`, error);
      return false;
    }
  }

  async mget(keys) {
    try {
      return await this.client.mGet(keys);
    } catch (error) {
      console.error('Redis MGET error:', error);
      return keys.map(() => null);
    }
  }

  async mset(keyValues, ttl = null) {
    try {
      const pipeline = this.client.multi();
      
      for (const [key, value] of Object.entries(keyValues)) {
        const stringValue = typeof value === 'object' 
          ? JSON.stringify(value) 
          : value;
        
        if (ttl) {
          pipeline.setEx(key, ttl, stringValue);
        } else {
          pipeline.set(key, stringValue);
        }
      }
      
      await pipeline.exec();
      return true;
    } catch (error) {
      console.error('Redis MSET error:', error);
      return false;
    }
  }

  async disconnect() {
    if (this.client) {
      await this.client.quit();
    }
  }
}

module.exports = { RedisCache };

Caching Service

A thin wrapper around the raw Redis client is one of those patterns that seems unnecessary at first but pays off enormously over time. By centralizing key prefixing, serialization, TTL defaults, and pattern-based invalidation in one place, you avoid the “twelve services, twelve slightly different cache key formats” problem. When you need to debug a cache entry, having a consistent service-name:entity:id prefix makes grep-ing Redis trivial. When you need to migrate cache shapes, you change one file instead of hunting through every service. Pay attention to invalidatePattern specifically. The naive way to implement this is KEYS pattern* — and it will work fine in development with 100 keys. In production with 10 million keys, KEYS blocks the Redis server for seconds and causes an outage. SCAN is cursor-based and non-blocking, processing keys in small batches. This is one of the most common production mistakes I see in Redis deployments: KEYS used anywhere in application code.
// cache/caching-service.js

class CachingService {
  constructor(redis) {
    this.redis = redis;
    this.defaultTTL = 3600;  // 1 hour
    this.prefix = process.env.SERVICE_NAME || 'app';
  }

  _key(key) {
    return `${this.prefix}:${key}`;
  }

  async get(key) {
    const data = await this.redis.get(this._key(key));
    return data ? JSON.parse(data) : null;
  }

  async set(key, value, ttl = this.defaultTTL) {
    return this.redis.setEx(this._key(key), ttl, JSON.stringify(value));
  }

  async getOrSet(key, fetchFn, ttl = this.defaultTTL) {
    // Try cache first
    let data = await this.get(key);
    
    if (data !== null) {
      return { data, cached: true };
    }
    
    // Fetch from source
    data = await fetchFn();
    
    if (data !== null && data !== undefined) {
      await this.set(key, data, ttl);
    }
    
    return { data, cached: false };
  }

  async invalidate(key) {
    return this.redis.del(this._key(key));
  }

  async invalidatePattern(pattern) {
    // Use SCAN to find matching keys (don't use KEYS in production)
    const keys = [];
    let cursor = 0;
    
    do {
      const result = await this.redis.client.scan(cursor, {
        MATCH: this._key(pattern),
        COUNT: 100
      });
      
      cursor = result.cursor;
      keys.push(...result.keys);
    } while (cursor !== 0);
    
    if (keys.length > 0) {
      await this.redis.client.del(keys);
    }
    
    return keys.length;
  }

  // Hash operations for objects
  async hget(key, field) {
    const data = await this.redis.client.hGet(this._key(key), field);
    return data ? JSON.parse(data) : null;
  }

  async hset(key, field, value, ttl = null) {
    await this.redis.client.hSet(this._key(key), field, JSON.stringify(value));
    
    if (ttl) {
      await this.redis.client.expire(this._key(key), ttl);
    }
  }

  async hgetall(key) {
    const data = await this.redis.client.hGetAll(this._key(key));
    
    const result = {};
    for (const [field, value] of Object.entries(data)) {
      result[field] = JSON.parse(value);
    }
    
    return result;
  }

  // Sorted sets for leaderboards, time-based data
  async zadd(key, score, member) {
    return this.redis.client.zAdd(this._key(key), { score, value: member });
  }

  async zrange(key, start, stop, options = {}) {
    return this.redis.client.zRange(this._key(key), start, stop, options);
  }

  // List operations for queues, recent items
  async lpush(key, ...values) {
    return this.redis.client.lPush(this._key(key), values);
  }

  async lrange(key, start, stop) {
    return this.redis.client.lRange(this._key(key), start, stop);
  }

  async ltrim(key, start, stop) {
    return this.redis.client.lTrim(this._key(key), start, stop);
  }
}

Cache Invalidation

Caveats and Common Pitfalls with Cache Invalidation in MicroservicesInvalidation in a single-service world is hard. In a microservices fan-out world, it compounds in ways most teams never model:
  • Invalidation fan-out grows combinatorially. When a single User update should invalidate caches in Order Service, Recommendations Service, Search Service, Feed Service, and a CDN edge, you now have five services whose invalidation logic must agree. A bug in any one of them leaves stale data somewhere. As services multiply, the invalidation graph outgrows any human’s ability to hold it in their head.
  • Lost events mean permanently stale cache. If the “UserUpdated” event is lost in transit (Kafka consumer crash, retention expired, network partition), the cache entries that should have been invalidated stay until their natural TTL — or worse, refresh themselves from a stale read model and stay wrong. Event-driven invalidation without a safety net is a slow-motion bug factory.
  • Nobody owns the invalidation contract. Producer changes the data; which consumer is responsible for the cache? Often every consumer implements its own invalidation logic, and when the cache schema changes (new key structure, added dimensions), each consumer must update independently. In practice, they do not, and you serve inconsistent versions of the same data from different services.
  • Personalization plus caching is a correctness trap. “Top 10 recommendations for user X” cached under recs:user:X gets invalidated when user X’s preferences change — but what about when another user triggers a model retrain that changes every user’s recommendations? Suddenly you have 100 million cache entries that should all be invalidated. Teams ship this feature without noticing until users complain about seeing stale recommendations for weeks.
Solutions and Patterns for Sane Cache Invalidation
  • Always pair event-driven invalidation with a TTL safety net. Even if you invalidate on every change event, set a maximum TTL (hours, not days) so that missed events eventually self-heal. The TTL is not your primary invalidation mechanism; it is the backstop that prevents permanent staleness.
  • Publish a versioned cache-key contract. The owning service publishes a schema that describes its cache key structure and the events that invalidate each pattern. Consumers subscribe to this contract, not to individual events. When the schema changes, every consumer is notified and can update deterministically.
  • Use tag-based invalidation for combinatorial cases. Instead of tracking individual cache keys, tag entries (“product:123”, “category:shoes”, “brand:nike”). When an event says “everything in category:shoes is stale,” you invalidate by tag. Redis supports this via secondary indexes; CDN vendors (Fastly, Cloudflare) support surrogate keys. Tag-based invalidation is how you avoid enumerating every affected key.
  • Consider accept-stale-while-revalidating. For data where immediate correctness is not critical, serve the stale value while asynchronously refreshing in the background. The user gets fast, slightly stale data; the cache self-heals within milliseconds. This is the HTTP stale-while-revalidate directive applied inside your own caches.
  • Run a periodic “cache verifier.” A background job samples cache entries, compares them to the source of truth, and flags drift. This catches the long tail of missed invalidations that TTL alone would not surface for hours. Treat cache drift as a real SLO, not an afterthought.
“There are only two hard things in Computer Science: cache invalidation and naming things.” — Phil Karlton. This quote is repeated endlessly, but most engineers do not internalize why cache invalidation is hard until they have shipped a bug caused by it. The difficulty is not the mechanism (delete a key from Redis is trivial); it is knowing when to invalidate and which keys depend on which data. In a microservices system where Service B might change data that Services A, C, and D all cache, figuring out the complete invalidation graph is a genuinely unsolved problem. Strategies below exist on a spectrum of precision vs complexity. TTL is imprecise but simple — accept 5 minutes of staleness and do nothing else. Event-based is precise but requires messaging infrastructure and careful dependency tracking. The right answer depends on your tolerance for staleness and your operational maturity.
┌─────────────────────────────────────────────────────────────────────────────┐
│                    CACHE INVALIDATION STRATEGIES                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  1. TIME-BASED (TTL)                                                        │
│     ─────────────────                                                       │
│     • Set expiration time on cache entries                                  │
│     • Simple, but may serve stale data                                      │
│     • Good for: product catalogs, user profiles                             │
│                                                                              │
│  2. EVENT-BASED                                                             │
│     ────────────────                                                        │
│     • Invalidate on write events                                            │
│     • Real-time freshness                                                   │
│     • Good for: inventory, prices, orders                                   │
│                                                                              │
│  3. VERSION-BASED                                                           │
│     ─────────────────                                                       │
│     • Include version in cache key                                          │
│     • Change version to invalidate                                          │
│     • Good for: configuration, templates                                    │
│                                                                              │
│  4. TAG-BASED                                                               │
│     ─────────────────                                                       │
│     • Tag entries with categories                                           │
│     • Invalidate by tag                                                     │
│     • Good for: related data (all products in category)                     │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Event-Based Invalidation

Event-based invalidation is the most architecturally clean way to solve the cross-service cache problem. The idea: when data changes, the owning service publishes an event. Any service that caches this data subscribes to the event and invalidates its local cache. The cache becomes an artifact of eventual consistency — it catches up when the event arrives, typically within milliseconds on a well-tuned Kafka or RabbitMQ setup. What can go wrong: events can be lost. If your event bus drops a message (network issue, consumer crash, topic retention expired), you permanently serve stale data. Defense-in-depth matters here: always pair event-based invalidation with a TTL safety net. The TTL catches missed invalidations. Also consider a “cache verification” job that periodically samples cached entries and cross-checks them against the source of truth. Another trap: invalidation cascades. If updating a product invalidates its cache, the category cache, the search results cache, and the homepage featured products cache, a single write now triggers four cache operations. Multiplied across frequent writes, this can overwhelm Redis. Batch invalidations where possible and set up metrics on invalidation rate to catch pathological cases early.
// Event-driven cache invalidation across services

// Event consumer for cache invalidation
class CacheInvalidator {
  constructor(cache, eventBus) {
    this.cache = cache;
    this.eventBus = eventBus;
    
    this.setupListeners();
  }

  setupListeners() {
    // Product events
    this.eventBus.subscribe('product.updated', this.onProductUpdated.bind(this));
    this.eventBus.subscribe('product.deleted', this.onProductDeleted.bind(this));
    
    // Price events
    this.eventBus.subscribe('price.changed', this.onPriceChanged.bind(this));
    
    // Inventory events
    this.eventBus.subscribe('inventory.updated', this.onInventoryUpdated.bind(this));
    
    // Category events
    this.eventBus.subscribe('category.updated', this.onCategoryUpdated.bind(this));
  }

  async onProductUpdated(event) {
    const { productId, categoryId } = event;
    
    // Invalidate specific product
    await this.cache.invalidate(`product:${productId}`);
    
    // Invalidate product lists
    await this.cache.invalidatePattern(`products:list:*`);
    await this.cache.invalidate(`products:category:${categoryId}`);
    await this.cache.invalidate('products:featured');
    
    console.log(`Cache invalidated for product ${productId}`);
  }

  async onProductDeleted(event) {
    const { productId, categoryId } = event;
    
    await this.cache.invalidate(`product:${productId}`);
    await this.cache.invalidatePattern(`products:*`);
    
    console.log(`Cache invalidated for deleted product ${productId}`);
  }

  async onPriceChanged(event) {
    const { productId, oldPrice, newPrice } = event;
    
    // Invalidate product cache
    await this.cache.invalidate(`product:${productId}`);
    
    // Invalidate any price-based lists
    await this.cache.invalidate('products:deals');
    await this.cache.invalidatePattern('products:price-range:*');
    
    console.log(`Price cache invalidated for product ${productId}`);
  }

  async onInventoryUpdated(event) {
    const { productId, oldStock, newStock } = event;
    
    // Invalidate inventory cache
    await this.cache.invalidate(`inventory:${productId}`);
    
    // If went out of stock or back in stock
    if ((oldStock > 0 && newStock === 0) || (oldStock === 0 && newStock > 0)) {
      await this.cache.invalidatePattern(`products:*`);
    }
  }

  async onCategoryUpdated(event) {
    const { categoryId } = event;
    
    // Invalidate category and all products in it
    await this.cache.invalidate(`category:${categoryId}`);
    await this.cache.invalidate(`products:category:${categoryId}`);
    await this.cache.invalidatePattern('categories:*');
  }
}

Tag-Based Invalidation

Tag-based invalidation solves a problem that both key-based and pattern-based invalidation struggle with: “invalidate everything related to X.” If you cache products, and each product belongs to a category, and each category belongs to a department, how do you invalidate all products in a department when the department changes? Key-based requires you to know every cached key. Pattern-based requires cooperating naming. Tag-based lets you attach arbitrary labels to cache entries and invalidate all entries with a given label. The mechanism: every cached value stores which tags it belongs to, and every tag stores a reverse index of which keys are tagged with it. Adding a cache entry costs two or three extra Redis operations; invalidating a tag involves reading the tag’s key set and deleting all referenced entries. Memory overhead is real (the tag indexes), so cap the cardinality of tags — do not tag every entry with every conceivable attribute.
// Tagging cache entries for group invalidation

class TaggedCache {
  constructor(redis) {
    this.redis = redis;
  }

  async set(key, value, options = {}) {
    const { ttl = 3600, tags = [] } = options;
    
    // Store the value
    await this.redis.setEx(key, ttl, JSON.stringify(value));
    
    // Store key in tag sets
    for (const tag of tags) {
      await this.redis.client.sAdd(`tag:${tag}`, key);
      await this.redis.client.expire(`tag:${tag}`, ttl * 2);  // Tags live longer
    }
    
    // Store tags for this key
    if (tags.length > 0) {
      await this.redis.client.sAdd(`key-tags:${key}`, tags);
      await this.redis.client.expire(`key-tags:${key}`, ttl);
    }
  }

  async get(key) {
    const data = await this.redis.get(key);
    return data ? JSON.parse(data) : null;
  }

  async invalidateByTag(tag) {
    // Get all keys with this tag
    const keys = await this.redis.client.sMembers(`tag:${tag}`);
    
    if (keys.length === 0) return 0;
    
    // Delete all cached values
    await this.redis.client.del(keys);
    
    // Clean up tag sets
    for (const key of keys) {
      const keyTags = await this.redis.client.sMembers(`key-tags:${key}`);
      for (const t of keyTags) {
        await this.redis.client.sRem(`tag:${t}`, key);
      }
      await this.redis.client.del(`key-tags:${key}`);
    }
    
    // Delete the tag set
    await this.redis.client.del(`tag:${tag}`);
    
    return keys.length;
  }

  async invalidateByTags(tags) {
    let count = 0;
    for (const tag of tags) {
      count += await this.invalidateByTag(tag);
    }
    return count;
  }
}

// Usage
const cache = new TaggedCache(redis);

// Cache a product with tags
await cache.set(`product:123`, product, {
  ttl: 3600,
  tags: [`category:${product.categoryId}`, 'products', 'featured']
});

// Later: invalidate all products in a category
await cache.invalidateByTag('category:electronics');

// Or invalidate all featured products
await cache.invalidateByTag('featured');

Cache Patterns for Microservices

Multi-Level Caching

Multi-level caching (also called tiered caching) recognizes that not all caches are equal. An in-process cache like a Python dict or Node.js Map has microsecond access times but is per-instance — if you have 10 pods, you have 10 separate caches. A distributed cache like Redis has sub-millisecond access times but is shared across all instances. The database is milliseconds but durable. Each tier serves a different purpose, and combining them gives you the best of all worlds: L1 absorbs the hottest keys at wire speed, L2 handles the long tail across all instances, and the database is the source of truth. The tricky part is keeping L1 coherent across instances. If Service Instance A invalidates a key in Redis but Service Instance B has that key in its local cache, B will serve stale data until its local TTL expires. The solution is pub/sub: when any instance invalidates a key, it publishes an invalidation message on a Redis channel, and all instances listen and invalidate their L1. This is the pattern below. Why this matters for microservices: network calls to Redis are still network calls. At high QPS (10,000+ RPS per instance), even 1ms of Redis latency adds up. An L1 cache with 90% hit rate eliminates 90% of those Redis round trips. The cost is coherence complexity, which is manageable if you accept very short L1 TTLs (30-60 seconds) and tolerate minor staleness.
// L1: Local in-memory cache (fastest)
// L2: Distributed Redis cache (shared)
// L3: Database (source of truth)

const NodeCache = require('node-cache');

class MultiLevelCache {
  constructor(redis) {
    // L1: Local cache (per instance, fast but not shared)
    this.l1 = new NodeCache({
      stdTTL: 60,  // 1 minute
      checkperiod: 30,
      maxKeys: 1000
    });
    
    // L2: Redis (shared across instances)
    this.l2 = redis;
    
    this.l2TTL = 3600;  // 1 hour
  }

  async get(key, fetchFn) {
    // Try L1 (local memory)
    let data = this.l1.get(key);
    if (data !== undefined) {
      console.log(`L1 HIT: ${key}`);
      return data;
    }

    // Try L2 (Redis)
    const cached = await this.l2.get(key);
    if (cached) {
      console.log(`L2 HIT: ${key}`);
      data = JSON.parse(cached);
      
      // Populate L1
      this.l1.set(key, data);
      return data;
    }

    // L3: Fetch from source
    console.log(`CACHE MISS: ${key}`);
    data = await fetchFn();
    
    if (data !== null && data !== undefined) {
      // Populate both caches
      this.l1.set(key, data);
      await this.l2.setEx(key, this.l2TTL, JSON.stringify(data));
    }

    return data;
  }

  async invalidate(key) {
    // Invalidate both levels
    this.l1.del(key);
    await this.l2.del(key);
  }

  async invalidateLocal(key) {
    // Invalidate only local cache (for pub/sub invalidation)
    this.l1.del(key);
  }
}

// Cross-instance invalidation with Redis Pub/Sub
class CacheCoordinator {
  constructor(multiLevelCache, redis) {
    this.cache = multiLevelCache;
    this.redis = redis;
    this.channel = 'cache-invalidation';
    this.instanceId = process.env.INSTANCE_ID || require('crypto').randomUUID();
  }

  async setup() {
    // Subscribe to invalidation messages
    const subscriber = this.redis.duplicate();
    await subscriber.connect();
    
    await subscriber.subscribe(this.channel, (message) => {
      const { key, sourceInstance } = JSON.parse(message);
      
      // Don't process our own messages
      if (sourceInstance === this.instanceId) return;
      
      // Invalidate local cache
      this.cache.invalidateLocal(key);
      console.log(`Received invalidation for ${key} from ${sourceInstance}`);
    });
  }

  async invalidate(key) {
    // Invalidate locally and in Redis
    await this.cache.invalidate(key);
    
    // Notify other instances
    await this.redis.client.publish(this.channel, JSON.stringify({
      key,
      sourceInstance: this.instanceId
    }));
  }
}

Request Coalescing (Thundering Herd Prevention)

The thundering herd problem is one of those issues that never shows up in testing but can take down production. Here is the scenario: a popular cache key expires. In the milliseconds before the cache is repopulated, 500 concurrent requests all see a cache miss and all hit the database simultaneously. Your database connection pool is exhausted, queries start timing out, and now you have a cascading failure. Request coalescing ensures that only one of those 500 requests actually fetches from the database — the other 499 wait for that one result. The insight: if 500 requests are asking the database for the same thing at the same time, you only need to ask the database once. Everyone else can share the answer. This is sometimes called “singleflight” (borrowed from Go’s golang.org/x/sync/singleflight). In Python, the natural primitive is an asyncio.Future or asyncio.Event that later arrivals await on while the first arrival does the actual work. Why this matters for microservices specifically: thundering herds cascade. If your product service gets hammered, its database gets hammered, which makes its queries slower, which makes the initial requests time out, which causes retries, which amplifies the problem. Request coalescing breaks this cycle at the source — the database sees one query instead of 500, and the 499 other callers experience a slight delay rather than a failure.
// Prevent multiple identical requests from hitting the database

class CoalescingCache {
  constructor(cache) {
    this.cache = cache;
    this.pending = new Map();  // In-flight requests
  }

  async get(key, fetchFn, ttl = 3600) {
    // Check cache first
    const cached = await this.cache.get(key);
    if (cached) {
      return JSON.parse(cached);
    }

    // Check if request is already in flight
    if (this.pending.has(key)) {
      console.log(`Coalescing request for ${key}`);
      return this.pending.get(key);
    }

    // Create new request
    const promise = this.fetchAndCache(key, fetchFn, ttl);
    this.pending.set(key, promise);

    try {
      const result = await promise;
      return result;
    } finally {
      this.pending.delete(key);
    }
  }

  async fetchAndCache(key, fetchFn, ttl) {
    const data = await fetchFn();
    
    if (data !== null && data !== undefined) {
      await this.cache.setEx(key, ttl, JSON.stringify(data));
    }
    
    return data;
  }
}

// Usage - even with 100 concurrent requests, only 1 DB query
const cache = new CoalescingCache(redis);

// Simulate 100 concurrent requests for same product
const requests = Array(100).fill().map(() =>
  cache.get('product:123', () => productRepository.findById('123'))
);

const results = await Promise.all(requests);
// Only 1 database query was made!

Cache Warming

Cold caches are a silent killer of newly deployed services. Your system tests passed with a warm cache, so latency looks like 5ms. You deploy. For the first few minutes, the cache is empty, every request is a miss, and latency is 100ms. If that first cold-start traffic is heavy, the database gets hammered and cascades into broader failure. Cache warming preempts this: before the service accepts production traffic, pre-load the cache with data you know will be requested. What to warm: the Pareto principle applies. Typically 10% of cache keys serve 80% of traffic. Identify those hot keys (featured products, popular categories, top-viewed items from yesterday’s logs) and pre-populate them. You do not need to warm everything — the long tail can be populated on demand. Some teams go further with “continuous warming”: a background job periodically refreshes the hot keys so they never expire. Cache warming is also part of your deployment strategy. If you use Kubernetes rolling updates, new pods should warm their local cache during their readiness check, before traffic routes to them. This way, the load balancer only sends traffic to a pod after its cache is useful. Otherwise, the first pod to receive traffic after a deploy will have a 100% miss rate and degrade user experience until it warms up naturally.
// Pre-populate cache before traffic hits

class CacheWarmer {
  constructor(cache, repository) {
    this.cache = cache;
    this.repository = repository;
  }

  async warmAll() {
    console.log('Starting cache warming...');
    
    await Promise.all([
      this.warmFeaturedProducts(),
      this.warmCategories(),
      this.warmPopularProducts(),
      this.warmConfigurations()
    ]);
    
    console.log('Cache warming complete');
  }

  async warmFeaturedProducts() {
    const products = await this.repository.getFeaturedProducts();
    
    for (const product of products) {
      await this.cache.set(`product:${product.id}`, product, 3600);
    }
    
    await this.cache.set('products:featured', products, 3600);
    console.log(`Warmed ${products.length} featured products`);
  }

  async warmCategories() {
    const categories = await this.repository.getAllCategories();
    
    for (const category of categories) {
      await this.cache.set(`category:${category.id}`, category, 7200);
      
      // Also warm products per category
      const products = await this.repository.getProductsByCategory(category.id);
      await this.cache.set(`products:category:${category.id}`, products, 3600);
    }
    
    await this.cache.set('categories:all', categories, 7200);
    console.log(`Warmed ${categories.length} categories`);
  }

  async warmPopularProducts() {
    // Top 100 most viewed products
    const products = await this.repository.getPopularProducts(100);
    
    for (const product of products) {
      await this.cache.set(`product:${product.id}`, product, 3600);
    }
    
    console.log(`Warmed ${products.length} popular products`);
  }

  async warmConfigurations() {
    const configs = await this.repository.getAllConfigurations();
    
    for (const [key, value] of Object.entries(configs)) {
      await this.cache.set(`config:${key}`, value, 86400);  // 24 hours
    }
    
    console.log(`Warmed ${Object.keys(configs).length} configurations`);
  }
}

// Run on startup
app.on('ready', async () => {
  const warmer = new CacheWarmer(cache, repository);
  await warmer.warmAll();
});

// Also run periodically
setInterval(async () => {
  await warmer.warmPopularProducts();
}, 30 * 60 * 1000);  // Every 30 minutes

Cache Failure Handling

Caveats and Common Pitfalls with Cache FailureCache failure is the single most common cause of large-scale outages at companies with mature microservice architectures. The pitfalls:
  • “Fail open” becomes “fail catastrophic.” The intuitive default is “if cache is down, go to the database.” In a low-traffic system this works. At scale, it converts a cache outage into a database outage in under 60 seconds. Your system assumed a 95 percent cache hit rate and a database sized for 5 percent of total traffic; now the database gets 100 percent and capacity-panics.
  • Cache warming on recovery triggers a second stampede. Redis comes back after a 10-minute outage. Suddenly every service rushes to repopulate, sending a flood of reads to the database and a flood of writes to Redis simultaneously. The recovering cache cluster can actually go down again under the write pressure of being re-warmed. This is a real, repeated failure mode.
  • Local fallback caches grow unbounded and cause OOM kills. Under Redis outage, services fall back to in-process caches. Without size limits, these caches grow until the JVM/Node process hits its memory limit and gets OOM-killed by Kubernetes. The “fallback” makes the outage worse by killing pods.
  • Circuit breaker tuning is invisible until it matters. Teams set the circuit breaker to “open after 50 percent error rate over 10 seconds” and never revisit. During a partial Redis degradation (say, 15 percent error rate), the breaker never opens, and the service accumulates connection timeouts until the connection pool exhausts.
Solutions and Patterns for Cache Failure Resilience
  • Design for the cache-off scenario up front. Load test your system with the cache completely disabled. Whatever latency and throughput you measure is your true floor. Capacity-plan the database to handle at least a significant fraction of peak traffic without the cache, not just the cache-miss rate. This is not free, but it is what keeps cache failure from becoming database failure.
  • Multi-tier caching with bounded local fallback. L1 is a size-bounded in-process LRU cache (typically tens of megabytes, tens of thousands of entries). L2 is Redis. Even during a full Redis outage, L1 absorbs the hot keys and dramatically flattens the spike to the database. Use libraries with explicit size limits: Caffeine (Java), lru-cache (Node), cachetools (Python).
  • Staggered cache warming on recovery. Instead of warming the entire cache at once, warm by key-popularity percentile with jitter. Top 1 percent of keys warmed first over 30 seconds with random delay, next 10 percent over the following minute, and so on. This prevents the recovery thundering herd.
  • Circuit breaker on the cache client, not just the database. When Redis latency climbs past a threshold, stop calling Redis entirely for some fraction of requests and go straight to L1 or the database. Half-open the breaker periodically to test recovery. This prevents slow-but-alive Redis from degrading the service worse than fully-dead Redis would.
  • Treat cache as a capacity multiplier, not a dependency. Every feature that “requires” the cache to function is a latent outage. If a feature cannot work without cache hits, either make the underlying database path faster (indexes, read replicas) or accept that the feature will degrade visibly when the cache fails. Hiding the degradation is worse than degrading visibly.
When Redis dies, what happens to your service? The honest answer for most teams is “we find out.” Designing for cache failure up front costs a bit more complexity, but it means your service survives cache outages instead of cascading into a full outage. The goal is graceful degradation: when Redis is healthy, serve from cache (fast). When Redis is unhealthy, serve from the database or a local fallback (slower, but still working). Never let a cache outage become an application outage. The resilient cache below uses three techniques working together. First, health monitoring: a background ping tells us when Redis comes and goes. Second, circuit-breaker-style behavior: when Redis is known to be down, do not even try to call it — skip straight to the fallback. Third, a local memory fallback: keep a small in-process cache of recently-seen values, so that during a Redis outage, popular keys continue to be served from local memory while the long tail falls through to the database.
// Graceful degradation when cache fails

class ResilientCache {
  constructor(redis, options = {}) {
    this.redis = redis;
    this.fallbackTTL = options.fallbackTTL || 60;  // 1 minute local fallback
    this.localFallback = new Map();
    this.isHealthy = true;
    
    // Monitor health
    this.startHealthCheck();
  }

  startHealthCheck() {
    setInterval(async () => {
      try {
        await this.redis.client.ping();
        if (!this.isHealthy) {
          console.log('Redis connection restored');
          this.isHealthy = true;
          this.localFallback.clear();  // Clear stale fallback data
        }
      } catch (error) {
        if (this.isHealthy) {
          console.error('Redis connection lost:', error.message);
          this.isHealthy = false;
        }
      }
    }, 5000);  // Check every 5 seconds
  }

  async get(key) {
    if (!this.isHealthy) {
      // Use local fallback
      const fallback = this.localFallback.get(key);
      if (fallback && fallback.expires > Date.now()) {
        return fallback.value;
      }
      return null;
    }

    try {
      const data = await this.redis.get(key);
      
      if (data) {
        // Update local fallback
        this.localFallback.set(key, {
          value: data,
          expires: Date.now() + (this.fallbackTTL * 1000)
        });
      }
      
      return data;
    } catch (error) {
      console.error(`Cache get error for ${key}:`, error.message);
      this.isHealthy = false;
      
      // Try local fallback
      const fallback = this.localFallback.get(key);
      return fallback?.value || null;
    }
  }

  async set(key, value, ttl = 3600) {
    if (!this.isHealthy) {
      // Store in local fallback only
      this.localFallback.set(key, {
        value,
        expires: Date.now() + (Math.min(ttl, this.fallbackTTL) * 1000)
      });
      return false;
    }

    try {
      await this.redis.setEx(key, ttl, value);
      
      // Also update local fallback
      this.localFallback.set(key, {
        value,
        expires: Date.now() + (this.fallbackTTL * 1000)
      });
      
      return true;
    } catch (error) {
      console.error(`Cache set error for ${key}:`, error.message);
      this.isHealthy = false;
      
      // Store in local fallback
      this.localFallback.set(key, {
        value,
        expires: Date.now() + (this.fallbackTTL * 1000)
      });
      
      return false;
    }
  }

  getHealth() {
    return {
      healthy: this.isHealthy,
      fallbackSize: this.localFallback.size
    };
  }
}

Interview Questions

Answer:Cache-Aside (Lazy Loading):
  1. Application checks cache first
  2. On miss, fetch from database
  3. Store in cache for future requests
const data = await cache.get(key);
if (!data) {
  data = await db.fetch(key);
  await cache.set(key, data);
}
return data;
Pros:
  • Only cache what’s needed
  • Cache failures don’t break reads
  • Simple to implement
Cons:
  • First request is slow (cache miss)
  • Potential stale data
  • Three round trips on miss
Use for: Most read-heavy scenarios
Answer:Strategies:
  1. TTL-based: Set expiration, accept staleness
  2. Event-based: Publish events on writes, subscribers invalidate
  3. Tag-based: Group related entries, invalidate by tag
  4. Version-based: Include version in key, change on update
Best Practice for Microservices:
  • Publish events when data changes
  • Each service invalidates its own cache
  • Use short TTLs as safety net
// On update
await db.update(product);
await eventBus.publish('product.updated', { productId });

// Consumers invalidate their caches
eventBus.on('product.updated', (e) => cache.del(`product:${e.productId}`));
Answer:Problem: When cache expires, many concurrent requests hit database simultaneously.Solutions:
  1. Request Coalescing: Single request, others wait
if (pending.has(key)) return pending.get(key);
pending.set(key, fetchFromDB(key));
  1. Probabilistic Early Expiration: Randomly refresh before TTL
  2. Background Refresh: Refresh in background before expiry
  3. Locking: Only one process refreshes cache
if (await lock.acquire(key)) {
  data = await fetchFromDB();
  await cache.set(key, data);
  lock.release(key);
}
Answer:Write-Through:
  • Write to cache AND database in same operation
  • Strong consistency
  • Slower writes (two writes)
Use for: Data that must be consistent, user profilesWrite-Behind (Write-Back):
  • Write to cache first, async to database
  • Fast writes
  • Risk of data loss if cache fails
Use for: Analytics, metrics, high-frequency updatesDecision factors:
  • Consistency requirements
  • Write frequency
  • Tolerance for data loss
Answer:Challenges:
  • Each instance has local cache
  • Invalidation must reach all instances
  • Data inconsistency between instances
Solutions:
  1. Distributed Cache Only (Redis)
    • No local cache
    • Always consistent
    • Higher latency
  2. Multi-Level Cache + Pub/Sub
    • L1: Local (fast)
    • L2: Redis (shared)
    • Invalidate via pub/sub
  3. Short TTL for Local Cache
    • Local cache: 60s
    • Redis cache: 1 hour
    • Accept brief inconsistency
// Pub/sub invalidation
redis.subscribe('cache-invalidate', (key) => {
  localCache.del(key);
});

Chapter Summary

Key Takeaways:
  • Choose the right caching pattern (cache-aside, write-through, write-behind)
  • Implement robust invalidation strategies
  • Handle cache failures gracefully with fallbacks
  • Use multi-level caching for performance
  • Prevent thundering herd with request coalescing
  • Warm caches proactively for predictable latency
Next Chapter: Load Balancing - Distribution strategies for microservices.

Interview Deep-Dive

Strong Answer:This is the thundering herd problem (also called cache stampede). When the cache is unavailable, every request that would have been a cache hit becomes a database query. If your cache hit rate is 95%, you suddenly have 20x the database load (going from 5% to 100% of requests hitting the database). The database cannot handle 20x its normal load and collapses.Prevention requires multiple layers. First, circuit breaker on the database: if database connection pool utilization exceeds 80%, start rejecting new queries with a 503 rather than queuing them until the database runs out of connections and crashes. This is the “shed load gracefully” principle — some failed requests are better than a total system crash.Second, fallback to stale cache data. When Redis goes down, do not immediately go to the database. Use a local in-memory cache (LRU cache in the process) as a second tier. The local cache has a shorter TTL and smaller capacity, but it absorbs the most frequent queries. Even a 50% local hit rate halves the database load during a Redis outage.Third, request coalescing (also called “single flight”). When multiple concurrent requests ask for the same cache key and all get a miss, only one request should query the database. The others wait for the first request to complete and share its result. This prevents 100 concurrent requests for the same popular product from all hitting the database simultaneously.Fourth, cache warming on recovery. When Redis comes back, do not wait for natural traffic to repopulate the cache. Run a warming script that pre-loads the top 1,000 most frequently accessed keys from the database. This prevents a second thundering herd when traffic shifts back from the local cache to the recovering Redis.Follow-up: “How do you design the local in-memory cache to avoid memory issues in a containerized environment?”The local cache must be size-bounded (not TTL-bounded) to prevent OOM kills. I use an LRU (Least Recently Used) cache with a fixed maximum number of entries. For a Node.js service with 512MB memory limit, I allocate at most 50MB for the local cache (roughly 10,000-50,000 entries depending on value size). The cache uses a Map with a doubly-linked list for LRU eviction. Libraries like lru-cache in Node.js handle this efficiently. Monitoring is critical: I expose the cache hit rate and size as Prometheus metrics so I can tell whether the local cache is actually helping during Redis outages.
Strong Answer:This is the distributed cache invalidation problem, and it has four solutions with different trade-off profiles.Option one: event-driven invalidation. When Service B updates a profile, it publishes a UserProfileUpdated event to Kafka. Service A subscribes to this event and invalidates (or updates) the corresponding cache entry. Staleness window: seconds (the time for the event to propagate through Kafka). This is the approach I recommend for most cases because it is decoupled and reliable.Option two: write-through from Service B to Service A’s cache. When Service B updates a profile, it also writes the updated data directly to Service A’s Redis cache (or sends a cache invalidation message). Staleness window: milliseconds. But it creates coupling — Service B needs to know about Service A’s caching strategy.Option three: shorter TTL. Reduce the cache TTL from 1 hour to 30 seconds. Staleness window: 30 seconds maximum. Simple but increases database load by 120x (one query every 30 seconds versus every hour per user). Only viable if the dataset is small or the database can handle the load.Option four: cache-aside with version checking. Service A stores the cache entry with a version number. On every read, Service A makes a lightweight “version check” call to Service B (just returns the current version, not the full profile). If the version matches, use the cache. If not, fetch fresh data. This is a hybrid between caching and fresh reads.For user profiles specifically, I use option one (event-driven invalidation) because profile updates are infrequent (most users update their profile rarely), so the event volume is manageable, and the seconds of propagation delay is acceptable. For pricing data (which changes frequently and staleness has financial impact), I would use option four with version checking.Follow-up: “What if the Kafka event is delayed or lost? Service A still serves stale data.”I add a maximum TTL as a safety net. Even with event-driven invalidation, the cache entry has a TTL of 1 hour. If the invalidation event is lost, the worst case is 1 hour of stale data, not infinite staleness. For critical data where even 1 hour is too long, I add a “cache verification” background job that periodically samples cached entries and verifies them against the source. Any stale entries are invalidated. This belt-and-suspenders approach ensures that event loss does not lead to permanently stale cache entries.
Strong Answer:Cache-aside (lazy loading): the application checks the cache first, fetches from the database on miss, and populates the cache. The application manages the cache explicitly. I use this for most read-heavy workloads because it is the safest: if the cache goes down, the application still works (it just hits the database). The downside is the first request for any data is always slow (cache miss), and there is a consistency window between a write and the next cache read.Write-through: the application writes to both the cache and the database simultaneously (or the cache writes to the database on behalf of the application). Every write immediately updates the cache, so reads are always fresh. I use this for data that is read immediately after writing — for example, after a user updates their profile, the next page load should show the updated profile. The downside: write latency increases because you are writing to two stores, and if the cache is down, writes fail (unless you add fallback logic).Write-behind (write-back): the application writes to the cache, and the cache asynchronously writes to the database later. This gives the lowest write latency because the application only waits for the cache write (microseconds) not the database write (milliseconds). I use this for high-throughput writes where some data loss is acceptable — for example, page view counters, analytics events, or session updates. The risk: if the cache crashes before flushing to the database, those writes are lost. This is not acceptable for financial data.In a microservices architecture, I use cache-aside for 80% of caching needs (product catalogs, user profiles, configuration). Write-through for the session service (session updates must be immediately visible across all service instances). Write-behind for the analytics service (high write volume, eventual consistency is fine, losing a few page view events is acceptable).The pattern I explicitly avoid: read-through caching at the database level (like MySQL query cache). It adds complexity inside the database, is difficult to invalidate correctly, and was removed from MySQL 8.0 for good reason. Application-level caching with Redis gives you more control over invalidation, TTL, and monitoring.Follow-up: “How do you handle cache warming for a new service deployment that starts with an empty cache?”Cold start is a real problem — the first few minutes after deployment have 0% cache hit rate, which means 100% database load. I use two strategies. First, pre-warming: during the deployment’s readiness check, the service loads the top N most-accessed keys from a pre-computed list (based on yesterday’s access patterns). Second, gradual traffic shift: using Kubernetes rolling updates with maxSurge: 1, new pods receive traffic gradually while old pods (with warm caches) still handle most requests. By the time the old pods are terminated, the new pods have warmed up through natural traffic.

Interview Questions with Structured Answers

Strong Answer Framework
  1. Stabilize first, debug second. If latency is user-visible and the database is near capacity, roll back the deploy immediately. Debugging a live incident on a cold cache is a bad trade. Only investigate on the running-but-rolled-back state or in a reproduction environment.
  2. Pull the three obvious signals in parallel. Cache keyspace size (did total keys drop or spike?), cache hit rate broken down by key pattern (is it one pattern or everything?), and database query mix (which queries are hitting the database that were previously cached?). These three answer “what changed” in under five minutes.
  3. Hypothesis one: cache key format changed. The most common cause. The new deploy includes a refactor that changed how keys are constructed — maybe added a user locale, a feature flag, or a version prefix. Every old cache entry is now orphaned; every new request is a miss. Diagnosis: grep recent PRs for cache key construction, diff the key format with redis-cli --scan --pattern samples, verify that new keys look different from old keys.
  4. Hypothesis two: TTL was accidentally shortened. Someone changed 300 to 30 in a config or env var. Every entry expires in 30 seconds instead of 5 minutes, so the hit rate craters. Diagnosis: check recent config changes, log cache-set calls with their TTL, compare current TTL distribution to pre-deploy baseline.
  5. Hypothesis three: serialization changed. The deploy moved from JSON to Protobuf, or changed a schema version prefix. New writers encode the new format; old readers cannot deserialize it and treat it as a miss. Diagnosis: look for deserialize errors in logs, check for a recent library upgrade that changed default serialization, verify round-trip for a single known key.
  6. Hypothesis four (often missed): cache sharding / routing changed. A deploy updated the Redis client library and the consistent-hashing algorithm subtly changed. Keys that used to live on shard A now live on shard B; shard B has empty cache; every lookup misses. Diagnosis: check the Redis cluster slot distribution before and after, verify CRC32 or similar hash output for a sample key.
  7. Finally, long-tail hypotheses. Eviction policy change (allkeys-lru to volatile-lru), memory limit lowered so entries get evicted aggressively, new traffic pattern (a new feature generates high-cardinality keys that blow out the cache), or a bug where the service is writing but never reading (cache key used on read path differs from write path).
Real-World ExampleFacebook had a widely-discussed incident around 2019 where a memcached client library upgrade silently changed key hashing. Hit rates across multiple services dropped to roughly their steady-state miss rates, and databases were overwhelmed within minutes. The fix was to revert the library. The postmortem lesson was that cache key transformations must be treated as breaking changes with explicit migration paths, not invisible implementation details. Similar stories have been published by LinkedIn (kafka-related cache invalidation), Twitter (timeline cache re-sharding), and Shopify (Rails cache key fingerprints changing across framework upgrades).Senior Follow-up Questions
Q: “What if the hit rate drop is not uniform, but only affects a subset of users?”This is a strong signal for a cache-key or serialization bug scoped to a specific user cohort. Check whether the affected users share some attribute: locale, feature flag state, account tier, geographic region. If EU users are affected but US users are not, the bug is probably in the EU-specific code path or in a flag that routes EU traffic differently. Split the hit rate metric by the suspected dimension and confirm before fixing. Partial-impact bugs often indicate a feature flag rollout that was assumed to be safe but changed cache semantics.
Q: “Your database is dying while you debug. What do you do right now?”Shed load. Temporarily disable the feature that is generating the uncached traffic if possible, or serve stale-but-acceptable data from a secondary cache tier, or rate-limit the affected endpoint at the API gateway. The principle: reduce the load the database is seeing before you fix the root cause. A failed endpoint serving 429s is better than an overloaded database that cascades into platform-wide failure. Once the database is stable, debug the cache issue offline. Many engineers freeze here because shedding load “feels wrong”; doing nothing is worse.
Q: “You roll back the deploy and hit rate recovers. How do you prevent this specific class of bug in future deploys?”Two controls. First, make cache-key and cache-serialization changes require explicit migration code that can run against existing entries, and gate deploys that would orphan more than some threshold of cached keys behind a manual approval. Second, add a canary for cache hit rate as a first-class deploy guardrail: if the new version running at 1 percent traffic shows a more-than-a-few-percent drop in hit rate versus the baseline, the deploy auto-rolls back before reaching 100 percent. Treating cache hit rate as a release-blocking SLO catches these bugs during canary, not during full rollout.
Common Wrong Answers
  • “I would increase the cache size to compensate.” This misdiagnoses the symptom as the cause. If keys are being generated differently or serialized incorrectly, a bigger cache does not help — you will fill it with the wrong things. Scaling up before diagnosing wastes money and delays the real fix.
  • “I would lower the TTL to refresh everything sooner.” Lowering the TTL makes the problem worse. The hit rate is low because entries are missing or malformed; refreshing more often means more database load, not less. The correct instinct under a hit-rate crash is to stabilize and diagnose, not tune.
Further Reading
  • “Scaling Memcache at Facebook” (Nishtala et al., NSDI 2013) — foundational paper on operating cache at scale and the failure modes to expect.
  • “On Consistent Hashing and Random Trees” (Karger et al., 1997) — the algorithmic underpinnings of why a client-library upgrade can silently break your cache topology.
  • Netflix Tech Blog posts on EVCache — practical coverage of cross-region cache consistency and the debugging stories behind their current architecture.
Strong Answer Framework
  1. Name the failure modes up front. At the moment of TTL expiration, multiple concurrent misses create three classical bugs: thundering herd on the source database, cache stampede compounding under load, and (if writes are involved) a race where one user’s write overwrites another user’s fresh cache entry.
  2. Address thundering herd with request coalescing. Implement single-flight. When N concurrent requests miss for the same key, only one queries the database. Others wait on a shared promise or channel. N becomes 1 regardless of concurrency level. Libraries: Go singleflight.Group, Node promise-memoize, Python aiocache with lock parameter.
  3. Layer on probabilistic early expiration. Before natural TTL, start refreshing probabilistically (the XFetch algorithm). A small random fraction of requests triggers an async refresh while still serving the current cached value. By the time natural expiration hits, the value has usually been refreshed already.
  4. Address the concurrent-write race with cache versioning. If the page content can be updated between A’s miss and B’s miss, A’s slow computation could overwrite B’s fresh value. Attach a version (ETag, monotonic counter) to each cache write. Use SET key value EX ttl NX (Redis) or a compare-and-swap operation to reject writes that arrive after a newer version.
  5. Address cache poisoning. If the source returns a 500 error or malformed response, do not cache it. Validate the response shape before caching. For transient errors, consider a short negative-cache TTL (seconds, not minutes) so a broken backend does not poison reads for the whole TTL window.
  6. Plan the failure mode. What happens if the source is slow (10 seconds instead of 50 ms)? The single-flight request blocks; all other waiters block with it. Add a timeout on the source call and a fallback: if the source is slow, serve the stale cache entry past its TTL (stale-while-revalidate) while logging the backend degradation.
Real-World ExampleFastly and Cloudflare both publicly document their stale-while-revalidate and origin-shielding behavior for exactly this case. When a popular URL expires at the edge, Fastly’s shield PoP coalesces origin requests across all edge PoPs so the origin sees exactly one request regardless of global concurrency. This is the same pattern applied at the CDN layer. Redis itself, at the application cache layer, does not do this for you — you have to implement coalescing in the client.Senior Follow-up Questions
Q: “What if the cached value is personalized per user? Does coalescing still work?”Coalescing only works when the cache key is shared. For per-user data, each user has a distinct key, so coalescing across users does not apply. Instead, the relevant protection is against a single user’s rapid retries — if a user refreshes the same page 20 times in a second while their cache is missing, coalesce their own concurrent requests. For truly per-user cold starts, the volume is bounded by user count, and the thundering herd concern is lower. Thundering herd is mostly a problem for shared hot keys.
Q: “Under probabilistic early expiration, you are doing more backend work than strictly necessary. How do you tune the parameter?”The XFetch algorithm has a parameter (often called beta) that controls how aggressively to pre-refresh. Too low: you refresh exactly at TTL and get stampedes. Too high: you refresh every request and defeat the point of caching. In practice, you want the expected number of pre-refreshes per key per TTL window to be around 1-2. Tune by observing: if you see stampede symptoms (latency spike at exact TTL expiration), increase beta. If you see the backend doing unnecessarily high work during steady state, decrease it. A beta around 1.0 is a typical starting point.
Q: “What if the data being cached changes mid-computation — say, a product’s price updates while you are mid-fetch?”This is the cache invalidation race. The defensive pattern: include the source’s version or last-modified timestamp in the cached value. On write, use compare-and-swap: only write if the version you just fetched is newer than the version currently cached. If the current cached version is newer (because someone else updated in between), discard your value and re-read. Redis Lua scripts are a good fit because they give you atomicity across the get-check-set sequence. Without this, you can end up with the cache permanently one version behind reality.
Common Wrong Answers
  • “I would use a shorter TTL so the window of inconsistency is smaller.” Shortening the TTL increases the frequency of cache misses and therefore increases the thundering herd problem. The issue is not the TTL length; it is the concurrent behavior at expiration. Coalescing and early expiration address the real bug.
  • “I would lock the key in Redis for the duration of the backend fetch.” Distributed locking has its own failure modes (lock holder dies, TTL on lock expires mid-fetch, network partitions). In-process single-flight (coalescing within one service instance) is usually enough, and it does not introduce a new distributed correctness problem. Redlock is rarely justified for cache-fill races.
Further Reading
  • “Optimal Probabilistic Cache Stampede Prevention” (Vattani, Chierichetti, Lowenstein) — the foundational XFetch paper.
  • Fastly’s documentation on stale-while-revalidate and origin shielding — a worked-example of stampede prevention at the CDN layer.
  • “Caches: the price of admission” on martinfowler.com — broader treatment of caching trade-offs with stampede prevention as one of several concerns.
Strong Answer Framework
  1. Name this clearly: personalization bleed / cache key collision. Two logically different values ended up under the same cache key, and one user’s request returned another user’s data. This is simultaneously a correctness bug, a privacy issue (if recommendations reveal anything about the other user), and potentially a regulated incident under GDPR or CCPA.
  2. Escalate appropriately. If any PII or sensitive behavioral data is being leaked across users, trigger the security incident process. Even “innocent” recommendation data can encode protected categories (religion, health) via what the user has viewed. Do not debug silently; inform the security and privacy teams.
  3. Hypothesize the root cause. Most likely: the cache key did not include the user_id, or included only a hashed session-scoped value that collided. Possible variations: the key used a tenant id and forgot the user, or used a user id that happens to reset on logout and be reused.
  4. Find the immediate leak. Flip a feature flag to bypass the cache and serve all recommendations directly from the source. Latency will increase; correctness is restored. Do this before continuing diagnosis. Never continue to serve wrong data while you debug.
  5. Audit the cache contents. Scan a sample of entries. For each, compare “who this was computed for” (from the original request context, traceable via logs) versus “who this is keyed under” (the cache key). Any mismatch is an instance of the bug.
  6. Fix the cache key permanently. The cache key must include every dimension of variation. For personalized data, that always includes the user identifier (ideally a stable, server-issued user id, not a session id). Consider adding a namespace prefix per service and a schema version, so accidental collisions across services or across key-schema changes are impossible.
  7. Add guardrails. Validate cache responses against the request context at read time: the cached payload includes a “for_user_id” field, and the read path rejects it if it does not match the requesting user. This is belt-and-suspenders defense against future cache-key regressions.
  8. Close out with a postmortem. Publish what happened, which users were affected, what data was exposed, what the fix is, and what detection (canary, monitoring, contract test) will prevent a recurrence.
Real-World ExampleIn around 2018-2019, Steam had a cache misconfiguration at the CDN layer that caused users to see other users’ account pages during a holiday traffic spike. The root cause was a caching layer treating logged-in content as cacheable and not including the auth context in the cache key. The incident was publicly acknowledged, and the fix involved both the immediate cache flush and a deeper change to mark authenticated endpoints as no-cache at the CDN layer. The same class of bug has hit multiple CDN-backed e-commerce sites over the years — it is not a rare failure mode.Senior Follow-up Questions
Q: “How do you prevent this class of bug at design time, before you need to debug it?”Make cache keys explicit first-class objects in your codebase. Instead of cache.get("recs_" + userId), have a RecsCacheKey(userId, locale, featureFlagVersion) class with a single construction path. Every field is required; the compiler or type system enforces it. Then write a contract test that shows “different users produce different keys” and run it on every build. Combine this with cache-response validation (the payload carries the for_user_id, rejected if mismatched). Together, these make personalization bleed structurally impossible rather than just unlikely.
Q: “What if the leak only happens under high load, not in normal traffic? Where do you look?”High-load-only bugs in caching usually come from shared mutable state in the request-handling path. A common culprit: an object reused across requests (a pre-allocated buffer, a thread-local that was not cleared, a ThreadLocal in a thread-pool executor where pooled threads carry state between users). Another: the cache client library returning a shared mutable reference rather than a defensive copy, and one request mutating the object while another reads it. Under low load, requests rarely overlap on the same pool thread; under high load, they do. Diagnose with load testing under thread contention and memory analysis tools.
Q: “You fix the cache key but your logs show the bug was live for three weeks. How do you handle the retroactive privacy implications?”Scope the impact. Query logs to enumerate every user whose requests returned mismatched data (if request-level logging exists) or at least the total number of cache lookups that were serving cross-user data. Engage legal and privacy early. Depending on jurisdiction (GDPR Article 33 requires notification within 72 hours of awareness), you may be legally required to notify affected users and regulators. Do not wait to be told. A clean, proactive disclosure is better than being discovered later. Internally, use this as the forcing function to implement the architectural fixes (explicit key objects, response validation) that prevent the bug class entirely.
Common Wrong Answers
  • “I would just add the user id to the key and deploy the fix.” This fixes the immediate bug but does not address the systemic weakness. There are many other “dimensions of variation” (locale, tenant, A/B test arm) that could produce the same class of bug tomorrow. A senior answer recognizes this as a pattern, not an instance, and proposes structural fixes.
  • “I would flush the cache and it will be fine.” Flushing removes the currently-poisoned entries but the key-construction bug is still in the code; new requests will repopulate the cache with the same collision. Flushing is hygiene, not a fix.
Further Reading
  • “You Cannot Trust the Browser” patterns and CDN cache-control best practices (documented by Fastly, Cloudflare, Akamai) — personalization-bleed is most dangerous at CDN layers where cache keys are URL-based.
  • “GDPR Article 33: Notification of a personal data breach to the supervisory authority” — the specific legal timeline that scopes how fast you must respond.
  • OWASP Web Security Testing Guide, section on cache poisoning and cache deception — adversarial framings of this same class of bug.