Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Caching Strategies
Caching is critical for microservices performance, but it introduces one of computer science’s two hardest problems (the other being naming things, and off-by-one errors). In a monolith, cache invalidation is already tricky. In microservices, Service A caches data that Service B can modify — and now you are dealing with distributed cache invalidation, which is exponentially harder. This chapter covers not just how to cache, but how to cache safely, how to fail gracefully when your cache goes down, and how to prevent the thundering herd problem that has caused more outages than most people realize.- Implement various caching patterns
- Design cache invalidation strategies
- Use Redis for distributed caching
- Handle cache failures gracefully
- Optimize cache hit rates
Why Caching Matters
Before we dive into patterns, let us be precise about why caching is worth the complexity. Every cache you add to a system is a promise you are making: “I can serve this data faster than the source of truth, and I will accept some risk of staleness in exchange.” That trade-off is nearly always worth it for read-heavy workloads because databases are slow compared to in-memory stores (milliseconds vs microseconds) and expensive to scale horizontally. But if you do not understand the trade-off, you will cache things that should not be cached (live pricing, security tokens, personalized feeds) and end up with bugs that are nearly impossible to reproduce. In microservices specifically, caching takes on a new dimension: it reduces coupling. If Service A caches a user’s profile from Service B for five minutes, Service A can continue functioning even if Service B is briefly unavailable. The cache acts as a buffer of eventual consistency that lets services tolerate each other’s failures. This is why caching is not just an optimization — it is a resilience pattern.Caching Patterns
The biggest mistake teams make with caching is treating “add a cache” as a single decision. It is not — there are at least five distinct patterns, each with different consistency, performance, and failure-mode characteristics. Picking the wrong one creates bugs that do not show up in testing but manifest as weird production behavior: “why did this user see stale data for 10 minutes?” or “why did Redis crashing lose us $50,000 in orders?”. Think of these patterns as answers to two questions: (1) when do reads populate the cache? and (2) when do writes update the cache? The patterns below trade safety, freshness, and write throughput in different ways. Cache-aside is the safest and the default for most teams. Write-through guarantees read-your-writes consistency at the cost of write latency. Write-behind is dangerous but fast. Read-through and refresh-ahead are niche patterns you reach for when the first three are not enough.Pattern Comparison
| Pattern | How It Works | Consistency | Performance | Risk | Best For |
|---|---|---|---|---|---|
| Cache-Aside | App checks cache, falls back to DB, writes to cache on miss | Stale reads possible (TTL window) | First request slow (cache miss) | Low (cache failure = slower, not broken) | Most read-heavy workloads; safest default |
| Write-Through | App writes to cache AND DB together | Strong (cache always fresh) | Writes slower (two writes) | Medium (write failure needs handling) | Data that is read immediately after write |
| Write-Behind | App writes to cache, async flush to DB | Weak (cache is ahead of DB) | Fastest writes | High (cache crash = data loss) | Analytics, counters, data you can afford to lose |
| Read-Through | Cache itself fetches from DB on miss | Same as cache-aside | Same as cache-aside | Low | When you want the cache layer to be transparent |
| Refresh-Ahead | Cache proactively refreshes before TTL expires | Strong (always fresh) | Best (no cache misses for hot keys) | Low | Highly predictable access patterns (e.g., homepage data) |
- Default choice: Cache-aside. It is the simplest, safest, and works for 80% of use cases.
- User just updated their profile and sees stale data: Switch to write-through for that specific entity.
- Dashboard counters, view counts, analytics: Write-behind is fine; losing a few counts is acceptable.
- Homepage hero content that must never be stale: Refresh-ahead with a short TTL.
Cache-Aside (Lazy Loading)
Cache-aside is the most common pattern because it is the safest: if your cache goes down entirely, your application still works (it just hits the database for every request). The application is in full control of what gets cached and when. The downside is that the first request for any piece of data is always slow (cache miss), and there is a window between a database write and cache invalidation where stale data can be served. Why this pattern dominates real-world microservices: it decouples the cache from your write path completely. The database is always the source of truth. If Redis is down, writes continue unchanged — they just do not invalidate anything. If Redis is slow, your reads degrade gracefully back to database speed. Compare this to write-through, where a cache outage can block writes entirely. In a distributed system where failures are expected, cache-aside’s “cache is an optimization, not a dependency” mindset is usually what you want. A subtle gotcha: if your invalidation step fails (you updated the DB butcache.del() threw), you are now serving stale data until the TTL expires. Always set a TTL, even when you have invalidation — TTL is your safety net against missed invalidations. Some teams also capture invalidation failures in a dead-letter queue and retry them asynchronously.
- Node.js
- Python
Write-Through
Write-through flips the safety trade-off: every write goes to both the cache and the database synchronously, so reads are always consistent with the last write. This is the pattern you want when users expect “read-your-writes” semantics — after I update my profile picture, I should see it immediately, not in 5 minutes when the TTL expires. But the cost is real: every write is now twice as slow (two systems to write to), and if the cache is down, writes either fail or need a fallback path. Where write-through shines: user profiles, shopping cart state, session data, and any “last write wins” scenarios where the write volume is moderate. Where it hurts: high-write-volume data like analytics events, where doubling every write blows up your Redis. The honest truth is that most teams who say they use write-through actually use write-around-with-invalidation and just do not know the difference — and in most cases that is fine.- Node.js
- Python
Write-Behind (Write-Back)
Write-behind is the most dangerous pattern here because it risks data loss: if Redis crashes before the queued write reaches the database, that data is gone. Use this only for data you can afford to lose (analytics counters, view counts) or where you have other durability guarantees in place. Think about what you are buying with write-behind: raw write throughput. If you are tracking page views on a popular product, every page view is one increment to a counter. Doing that synchronously against PostgreSQL would generate thousands of transactions per second for a single row — guaranteed lock contention and write amplification. With write-behind, you update Redis in memory (microseconds), and a background job batches updates to the database every second or minute. You have traded durability for throughput, and for counters, that is a good trade. The architectural trap: teams use write-behind for “temporary” data, then other services start relying on that data being durable in the database. Now you have an implicit contract that write-behind cannot fulfill. Document clearly that write-behind caches are not durable, and put a monitoring alert on queue depth — if the queue backs up, you know writes are being lost.- Node.js
- Python
Read-Through
Read-through looks identical to cache-aside from the outside, but the key difference is where the “load on miss” logic lives. In cache-aside, the application knows about the cache and the database separately. In read-through, the cache itself knows how to load data — the application just asks the cache and does not care where the data came from. This is appealing because it centralizes the cache-load logic, but it also couples the cache to your data sources, which is why it is less common in practice. When would you pick this? When you have many different consumers of the same data and you want all of them to share the same cache-loading logic. If five services all need to load a user by ID on cache miss, do you want five copies of that logic or one? Read-through centralizes it. The trade-off: the cache layer becomes stateful and service-like, which complicates deployment and testing.- Node.js
- Python
Redis Implementation
Redis has become the de facto standard for distributed caching in microservices, and it is worth understanding why that happened rather than just accepting it. Redis wins because it is single-threaded (so no lock contention within a node), stores everything in memory (sub-millisecond reads), supports rich data structures (hashes, sorted sets, streams) beyond simple key-value, and has robust clustering for horizontal scaling. The alternatives — Memcached is simpler but less capable; Hazelcast and Apache Ignite are richer but heavier — each win on specific axes, but Redis hits the sweet spot for most teams. A word of caution: Redis is not a durable database. When you see teams using Redis as a “fast database” for critical data, that is usually a mistake waiting to happen. AOF and RDB persistence help but do not eliminate data loss on crash. Treat Redis as a cache and rebuild state from the source of truth when needed. When people try to use Redis as both the hot path and the durability layer, they end up with a system that loses data during failovers and is slower to recover than it would be with proper database-backed architecture.Connection and Configuration
The connection logic below handles three production-critical concerns: reconnection with backoff (your Redis cluster will fail over at some point), cluster mode (for horizontal scaling beyond a single node), and error handling that does not crash your application. A naive Redis client that throws on every connection error will cascade into a full application restart — which is exactly what you do not want in a distributed system. Instead, treat Redis errors as transient and let the fallback path (hit the database) handle them.- Node.js
- Python
Caching Service
A thin wrapper around the raw Redis client is one of those patterns that seems unnecessary at first but pays off enormously over time. By centralizing key prefixing, serialization, TTL defaults, and pattern-based invalidation in one place, you avoid the “twelve services, twelve slightly different cache key formats” problem. When you need to debug a cache entry, having a consistentservice-name:entity:id prefix makes grep-ing Redis trivial. When you need to migrate cache shapes, you change one file instead of hunting through every service.
Pay attention to invalidatePattern specifically. The naive way to implement this is KEYS pattern* — and it will work fine in development with 100 keys. In production with 10 million keys, KEYS blocks the Redis server for seconds and causes an outage. SCAN is cursor-based and non-blocking, processing keys in small batches. This is one of the most common production mistakes I see in Redis deployments: KEYS used anywhere in application code.
- Node.js
- Python
Cache Invalidation
“There are only two hard things in Computer Science: cache invalidation and naming things.” — Phil Karlton. This quote is repeated endlessly, but most engineers do not internalize why cache invalidation is hard until they have shipped a bug caused by it. The difficulty is not the mechanism (delete a key from Redis is trivial); it is knowing when to invalidate and which keys depend on which data. In a microservices system where Service B might change data that Services A, C, and D all cache, figuring out the complete invalidation graph is a genuinely unsolved problem. Strategies below exist on a spectrum of precision vs complexity. TTL is imprecise but simple — accept 5 minutes of staleness and do nothing else. Event-based is precise but requires messaging infrastructure and careful dependency tracking. The right answer depends on your tolerance for staleness and your operational maturity.Event-Based Invalidation
Event-based invalidation is the most architecturally clean way to solve the cross-service cache problem. The idea: when data changes, the owning service publishes an event. Any service that caches this data subscribes to the event and invalidates its local cache. The cache becomes an artifact of eventual consistency — it catches up when the event arrives, typically within milliseconds on a well-tuned Kafka or RabbitMQ setup. What can go wrong: events can be lost. If your event bus drops a message (network issue, consumer crash, topic retention expired), you permanently serve stale data. Defense-in-depth matters here: always pair event-based invalidation with a TTL safety net. The TTL catches missed invalidations. Also consider a “cache verification” job that periodically samples cached entries and cross-checks them against the source of truth. Another trap: invalidation cascades. If updating a product invalidates its cache, the category cache, the search results cache, and the homepage featured products cache, a single write now triggers four cache operations. Multiplied across frequent writes, this can overwhelm Redis. Batch invalidations where possible and set up metrics on invalidation rate to catch pathological cases early.- Node.js
- Python
Tag-Based Invalidation
Tag-based invalidation solves a problem that both key-based and pattern-based invalidation struggle with: “invalidate everything related to X.” If you cache products, and each product belongs to a category, and each category belongs to a department, how do you invalidate all products in a department when the department changes? Key-based requires you to know every cached key. Pattern-based requires cooperating naming. Tag-based lets you attach arbitrary labels to cache entries and invalidate all entries with a given label. The mechanism: every cached value stores which tags it belongs to, and every tag stores a reverse index of which keys are tagged with it. Adding a cache entry costs two or three extra Redis operations; invalidating a tag involves reading the tag’s key set and deleting all referenced entries. Memory overhead is real (the tag indexes), so cap the cardinality of tags — do not tag every entry with every conceivable attribute.- Node.js
- Python
Cache Patterns for Microservices
Multi-Level Caching
Multi-level caching (also called tiered caching) recognizes that not all caches are equal. An in-process cache like a Python dict or Node.jsMap has microsecond access times but is per-instance — if you have 10 pods, you have 10 separate caches. A distributed cache like Redis has sub-millisecond access times but is shared across all instances. The database is milliseconds but durable. Each tier serves a different purpose, and combining them gives you the best of all worlds: L1 absorbs the hottest keys at wire speed, L2 handles the long tail across all instances, and the database is the source of truth.
The tricky part is keeping L1 coherent across instances. If Service Instance A invalidates a key in Redis but Service Instance B has that key in its local cache, B will serve stale data until its local TTL expires. The solution is pub/sub: when any instance invalidates a key, it publishes an invalidation message on a Redis channel, and all instances listen and invalidate their L1. This is the pattern below.
Why this matters for microservices: network calls to Redis are still network calls. At high QPS (10,000+ RPS per instance), even 1ms of Redis latency adds up. An L1 cache with 90% hit rate eliminates 90% of those Redis round trips. The cost is coherence complexity, which is manageable if you accept very short L1 TTLs (30-60 seconds) and tolerate minor staleness.
- Node.js
- Python
Request Coalescing (Thundering Herd Prevention)
The thundering herd problem is one of those issues that never shows up in testing but can take down production. Here is the scenario: a popular cache key expires. In the milliseconds before the cache is repopulated, 500 concurrent requests all see a cache miss and all hit the database simultaneously. Your database connection pool is exhausted, queries start timing out, and now you have a cascading failure. Request coalescing ensures that only one of those 500 requests actually fetches from the database — the other 499 wait for that one result. The insight: if 500 requests are asking the database for the same thing at the same time, you only need to ask the database once. Everyone else can share the answer. This is sometimes called “singleflight” (borrowed from Go’sgolang.org/x/sync/singleflight). In Python, the natural primitive is an asyncio.Future or asyncio.Event that later arrivals await on while the first arrival does the actual work.
Why this matters for microservices specifically: thundering herds cascade. If your product service gets hammered, its database gets hammered, which makes its queries slower, which makes the initial requests time out, which causes retries, which amplifies the problem. Request coalescing breaks this cycle at the source — the database sees one query instead of 500, and the 499 other callers experience a slight delay rather than a failure.
- Node.js
- Python
Cache Warming
Cold caches are a silent killer of newly deployed services. Your system tests passed with a warm cache, so latency looks like 5ms. You deploy. For the first few minutes, the cache is empty, every request is a miss, and latency is 100ms. If that first cold-start traffic is heavy, the database gets hammered and cascades into broader failure. Cache warming preempts this: before the service accepts production traffic, pre-load the cache with data you know will be requested. What to warm: the Pareto principle applies. Typically 10% of cache keys serve 80% of traffic. Identify those hot keys (featured products, popular categories, top-viewed items from yesterday’s logs) and pre-populate them. You do not need to warm everything — the long tail can be populated on demand. Some teams go further with “continuous warming”: a background job periodically refreshes the hot keys so they never expire. Cache warming is also part of your deployment strategy. If you use Kubernetes rolling updates, new pods should warm their local cache during their readiness check, before traffic routes to them. This way, the load balancer only sends traffic to a pod after its cache is useful. Otherwise, the first pod to receive traffic after a deploy will have a 100% miss rate and degrade user experience until it warms up naturally.- Node.js
- Python
Cache Failure Handling
When Redis dies, what happens to your service? The honest answer for most teams is “we find out.” Designing for cache failure up front costs a bit more complexity, but it means your service survives cache outages instead of cascading into a full outage. The goal is graceful degradation: when Redis is healthy, serve from cache (fast). When Redis is unhealthy, serve from the database or a local fallback (slower, but still working). Never let a cache outage become an application outage. The resilient cache below uses three techniques working together. First, health monitoring: a background ping tells us when Redis comes and goes. Second, circuit-breaker-style behavior: when Redis is known to be down, do not even try to call it — skip straight to the fallback. Third, a local memory fallback: keep a small in-process cache of recently-seen values, so that during a Redis outage, popular keys continue to be served from local memory while the long tail falls through to the database.- Node.js
- Python
Interview Questions
Q1: What is cache-aside pattern and when to use it?
Q1: What is cache-aside pattern and when to use it?
- Application checks cache first
- On miss, fetch from database
- Store in cache for future requests
- Only cache what’s needed
- Cache failures don’t break reads
- Simple to implement
- First request is slow (cache miss)
- Potential stale data
- Three round trips on miss
Q2: How do you handle cache invalidation in microservices?
Q2: How do you handle cache invalidation in microservices?
- TTL-based: Set expiration, accept staleness
- Event-based: Publish events on writes, subscribers invalidate
- Tag-based: Group related entries, invalidate by tag
- Version-based: Include version in key, change on update
- Publish events when data changes
- Each service invalidates its own cache
- Use short TTLs as safety net
Q3: What is the thundering herd problem and how to prevent it?
Q3: What is the thundering herd problem and how to prevent it?
- Request Coalescing: Single request, others wait
- Probabilistic Early Expiration: Randomly refresh before TTL
- Background Refresh: Refresh in background before expiry
- Locking: Only one process refreshes cache
Q4: When would you use write-through vs write-behind caching?
Q4: When would you use write-through vs write-behind caching?
- Write to cache AND database in same operation
- Strong consistency
- Slower writes (two writes)
- Write to cache first, async to database
- Fast writes
- Risk of data loss if cache fails
- Consistency requirements
- Write frequency
- Tolerance for data loss
Q5: How do you handle cache in a multi-instance deployment?
Q5: How do you handle cache in a multi-instance deployment?
- Each instance has local cache
- Invalidation must reach all instances
- Data inconsistency between instances
-
Distributed Cache Only (Redis)
- No local cache
- Always consistent
- Higher latency
-
Multi-Level Cache + Pub/Sub
- L1: Local (fast)
- L2: Redis (shared)
- Invalidate via pub/sub
-
Short TTL for Local Cache
- Local cache: 60s
- Redis cache: 1 hour
- Accept brief inconsistency
Chapter Summary
- Choose the right caching pattern (cache-aside, write-through, write-behind)
- Implement robust invalidation strategies
- Handle cache failures gracefully with fallbacks
- Use multi-level caching for performance
- Prevent thundering herd with request coalescing
- Warm caches proactively for predictable latency
Interview Deep-Dive
'Your Redis cache cluster goes down during peak traffic. Within 30 seconds, your PostgreSQL database is overwhelmed and the entire system crashes. What happened and how do you prevent it?'
'Your Redis cache cluster goes down during peak traffic. Within 30 seconds, your PostgreSQL database is overwhelmed and the entire system crashes. What happened and how do you prevent it?'
Map with a doubly-linked list for LRU eviction. Libraries like lru-cache in Node.js handle this efficiently. Monitoring is critical: I expose the cache hit rate and size as Prometheus metrics so I can tell whether the local cache is actually helping during Redis outages.'Service A caches user profiles with a 1-hour TTL. Service B can update user profiles. How do you ensure Service A does not serve stale data for up to an hour after a profile update?'
'Service A caches user profiles with a 1-hour TTL. Service B can update user profiles. How do you ensure Service A does not serve stale data for up to an hour after a profile update?'
'Compare cache-aside, write-through, and write-behind patterns. When would you use each in a microservices architecture?'
'Compare cache-aside, write-through, and write-behind patterns. When would you use each in a microservices architecture?'
maxSurge: 1, new pods receive traffic gradually while old pods (with warm caches) still handle most requests. By the time the old pods are terminated, the new pods have warmed up through natural traffic.Interview Questions with Structured Answers
Your cache hit rate drops from 95 percent to 40 percent immediately after a deploy, and latency doubles. Walk me through your debugging process and the first three hypotheses you investigate.
Your cache hit rate drops from 95 percent to 40 percent immediately after a deploy, and latency doubles. Walk me through your debugging process and the first three hypotheses you investigate.
- Stabilize first, debug second. If latency is user-visible and the database is near capacity, roll back the deploy immediately. Debugging a live incident on a cold cache is a bad trade. Only investigate on the running-but-rolled-back state or in a reproduction environment.
- Pull the three obvious signals in parallel. Cache keyspace size (did total keys drop or spike?), cache hit rate broken down by key pattern (is it one pattern or everything?), and database query mix (which queries are hitting the database that were previously cached?). These three answer “what changed” in under five minutes.
- Hypothesis one: cache key format changed. The most common cause. The new deploy includes a refactor that changed how keys are constructed — maybe added a user locale, a feature flag, or a version prefix. Every old cache entry is now orphaned; every new request is a miss. Diagnosis: grep recent PRs for cache key construction, diff the key format with
redis-cli --scan --patternsamples, verify that new keys look different from old keys. - Hypothesis two: TTL was accidentally shortened. Someone changed
300to30in a config or env var. Every entry expires in 30 seconds instead of 5 minutes, so the hit rate craters. Diagnosis: check recent config changes, log cache-set calls with their TTL, compare current TTL distribution to pre-deploy baseline. - Hypothesis three: serialization changed. The deploy moved from JSON to Protobuf, or changed a schema version prefix. New writers encode the new format; old readers cannot deserialize it and treat it as a miss. Diagnosis: look for deserialize errors in logs, check for a recent library upgrade that changed default serialization, verify round-trip for a single known key.
- Hypothesis four (often missed): cache sharding / routing changed. A deploy updated the Redis client library and the consistent-hashing algorithm subtly changed. Keys that used to live on shard A now live on shard B; shard B has empty cache; every lookup misses. Diagnosis: check the Redis cluster slot distribution before and after, verify CRC32 or similar hash output for a sample key.
- Finally, long-tail hypotheses. Eviction policy change (
allkeys-lrutovolatile-lru), memory limit lowered so entries get evicted aggressively, new traffic pattern (a new feature generates high-cardinality keys that blow out the cache), or a bug where the service is writing but never reading (cache key used on read path differs from write path).
- “I would increase the cache size to compensate.” This misdiagnoses the symptom as the cause. If keys are being generated differently or serialized incorrectly, a bigger cache does not help — you will fill it with the wrong things. Scaling up before diagnosing wastes money and delays the real fix.
- “I would lower the TTL to refresh everything sooner.” Lowering the TTL makes the problem worse. The hit rate is low because entries are missing or malformed; refreshing more often means more database load, not less. The correct instinct under a hit-rate crash is to stabilize and diagnose, not tune.
- “Scaling Memcache at Facebook” (Nishtala et al., NSDI 2013) — foundational paper on operating cache at scale and the failure modes to expect.
- “On Consistent Hashing and Random Trees” (Karger et al., 1997) — the algorithmic underpinnings of why a client-library upgrade can silently break your cache topology.
- Netflix Tech Blog posts on EVCache — practical coverage of cross-region cache consistency and the debugging stories behind their current architecture.
You have a product page cached for 5 minutes. Two users A and B view it concurrently at the moment the cache expires. What can go wrong, and how do you prevent every failure mode?
You have a product page cached for 5 minutes. Two users A and B view it concurrently at the moment the cache expires. What can go wrong, and how do you prevent every failure mode?
- Name the failure modes up front. At the moment of TTL expiration, multiple concurrent misses create three classical bugs: thundering herd on the source database, cache stampede compounding under load, and (if writes are involved) a race where one user’s write overwrites another user’s fresh cache entry.
- Address thundering herd with request coalescing. Implement single-flight. When N concurrent requests miss for the same key, only one queries the database. Others wait on a shared promise or channel. N becomes 1 regardless of concurrency level. Libraries: Go
singleflight.Group, Nodepromise-memoize, Pythonaiocachewith lock parameter. - Layer on probabilistic early expiration. Before natural TTL, start refreshing probabilistically (the XFetch algorithm). A small random fraction of requests triggers an async refresh while still serving the current cached value. By the time natural expiration hits, the value has usually been refreshed already.
- Address the concurrent-write race with cache versioning. If the page content can be updated between A’s miss and B’s miss, A’s slow computation could overwrite B’s fresh value. Attach a version (ETag, monotonic counter) to each cache write. Use
SET key value EX ttl NX(Redis) or a compare-and-swap operation to reject writes that arrive after a newer version. - Address cache poisoning. If the source returns a 500 error or malformed response, do not cache it. Validate the response shape before caching. For transient errors, consider a short negative-cache TTL (seconds, not minutes) so a broken backend does not poison reads for the whole TTL window.
- Plan the failure mode. What happens if the source is slow (10 seconds instead of 50 ms)? The single-flight request blocks; all other waiters block with it. Add a timeout on the source call and a fallback: if the source is slow, serve the stale cache entry past its TTL (
stale-while-revalidate) while logging the backend degradation.
stale-while-revalidate and origin-shielding behavior for exactly this case. When a popular URL expires at the edge, Fastly’s shield PoP coalesces origin requests across all edge PoPs so the origin sees exactly one request regardless of global concurrency. This is the same pattern applied at the CDN layer. Redis itself, at the application cache layer, does not do this for you — you have to implement coalescing in the client.Senior Follow-up Questions- “I would use a shorter TTL so the window of inconsistency is smaller.” Shortening the TTL increases the frequency of cache misses and therefore increases the thundering herd problem. The issue is not the TTL length; it is the concurrent behavior at expiration. Coalescing and early expiration address the real bug.
- “I would lock the key in Redis for the duration of the backend fetch.” Distributed locking has its own failure modes (lock holder dies, TTL on lock expires mid-fetch, network partitions). In-process single-flight (coalescing within one service instance) is usually enough, and it does not introduce a new distributed correctness problem. Redlock is rarely justified for cache-fill races.
- “Optimal Probabilistic Cache Stampede Prevention” (Vattani, Chierichetti, Lowenstein) — the foundational XFetch paper.
- Fastly’s documentation on
stale-while-revalidateand origin shielding — a worked-example of stampede prevention at the CDN layer. - “Caches: the price of admission” on martinfowler.com — broader treatment of caching trade-offs with stampede prevention as one of several concerns.
Your recommendation service caches the top-10 items per user. One day you notice users are occasionally seeing recommendations that were computed for a different user. How did this happen, and how do you fix it?
Your recommendation service caches the top-10 items per user. One day you notice users are occasionally seeing recommendations that were computed for a different user. How did this happen, and how do you fix it?
- Name this clearly: personalization bleed / cache key collision. Two logically different values ended up under the same cache key, and one user’s request returned another user’s data. This is simultaneously a correctness bug, a privacy issue (if recommendations reveal anything about the other user), and potentially a regulated incident under GDPR or CCPA.
- Escalate appropriately. If any PII or sensitive behavioral data is being leaked across users, trigger the security incident process. Even “innocent” recommendation data can encode protected categories (religion, health) via what the user has viewed. Do not debug silently; inform the security and privacy teams.
- Hypothesize the root cause. Most likely: the cache key did not include the user_id, or included only a hashed session-scoped value that collided. Possible variations: the key used a tenant id and forgot the user, or used a user id that happens to reset on logout and be reused.
- Find the immediate leak. Flip a feature flag to bypass the cache and serve all recommendations directly from the source. Latency will increase; correctness is restored. Do this before continuing diagnosis. Never continue to serve wrong data while you debug.
- Audit the cache contents. Scan a sample of entries. For each, compare “who this was computed for” (from the original request context, traceable via logs) versus “who this is keyed under” (the cache key). Any mismatch is an instance of the bug.
- Fix the cache key permanently. The cache key must include every dimension of variation. For personalized data, that always includes the user identifier (ideally a stable, server-issued user id, not a session id). Consider adding a namespace prefix per service and a schema version, so accidental collisions across services or across key-schema changes are impossible.
- Add guardrails. Validate cache responses against the request context at read time: the cached payload includes a “for_user_id” field, and the read path rejects it if it does not match the requesting user. This is belt-and-suspenders defense against future cache-key regressions.
- Close out with a postmortem. Publish what happened, which users were affected, what data was exposed, what the fix is, and what detection (canary, monitoring, contract test) will prevent a recurrence.
cache.get("recs_" + userId), have a RecsCacheKey(userId, locale, featureFlagVersion) class with a single construction path. Every field is required; the compiler or type system enforces it. Then write a contract test that shows “different users produce different keys” and run it on every build. Combine this with cache-response validation (the payload carries the for_user_id, rejected if mismatched). Together, these make personalization bleed structurally impossible rather than just unlikely.- “I would just add the user id to the key and deploy the fix.” This fixes the immediate bug but does not address the systemic weakness. There are many other “dimensions of variation” (locale, tenant, A/B test arm) that could produce the same class of bug tomorrow. A senior answer recognizes this as a pattern, not an instance, and proposes structural fixes.
- “I would flush the cache and it will be fine.” Flushing removes the currently-poisoned entries but the key-construction bug is still in the code; new requests will repopulate the cache with the same collision. Flushing is hygiene, not a fix.
- “You Cannot Trust the Browser” patterns and CDN cache-control best practices (documented by Fastly, Cloudflare, Akamai) — personalization-bleed is most dangerous at CDN layers where cache keys are URL-based.
- “GDPR Article 33: Notification of a personal data breach to the supervisory authority” — the specific legal timeline that scopes how fast you must respond.
- OWASP Web Security Testing Guide, section on cache poisoning and cache deception — adversarial framings of this same class of bug.