Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Design LRU Cache
1. Requirements
Functional Requirements
| Operation | Description | Time Complexity |
|---|---|---|
get(key) | Return value if key exists, else -1 | O(1) |
put(key, value) | Insert or update key-value pair | O(1) |
| Eviction | Remove least recently used item when capacity exceeded | O(1) |
Constraints
- Fixed capacity (specified at creation)
- Thread-safe (for production use)
- Generic key and value types
2. The Problem with Simple Approaches
Approach 1: Array/List
Approach 2: HashMap Only
The Solution: HashMap + Doubly Linked List
3. Data Structure Design
- Head: Most recently used (MRU)
- Tail: Least recently used (LRU) - eviction candidate
- HashMap: O(1) access to any node
4. Implementation
4.1 Node Class
4.2 Doubly Linked List
4.3 LRU Cache
5. Thread-Safe Version
For production use, we need thread safety:6. LeetCode Style Implementation
For interviews, a more compact version:7. Python’s Built-in OrderedDict
Python’sOrderedDict can simplify LRU cache implementation:
8. Usage Example
9. Complexity Analysis
| Operation | Time | Space |
|---|---|---|
get(key) | O(1) | - |
put(key, value) | O(1) | - |
delete(key) | O(1) | - |
| Overall Space | - | O(capacity) |
- HashMap: O(1) lookup to find node
- DLL add/remove: O(1) with direct node reference
- No iteration needed for eviction
10. Variations
10.1 LRU with TTL (Time-To-Live)
10.2 LFU Cache (Least Frequently Used)
11. Class Diagram
12. Key Takeaways
| Concept | Details |
|---|---|
| Data Structure | HashMap + Doubly Linked List |
| Time Complexity | O(1) for all operations |
| Space Complexity | O(capacity) |
| Key Insight | HashMap for lookup, DLL for ordering |
| Eviction | Remove from tail (LRU position) |
| Access | Move to head (MRU position) |
13. Interview Tips
Why doubly linked list instead of singly?
Why doubly linked list instead of singly?
- Singly linked list: O(n) to remove a node because you need to find the previous node to update its
nextpointer, and a singly linked list only has forward pointers - Doubly linked list: O(1) to remove because you have direct access to both
prevandnext, so you can splice the node out in constant time - This is a great example of choosing data structures based on operation requirements. When an interviewer asks “why doubly linked?” the answer is: “because our eviction policy requires O(1) removal from an arbitrary position, and only a doubly linked list provides that.”
Why store key in the Node?
Why store key in the Node?
What about cache invalidation?
What about cache invalidation?
- Add TTL per entry
- Lazy invalidation: check TTL on access
- Active invalidation: background thread cleans expired entries
Interview Deep-Dive Questions
Q1: Why does an LRU cache require both a HashMap and a doubly linked list? Could you achieve O(1) with just one of them?
Q1: Why does an LRU cache require both a HashMap and a doubly linked list? Could you achieve O(1) with just one of them?
- The HashMap alone gives us O(1) key lookups but has no concept of ordering — you cannot efficiently determine which entry was accessed least recently. You would need to scan every entry and compare timestamps, making eviction O(n).
- The doubly linked list alone gives us O(1) insertion at the head, O(1) removal from the tail for eviction, and O(1) node relocation if you already have a pointer to the node. But finding a node by key in a linked list is O(n) because you must traverse.
- The combination is what makes every operation O(1): the HashMap maps keys directly to node references in the DLL, so
getis a HashMap lookup (O(1)) followed by amove_to_fronton the DLL (O(1) since we have the node pointer). Eviction is removingtail.prevfrom the DLL (O(1)) and deleting the corresponding HashMap entry using the key stored in the evicted node (O(1)). - The reason the list must be doubly linked and not singly linked: when we remove a node from the middle of the list, we need to update
node.prev.nextandnode.next.prev. With a singly linked list, we do not havenode.prev, so removal requires O(n) traversal to find the predecessor. - A subtle but important detail: the Node must store the key (not just the value) because during eviction from the tail, we need to know which key to delete from the HashMap. Without the key in the node, we would need an O(n) reverse lookup.
- What would happen if you used a singly linked list with a
prevpointer hack (storing the previous node’s reference in the HashMap alongside the node itself)? What are the trade-offs? - Python’s
OrderedDictuses a different internal implementation than a raw HashMap + DLL. How doesOrderedDictachievemove_to_endin O(1), and why might an interviewer still ask you to implement the DLL version manually?
Q2: Walk me through what happens internally when you call `put` on a full LRU cache with a brand new key. Every pointer change, every HashMap operation.
Q2: Walk me through what happens internally when you call `put` on a full LRU cache with a brand new key. Every pointer change, every HashMap operation.
- Starting state: cache is at capacity, say 3 entries. The DLL looks like
HEAD <-> A <-> B <-> C <-> TAILand the HashMap has{keyA: nodeA, keyB: nodeB, keyC: nodeC}. - Step 1 — Eviction: We remove the node at
tail.prev, which is node C. This means: setC.prev.next = C.next(which isB.next = TAIL) andC.next.prev = C.prev(which isTAIL.prev = B). Then null out C’s pointers. Decrement the DLL size. Then useC.keyto dodel cache[keyC]from the HashMap. This is why the key is stored in the node. - Step 2 — Create new node: Allocate
Node(newKey, newValue). - Step 3 — Add to front: Set
node.prev = HEAD,node.next = HEAD.next(which is A). Then updateHEAD.next.prev = node(soA.prev = node) andHEAD.next = node. The DLL is nowHEAD <-> new <-> A <-> B <-> TAIL. Increment size. - Step 4 — Update HashMap:
cache[newKey] = node. - Total: 4 pointer updates for eviction, 4 pointer updates for insertion, 1 HashMap delete, 1 HashMap insert. All O(1). No iteration, no searching.
- The dummy head and tail nodes are critical here — they eliminate edge cases. Without them, adding to an empty list or removing the only node requires special-case code for null checks.
- What happens if
putis called with a key that already exists in the cache? How does the operation differ? - What would go wrong if you forgot to null out the evicted node’s
prevandnextpointers? Could this cause a memory leak or a subtle bug?
Q3: The thread-safe version uses a single RLock around every operation. What are the performance problems with this approach, and how would you improve concurrency?
Q3: The thread-safe version uses a single RLock around every operation. What are the performance problems with this approach, and how would you improve concurrency?
- A single global lock serializes all operations. If you have 100 threads doing
getcalls, only one can proceed at a time — even though reads that do not modify the same node could theoretically run concurrently. Under high throughput (say 500K ops/sec on a web server’s local cache), this becomes a bottleneck. - Improvement 1 — Read-write lock (
ReadWriteLock): Allow multiple concurrent readers, exclusive access for writers. But for LRU cache this is tricky because evengetis a write operation (it moves the node to the front). So a naive RWLock does not help much. - Improvement 2 — Sharded/segmented locks: Split the cache into N segments (like Java’s
ConcurrentHashMappre-Java 8). Each segment has its own lock.key.hash() % Ndetermines the segment. This reduces contention by a factor of N. Trade-off: eviction becomes more complex since you need a global LRU ordering across segments, or you accept per-segment LRU (which is an approximation). - Improvement 3 — Lock-free approaches: Use a
ConcurrentHashMapfor lookups and a probabilistic/approximate LRU policy. For example, Redis does not maintain a true LRU linked list. It samples K random keys and evicts the one with the oldest access time. This is O(K) per eviction but avoids all linked list contention. In benchmarks, K=10 gives an eviction accuracy very close to true LRU. - Improvement 4 — Batched reordering: Instead of moving to front on every
get, buffer the “recently accessed” keys in a lock-free queue and periodically flush them to update the DLL order in a single batch under one lock acquisition. Caffeine (Java’s high-performance cache) uses this approach with a ring buffer. - The choice of
RLock(reentrant) vsLockmatters:RLockallows the same thread to acquire the lock multiple times (useful ifputinternally callsget), but it has slightly higher overhead than a plainLockdue to tracking the owning thread and recursion count.
get is a read operation” — this misses that get mutates the DLL.Follow-ups:- In the sharded lock approach, how would you handle the case where you need to evict the globally least-recently-used entry, not just the least-recently-used within a single shard?
- How does Redis actually implement its LRU eviction? Why did the Redis team choose approximate LRU over true LRU?
Q4: Compare LRU, LFU, and ARC eviction policies. When would you pick each one in a production system?
Q4: Compare LRU, LFU, and ARC eviction policies. When would you pick each one in a production system?
- LRU (Least Recently Used): Evicts the entry that has not been accessed for the longest time. Best when recent access is a good predictor of future access (temporal locality). Simple to implement. Weakness: a single full-table scan (e.g., a batch analytics query reading every row) can flush the entire hot working set from the cache. This is called “cache pollution.”
- LFU (Least Frequently Used): Evicts the entry with the lowest access count. Better when some items are consistently popular over time (frequency locality). Used by CDNs for static assets — a viral video accessed 10M times should not be evicted just because it was not accessed in the last 5 seconds. Weakness: “frequency aging” — an item that was popular yesterday but irrelevant today can linger because its count is high. You need a decay mechanism (e.g., halve all counts periodically, or use a time-windowed frequency).
- ARC (Adaptive Replacement Cache): Maintains two LRU lists — one for entries accessed once (recency) and one for entries accessed multiple times (frequency). Dynamically adjusts the size ratio between them based on which list is producing more hits. Used by ZFS and IBM DB2. It adapts to workload changes without manual tuning. Downside: more memory overhead (tracks “ghost entries” for recently evicted items) and more complex implementation.
- W-TinyLFU (used by Caffeine, the gold standard Java cache): Uses a small LRU “window” admission filter + a segmented LRU main space, with a frequency sketch (Count-Min Sketch) to decide whether a new entry should replace an existing one. Near-optimal hit rates across diverse workloads. This is what you would use in a modern JVM application.
- Rule of thumb: LRU for general-purpose caching (web sessions, database query results). LFU for CDN and content caching. ARC or W-TinyLFU when you need best-in-class hit rates and can afford the implementation complexity.
- How would you implement a frequency decay mechanism for LFU so that stale-but-once-popular items eventually get evicted?
- You are building a cache for a social media feed service where some posts go viral for 2 hours then drop off completely. Which eviction policy would you choose, and why?
Q5: How would you extend a single-node LRU cache into a distributed cache system? What are the key design decisions?
Q5: How would you extend a single-node LRU cache into a distributed cache system? What are the key design decisions?
- Moving from single-node to distributed introduces several fundamental challenges: data partitioning, consistency, replication, and cache coherence.
- Partitioning (sharding): Use consistent hashing to distribute keys across N cache nodes. Each key hashes to a position on the ring, and the first node clockwise from that position owns the key. When a node is added or removed, only ~1/N of keys need to be remapped instead of all of them (which would happen with simple
hash(key) % N). Virtual nodes (each physical node gets 100-200 positions on the ring) improve balance. - Replication: For availability, replicate each key to the next R nodes on the hash ring. Decide on consistency model: strong consistency (all replicas must agree before returning — slow but safe) vs eventual consistency (write to primary, async replicate — fast but stale reads possible). Most production caches (Memcached, Redis) choose eventual consistency or single-leader replication because cache misses can always fall through to the source of truth.
- Cache coherence: When data is updated in the source of truth, how do you invalidate/update the cache? Options: (a) write-through — update cache and DB together, (b) write-behind — update cache immediately, batch-write to DB later (risk of data loss), (c) cache-aside — application manages cache manually (read: check cache, miss -> query DB -> populate cache; write: update DB, invalidate cache). Cache-aside is the most common pattern because it is simple and the cache layer does not need to know about the DB.
- Hot key problem: Some keys (e.g., a celebrity’s profile, a trending product) receive disproportionate traffic. If one node owns that key, it becomes a hotspot. Solutions: local caching (each app server caches hot keys locally with short TTL), key replication (replicate hot keys to multiple nodes), or key splitting (split
celebrity:123intocelebrity:123:shard0throughcelebrity:123:shard9). - Failure handling: What happens when a cache node dies? With consistent hashing, its keys redistribute to the next node. If using replication, the replica takes over. Without replication, those keys experience cache misses and the DB sees a “thundering herd” of requests. Mitigation: request coalescing (only one request per key goes to DB, others wait), probabilistic early expiration (each client independently decides to refresh a key slightly before TTL expires, spreading the load).
- Explain the thundering herd problem in detail. You have a cache key that 10,000 requests per second depend on, and it just expired. What happens, and how do you prevent it?
- Compare Redis Cluster’s approach to sharding with Memcached’s client-side consistent hashing. What are the trade-offs of server-side vs client-side partitioning?
Q6: In the implementation, dummy head and tail nodes are used. Why? What bugs would occur without them?
Q6: In the implementation, dummy head and tail nodes are used. Why? What bugs would occur without them?
- Dummy (sentinel) nodes eliminate all null-check edge cases in linked list operations. Without them, every operation (
add_to_front,remove,remove_last) needs special-case code for: (a) empty list, (b) single-element list, (c) removing the head, (d) removing the tail. - For example,
add_to_frontwithout sentinels: if the list is empty, you must set bothhead = nodeandtail = node. If the list is not empty, you setnode.next = head,head.prev = node,head = node. That is two code paths. With sentinels, it is always the same four pointer assignments regardless of list state. removewithout sentinels: if the node is the head, you must updatehead = node.next. If the node is the tail, updatetail = node.prev. If the node is in the middle, updatenode.prev.nextandnode.next.prev. With sentinels, it is always justnode.prev.next = node.nextandnode.next.prev = node.prev— the sentinels absorb the edge cases.- The bugs that occur without sentinels are classic:
NullPointerException(orAttributeErrorin Python) when removing the last element and then trying to accesshead.next, or silently corrupting the list by not updatingtailwhen removing the tail node. These bugs are insidious because they only trigger in edge cases (empty list, single-element list) that are easy to miss in testing. - Sentinels cost two extra node allocations (trivial memory) but eliminate an entire class of bugs. In a production cache handling millions of operations, those edge-case bugs would eventually surface as data corruption or crashes.
- If you were implementing this in a language without garbage collection (C/C++), what additional concerns would the sentinel nodes introduce around memory management?
- Can you think of other data structures that use sentinel or dummy nodes for the same reason? How does this pattern generalize?
Q7: You deploy your LRU cache in production and notice the hit rate is only 40% when you expected 85%. How do you diagnose and fix this?
Q7: You deploy your LRU cache in production and notice the hit rate is only 40% when you expected 85%. How do you diagnose and fix this?
- Step 1 — Instrument the cache: Add metrics for hit rate, miss rate, eviction rate, and current size. If the eviction rate is very high (close to the request rate), the cache is too small for the working set. The fix is straightforward: increase capacity.
- Step 2 — Analyze the access pattern: Log a sample of keys being accessed. Look at the key distribution. If it follows a Zipfian distribution (a small number of keys are accessed very frequently), LRU should work well. If the distribution is nearly uniform (many keys each accessed rarely), no cache will help much — the working set is larger than the cache.
- Step 3 — Check for cache pollution: Look for periodic full scans or batch operations that access many keys once and then never again. For example, a nightly analytics job reading every user record would flush the hot working set. Solutions: use a separate cache for batch workloads, or switch to an eviction policy that resists scan pollution (like LRU-K, 2Q, or ARC which require an item to be accessed K times before it enters the main cache).
- Step 4 — Check key cardinality: If the key space is too large relative to cache size (e.g., caching per-user data for 10M users in a cache that holds 100K entries), the hit rate will be low because the “long tail” of users constantly evicts each other. Solutions: cache only for active users (determined by a Bloom filter or activity window), or segment users by activity tier.
- Step 5 — Check for TTL issues: If entries have short TTLs, they may be expiring before they are re-accessed. Analyze the gap between consecutive accesses to the same key and adjust TTL accordingly. A TTL that is shorter than the median inter-access time for popular keys will destroy hit rate.
- Step 6 — Check for key fragmentation: If the same logical data is cached under multiple keys (e.g.,
user:123andUSER:123anduser:123:v2), you are wasting capacity on duplicates. Normalize keys.
- How would you implement a hit-rate monitoring system for the cache that has minimal performance overhead? What metrics would you track and alert on?
- You discover the low hit rate is caused by a batch job running every hour that scans 500K keys. The cache capacity is 100K. How do you solve this without increasing cache size?
Q8: How does Python's `OrderedDict` work internally, and why is it not the same as your HashMap + DLL implementation even though it achieves the same interface?
Q8: How does Python's `OrderedDict` work internally, and why is it not the same as your HashMap + DLL implementation even though it achieves the same interface?
- Python’s
OrderedDictis implemented as a regular dict (hash table) combined with a doubly linked list that tracks insertion order. In CPython, since Python 3.7, even regulardictmaintains insertion order, butOrderedDictadditionally providesmove_to_end()andpopitem(last=True/False)for O(1) reordering, which regulardictdoes not. - Internally,
OrderedDictuses a circular doubly linked list. Each entry in the dict has a corresponding link node. Themove_to_end(key)operation does exactly what ourmove_to_frontdoes: unlinks the node from its current position and relinks it at the end (or beginning iflast=False). - So at the data structure level,
OrderedDictis essentially HashMap + DLL — the same combination. The difference is abstraction level and purpose. When you useOrderedDictin an interview, you are using a library that hides the data structure. The interviewer wants to see that you understand why this combination works, how the pointers are manipulated, and what the complexity is. UsingOrderedDictproves you know the Python standard library; implementing from scratch proves you understand the data structure. - There are also subtle behavioral differences:
OrderedDictequality comparison considers order (OrderedDict(a=1, b=2) != OrderedDict(b=2, a=1)), while regulardictsince 3.7 does not.OrderedDictalso has higher memory overhead than a regulardictbecause of the additional linked list nodes. - In production Python code, you would actually use
functools.lru_cache(for function memoization) or a library likecachetoolsrather than rolling your own.functools.lru_cacheuses a circular doubly linked list internally — same principle, battle-tested implementation.
OrderedDict is just a dict that remembers order, so you can use it as an LRU cache directly.” This skips the how and why. It also misses that OrderedDict without the move_to_end calls is insertion-ordered, not access-ordered — you must explicitly call move_to_end on every get to make it behave as LRU.Follow-ups:- What is the difference between
functools.lru_cacheand a manually implemented LRU cache? When would you use each? - If you were implementing an LRU cache in a language without an
OrderedDictequivalent (say, C++), what standard library data structures would you use, and what is the equivalent combination?