Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
What is HashMap?
HashMap (or HashTable/Dictionary) provides O(1) average-case lookup, insertion, and deletion by mapping keys to values using a hash function. Think of a library where every book has a unique call number. Instead of scanning every shelf (O(n)), you look up the call number in the catalog, which tells you the exact shelf and position (O(1)). A hash function is that catalog — it converts a key into an index where the value is stored. Under the hood, a HashMap maintains an array of “buckets.” The hash function maps each key to a bucket index. When two keys hash to the same bucket (a collision), the bucket stores both entries in a linked list or a balanced tree (Java 8+ uses a tree after 8 collisions). This is why average-case is O(1) but worst-case is O(n) — if every key collides into one bucket, lookup degrades to scanning a list. Why HashMap is so important for interviews: It is the single most versatile tool for trading space for time. An O(n^2) brute-force solution that uses a nested loop can almost always be reduced to O(n) by replacing the inner loop with a HashMap lookup. Two Sum is the canonical example.Pattern Recognition Cheatsheet
Reusable Template Code
The 90 percent of HashMap problems you will see in interviews collapse into one of three skeletons. Memorize these and you will write them in your sleep.Pattern Recognition Checklist
Use HashMap When
- Need O(1) lookup by key
- Counting frequencies of elements
- Finding pairs/complements (like Two Sum)
- Grouping elements by property
- Caching/Memoization
Don't Use When
- Need sorted order (use TreeMap)
- Range queries required
- Space is very constrained
- Keys are unhashable (like lists)
When to Use
Frequency Counting
Two Sum Variants
Caching/Memoization
Grouping/Indexing
Pattern Variations
1. Two Sum (Classic HashMap)
The problem that launched a thousand interviews. The brute-force approach checks every pair in O(n^2). The HashMap solution turns this into O(n) by storing each number’s index as you scan. For each new number, you compute its complement (target - num) and check if the complement is already in the map.
Why this pattern is so powerful: It replaces the inner loop of “search for the matching element” with an O(1) lookup. This same trick applies to Three Sum (fix one, Two Sum the rest), Four Sum, and many subarray problems.
Time: O(n). Space: O(n).
Edge case: What if the same element is used twice? The code handles this correctly because we check the map before inserting the current element, so [3, 3] with target 6 works — we find 3 in the map from index 0 when processing index 1.
2. Frequency Counter
Counting element occurrences is the second most common HashMap pattern (after Two Sum). It appears in Top K Frequent, Valid Anagram, First Unique Character, Majority Element, and dozens more. The pattern: Build a frequency map in O(n), then query it in O(1) per lookup. Python’sCounter and Java’s getOrDefault make this concise.
Anagram check insight: Two strings are anagrams if and only if they have the same character frequencies. Instead of sorting both strings (O(n log n)), compare their frequency maps (O(n)).
Time: O(n). Space: O(k) where k is the number of distinct elements (at most 26 for lowercase letters).
3. Group By (Anagrams)
Group elements by a computed key. For anagrams, the key is a canonical form that is identical for all anagrams of the same word. Two approaches: (1) sort the characters (simple, O(k log k) per word), or (2) count characters into a tuple (O(k) per word, faster for long strings). Example: “eat”, “tea”, “ate” all sort to “aet”, so they share the same key and land in the same bucket. Time: O(n * k log k) where n is the number of strings and k is the maximum string length. Space: O(n * k) for storing all strings in the map.4. Prefix Sum with HashMap
This is one of the most important combined patterns in interviews. The core idea:sum(nums[i..j]) = prefix[j] - prefix[i-1]. So if you want subarrays with sum exactly k, you need prefix[j] - prefix[i-1] = k, which means prefix[i-1] = prefix[j] - k. Use a HashMap to count how many times each prefix sum has occurred.
Why {0: 1} in the initial map? This handles subarrays starting at index 0. If the prefix sum at some point equals exactly k, then prefix_sum - k = 0, and we need to know that “a prefix sum of 0 occurred once” (representing the empty prefix before the array starts).
Walked example: nums = [1, 2, 3], k = 3. Prefix sums: 1, 3, 6. At prefix_sum=3: 3 - 3 = 0 is in the map (count 1), so one subarray sums to 3 ([1,2]). At prefix_sum=6: 6 - 3 = 3 is in the map (count 1), so another subarray ([3]). Total = 2.
Time: O(n). Space: O(n).
5. Longest Consecutive Sequence
A beautiful problem that looks like it requires sorting (O(n log n)) but can be solved in O(n) with a HashSet. The trick: for each number, check if it is the start of a sequence (i.e.,num - 1 is NOT in the set). If it is a start, count upward as far as possible. If it is not a start, skip it — it will be counted when we process the actual start.
Why this is O(n) and not O(n^2): Each number is visited at most twice — once when we iterate through the set, and at most once when we extend a sequence from its start. Numbers that are not sequence starts are skipped immediately.
Common interview mistake: Starting a count from every number (not just sequence starts). This turns the solution into O(n^2) because sequences are re-counted from every element.
Time: O(n). Space: O(n).
6. LRU Cache (HashMap + Doubly Linked List)
Classic Problems
1. Two Sum
1. Two Sum
2. Valid Anagram
2. Valid Anagram
3. Contains Duplicate
3. Contains Duplicate
4. First Unique Character
4. First Unique Character
5. Subarray Sum Equals K
5. Subarray Sum Equals K
Worked LeetCode Problems: Brute Force vs Optimized
The single biggest mistake candidates make is jumping to the optimized HashMap solution without first naming the brute-force baseline. Interviewers want to see the trade-off explicitly: “brute force is O(n^2), HashMap trades O(n) space for O(n) time.” Walk through these four canonical problems and you will have a template for every HashMap question.LC 1: Two Sum (Easy)
Problem. Givennums and target, return indices of the two numbers such that they add up to target. Each input has exactly one solution; you may not use the same element twice.
Brute force (O(n^2) time, O(1) space). Two nested loops. Concede this version, then immediately pivot.
value to index and on each new number ask: “have I seen the complement already?” If yes, return both indices. The store-then-check ordering naturally rejects using the same element twice.
LC 49: Group Anagrams (Medium)
Problem. Given an array of strings, group anagrams together. Brute force (O(n^2 * k)). Compare every pair of strings using a sort-and-compare. Painfully slow at scale. Optimized — sorted key (O(n * k log k)). For each string, sort its characters to get a canonical form. Use that as the HashMap key.LC 242: Valid Anagram (Easy)
Problem. Given two stringss and t, return True if t is an anagram of s.
Brute force (O(n log n)). Sort both, compare. Acceptable answer, but you can do better.
Optimized counting (O(n) time, O(1) space for fixed alphabet). Use a single 26-element array. Increment for chars in s, decrement for chars in t. If any cell is non-zero at the end, not an anagram.
Counter(s) == Counter(t) — still O(n), now general-purpose.
LC 128: Longest Consecutive Sequence (Medium)
Problem. Given an unsortednums, return the length of the longest consecutive elements sequence. Required: O(n) time.
Brute force (O(n log n)). Sort, then scan. Easy but violates the time requirement.
Optimized (O(n)). Convert to set. For each number, only start a count if num - 1 is NOT in the set (this guards against re-counting the same sequence). Then walk forward as far as the set allows.
n - 1 not in num_set guard, you re-walk the same streak from every element and the solution silently becomes O(n^2). This is the single most-asked follow-up: “why is this O(n) and not O(n^2)?”
LC 560: Subarray Sum Equals K (Medium)
Problem. Count the number of contiguous subarrays whose sum equalsk.
Brute force (O(n^2)). Nested loop summing every subarray. Works, but cannot pass large inputs.
Optimized prefix-sum plus map (O(n) time, O(n) space). Track running prefix sum. For each position, count how many earlier prefix sums equal prefix - k — each such match defines a subarray ending at the current index summing to k.
{0: 1} interview trap. Initialize with {0: 1} so that whenever the prefix itself equals k, the lookup prefix - k = 0 finds an “empty prefix” and counts it. Forget this and you under-count by missing every subarray that starts at index 0.
Common Mistakes
Curated LeetCode Practice List
The classic HashMap problem set, grouped by difficulty and annotated with which sub-pattern each one teaches. Solve them in order within each tier; later problems build on earlier insights.Easy
| # | Problem | Pattern Variant |
|---|---|---|
| 1 | Two Sum | Complement lookup — the canonical |
| 217 | Contains Duplicate | HashSet for “have I seen this?“ |
| 242 | Valid Anagram | Frequency-count comparison |
| 387 | First Unique Character | Two-pass count then scan |
| 383 | Ransom Note | Subset frequency check |
| 219 | Contains Duplicate II | Map with index for k-distance check |
| 290 | Word Pattern | Two-map bijection |
Medium
| # | Problem | Pattern Variant |
|---|---|---|
| 49 | Group Anagrams | Canonical-key grouping |
| 560 | Subarray Sum Equals K | Prefix sum plus map |
| 128 | Longest Consecutive Sequence | Set with sequence-head guard |
| 146 | LRU Cache | HashMap plus doubly linked list |
| 347 | Top K Frequent Elements | Counter plus heap or bucket sort |
| 3 | Longest Substring Without Repeating | Sliding window plus map |
| 454 | 4Sum II | Two-map split for n^2 instead of n^4 |
| 525 | Contiguous Array | Prefix-sum trick (treat 0 as -1) |
Hard
| # | Problem | Pattern Variant |
|---|---|---|
| 76 | Minimum Window Substring | Sliding window plus need-have count |
| 460 | LFU Cache | Three coordinated maps |
| 30 | Substring with Concat of All Words | Multi-map sliding window |
| 41 | First Missing Positive | In-place hashing via index marking |
| 432 | All O(1) Data Structure | HashMap plus doubly linked list of buckets |
Complexity Quick Reference
| Operation | Average | Worst Case | Note |
|---|---|---|---|
| Insert | O(1) | O(n) | Worst case with bad hash |
| Lookup | O(1) | O(n) | Worst case with collisions |
| Delete | O(1) | O(n) | Worst case with chaining |
| Iterate | O(n) | O(n) | All keys/values |
Interview Problems by Company
- Easy
- Medium
- Hard
| Problem | Company | Key Concept |
|---|---|---|
| Two Sum | All FAANG | Complement lookup |
| Valid Anagram | Amazon, Meta | Character frequency |
| Contains Duplicate | All | HashSet for seen |
| First Unique Char | Amazon | Frequency count |
| Ransom Note | Microsoft | Character counting |
Interview Tips
How to Explain Your Approach
How to Explain Your Approach
- “I’ll use a HashMap to get O(1) lookup time.”
- “I’ll store [key] mapping to [value].”
- “This trades O(n) space for O(1) lookup time.”
- “For each element, I check if [complement/target] exists.”
- “Total time is O(n) with O(n) space.”
When Interviewer Says...
When Interviewer Says...
| Interviewer Says | You Should Think |
|---|---|
| ”Can you do it in O(n)?” | HashMap to avoid nested loop |
| ”Count occurrences” | Frequency counter |
| ”Find pair that sums to…” | Store complements |
| ”Group by property” | Key = property, value = list |
| ”Check if seen before” | HashSet |
HashMap vs HashSet
HashMap vs HashSet
- Two Sum (value → index)
- Frequency counting (element → count)
- Contains Duplicate (seen set)
- Unique elements (deduplication)
Practice Problems
| Problem | Difficulty | Link |
|---|---|---|
| Two Sum | Easy | LeetCode 1 |
| Valid Anagram | Easy | LeetCode 242 |
| Group Anagrams | Medium | LeetCode 49 |
| Subarray Sum Equals K | Medium | LeetCode 560 |
| LRU Cache | Medium | LeetCode 146 |
Practice Roadmap
Day 2: Frequency Counting
- Solve: Valid Anagram, First Unique Character
- Focus: Counter/frequency map usage
Interview Questions
Deep-dive questions that test real understanding of the HashMap pattern — how hashing works under the hood, when it breaks, and why you reach for it over other structures. Expect interviewers to push past your initial answer.Q1: Walk me through how a HashMap achieves O(1) lookup. What actually happens when you do map[key]?
Q1: Walk me through how a HashMap achieves O(1) lookup. What actually happens when you do map[key]?
- The key is passed through a hash function that produces an integer. This integer is then mapped to a bucket index via modulo:
index = hash(key) % num_buckets. The value is stored in (or retrieved from) that bucket. - Hash function quality matters. A good hash distributes keys uniformly across buckets. A bad one clusters keys into a few buckets, creating chains and degrading to O(n).
- Collision resolution comes in two flavors: chaining (each bucket holds a linked list or tree of entries) and open addressing (probe to the next empty slot using linear probing, quadratic probing, or double hashing). Java’s HashMap uses chaining with a red-black tree upgrade when a bucket exceeds 8 entries (since Java 8). Python’s
dictuses open addressing with a custom probe sequence. - Load factor and resizing. When the ratio of entries to buckets exceeds a threshold (0.75 in Java, ~2/3 in Python), the table is resized — typically doubled — and all entries are rehashed. This is an O(n) operation, but it happens infrequently enough that inserts are O(1) amortized.
- Example: In Java,
"hello".hashCode()returns a 32-bit integer. HashMap then applies a secondary spread function (h ^ (h >>> 16)) to reduce collision rates, then masks the result to the bucket array size.
- What happens if two different keys produce the same hash? Walk me through the full lookup path with a collision. (Answer: both entries end up in the same bucket. On lookup, the bucket is traversed — comparing keys via
.equals()— until the target key is found or the chain is exhausted.) - Java’s HashMap converts chains to red-black trees at 8 entries. Why 8 specifically, and what does this do to worst-case lookup? (Answer: 8 is chosen because under random hashing the probability of a chain reaching 8 is extremely low — about 1 in 10 million at load factor 0.75. The tree upgrade changes worst-case from O(n) to O(log n) per bucket, defending against pathological hash collisions or hash-flooding attacks.)
Q2: Solve Two Sum. Then tell me -- why is this problem so important? What broader pattern does it teach?
Q2: Solve Two Sum. Then tell me -- why is this problem so important? What broader pattern does it teach?
- The brute force is O(n^2): for each element, scan the rest for its complement. The HashMap approach stores each element as you go and checks if
target - currentalready exists. One pass, O(n) time, O(n) space. - Why it matters: Two Sum teaches the complement lookup pattern — instead of searching forward for a match, you store what you have seen and ask “does my complement exist?” This exact pattern appears in: subarray sum equals K (prefix sum complements), 3Sum (reduce to Two Sum), four-sum variants, and even some graph problems.
- The space-time trade-off is the core lesson. You burn O(n) space to avoid O(n) work per element. This principle — “can I precompute or cache something to avoid repeated work?” — is the single most common HashMap insight across all interview problems.
- Key detail: You store elements as you iterate, not all at once upfront. This naturally handles the “cannot use the same element twice” constraint and guarantees you return the first valid pair.
- What if the array has duplicates and there are multiple valid pairs? How does your approach handle that? (Answer: it returns the first pair found because we store-then-check in order. If you need all pairs, you would need to store a list of indices per value instead of a single index.)
- How would you extend this to Three Sum? Does the HashMap approach still win? (Answer: for 3Sum, you sort and use two pointers for the inner loop, giving O(n^2). Using a HashMap for the inner pair does not improve asymptotic complexity and is harder to deduplicate. The two-pointer approach is cleaner for 3Sum.)
Q3: Explain the prefix sum + HashMap technique for counting subarrays with sum K. Why does it work, and why is the initial {0: 1} entry critical?
Q3: Explain the prefix sum + HashMap technique for counting subarrays with sum K. Why does it work, and why is the initial {0: 1} entry critical?
prefix[j] - prefix[i] = K), not just pattern matching. The {0: 1} initialization is the #1 bug source — interviewers watch for whether you explain it or just memorize it.Strong answer:- Core insight: If
prefix_sum[j] - prefix_sum[i] = K, then the subarray from indexi+1tojsums to K. So for each positionj, we need to count how many earlier positionsihaveprefix_sum[i] = prefix_sum[j] - K. - This is Two Sum in disguise. Instead of looking for two numbers that sum to a target, we look for two prefix sums whose difference equals K. The HashMap stores
prefix_sum -> count of times we have seen this prefix sum. - Why
{0: 1}is critical. It handles subarrays that start from index 0. If the prefix sum at positionjequals K, then the subarray[0..j]is a valid answer. Without{0: 1}, the lookup forprefix_sum - K = 0would find nothing, and you would miss every valid subarray starting from the beginning. - Works with negative numbers. Unlike sliding window, this technique handles negatives because prefix sums are monotonically computed (no shrink step to invalidate).
- Example: Array
[1, 2, 3], K=3. Prefix sums: 0, 1, 3, 6. At prefix 3, we look for3 - 3 = 0, which exists once in our map (the initial entry). At prefix 6, we look for6 - 3 = 3, which also exists once. Total: 2 subarrays ([1,2]and[3]).
{0: 1}. Another red flag: trying to use sliding window for this problem (sliding window fails with negative numbers because the sum does not monotonically increase as you expand).Follow-ups:- What if I change the problem to “longest subarray with sum K” instead of counting? How does your HashMap usage change? (Answer: instead of storing counts, store the first index where each prefix sum was seen. When you find
prefix_sum - Kin the map, the length iscurrent_index - stored_index. By keeping only the first occurrence, you maximize length.) - What if K = 0? Does the approach still work? (Answer: yes, perfectly. You are looking for
prefix_sum[j] == prefix_sum[i], meaning two positions with the same prefix sum. The map naturally counts how many times each prefix sum has appeared.)
Q4: Design an LRU Cache with O(1) get and put. Why do you need both a HashMap and a linked list? Why not just one?
Q4: Design an LRU Cache with O(1) get and put. Why do you need both a HashMap and a linked list? Why not just one?
- The problem has two O(1) requirements that conflict. You need O(1) key lookup (HashMap territory) AND O(1) insertion/removal at arbitrary positions plus O(1) access to the least-recently-used item (doubly linked list territory). Neither structure alone satisfies both.
- HashMap stores
key -> node_reference. This gives O(1) lookup to find any node instantly. The doubly linked list maintains recency order: most-recently-used at the tail, least-recently-used at the head. Doubly linked because you need O(1) removal of a node from its current position (you need theprevpointer). - On
get(key): Look up the node via the HashMap (O(1)), remove it from its current position in the list (O(1) with prev/next pointers), re-insert it at the tail (O(1)). Return the value. - On
put(key, value): If the key exists, update its value and move it to the tail. If not, create a new node, add it to the HashMap and the tail of the list. If capacity is exceeded, remove the head node from both the list and the HashMap. Critical detail: The list node must store the key so that when evicting from the list, you know which HashMap entry to delete. - Sentinel nodes trick. Use dummy
headandtailnodes to avoid null checks on every insert/remove. This eliminates four edge-case branches and halves the number of bugs in implementation.
OrderedDict or Java’s LinkedHashMap and calling it done. That is not what the interviewer is asking — they want to see you build the data structure. Another red flag: forgetting to store the key in the node (you will not be able to evict from the HashMap).Follow-ups:- How would you make this thread-safe? What is the concurrency bottleneck? (Answer: naive approach is a global lock on every
get/put, but reads massively outnumber writes in most cache workloads. A striped lock approach partitions the HashMap into segments, each with its own lock. The linked list is the harder part — Java’sConcurrentLinkedDequeuses CAS operations, but maintaining strict LRU ordering under contention is expensive. Production caches like Caffeine use a write buffer to batch list updates.) - What changes if I ask for an LFU (Least Frequently Used) cache instead? (Answer: LFU needs a third structure: a frequency-to-list mapping. You maintain a HashMap from frequency -> doubly-linked-list of nodes at that frequency, plus a pointer to the minimum frequency. On access, move the node from its current frequency list to the next. This is significantly more complex — three interacting data structures instead of two.)
Q5: You have a stream of integers and need to check if any two numbers in the stream sum to a target. But now the stream is infinite -- you cannot store everything. What do you do?
Q5: You have a stream of integers and need to check if any two numbers in the stream sum to a target. But now the stream is infinite -- you cannot store everything. What do you do?
- First, clarify the requirements. “Are we checking for any pair ever seen in the entire stream, or only within a recent window? Can we tolerate false positives? These choices radically change the solution.”
- Bounded window approach. If you only need to check within the last N elements, use a HashMap of size N with a sliding window. When the window slides, evict the oldest entry. This is O(1) per element and O(N) space — manageable.
- Approximate approach with Bloom filters. If you need the entire history but cannot store everything, use a Bloom filter. For each new element
x, query the filter fortarget - x. If it says yes, probably a pair exists (with configurable false positive rate). Then insertxinto the filter. Space drops from O(n) to O(m) where m is the filter size — you can handle billions of elements in a few MB. Trade-off: you get false positives but never false negatives. - Probabilistic data structures matter here. A Bloom filter with 1% false positive rate uses about 10 bits per element. For 1 billion elements, that is ~1.2 GB instead of ~8 GB for a full HashMap.
- If exact answers are required with infinite stream: You need to relax another constraint — either bound the window, shard across machines (distributed HashMap), or use external storage (LSM tree / sorted file with binary search).
- What is the false positive rate of a Bloom filter and how do you tune it? (Answer: for m bits, k hash functions, and n inserted elements, the false positive rate is approximately
(1 - e^(-kn/m))^k. You typically choose k =(m/n) * ln(2)to minimize it. Doubling the bit array halves the false positive rate roughly.) - Could you use Count-Min Sketch instead of a Bloom filter here? Why or why not? (Answer: Count-Min Sketch estimates frequency counts, not membership. It would tell you approximately how many times
target - xhas appeared, which gives you more information than a Bloom filter but uses more space per element. Overkill for a simple “exists?” check, but useful if you also need to know how many valid pairs exist.)
Q6: Group Anagrams asks you to group strings by anagram equivalence. What key do you use in the HashMap, and what are the trade-offs of different key choices?
Q6: Group Anagrams asks you to group strings by anagram equivalence. What key do you use in the HashMap, and what are the trade-offs of different key choices?
- Approach 1: Sorted string as key. Sort each string alphabetically and use the sorted version as the key.
"eat"and"tea"both become"aet". Time per string: O(k log k) where k is string length. Simple and readable. - Approach 2: Character frequency tuple as key. Count character frequencies and create a tuple like
(1,0,0,...,1,0,...,1,...)for 26 letters."eat"becomes(1,0,0,0,1,0,...,1,0,...). Time per string: O(k) — faster than sorting for long strings. But the key is larger in memory (26 integers vs. k characters). - Approach 3: Prime number product. Assign each letter a unique prime number (a=2, b=3, c=5, …) and multiply them. Anagrams produce the same product by the fundamental theorem of arithmetic. O(k) time. Gotcha: integer overflow for long strings.
"z"maps to the 26th prime (101), and a 20-character string could overflow a 64-bit integer. - Which to choose? For interview problems with short strings (k under 20), sorting is cleanest and least error-prone. For production with long strings, frequency counting wins. The prime approach is clever but fragile — mention it to show depth, then explain why you would not use it in practice.
- Overall complexity: O(n * k log k) for sorting approach, O(n * k) for frequency counting, where n is the number of strings.
- What if strings contain Unicode characters, not just lowercase English letters? How does each approach change? (Answer: sorting still works perfectly. Frequency counting now needs a HashMap instead of a 26-element array, which increases overhead. The prime approach becomes impractical because you would need a unique prime for each Unicode code point.)
- If you had 10 million strings with average length 100, which approach would you use and why? (Answer: frequency counting with a tuple key. At length 100, sorting costs 100 * log(100) ~ 660 operations per string. Frequency counting costs exactly 100. At 10 million strings, that savings is significant — roughly 5.6 billion operations saved. Memory-wise both are comparable.)
Q7: Longest Consecutive Sequence asks for O(n) time. Why does the HashSet solution achieve this even though there is a while loop inside the for loop?
Q7: Longest Consecutive Sequence asks for O(n) time. Why does the HashSet solution achieve this even though there is a while loop inside the for loop?
- The trick is the
if num - 1 not in num_setguard. This ensures the innerwhileloop only runs whennumis the start of a consecutive sequence. Ifnum - 1exists, we skip it entirely because some earlier start will cover it. - Amortized argument: Every element in the array is touched at most twice: once in the outer
forloop, and at most once by thewhileloop (as part of extending some sequence). No element is “re-visited” by multiple sequences because the guard ensures each element belongs to exactly one sequence expansion. - Concrete example: Array
[100, 4, 200, 1, 3, 2]. The set is{1, 2, 3, 4, 100, 200}. Only1,100, and200pass the guard (0,99,199are not in the set). Thewhileloop runs: 4 times for the sequence starting at 1 (1->2->3->4), 1 time for 100, 1 time for 200. Total inner iterations: 6 = n. Total work: O(n) + O(n) = O(n). - This is the same amortized reasoning used for: two-pointer techniques (each pointer moves at most n times total), sliding window (each element added/removed once), and Union-Find with path compression.
- Could you solve this with sorting instead? What would the complexity be and why might you still prefer the HashSet approach? (Answer: sorting gives O(n log n) time and O(1) extra space (or O(n) with a new sorted array). The HashSet approach is O(n) time but O(n) space. You prefer HashSet when n is large and you can afford the memory, or when the data is streaming in and you cannot sort it.)
- What if the array contains duplicates? Does your solution still work? (Answer: yes, because we use a HashSet which automatically deduplicates. Duplicate values do not create duplicate sequence starts or extend sequences incorrectly. This is actually a subtle advantage of converting to a set first.)
Q8: When should you NOT use a HashMap? Give me three situations where it is the wrong choice despite seeming like a natural fit.
Q8: When should you NOT use a HashMap? Give me three situations where it is the wrong choice despite seeming like a natural fit.
- Situation 1: You need sorted/ordered traversal. If you need to iterate keys in order, find the closest key, or answer range queries like “all keys between 5 and 10”, a HashMap is the wrong choice. Use a TreeMap/BST (O(log n) ops but ordered) or a sorted array with binary search. Example: building a leaderboard where you need “find all players ranked 50-100” — a TreeMap gives you
subMap(50, 100)in O(log n + k), while a HashMap would require sorting all keys every time. - Situation 2: Memory-constrained environments or very large datasets. HashMaps have significant overhead: each entry in Java’s HashMap allocates a
Nodeobject (32+ bytes of overhead beyond the key/value). For 100 million integer pairs, a HashMap uses ~3.2 GB while a flat array or open-addressing table uses ~800 MB. In embedded systems, caches, or when your dataset approaches available RAM, this 4x overhead matters. Example: mobile apps caching user data — anArrayMap(Android) or flat arrays are often better. - Situation 3: Hash-flooding / adversarial input. If an attacker can control your keys (e.g., HTTP request parameters, user-supplied data), they can craft keys that all hash to the same bucket, degrading O(1) to O(n) and enabling denial-of-service. This has happened in production — the 2011 HashDoS attack affected PHP, Java, Python, and Ruby web frameworks simultaneously. Mitigations include randomized hash seeds (Python 3.3+, Ruby), using a cryptographic hash (SipHash), or switching to a balanced tree on collision (Java 8+ HashMap).
- Bonus — Situation 4: Keys are small integers in a known range. If keys are integers from 0 to 10,000, a simple array
arr[key] = valueis faster (no hashing, no collision resolution, cache-friendly memory layout) and uses less memory. Example: counting character frequencies — useint[26]orint[128], not a HashMap.
- You mentioned Java 8 converts chains to trees. Does Python have a similar defense? (Answer: Python uses a different strategy — SipHash as its hash function since Python 3.4, with a random seed per process. This makes it computationally infeasible for an attacker to craft colliding keys without knowing the seed. Python also uses open addressing, so there are no chains to degrade — but clustering under many collisions still degrades performance.)
- In a system design interview, when would you choose Redis (essentially a remote HashMap) versus a B-tree database like PostgreSQL? (Answer: Redis when you need sub-millisecond key-value lookups with simple access patterns — session stores, rate limiters, caches. PostgreSQL when you need range queries, joins, transactions, or durable storage with complex query patterns. The HashMap vs. tree trade-off scales up to the infrastructure level.)
Q9: You are debugging a service and notice that HashMap operations that should be O(1) are taking 50ms. What could be going wrong and how do you investigate?
Q9: You are debugging a service and notice that HashMap operations that should be O(1) are taking 50ms. What could be going wrong and how do you investigate?
- Cause 1: Hash collisions creating long chains. If many keys hash to the same bucket, lookups degrade from O(1) to O(n). Check your key distribution. In Java, add logging to measure bucket chain lengths. Common culprit: a custom
hashCode()that returns the same value for many objects (e.g.,return 0;or hashing only on a low-cardinality field likestatus). - Cause 2: Frequent resizing. If the HashMap is initialized with a small capacity and receives millions of insertions, it resizes repeatedly (doubling and rehashing everything). Each resize is O(n). If you know the approximate size upfront, initialize with
new HashMap<>(expectedSize * 4/3)to avoid resizing entirely. - Cause 3: GC pauses from HashMap churn. If you are creating and discarding millions of HashMap entries, the garbage collector is doing massive work. In Java, each
putallocates aNodeobject, and each resize creates a new array and orphans the old one. Profile with GC logs (-verbose:gc) and look for long stop-the-world pauses during HashMap operations. - Cause 4: Expensive
hashCode()orequals(). If your key is a complex object with a slowhashCode()(e.g., hashing a large byte array or a deeply nested object), every single lookup triggers this computation. Profile thehashCode()method directly. Fix: cache the hash code in the object or use a cheaper hash. - Cause 5: Thread-safety issues. If a
HashMap(notConcurrentHashMap) is accessed by multiple threads without synchronization, the internal array can get corrupted. In older Java versions, this could cause an infinite loop duringget()due to a cycle in the chain. In newer versions, it causes data loss orConcurrentModificationException. Check for multi-threaded access to a non-synchronized HashMap. - Debugging approach: Measure, do not guess. Use a profiler (async-profiler, JFR) to see where time is spent. Check bucket distribution via reflection or custom instrumentation. Look at GC logs. Check thread contention.
- How would you detect the infinite-loop bug caused by concurrent HashMap access in production? (Answer: the thread gets stuck in
get()and the CPU spikes. A thread dump shows the thread spinning inHashMap.get()traversing a cycle. The fix: switch toConcurrentHashMapor add synchronization. Prevention: static analysis tools like ErrorProne can flag unsynchronizedHashMapusage.) - If you need to store 500 million key-value pairs in memory, what alternatives to a standard HashMap would you consider? (Answer: off-heap solutions like Chronicle Map (Java) or memory-mapped files to avoid GC pressure. Open-addressing tables like Eclipse Collections’ primitive maps or Koloboke reduce per-entry overhead. If keys are integers, a flat array is ideal. At this scale, even the choice between chaining and open addressing matters for cache-line utilization.)
Q10: Compare HashMap, TreeMap, and LinkedHashMap. When would you choose each? Give me a real scenario for each.
Q10: Compare HashMap, TreeMap, and LinkedHashMap. When would you choose each? Give me a real scenario for each.
- HashMap: O(1) average for get/put, no ordering guarantees. Use when you only need fast key-value lookups and do not care about iteration order. Scenario: building a frequency counter for words in a document. You only care about counts, not the order you encounter words.
- TreeMap (Red-Black Tree): O(log n) for get/put, keys maintained in sorted order. Supports
firstKey(),lastKey(),subMap(from, to),floorKey(),ceilingKey(). Use when you need sorted traversal or range queries. Scenario: implementing a stock price tracker where you need “find the closest price to 100 and $200” —ceilingKey(100)andsubMap(100, 200)give you this in O(log n). - LinkedHashMap: O(1) average for get/put, maintains insertion order (or optionally access order). Use when you need HashMap speed but also need predictable iteration order. Scenario: implementing an LRU cache in Java —
LinkedHashMapwithaccessOrder=trueand overridingremoveEldestEntry()gives you a working LRU cache in about 5 lines. Also useful for serialization where you want JSON keys in a consistent order.
| Feature | HashMap | TreeMap | LinkedHashMap |
|---|---|---|---|
| Get/Put | O(1) avg | O(log n) | O(1) avg |
| Ordering | None | Sorted by key | Insertion or access order |
| Null keys | 1 allowed | Not allowed (comparison fails) | 1 allowed |
| Memory | Lowest | Higher (tree nodes) | Higher (linked list pointers) |
| Best for | Pure lookup | Range queries, sorted | Ordered iteration, LRU |
- In Python,
dictmaintains insertion order since 3.7. Does that make it equivalent to Java’s LinkedHashMap? (Answer: partially. Python’sdictpreserves insertion order, similar to LinkedHashMap’s default mode. But it does NOT support access-order mode, so you cannot use it as an LRU cache directly. For that, usecollections.OrderedDictwhich hasmove_to_end().) - What if you need both O(1) lookup by key AND sorted iteration? Can you have both? (Answer: not in a single data structure. You would maintain both a HashMap and a TreeMap (or a sorted list), updating both on insert/delete. This is a common pattern in order books for trading systems — a HashMap for O(1) price lookup and a TreeMap for sorted price levels. The trade-off is double the memory and insert cost.)
Interview Q&A Rapid-Fire (Rich Format)
The four problems below are the ones interviewers love to use as opening rounds. For each one we give a strong answer framework, a real LeetCode example with full code, three senior follow-ups, common wrong answers, and further reading.Q1: Two Sum -- walk me through it
Q1: Two Sum -- walk me through it
- State the brute force first. “Brute force is O(n^2) — nested loop checking every pair. We can do better.”
- Identify the trade-off. “Trading O(n) space for O(n) time using a HashMap of value-to-index.”
- Walk through one pass. For each element, compute
complement = target - num. If complement is in the map, return both indices; otherwise store current. - Justify the store-after-check ordering. “We check before storing so the same element cannot be used twice.”
- Handle edge cases. Duplicates with target equal to 2 * value — works because we store-then-check in order.
int to List<int> of indices. On a hit, append every stored index for that complement. Time stays O(n), space is O(n) for the result.- “Sort and use two pointers” when the problem asks for original indices. Sorting destroys index information.
- “Store before checking the complement” — breaks the “no same element twice” constraint when target = 2 * num.
- “Brute force is fine, it is O(n^2)” — correct but lazy. The interviewer is testing whether you can pivot to the optimized solution.
Q2: Group Anagrams -- design the right key
Q2: Group Anagrams -- design the right key
- Frame as bucketing. “I need to put strings into buckets such that anagrams land in the same bucket. The HashMap key is the bucket label.”
- Compare three key strategies. Sorted string (O(k log k) per string, simple). Frequency tuple (O(k), faster for long strings). Prime-product (O(k), but overflow-prone — mention to show depth, then dismiss).
- Pick the sorted-string approach for clarity unless asked to optimize.
- Code it concisely with
defaultdict(list). - Acknowledge the Unicode caveat. Sorted-string still works; 26-array does not.
tuple(sorted(s)) is hashable — using a list directly raises TypeError: unhashable type: 'list'.[0] * 26 count, then tuple(count). O(k) per string instead of O(k log k). At k = 10000, that is 10000 vs 130000 operations per string — 13x faster.Counter and convert to a frozenset of (char, count) tuples for the key.- “Use a list as the key” —
TypeError: unhashable type: 'list'. Lists are mutable, hence unhashable. - “Compare every pair to detect anagrams” — O(n^2 * k log k); fails the size constraint.
- “Use a prime-product key without mentioning overflow” — works for short strings but silently fails on long ones.
Q3: Subarray Sum Equals K -- the prefix-sum trick
Q3: Subarray Sum Equals K -- the prefix-sum trick
- State the brute force. “O(n^2) — enumerate every subarray and sum it. Too slow for large inputs.”
- Introduce the prefix-sum identity.
sum(i..j) = prefix[j] - prefix[i-1]. So we want pairs(i, j)withprefix[j] - prefix[i-1] = k, i.e.,prefix[i-1] = prefix[j] - k. - Apply the Two Sum trick. As we compute prefix sums, for each new prefix, ask: how many earlier prefixes equal
prefix - k? Each such match defines a valid subarray ending here. - Justify the
{0: 1}initialization. “Handles subarrays starting at index 0. Without it, we miss every subarray whose prefix at some point equals exactly k.” - Note the negative-numbers gotcha. Sliding window does not work because the running sum is not monotonic; prefix-sum + map handles negatives natively.
prefix -> count to prefix -> first_index_seen. When you find prefix - k in the map, the length is current_index - stored_index. Keep only the first occurrence to maximize length.- “Use sliding window” — breaks on negative numbers.
- “Forget
{0: 1}” — silently under-counts; the most common bug in this problem. - “Brute force with two nested loops” — O(n^2), fails the constraint at n = 20000.
Q4: Longest Consecutive Sequence in O(n)
Q4: Longest Consecutive Sequence in O(n)
- State the easy O(n log n) approach. Sort, then scan. Acknowledge it works but the problem requires O(n).
- Introduce the HashSet trick. Convert array to set for O(1) lookups.
- Add the “only-from-head” guard. For each number, only attempt to count if
n - 1is NOT in the set — this guarantees we only walk each consecutive streak from its leftmost element. - Walk forward. Once at a head, increment as long as
n + 1is in the set. - Prove the O(n) amortization. Each number is visited once in the outer loop and at most once during a forward walk — 2n operations total. Without the guard, we re-walk streaks from every position and the algorithm degrades to O(n^2).
(start, length) of the best streak so far instead of just the length. On a new best, record the start. After the loop, return range(start, start + length) or equivalent.- “Sort and scan” — correct but O(n log n), violates the constraint.
- “Walk forward from every element” — without the guard, this is O(n^2). This is the most common implementation bug.
- “Use a Union-Find” — works but overkill; O(n * alpha(n)) which is essentially O(n) but with much higher constants.