Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

HashMap Pattern

What is HashMap?

HashMap (or HashTable/Dictionary) provides O(1) average-case lookup, insertion, and deletion by mapping keys to values using a hash function. Think of a library where every book has a unique call number. Instead of scanning every shelf (O(n)), you look up the call number in the catalog, which tells you the exact shelf and position (O(1)). A hash function is that catalog — it converts a key into an index where the value is stored. Under the hood, a HashMap maintains an array of “buckets.” The hash function maps each key to a bucket index. When two keys hash to the same bucket (a collision), the bucket stores both entries in a linked list or a balanced tree (Java 8+ uses a tree after 8 collisions). This is why average-case is O(1) but worst-case is O(n) — if every key collides into one bucket, lookup degrades to scanning a list. Why HashMap is so important for interviews: It is the single most versatile tool for trading space for time. An O(n^2) brute-force solution that uses a nested loop can almost always be reduced to O(n) by replacing the inner loop with a HashMap lookup. Two Sum is the canonical example.
Quick Recognition: If you need O(1) lookup, frequency counting, or want to trade space for time, HashMap is your friend!

Pattern Recognition Cheatsheet

Reach for a HashMap the moment you see any of these phrases in a problem statement. Train your pattern-matching reflexes on these triggers and you will solve 90 percent of HashMap problems within 30 seconds of reading them.
Keyword / Shape in ProblemWhat HashMap Trick to Use
”count occurrences”, “frequency of”, “how many times”Counter(arr) or dict[k] = dict.get(k,0)+1
”find two numbers that sum to”, “two-sum-ish”Store target - x complement; check on iterate
”is anagram”, “permutation of”, “rearrange to form”Compare frequency maps or sorted-string keys
”duplicates”, “first repeating”, “contains duplicate within k”seen set or seen[val] = index map
”longest substring with K distinct”, “characters at most”Sliding window plus frequency map
”subarray sum equals K”, “count subarrays with sum”Prefix sum plus {0: 1} initial map
”group X by Y” (anagrams, strings, isomorphic)Map canonical key to list of items
”first non-repeating”, “unique character”Two-pass: count, then scan for count == 1
”longest consecutive sequence”Set plus only-start-from-sequence-head guard
”design LRU/LFU cache”HashMap plus doubly linked list (or two maps)
“isomorphic strings”, “word pattern”Two parallel maps for bijective check
Mental shortcut: if the brute-force is “for each element, search the rest” and you can describe a complement or running aggregate you would store, the HashMap solution is one line of code away.

Reusable Template Code

The 90 percent of HashMap problems you will see in interviews collapse into one of three skeletons. Memorize these and you will write them in your sleep.
# Skeleton 1: Complement lookup (Two Sum, Pair with Difference, etc.)
def complement_lookup(nums, target):
    seen = {}                          # value -> index (or count)
    for i, num in enumerate(nums):
        complement = target - num
        if complement in seen:
            return [seen[complement], i]
        seen[num] = i                  # Store AFTER check to avoid using same element
    return []

# Skeleton 2: Frequency counter (Anagram, Top K, First Unique, etc.)
from collections import Counter, defaultdict
def frequency_pattern(arr):
    freq = Counter(arr)                # one-liner build
    # ...query freq[x] in O(1) afterwards
    return freq

# Skeleton 3: Prefix sum plus map (Subarray Sum K, Continuous Subarray, etc.)
def prefix_sum_map(nums, k):
    count, prefix = 0, 0
    sum_count = {0: 1}                 # CRITICAL: handles subarrays starting at index 0
    for num in nums:
        prefix += num
        count += sum_count.get(prefix - k, 0)
        sum_count[prefix] = sum_count.get(prefix, 0) + 1
    return count

Pattern Recognition Checklist

Use HashMap When

  • Need O(1) lookup by key
  • Counting frequencies of elements
  • Finding pairs/complements (like Two Sum)
  • Grouping elements by property
  • Caching/Memoization

Don't Use When

  • Need sorted order (use TreeMap)
  • Range queries required
  • Space is very constrained
  • Keys are unhashable (like lists)

When to Use

Frequency Counting

Count occurrences of elements

Two Sum Variants

Find pairs/groups with specific properties

Caching/Memoization

Store computed results

Grouping/Indexing

Group elements by property

Pattern Variations

1. Two Sum (Classic HashMap)

The problem that launched a thousand interviews. The brute-force approach checks every pair in O(n^2). The HashMap solution turns this into O(n) by storing each number’s index as you scan. For each new number, you compute its complement (target - num) and check if the complement is already in the map. Why this pattern is so powerful: It replaces the inner loop of “search for the matching element” with an O(1) lookup. This same trick applies to Three Sum (fix one, Two Sum the rest), Four Sum, and many subarray problems. Time: O(n). Space: O(n). Edge case: What if the same element is used twice? The code handles this correctly because we check the map before inserting the current element, so [3, 3] with target 6 works — we find 3 in the map from index 0 when processing index 1.
def two_sum(nums, target):
    """Find indices of two numbers that sum to target"""
    seen = {}  # value -> index mapping
    
    for i, num in enumerate(nums):
        complement = target - num       # What value do we need?
        if complement in seen:          # Have we seen it before?
            return [seen[complement], i]
        seen[num] = i                   # Store current number for future lookups
    
    return []  # No pair found

2. Frequency Counter

Counting element occurrences is the second most common HashMap pattern (after Two Sum). It appears in Top K Frequent, Valid Anagram, First Unique Character, Majority Element, and dozens more. The pattern: Build a frequency map in O(n), then query it in O(1) per lookup. Python’s Counter and Java’s getOrDefault make this concise. Anagram check insight: Two strings are anagrams if and only if they have the same character frequencies. Instead of sorting both strings (O(n log n)), compare their frequency maps (O(n)). Time: O(n). Space: O(k) where k is the number of distinct elements (at most 26 for lowercase letters).
from collections import Counter

def top_k_frequent(nums, k):
    """Return k most frequent elements"""
    count = Counter(nums)
    return [num for num, _ in count.most_common(k)]

def is_anagram(s, t):
    """Check if t is anagram of s"""
    return Counter(s) == Counter(t)

3. Group By (Anagrams)

Group elements by a computed key. For anagrams, the key is a canonical form that is identical for all anagrams of the same word. Two approaches: (1) sort the characters (simple, O(k log k) per word), or (2) count characters into a tuple (O(k) per word, faster for long strings). Example: “eat”, “tea”, “ate” all sort to “aet”, so they share the same key and land in the same bucket. Time: O(n * k log k) where n is the number of strings and k is the maximum string length. Space: O(n * k) for storing all strings in the map.
from collections import defaultdict

def group_anagrams(strs):
    """Group strings that are anagrams of each other"""
    groups = defaultdict(list)
    
    for s in strs:
        key = tuple(sorted(s))  # or count-based key
        groups[key].append(s)
    
    return list(groups.values())

4. Prefix Sum with HashMap

This is one of the most important combined patterns in interviews. The core idea: sum(nums[i..j]) = prefix[j] - prefix[i-1]. So if you want subarrays with sum exactly k, you need prefix[j] - prefix[i-1] = k, which means prefix[i-1] = prefix[j] - k. Use a HashMap to count how many times each prefix sum has occurred. Why {0: 1} in the initial map? This handles subarrays starting at index 0. If the prefix sum at some point equals exactly k, then prefix_sum - k = 0, and we need to know that “a prefix sum of 0 occurred once” (representing the empty prefix before the array starts). Walked example: nums = [1, 2, 3], k = 3. Prefix sums: 1, 3, 6. At prefix_sum=3: 3 - 3 = 0 is in the map (count 1), so one subarray sums to 3 ([1,2]). At prefix_sum=6: 6 - 3 = 3 is in the map (count 1), so another subarray ([3]). Total = 2. Time: O(n). Space: O(n).
def subarray_sum_equals_k(nums, k):
    """Count subarrays with sum exactly k"""
    count = 0
    prefix_sum = 0
    sum_count = {0: 1}  # prefix_sum -> frequency (0 appears once: empty prefix)
    
    for num in nums:
        prefix_sum += num
        
        # How many previous prefix sums equal (prefix_sum - k)?
        # Each one defines a subarray ending here with sum exactly k
        if prefix_sum - k in sum_count:
            count += sum_count[prefix_sum - k]
        
        # Record this prefix sum for future subarrays to reference
        sum_count[prefix_sum] = sum_count.get(prefix_sum, 0) + 1
    
    return count

5. Longest Consecutive Sequence

A beautiful problem that looks like it requires sorting (O(n log n)) but can be solved in O(n) with a HashSet. The trick: for each number, check if it is the start of a sequence (i.e., num - 1 is NOT in the set). If it is a start, count upward as far as possible. If it is not a start, skip it — it will be counted when we process the actual start. Why this is O(n) and not O(n^2): Each number is visited at most twice — once when we iterate through the set, and at most once when we extend a sequence from its start. Numbers that are not sequence starts are skipped immediately. Common interview mistake: Starting a count from every number (not just sequence starts). This turns the solution into O(n^2) because sequences are re-counted from every element. Time: O(n). Space: O(n).
def longest_consecutive(nums):
    """Find length of longest consecutive sequence in O(n)"""
    num_set = set(nums)      # O(1) lookups
    max_length = 0
    
    for num in num_set:
        # KEY INSIGHT: only start counting from the beginning of a sequence
        if num - 1 not in num_set:   # num is a sequence start
            current = num
            length = 1
            
            while current + 1 in num_set:  # Extend the sequence
                current += 1
                length += 1
            
            max_length = max(max_length, length)
    
    return max_length

6. LRU Cache (HashMap + Doubly Linked List)

from collections import OrderedDict

class LRUCache:
    def __init__(self, capacity):
        self.cache = OrderedDict()
        self.capacity = capacity
    
    def get(self, key):
        if key not in self.cache:
            return -1
        self.cache.move_to_end(key)
        return self.cache[key]
    
    def put(self, key, value):
        if key in self.cache:
            self.cache.move_to_end(key)
        self.cache[key] = value
        if len(self.cache) > self.capacity:
            self.cache.popitem(last=False)

Classic Problems

Pattern: Store complement, check on each iterationKey: HashMap trades space for time O(n^2) to O(n)
Pattern: Compare character frequency countsVariants: Counting sort (26 letters), Counter equality
Pattern: HashSet for seen elementsVariants: Within k indices, within t value difference
Pattern: Count frequency, find first with count 1Key: Two passes - count, then find
Pattern: Prefix sum with HashMapKey: Store prefix_sum frequencies, look for (current - k)

Worked LeetCode Problems: Brute Force vs Optimized

The single biggest mistake candidates make is jumping to the optimized HashMap solution without first naming the brute-force baseline. Interviewers want to see the trade-off explicitly: “brute force is O(n^2), HashMap trades O(n) space for O(n) time.” Walk through these four canonical problems and you will have a template for every HashMap question.

LC 1: Two Sum (Easy)

Problem. Given nums and target, return indices of the two numbers such that they add up to target. Each input has exactly one solution; you may not use the same element twice. Brute force (O(n^2) time, O(1) space). Two nested loops. Concede this version, then immediately pivot.
def two_sum_brute(nums, target):
    for i in range(len(nums)):
        for j in range(i + 1, len(nums)):
            if nums[i] + nums[j] == target:
                return [i, j]
Optimized (O(n) time, O(n) space). One pass with a complement map. Store value to index and on each new number ask: “have I seen the complement already?” If yes, return both indices. The store-then-check ordering naturally rejects using the same element twice.
def two_sum(nums, target):
    seen = {}
    for i, num in enumerate(nums):
        if target - num in seen:
            return [seen[target - num], i]
        seen[num] = i
Why this beats brute force. You replaced “scan the array for the complement” with “ask a hash table.” That is the entire HashMap thesis in one problem.

LC 49: Group Anagrams (Medium)

Problem. Given an array of strings, group anagrams together. Brute force (O(n^2 * k)). Compare every pair of strings using a sort-and-compare. Painfully slow at scale. Optimized — sorted key (O(n * k log k)). For each string, sort its characters to get a canonical form. Use that as the HashMap key.
from collections import defaultdict
def group_anagrams(strs):
    groups = defaultdict(list)
    for s in strs:
        groups[tuple(sorted(s))].append(s)   # tuple is hashable; list is not
    return list(groups.values())
Optimized — count key (O(n * k)). For lowercase-only strings, build a 26-tuple of letter counts as the key. Faster for long strings.
def group_anagrams_count(strs):
    groups = defaultdict(list)
    for s in strs:
        key = [0] * 26
        for c in s:
            key[ord(c) - ord('a')] += 1
        groups[tuple(key)].append(s)         # tuple, not list, for hashability
    return list(groups.values())
Trade-off note. Sorted key is shorter and easier to write under interview pressure. Count key wins when strings are long. Mention both; pick sorted unless asked to optimize.

LC 242: Valid Anagram (Easy)

Problem. Given two strings s and t, return True if t is an anagram of s. Brute force (O(n log n)). Sort both, compare. Acceptable answer, but you can do better. Optimized counting (O(n) time, O(1) space for fixed alphabet). Use a single 26-element array. Increment for chars in s, decrement for chars in t. If any cell is non-zero at the end, not an anagram.
def is_anagram(s, t):
    if len(s) != len(t):
        return False
    count = [0] * 26
    for cs, ct in zip(s, t):
        count[ord(cs) - ord('a')] += 1
        count[ord(ct) - ord('a')] -= 1
    return all(c == 0 for c in count)
Senior twist. If asked to handle Unicode, the 26-array does not work. Switch to Counter(s) == Counter(t) — still O(n), now general-purpose.

LC 128: Longest Consecutive Sequence (Medium)

Problem. Given an unsorted nums, return the length of the longest consecutive elements sequence. Required: O(n) time. Brute force (O(n log n)). Sort, then scan. Easy but violates the time requirement. Optimized (O(n)). Convert to set. For each number, only start a count if num - 1 is NOT in the set (this guards against re-counting the same sequence). Then walk forward as far as the set allows.
def longest_consecutive(nums):
    num_set = set(nums)
    longest = 0
    for n in num_set:
        if n - 1 not in num_set:        # only start at the head of a streak
            curr, length = n, 1
            while curr + 1 in num_set:
                curr += 1
                length += 1
            longest = max(longest, length)
    return longest
The trap. Without the n - 1 not in num_set guard, you re-walk the same streak from every element and the solution silently becomes O(n^2). This is the single most-asked follow-up: “why is this O(n) and not O(n^2)?”

LC 560: Subarray Sum Equals K (Medium)

Problem. Count the number of contiguous subarrays whose sum equals k. Brute force (O(n^2)). Nested loop summing every subarray. Works, but cannot pass large inputs. Optimized prefix-sum plus map (O(n) time, O(n) space). Track running prefix sum. For each position, count how many earlier prefix sums equal prefix - k — each such match defines a subarray ending at the current index summing to k.
def subarray_sum(nums, k):
    sum_count = {0: 1}                  # the {0: 1} handles subarrays starting at index 0
    prefix = total = 0
    for num in nums:
        prefix += num
        total += sum_count.get(prefix - k, 0)
        sum_count[prefix] = sum_count.get(prefix, 0) + 1
    return total
The {0: 1} interview trap. Initialize with {0: 1} so that whenever the prefix itself equals k, the lookup prefix - k = 0 finds an “empty prefix” and counts it. Forget this and you under-count by missing every subarray that starts at index 0.

Common Mistakes

Caveats and Traps — Bugs Every HashMap Problem Hides:
  1. Mutating a dict while iterating it. Python raises RuntimeError: dictionary changed size during iteration; Java raises ConcurrentModificationException. Iterate over a copy (list(d.items())) or collect changes and apply them after the loop.
  2. Hash-collision attacks on adversarial input. Untrusted user-supplied keys (HTTP params, JSON keys) can be crafted to all hash to one bucket, turning O(1) into O(n) and causing DoS. Modern Python 3.3+ and Java 8+ defend with randomized seeds and tree-bucket fallback, but custom hashCode() you write yourself does not.
  3. Tuples as keys only if every element is hashable. (1, 2, frozenset({3})) works; (1, 2, [3]) raises TypeError: unhashable type: 'list'. Convert nested lists to tuples or frozenset before keying.
  4. defaultdict swallows missing keys silently. Accessing dd[key] for a non-existent key creates an empty entry. This silently mutates the dict and produces phantom entries when you only meant to read. Use dd.get(key) or check key in dd for read-only access.
  5. Using == instead of .equals() on Java boxed Integers. Outside the cached range of -128 to 127, Integer == Integer compares references, not values. Always .equals() or unbox via intValue() when comparing HashMap-stored integers.
  6. Forgetting {0: 1} initialization in prefix-sum problems. This single missing line causes more wrong answers in LC 560 than every other bug combined.
  7. Mutating an object that is already a HashMap key. Its hash changes, but the map still indexes it under the old bucket — the entry becomes silently unreachable. Use immutable keys (strings, tuples, frozen records) only.
Solutions and Idioms — Paired with the Warnings Above:
  1. Safe mutation while iterating: snapshot keys with for k in list(d): or build a to_delete list and apply removals afterward.
  2. Defend against hash-collision DoS: stick with the standard library’s HashMap (which already randomizes). For custom keys, never write a hashCode() that returns a constant or a low-cardinality field.
  3. Make complex keys hashable: wrap nested lists in tuples; convert sets to frozenset. For dataclasses, mark them @dataclass(frozen=True) so they are hashable.
  4. Avoid silent defaultdict creation: read with .get(key) or if key in dd: — only use bracket access when you intend to insert.
  5. Java integer equality: always map.get(key).equals(value) or map.get(key).intValue() == primitiveValue.
  6. Prefix-sum mantra: “I init my map with {0: 1} because the empty prefix sums to zero” — say it out loud before coding.
  7. Immutable keys only: if you find yourself wanting to mutate a key, you have a design bug. Use a separate value object or replace the entry entirely (del d[old_key]; d[new_key] = v).

Curated LeetCode Practice List

The classic HashMap problem set, grouped by difficulty and annotated with which sub-pattern each one teaches. Solve them in order within each tier; later problems build on earlier insights.

Easy

#ProblemPattern Variant
1Two SumComplement lookup — the canonical
217Contains DuplicateHashSet for “have I seen this?“
242Valid AnagramFrequency-count comparison
387First Unique CharacterTwo-pass count then scan
383Ransom NoteSubset frequency check
219Contains Duplicate IIMap with index for k-distance check
290Word PatternTwo-map bijection

Medium

#ProblemPattern Variant
49Group AnagramsCanonical-key grouping
560Subarray Sum Equals KPrefix sum plus map
128Longest Consecutive SequenceSet with sequence-head guard
146LRU CacheHashMap plus doubly linked list
347Top K Frequent ElementsCounter plus heap or bucket sort
3Longest Substring Without RepeatingSliding window plus map
4544Sum IITwo-map split for n^2 instead of n^4
525Contiguous ArrayPrefix-sum trick (treat 0 as -1)

Hard

#ProblemPattern Variant
76Minimum Window SubstringSliding window plus need-have count
460LFU CacheThree coordinated maps
30Substring with Concat of All WordsMulti-map sliding window
41First Missing PositiveIn-place hashing via index marking
432All O(1) Data StructureHashMap plus doubly linked list of buckets
Avoid These Pitfalls:
  1. Unhashable keys in Python: Lists and sets cannot be dictionary keys because they are mutable. Convert lists to tuples (tuple(sorted(s)) for anagram grouping) or use frozenset. This is the most common runtime error in HashMap-based solutions.
  2. KeyError when accessing missing keys: Use dict.get(key, default) or collections.defaultdict instead of direct access. In Java, map.getOrDefault(key, 0) serves the same purpose.
  3. Forgetting the {0: 1} initialization in prefix sum: Without this, subarrays starting at index 0 that sum to k are not counted. This single missing initialization is responsible for more wrong answers than any other HashMap bug.
  4. Hash collisions degrading performance: In adversarial inputs, all keys can hash to the same bucket, turning O(1) lookups into O(n). Java 8+ mitigates this by converting long chains to balanced trees. In competitive programming, attackers craft inputs to exploit weak hash functions — use a randomized hash if needed.
  5. Mutating objects used as keys: If you change an object after using it as a key, its hash changes but the HashMap still stores it at the old hash’s bucket. The key becomes unreachable — effectively a memory leak and a silent logic bug.
  6. Using HashMap when a fixed-size array suffices: For problems with a small, known key space (26 lowercase letters, 128 ASCII characters), a fixed array is faster and simpler. int[26] beats HashMap<Character, Integer> in both speed and clarity.

Complexity Quick Reference

OperationAverageWorst CaseNote
InsertO(1)O(n)Worst case with bad hash
LookupO(1)O(n)Worst case with collisions
DeleteO(1)O(n)Worst case with chaining
IterateO(n)O(n)All keys/values

Interview Problems by Company

ProblemCompanyKey Concept
Two SumAll FAANGComplement lookup
Valid AnagramAmazon, MetaCharacter frequency
Contains DuplicateAllHashSet for seen
First Unique CharAmazonFrequency count
Ransom NoteMicrosoftCharacter counting

Interview Tips

Script for interviews:
  1. “I’ll use a HashMap to get O(1) lookup time.”
  2. “I’ll store [key] mapping to [value].”
  3. “This trades O(n) space for O(1) lookup time.”
  4. “For each element, I check if [complement/target] exists.”
  5. “Total time is O(n) with O(n) space.”
Interviewer SaysYou Should Think
”Can you do it in O(n)?”HashMap to avoid nested loop
”Count occurrences”Frequency counter
”Find pair that sums to…”Store complements
”Group by property”Key = property, value = list
”Check if seen before”HashSet
HashMap: When you need key-value pairs
  • Two Sum (value → index)
  • Frequency counting (element → count)
HashSet: When you only need presence check
  • Contains Duplicate (seen set)
  • Unique elements (deduplication)

Practice Problems

ProblemDifficultyLink
Two SumEasyLeetCode 1
Valid AnagramEasyLeetCode 242
Group AnagramsMediumLeetCode 49
Subarray Sum Equals KMediumLeetCode 560
LRU CacheMediumLeetCode 146

Practice Roadmap

1

Day 1: Basic Lookup

  • Solve: Two Sum, Contains Duplicate
  • Focus: Complement pattern, seen set
2

Day 2: Frequency Counting

  • Solve: Valid Anagram, First Unique Character
  • Focus: Counter/frequency map usage
3

Day 3: Grouping

  • Solve: Group Anagrams, Word Pattern
  • Focus: Key design for grouping
4

Day 4: Advanced

  • Solve: LRU Cache, Subarray Sum = K
  • Focus: HashMap + other data structures
Interview Tip: When you need O(1) lookup or counting frequencies, HashMap is almost always the answer.

Interview Questions

Deep-dive questions that test real understanding of the HashMap pattern — how hashing works under the hood, when it breaks, and why you reach for it over other structures. Expect interviewers to push past your initial answer.
Difficulty: FoundationalWhat interviewers are really testing: Whether you understand the mechanism behind the magic, not just the API. Candidates who treat HashMaps as a black box will struggle when asked about collisions, resizing, or performance degradation. This separates rote memorizers from people who can debug hash-related performance issues in production.Strong answer:
  • The key is passed through a hash function that produces an integer. This integer is then mapped to a bucket index via modulo: index = hash(key) % num_buckets. The value is stored in (or retrieved from) that bucket.
  • Hash function quality matters. A good hash distributes keys uniformly across buckets. A bad one clusters keys into a few buckets, creating chains and degrading to O(n).
  • Collision resolution comes in two flavors: chaining (each bucket holds a linked list or tree of entries) and open addressing (probe to the next empty slot using linear probing, quadratic probing, or double hashing). Java’s HashMap uses chaining with a red-black tree upgrade when a bucket exceeds 8 entries (since Java 8). Python’s dict uses open addressing with a custom probe sequence.
  • Load factor and resizing. When the ratio of entries to buckets exceeds a threshold (0.75 in Java, ~2/3 in Python), the table is resized — typically doubled — and all entries are rehashed. This is an O(n) operation, but it happens infrequently enough that inserts are O(1) amortized.
  • Example: In Java, "hello".hashCode() returns a 32-bit integer. HashMap then applies a secondary spread function (h ^ (h >>> 16)) to reduce collision rates, then masks the result to the bucket array size.
Red flag answer: “It hashes the key and stores it.” No mention of collision handling, load factor, or what happens during resizing. Another red flag: saying lookup is always O(1) without qualifying “average case” or “amortized.”Follow-ups:
  1. What happens if two different keys produce the same hash? Walk me through the full lookup path with a collision. (Answer: both entries end up in the same bucket. On lookup, the bucket is traversed — comparing keys via .equals() — until the target key is found or the chain is exhausted.)
  2. Java’s HashMap converts chains to red-black trees at 8 entries. Why 8 specifically, and what does this do to worst-case lookup? (Answer: 8 is chosen because under random hashing the probability of a chain reaching 8 is extremely low — about 1 in 10 million at load factor 0.75. The tree upgrade changes worst-case from O(n) to O(log n) per bucket, defending against pathological hash collisions or hash-flooding attacks.)
Difficulty: FoundationalWhat interviewers are really testing: Two Sum is not just an easy warm-up — it is the canonical example of the complement lookup pattern and the space-time trade-off. Interviewers want to see if you recognize the underlying principle that applies to dozens of harder problems, not just the specific solution.Strong answer:
  • The brute force is O(n^2): for each element, scan the rest for its complement. The HashMap approach stores each element as you go and checks if target - current already exists. One pass, O(n) time, O(n) space.
  • Why it matters: Two Sum teaches the complement lookup pattern — instead of searching forward for a match, you store what you have seen and ask “does my complement exist?” This exact pattern appears in: subarray sum equals K (prefix sum complements), 3Sum (reduce to Two Sum), four-sum variants, and even some graph problems.
  • The space-time trade-off is the core lesson. You burn O(n) space to avoid O(n) work per element. This principle — “can I precompute or cache something to avoid repeated work?” — is the single most common HashMap insight across all interview problems.
  • Key detail: You store elements as you iterate, not all at once upfront. This naturally handles the “cannot use the same element twice” constraint and guarantees you return the first valid pair.
def two_sum(nums, target):
    seen = {}  # value -> index
    for i, num in enumerate(nums):
        complement = target - num
        if complement in seen:
            return [seen[complement], i]
        seen[num] = i
Red flag answer: Solving it but calling it “just an easy warm-up” without articulating the underlying pattern. Another red flag: sorting first and using two pointers when the problem asks for indices (sorting destroys index information — you would need to track original indices, adding complexity for no benefit).Follow-ups:
  1. What if the array has duplicates and there are multiple valid pairs? How does your approach handle that? (Answer: it returns the first pair found because we store-then-check in order. If you need all pairs, you would need to store a list of indices per value instead of a single index.)
  2. How would you extend this to Three Sum? Does the HashMap approach still win? (Answer: for 3Sum, you sort and use two pointers for the inner loop, giving O(n^2). Using a HashMap for the inner pair does not improve asymptotic complexity and is harder to deduplicate. The two-pointer approach is cleaner for 3Sum.)
Difficulty: IntermediateWhat interviewers are really testing: This is where most candidates who know Two Sum start to struggle. The prefix sum + HashMap combo requires understanding a mathematical invariant (prefix[j] - prefix[i] = K), not just pattern matching. The {0: 1} initialization is the #1 bug source — interviewers watch for whether you explain it or just memorize it.Strong answer:
  • Core insight: If prefix_sum[j] - prefix_sum[i] = K, then the subarray from index i+1 to j sums to K. So for each position j, we need to count how many earlier positions i have prefix_sum[i] = prefix_sum[j] - K.
  • This is Two Sum in disguise. Instead of looking for two numbers that sum to a target, we look for two prefix sums whose difference equals K. The HashMap stores prefix_sum -> count of times we have seen this prefix sum.
  • Why {0: 1} is critical. It handles subarrays that start from index 0. If the prefix sum at position j equals K, then the subarray [0..j] is a valid answer. Without {0: 1}, the lookup for prefix_sum - K = 0 would find nothing, and you would miss every valid subarray starting from the beginning.
  • Works with negative numbers. Unlike sliding window, this technique handles negatives because prefix sums are monotonically computed (no shrink step to invalidate).
  • Example: Array [1, 2, 3], K=3. Prefix sums: 0, 1, 3, 6. At prefix 3, we look for 3 - 3 = 0, which exists once in our map (the initial entry). At prefix 6, we look for 6 - 3 = 3, which also exists once. Total: 2 subarrays ([1,2] and [3]).
Red flag answer: Memorizing the code but not being able to explain why you initialize the map with {0: 1}. Another red flag: trying to use sliding window for this problem (sliding window fails with negative numbers because the sum does not monotonically increase as you expand).Follow-ups:
  1. What if I change the problem to “longest subarray with sum K” instead of counting? How does your HashMap usage change? (Answer: instead of storing counts, store the first index where each prefix sum was seen. When you find prefix_sum - K in the map, the length is current_index - stored_index. By keeping only the first occurrence, you maximize length.)
  2. What if K = 0? Does the approach still work? (Answer: yes, perfectly. You are looking for prefix_sum[j] == prefix_sum[i], meaning two positions with the same prefix sum. The map naturally counts how many times each prefix sum has appeared.)
Difficulty: SeniorWhat interviewers are really testing: LRU Cache is the classic “combine two data structures” design question. It tests whether you understand that no single data structure gives you O(1) for both access-by-key and recency-ordering, and whether you can compose them cleanly. This is also a test of your ability to manage pointer/reference manipulation without bugs.Strong answer:
  • The problem has two O(1) requirements that conflict. You need O(1) key lookup (HashMap territory) AND O(1) insertion/removal at arbitrary positions plus O(1) access to the least-recently-used item (doubly linked list territory). Neither structure alone satisfies both.
  • HashMap stores key -> node_reference. This gives O(1) lookup to find any node instantly. The doubly linked list maintains recency order: most-recently-used at the tail, least-recently-used at the head. Doubly linked because you need O(1) removal of a node from its current position (you need the prev pointer).
  • On get(key): Look up the node via the HashMap (O(1)), remove it from its current position in the list (O(1) with prev/next pointers), re-insert it at the tail (O(1)). Return the value.
  • On put(key, value): If the key exists, update its value and move it to the tail. If not, create a new node, add it to the HashMap and the tail of the list. If capacity is exceeded, remove the head node from both the list and the HashMap. Critical detail: The list node must store the key so that when evicting from the list, you know which HashMap entry to delete.
  • Sentinel nodes trick. Use dummy head and tail nodes to avoid null checks on every insert/remove. This eliminates four edge-case branches and halves the number of bugs in implementation.
# Why the node stores BOTH key and value:
# When evicting the LRU node, you need the key to delete from the HashMap
class Node:
    def __init__(self, key, value):
        self.key = key      # Needed for HashMap eviction
        self.value = value
        self.prev = None
        self.next = None
Red flag answer: Using Python’s OrderedDict or Java’s LinkedHashMap and calling it done. That is not what the interviewer is asking — they want to see you build the data structure. Another red flag: forgetting to store the key in the node (you will not be able to evict from the HashMap).Follow-ups:
  1. How would you make this thread-safe? What is the concurrency bottleneck? (Answer: naive approach is a global lock on every get/put, but reads massively outnumber writes in most cache workloads. A striped lock approach partitions the HashMap into segments, each with its own lock. The linked list is the harder part — Java’s ConcurrentLinkedDeque uses CAS operations, but maintaining strict LRU ordering under contention is expensive. Production caches like Caffeine use a write buffer to batch list updates.)
  2. What changes if I ask for an LFU (Least Frequently Used) cache instead? (Answer: LFU needs a third structure: a frequency-to-list mapping. You maintain a HashMap from frequency -> doubly-linked-list of nodes at that frequency, plus a pointer to the minimum frequency. On access, move the node from its current frequency list to the next. This is significantly more complex — three interacting data structures instead of two.)
Difficulty: SeniorWhat interviewers are really testing: Whether you can adapt the classic Two Sum HashMap pattern to a constrained environment. This tests your understanding of space-time trade-offs when space is not unlimited, and whether you know approximate data structures like Bloom filters. It also tests your ability to define and negotiate requirements when a problem is ambiguous.Strong answer:
  • First, clarify the requirements. “Are we checking for any pair ever seen in the entire stream, or only within a recent window? Can we tolerate false positives? These choices radically change the solution.”
  • Bounded window approach. If you only need to check within the last N elements, use a HashMap of size N with a sliding window. When the window slides, evict the oldest entry. This is O(1) per element and O(N) space — manageable.
  • Approximate approach with Bloom filters. If you need the entire history but cannot store everything, use a Bloom filter. For each new element x, query the filter for target - x. If it says yes, probably a pair exists (with configurable false positive rate). Then insert x into the filter. Space drops from O(n) to O(m) where m is the filter size — you can handle billions of elements in a few MB. Trade-off: you get false positives but never false negatives.
  • Probabilistic data structures matter here. A Bloom filter with 1% false positive rate uses about 10 bits per element. For 1 billion elements, that is ~1.2 GB instead of ~8 GB for a full HashMap.
  • If exact answers are required with infinite stream: You need to relax another constraint — either bound the window, shard across machines (distributed HashMap), or use external storage (LSM tree / sorted file with binary search).
Red flag answer: “Just use a HashMap” without addressing the infinite/memory constraint. Another red flag: not asking clarifying questions about false positive tolerance or window bounds — jumping straight into a solution for an ambiguous problem shows a lack of senior-level thinking.Follow-ups:
  1. What is the false positive rate of a Bloom filter and how do you tune it? (Answer: for m bits, k hash functions, and n inserted elements, the false positive rate is approximately (1 - e^(-kn/m))^k. You typically choose k = (m/n) * ln(2) to minimize it. Doubling the bit array halves the false positive rate roughly.)
  2. Could you use Count-Min Sketch instead of a Bloom filter here? Why or why not? (Answer: Count-Min Sketch estimates frequency counts, not membership. It would tell you approximately how many times target - x has appeared, which gives you more information than a Bloom filter but uses more space per element. Overkill for a simple “exists?” check, but useful if you also need to know how many valid pairs exist.)
Difficulty: IntermediateWhat interviewers are really testing: Key design is one of the most underrated HashMap skills. This problem has at least three valid key strategies, each with different time/space trade-offs. Interviewers want to see if you can analyze and compare approaches rather than just defaulting to the first one that works.Strong answer:
  • Approach 1: Sorted string as key. Sort each string alphabetically and use the sorted version as the key. "eat" and "tea" both become "aet". Time per string: O(k log k) where k is string length. Simple and readable.
  • Approach 2: Character frequency tuple as key. Count character frequencies and create a tuple like (1,0,0,...,1,0,...,1,...) for 26 letters. "eat" becomes (1,0,0,0,1,0,...,1,0,...). Time per string: O(k) — faster than sorting for long strings. But the key is larger in memory (26 integers vs. k characters).
  • Approach 3: Prime number product. Assign each letter a unique prime number (a=2, b=3, c=5, …) and multiply them. Anagrams produce the same product by the fundamental theorem of arithmetic. O(k) time. Gotcha: integer overflow for long strings. "z" maps to the 26th prime (101), and a 20-character string could overflow a 64-bit integer.
  • Which to choose? For interview problems with short strings (k under 20), sorting is cleanest and least error-prone. For production with long strings, frequency counting wins. The prime approach is clever but fragile — mention it to show depth, then explain why you would not use it in practice.
  • Overall complexity: O(n * k log k) for sorting approach, O(n * k) for frequency counting, where n is the number of strings.
Red flag answer: Only knowing the sorting approach and not considering alternatives. Using a mutable list as a dictionary key in Python (lists are unhashable — you must convert to a tuple). Not being able to explain why the sorted-string approach is correct (relies on the fact that anagrams are permutations).Follow-ups:
  1. What if strings contain Unicode characters, not just lowercase English letters? How does each approach change? (Answer: sorting still works perfectly. Frequency counting now needs a HashMap instead of a 26-element array, which increases overhead. The prime approach becomes impractical because you would need a unique prime for each Unicode code point.)
  2. If you had 10 million strings with average length 100, which approach would you use and why? (Answer: frequency counting with a tuple key. At length 100, sorting costs 100 * log(100) ~ 660 operations per string. Frequency counting costs exactly 100. At 10 million strings, that savings is significant — roughly 5.6 billion operations saved. Memory-wise both are comparable.)
Difficulty: IntermediateWhat interviewers are really testing: Amortized analysis. Many candidates see a loop inside a loop and say O(n^2). The key insight is that the inner loop only fires for sequence starts, and each element is visited at most twice total. If you cannot explain this, you cannot reason about the complexity of most real-world algorithms.Strong answer:
  • The trick is the if num - 1 not in num_set guard. This ensures the inner while loop only runs when num is the start of a consecutive sequence. If num - 1 exists, we skip it entirely because some earlier start will cover it.
  • Amortized argument: Every element in the array is touched at most twice: once in the outer for loop, and at most once by the while loop (as part of extending some sequence). No element is “re-visited” by multiple sequences because the guard ensures each element belongs to exactly one sequence expansion.
  • Concrete example: Array [100, 4, 200, 1, 3, 2]. The set is {1, 2, 3, 4, 100, 200}. Only 1, 100, and 200 pass the guard (0, 99, 199 are not in the set). The while loop runs: 4 times for the sequence starting at 1 (1->2->3->4), 1 time for 100, 1 time for 200. Total inner iterations: 6 = n. Total work: O(n) + O(n) = O(n).
  • This is the same amortized reasoning used for: two-pointer techniques (each pointer moves at most n times total), sliding window (each element added/removed once), and Union-Find with path compression.
for num in num_set:
    if num - 1 not in num_set:  # THIS is the key -- only start from sequence beginnings
        current = num
        length = 1
        while current + 1 in num_set:
            current += 1
            length += 1
Red flag answer: Saying the time complexity is O(n^2) because of the nested loop. Alternatively, saying it is O(n) but not being able to explain why — just asserting it because “the solution says so.” Both show a lack of understanding of amortized analysis.Follow-ups:
  1. Could you solve this with sorting instead? What would the complexity be and why might you still prefer the HashSet approach? (Answer: sorting gives O(n log n) time and O(1) extra space (or O(n) with a new sorted array). The HashSet approach is O(n) time but O(n) space. You prefer HashSet when n is large and you can afford the memory, or when the data is streaming in and you cannot sort it.)
  2. What if the array contains duplicates? Does your solution still work? (Answer: yes, because we use a HashSet which automatically deduplicates. Duplicate values do not create duplicate sequence starts or extend sequences incorrectly. This is actually a subtle advantage of converting to a set first.)
Difficulty: SeniorWhat interviewers are really testing: Judgment. Every junior engineer reaches for a HashMap. Senior engineers know its limitations. This question tests whether you understand the boundaries of the tool, not just its strengths. It also reveals whether you have hit real-world performance problems caused by HashMap misuse.Strong answer:
  • Situation 1: You need sorted/ordered traversal. If you need to iterate keys in order, find the closest key, or answer range queries like “all keys between 5 and 10”, a HashMap is the wrong choice. Use a TreeMap/BST (O(log n) ops but ordered) or a sorted array with binary search. Example: building a leaderboard where you need “find all players ranked 50-100” — a TreeMap gives you subMap(50, 100) in O(log n + k), while a HashMap would require sorting all keys every time.
  • Situation 2: Memory-constrained environments or very large datasets. HashMaps have significant overhead: each entry in Java’s HashMap allocates a Node object (32+ bytes of overhead beyond the key/value). For 100 million integer pairs, a HashMap uses ~3.2 GB while a flat array or open-addressing table uses ~800 MB. In embedded systems, caches, or when your dataset approaches available RAM, this 4x overhead matters. Example: mobile apps caching user data — an ArrayMap (Android) or flat arrays are often better.
  • Situation 3: Hash-flooding / adversarial input. If an attacker can control your keys (e.g., HTTP request parameters, user-supplied data), they can craft keys that all hash to the same bucket, degrading O(1) to O(n) and enabling denial-of-service. This has happened in production — the 2011 HashDoS attack affected PHP, Java, Python, and Ruby web frameworks simultaneously. Mitigations include randomized hash seeds (Python 3.3+, Ruby), using a cryptographic hash (SipHash), or switching to a balanced tree on collision (Java 8+ HashMap).
  • Bonus — Situation 4: Keys are small integers in a known range. If keys are integers from 0 to 10,000, a simple array arr[key] = value is faster (no hashing, no collision resolution, cache-friendly memory layout) and uses less memory. Example: counting character frequencies — use int[26] or int[128], not a HashMap.
Red flag answer: “I always use HashMap, it is always O(1).” Not mentioning hash collisions, memory overhead, or ordering limitations. Another red flag: not knowing about HashDoS or the adversarial input problem — this is a common production concern.Follow-ups:
  1. You mentioned Java 8 converts chains to trees. Does Python have a similar defense? (Answer: Python uses a different strategy — SipHash as its hash function since Python 3.4, with a random seed per process. This makes it computationally infeasible for an attacker to craft colliding keys without knowing the seed. Python also uses open addressing, so there are no chains to degrade — but clustering under many collisions still degrades performance.)
  2. In a system design interview, when would you choose Redis (essentially a remote HashMap) versus a B-tree database like PostgreSQL? (Answer: Redis when you need sub-millisecond key-value lookups with simple access patterns — session stores, rate limiters, caches. PostgreSQL when you need range queries, joins, transactions, or durable storage with complex query patterns. The HashMap vs. tree trade-off scales up to the infrastructure level.)
Difficulty: SeniorWhat interviewers are really testing: Production debugging instincts. This is not an algorithm question — it is a “have you actually operated systems that use HashMaps at scale?” question. It tests whether you can go from a symptom (slow lookups) to root causes systematically, and whether you know the common failure modes that do not show up in textbooks.Strong answer:
  • Cause 1: Hash collisions creating long chains. If many keys hash to the same bucket, lookups degrade from O(1) to O(n). Check your key distribution. In Java, add logging to measure bucket chain lengths. Common culprit: a custom hashCode() that returns the same value for many objects (e.g., return 0; or hashing only on a low-cardinality field like status).
  • Cause 2: Frequent resizing. If the HashMap is initialized with a small capacity and receives millions of insertions, it resizes repeatedly (doubling and rehashing everything). Each resize is O(n). If you know the approximate size upfront, initialize with new HashMap<>(expectedSize * 4/3) to avoid resizing entirely.
  • Cause 3: GC pauses from HashMap churn. If you are creating and discarding millions of HashMap entries, the garbage collector is doing massive work. In Java, each put allocates a Node object, and each resize creates a new array and orphans the old one. Profile with GC logs (-verbose:gc) and look for long stop-the-world pauses during HashMap operations.
  • Cause 4: Expensive hashCode() or equals(). If your key is a complex object with a slow hashCode() (e.g., hashing a large byte array or a deeply nested object), every single lookup triggers this computation. Profile the hashCode() method directly. Fix: cache the hash code in the object or use a cheaper hash.
  • Cause 5: Thread-safety issues. If a HashMap (not ConcurrentHashMap) is accessed by multiple threads without synchronization, the internal array can get corrupted. In older Java versions, this could cause an infinite loop during get() due to a cycle in the chain. In newer versions, it causes data loss or ConcurrentModificationException. Check for multi-threaded access to a non-synchronized HashMap.
  • Debugging approach: Measure, do not guess. Use a profiler (async-profiler, JFR) to see where time is spent. Check bucket distribution via reflection or custom instrumentation. Look at GC logs. Check thread contention.
Red flag answer: Only mentioning hash collisions and stopping there. Not mentioning GC, resizing, or thread-safety. Another red flag: suggesting “just switch to ConcurrentHashMap” without diagnosing the root cause first.Follow-ups:
  1. How would you detect the infinite-loop bug caused by concurrent HashMap access in production? (Answer: the thread gets stuck in get() and the CPU spikes. A thread dump shows the thread spinning in HashMap.get() traversing a cycle. The fix: switch to ConcurrentHashMap or add synchronization. Prevention: static analysis tools like ErrorProne can flag unsynchronized HashMap usage.)
  2. If you need to store 500 million key-value pairs in memory, what alternatives to a standard HashMap would you consider? (Answer: off-heap solutions like Chronicle Map (Java) or memory-mapped files to avoid GC pressure. Open-addressing tables like Eclipse Collections’ primitive maps or Koloboke reduce per-entry overhead. If keys are integers, a flat array is ideal. At this scale, even the choice between chaining and open addressing matters for cache-line utilization.)
Difficulty: IntermediateWhat interviewers are really testing: Whether you understand that “map” is an interface with multiple implementations, each optimized for different access patterns. This tests breadth of knowledge and — more importantly — the ability to select the right tool based on requirements rather than defaulting to the most common one.Strong answer:
  • HashMap: O(1) average for get/put, no ordering guarantees. Use when you only need fast key-value lookups and do not care about iteration order. Scenario: building a frequency counter for words in a document. You only care about counts, not the order you encounter words.
  • TreeMap (Red-Black Tree): O(log n) for get/put, keys maintained in sorted order. Supports firstKey(), lastKey(), subMap(from, to), floorKey(), ceilingKey(). Use when you need sorted traversal or range queries. Scenario: implementing a stock price tracker where you need “find the closest price to 150"or"listallpricesbetween150" or "list all prices between 100 and $200” — ceilingKey(100) and subMap(100, 200) give you this in O(log n).
  • LinkedHashMap: O(1) average for get/put, maintains insertion order (or optionally access order). Use when you need HashMap speed but also need predictable iteration order. Scenario: implementing an LRU cache in Java — LinkedHashMap with accessOrder=true and overriding removeEldestEntry() gives you a working LRU cache in about 5 lines. Also useful for serialization where you want JSON keys in a consistent order.
FeatureHashMapTreeMapLinkedHashMap
Get/PutO(1) avgO(log n)O(1) avg
OrderingNoneSorted by keyInsertion or access order
Null keys1 allowedNot allowed (comparison fails)1 allowed
MemoryLowestHigher (tree nodes)Higher (linked list pointers)
Best forPure lookupRange queries, sortedOrdered iteration, LRU
Red flag answer: Not knowing TreeMap or LinkedHashMap exist. Saying “always use HashMap because it is fastest” without considering ordered access patterns. Another red flag: confusing TreeMap’s sorted-by-key with LinkedHashMap’s insertion-order.Follow-ups:
  1. In Python, dict maintains insertion order since 3.7. Does that make it equivalent to Java’s LinkedHashMap? (Answer: partially. Python’s dict preserves insertion order, similar to LinkedHashMap’s default mode. But it does NOT support access-order mode, so you cannot use it as an LRU cache directly. For that, use collections.OrderedDict which has move_to_end().)
  2. What if you need both O(1) lookup by key AND sorted iteration? Can you have both? (Answer: not in a single data structure. You would maintain both a HashMap and a TreeMap (or a sorted list), updating both on insert/delete. This is a common pattern in order books for trading systems — a HashMap for O(1) price lookup and a TreeMap for sorted price levels. The trade-off is double the memory and insert cost.)

Interview Q&A Rapid-Fire (Rich Format)

The four problems below are the ones interviewers love to use as opening rounds. For each one we give a strong answer framework, a real LeetCode example with full code, three senior follow-ups, common wrong answers, and further reading.
Strong Answer Framework:
  1. State the brute force first. “Brute force is O(n^2) — nested loop checking every pair. We can do better.”
  2. Identify the trade-off. “Trading O(n) space for O(n) time using a HashMap of value-to-index.”
  3. Walk through one pass. For each element, compute complement = target - num. If complement is in the map, return both indices; otherwise store current.
  4. Justify the store-after-check ordering. “We check before storing so the same element cannot be used twice.”
  5. Handle edge cases. Duplicates with target equal to 2 * value — works because we store-then-check in order.
Real LeetCode Example — LC 1: Two Sum
def two_sum(nums, target):
    seen = {}                         # value -> index
    for i, num in enumerate(nums):
        complement = target - num
        if complement in seen:
            return [seen[complement], i]
        seen[num] = i                 # store AFTER check
    return []
Time O(n), space O(n). One pass.
Senior Follow-up 1: “What if the input is sorted?” — Switch to two pointers (one at each end, move based on sum vs target). Now O(n) time and O(1) space — strictly better. This is LC 167 (Two Sum II - Input Array Is Sorted). Mention sorting destroys index info, so if the question asks for original indices, you would need to track them.
Senior Follow-up 2: “What if duplicates are allowed and you need ALL pairs?” — Switch the map’s value type from int to List<int> of indices. On a hit, append every stored index for that complement. Time stays O(n), space is O(n) for the result.
Senior Follow-up 3: “What if the input is an infinite stream?” — You cannot store everything. Options: (a) sliding window of size N for a “recent pair” check; (b) Bloom filter for approximate membership at constant memory cost; (c) shard across machines with consistent hashing. Discussing trade-offs (false positives vs memory bound) is the senior signal.
Common Wrong Answers:
  • “Sort and use two pointers” when the problem asks for original indices. Sorting destroys index information.
  • “Store before checking the complement” — breaks the “no same element twice” constraint when target = 2 * num.
  • “Brute force is fine, it is O(n^2)” — correct but lazy. The interviewer is testing whether you can pivot to the optimized solution.
Further Reading:
Strong Answer Framework:
  1. Frame as bucketing. “I need to put strings into buckets such that anagrams land in the same bucket. The HashMap key is the bucket label.”
  2. Compare three key strategies. Sorted string (O(k log k) per string, simple). Frequency tuple (O(k), faster for long strings). Prime-product (O(k), but overflow-prone — mention to show depth, then dismiss).
  3. Pick the sorted-string approach for clarity unless asked to optimize.
  4. Code it concisely with defaultdict(list).
  5. Acknowledge the Unicode caveat. Sorted-string still works; 26-array does not.
Real LeetCode Example — LC 49: Group Anagrams
from collections import defaultdict

def group_anagrams(strs):
    groups = defaultdict(list)
    for s in strs:
        key = tuple(sorted(s))            # tuple is hashable; list is not
        groups[key].append(s)
    return list(groups.values())
Time O(n * k log k), space O(n * k). The tuple(sorted(s)) is hashable — using a list directly raises TypeError: unhashable type: 'list'.
Senior Follow-up 1: “What if strings are very long, say 10000 characters each?” — Switch to the frequency-tuple key: build a [0] * 26 count, then tuple(count). O(k) per string instead of O(k log k). At k = 10000, that is 10000 vs 130000 operations per string — 13x faster.
Senior Follow-up 2: “What about Unicode (CJK characters, emoji)?” — The 26-array approach fails. Sorted-string still works (Python sorts code points). For frequency, use a Counter and convert to a frozenset of (char, count) tuples for the key.
Senior Follow-up 3: “How would you parallelize this for a 100M-string corpus?” — Map-reduce. Map step: each worker computes the canonical key for its strings. Shuffle by key. Reduce step: workers receive all strings sharing a key and emit them together. Hadoop’s word-count generalization.
Common Wrong Answers:
  • “Use a list as the key” — TypeError: unhashable type: 'list'. Lists are mutable, hence unhashable.
  • “Compare every pair to detect anagrams” — O(n^2 * k log k); fails the size constraint.
  • “Use a prime-product key without mentioning overflow” — works for short strings but silently fails on long ones.
Further Reading:
Strong Answer Framework:
  1. State the brute force. “O(n^2) — enumerate every subarray and sum it. Too slow for large inputs.”
  2. Introduce the prefix-sum identity. sum(i..j) = prefix[j] - prefix[i-1]. So we want pairs (i, j) with prefix[j] - prefix[i-1] = k, i.e., prefix[i-1] = prefix[j] - k.
  3. Apply the Two Sum trick. As we compute prefix sums, for each new prefix, ask: how many earlier prefixes equal prefix - k? Each such match defines a valid subarray ending here.
  4. Justify the {0: 1} initialization. “Handles subarrays starting at index 0. Without it, we miss every subarray whose prefix at some point equals exactly k.”
  5. Note the negative-numbers gotcha. Sliding window does not work because the running sum is not monotonic; prefix-sum + map handles negatives natively.
Real LeetCode Example — LC 560: Subarray Sum Equals K
def subarray_sum(nums, k):
    sum_count = {0: 1}                    # CRITICAL: handles prefix == k
    prefix = total = 0
    for num in nums:
        prefix += num
        total += sum_count.get(prefix - k, 0)
        sum_count[prefix] = sum_count.get(prefix, 0) + 1
    return total
Time O(n), space O(n).
Senior Follow-up 1: “Now find the LONGEST subarray with sum exactly K instead of counting.” — Change the map from prefix -> count to prefix -> first_index_seen. When you find prefix - k in the map, the length is current_index - stored_index. Keep only the first occurrence to maximize length.
Senior Follow-up 2: “What if K = 0?” — Works perfectly. You are now looking for two positions with the same prefix sum (their difference is 0). Common variant: LC 525 (Contiguous Array) — find longest subarray with equal 0s and 1s, by treating 0 as -1 and looking for prefix-sum 0.
Senior Follow-up 3: “Why does sliding window fail here, even though K = 0 sounds simple?” — Sliding window assumes the running sum is monotonic with respect to expanding/shrinking. With negatives, growing the window can decrease the sum. Prefix-sum + map sidesteps this because we are not “shrinking” — we are looking up across all earlier positions.
Common Wrong Answers:
  • “Use sliding window” — breaks on negative numbers.
  • “Forget {0: 1}” — silently under-counts; the most common bug in this problem.
  • “Brute force with two nested loops” — O(n^2), fails the constraint at n = 20000.
Further Reading:
Strong Answer Framework:
  1. State the easy O(n log n) approach. Sort, then scan. Acknowledge it works but the problem requires O(n).
  2. Introduce the HashSet trick. Convert array to set for O(1) lookups.
  3. Add the “only-from-head” guard. For each number, only attempt to count if n - 1 is NOT in the set — this guarantees we only walk each consecutive streak from its leftmost element.
  4. Walk forward. Once at a head, increment as long as n + 1 is in the set.
  5. Prove the O(n) amortization. Each number is visited once in the outer loop and at most once during a forward walk — 2n operations total. Without the guard, we re-walk streaks from every position and the algorithm degrades to O(n^2).
Real LeetCode Example — LC 128: Longest Consecutive Sequence
def longest_consecutive(nums):
    num_set = set(nums)
    longest = 0
    for n in num_set:
        if n - 1 not in num_set:          # only start at streak heads
            curr, length = n, 1
            while curr + 1 in num_set:
                curr += 1
                length += 1
            longest = max(longest, length)
    return longest
Time O(n), space O(n). The guard is what makes this O(n) and not O(n^2).
Senior Follow-up 1: “Why is this O(n) and not O(n^2) — there is a while loop inside a for loop?” — Amortized analysis. The while loop only runs from streak heads. Each element is visited at most twice across the whole run: once in the outer loop and at most once during a forward walk. Total operations bounded by 2n.
Senior Follow-up 2: “What if the array contains duplicates?” — Works automatically because we convert to a set first, which deduplicates. Duplicates do not extend or break streaks — they are simply absorbed.
Senior Follow-up 3: “How would you adapt this if you need the actual sequence, not just the length?” — Track (start, length) of the best streak so far instead of just the length. On a new best, record the start. After the loop, return range(start, start + length) or equivalent.
Common Wrong Answers:
  • “Sort and scan” — correct but O(n log n), violates the constraint.
  • “Walk forward from every element” — without the guard, this is O(n^2). This is the most common implementation bug.
  • “Use a Union-Find” — works but overkill; O(n * alpha(n)) which is essentially O(n) but with much higher constants.
Further Reading: