Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
What is Trie?
Trie (pronounced “try,” derived from retrieval) is a tree-like data structure for storing strings where each edge represents a character and each path from root to a marked node represents a complete word. Think of it like a phone book organized as a decision tree: to look up “cat,” you go to the ‘c’ branch, then the ‘a’ branch, then the ‘t’ branch. Every word that starts with “ca” shares the same first two edges, so shared prefixes are stored only once. This prefix-sharing property is what makes tries special. A HashMap can tell you “does this word exist?” in O(1), but it cannot efficiently answer “what words start with ‘pre’?” — that requires scanning every key. A trie answers prefix queries in O(m) where m is the prefix length, regardless of how many words are stored. This makes tries the backbone of autocomplete systems, spell checkers, IP routing tables, and any application where prefix-based lookup matters. How it differs from a binary search tree: In a BST, each node stores a complete key and you compare keys at each level. In a trie, each node stores a single character (or edge label), and the path from root to node is the key. This means all keys with a common prefix share the same initial path, saving both time and space.Pattern Recognition Cheatsheet (LeetCode Triggers)
Pattern Recognition Checklist
Use Trie When
- Prefix-based search (autocomplete)
- Word dictionary with wildcard search
- Word search in 2D grid (multiple words)
- Longest common prefix
- Spell checking
- IP routing (binary tries)
Don't Use When
- Exact match only (use HashSet)
- Single word search (use HashMap)
- Memory is very constrained
- Small dictionary (overhead not worth it)
Trie Complexity Reference
| Operation | Time | Space |
|---|---|---|
| Insert | O(m) | O(m) per word |
| Search | O(m) | - |
| Prefix search | O(m) | - |
| Delete | O(m) | - |
| Build trie | O(n * m) | O(n * m * 26) worst case |
When to Use
Autocomplete
Spell Checker
Word Search
Longest Prefix
Basic Implementation
Every trie node holds two things: a map of children (character to child node) and a booleanis_end marking whether a complete word terminates at this node. The root is an empty node representing the empty prefix. This distinction between search (requires is_end == True) and startsWith (only requires the path to exist) is the single most common source of bugs in trie implementations.
Children representation trade-off: A HashMap gives O(1) average lookup and handles arbitrary character sets (Unicode). A fixed array of 26 gives O(1) worst-case and better cache locality, but wastes memory when the trie is sparse. In an interview, HashMap is the safer default unless you are told the charset is lowercase English only.
Trie Templates You Should Have Memorized
Three templates that cover roughly 80 percent of trie LeetCode. Type them from blank in under three minutes each.Template 1: TrieNode + Insert + Search + StartsWith
The reference implementation. LC 208 in canonical form.Template 2: Count Words and Count Prefixes
A common variant (LC 1804, LC 1023): each node stores the number of words ending here and the number of words passing through here. Decrement on erase, query in O(m).pass_count tells you how many inserted words have the current node on their path, which is exactly the answer for startsWith-with-count queries.
Template 3: Wildcard Search (DFS Branching on ’.’)
The LC 211 pattern. The standard trie walk is iterative; the moment you hit a wildcard you must recurse and branch."a.b.c" is fast; "....." is not.
Pattern Variations
1. Word Search II (Backtracking + Trie)
This is the classic hard trie problem and a FAANG favorite. The brute-force approach searches for each word independently via DFS, costing O(words * rows * cols * 4^L). A trie lets you search for all words simultaneously in a single DFS pass: at each cell, you check whether the current path exists in the trie and prune immediately if not. This can be 100x faster for large word lists. Critical optimization: After finding a word, setis_end = False on that node (or prune the node entirely if childless). Without this, the same word can be found from multiple starting cells, and the trie never shrinks.
Complexity: O(rows * cols * 4^L) worst case, but trie pruning makes real performance dramatically better.
2. Autocomplete System
A system design + data structures hybrid. The trie handles prefix matching while frequency counts enable ranking. The statefulinput() method accumulates characters across calls — each keystroke narrows the prefix and refines suggestions. When the user types #, the current sentence is saved.
Performance insight: The naive approach (DFS on every keystroke) is fine for small dictionaries. For production-scale autocomplete (millions of sentences), precompute and store the top-3 results at each trie node during insertion, making each input() call O(prefix_length) instead of O(prefix_length + all_sentences_under_prefix).
Classic Problems
| Problem | Pattern | Key Insight |
|---|---|---|
| Implement Trie | Basic ops | HashMap children |
| Word Search II | Trie + DFS | Prune with trie |
| Autocomplete | Trie + Heap | Top-k by frequency |
| Longest Word | BFS/DFS | Check prefixes exist |
| Word Dictionary | Trie + ’.’ | DFS for wildcards |
Interview Problems by Company
- Medium
- Hard
| Problem | Company | Key Concept |
|---|---|---|
| Implement Trie | All FAANG | Basic trie ops |
| Word Dictionary | Meta, Amazon | Wildcard DFS |
| Replace Words | Amazon | Prefix replacement |
| Map Sum Pairs | Trie with values |
Interview Tips
How to Explain Your Approach
How to Explain Your Approach
- “I’ll use a Trie for efficient prefix-based operations.”
- “Each node has children (HashMap or array of 26).”
- “I’ll mark end-of-word with a boolean flag.”
- “Insert and search are both O(word length).”
- “For Word Search II, I’ll combine Trie with DFS backtracking.”
Trie Node Choices
Trie Node Choices
| Implementation | Pros | Cons |
|---|---|---|
| HashMap children | Flexible, sparse-friendly | Slightly slower |
| Array[26] | Faster access | Wastes memory for sparse |
| With count | Supports frequency | Extra space |
| With word | Store full word at end | Easier retrieval |
Common Optimizations
Common Optimizations
- Remove word from trie after finding (Word Search II)
- Store word at end node instead of just boolean
- Track count for frequency-based problems
- Prune empty branches after deletion
Practice Problems
Implement Trie
Word Search II
Design Search Autocomplete
Replace Words
Practice Roadmap
Day 1: Basic Trie
- Solve: Implement Trie, Longest Common Prefix
- Focus: Insert, search, startsWith operations
Day 2: Pattern Matching
- Solve: Word Dictionary, Replace Words
- Focus: Wildcard handling, prefix matching
Worked LeetCode Problems
Five canonical trie problems, each walked through end to end. The point is not just the answer but the decision — why a trie, what each node carries, what the termination rule is, and which trap to avoid.LC 208: Implement Trie (Prefix Tree) (Medium)
Decision: Standard trie with hash-map children. Use Template 1 verbatim. The interview hook is articulating the difference betweensearch (requires is_end) and startsWith (path exists, regardless of is_end).
See Template 1 above for the full code. A single subtle bug to avoid: do not return True from search just because the walk succeeded — you must check is_end on the final node. About 30 percent of candidates miss this on first attempt.
LC 211: Design Add and Search Words (Medium)
Decision: Standard trie + recursive DFS for the wildcard. Template 3 verbatim. See Template 3 above. The interview signal here is recognizing that an iterative walk works for plain queries but the wildcard forces recursion the moment you hit a dot. State the worst-case complexity out loud:O(26^m) for a pure-dot query of length m. Mention that real tries are sparse so the practical cost is closer to O(b^m) for branching factor b around 4 to 5 on English text.
LC 212: Word Search II (Hard)
Decision: Build a trie from the word list, then DFS from every cell, walking the trie in lockstep with the grid. The trie prunes catastrophic branches.LC 14: Longest Common Prefix (Easy)
Decision: A trie is overkill for the textbook two-pointer or vertical-scan solution, but it is the right answer when the same set of words is queried repeatedly or when the words arrive as a stream.LC 648: Replace Words (Medium)
Decision: Build a trie from the dictionary roots. For each word in the sentence, walk the trie and return at the firstis_end — that is the shortest root.
is_end” is the whole interview signal. Compare with longest prefix matching (used in IP routing) where you continue past is_end and remember the most recent end seen — different traversal rule, same trie.
Caveats and Traps
Curated LeetCode List
A focused trie grind list. Each problem teaches a distinct lesson; do not skip the “why this is here” column.| LC # | Problem | Difficulty | Why this problem |
|---|---|---|---|
| 14 | Longest Common Prefix | Easy | Trie is overkill but conceptually clean; learn the two-pointer alternative |
| 720 | Longest Word in Dictionary | Easy | Trie + DFS with the “every prefix must be a word” constraint |
| 1268 | Search Suggestions System | Medium | Real-world autocomplete; combine trie with sorted top-3 storage |
| 208 | Implement Trie | Medium | The reference implementation; if you cannot do this in 5 minutes you are not ready |
| 211 | Design Add and Search Words | Medium | Wildcard DFS; the canonical “trie + recursion” pattern |
| 648 | Replace Words | Medium | ”Stop at first is_end” — shortest-root traversal |
| 677 | Map Sum Pairs | Medium | Trie that aggregates an integer along the path; foundation for autocomplete-with-counts |
| 1023 | Camelcase Matching | Medium | Wildcard-like matching with a constraint; tests adapting the basic walk |
| 212 | Word Search II | Hard | The trie + DFS classic; pruning and dedup are senior signals |
| 642 | Design Search Autocomplete System | Hard | Stateful input across calls; ranking by frequency then lexicographic |
| 472 | Concatenated Words | Hard | Trie + DP; recognize that every prefix that exists is a recursion candidate |
| 421 | Maximum XOR of Two Numbers | Hard | Binary trie; greedy “take the opposite bit” traversal |
| 1707 | Maximum XOR With an Element From Array | Hard | Offline trie + sorting; queries answered as you build the trie |
| 745 | Prefix and Suffix Search | Hard | Two tries (one for prefix, one for reversed suffix) joined by a weight |
| 336 | Palindrome Pairs | Hard | Trie + palindrome checks; the most fiddly but rewarding trie problem |
Interview Questions
Q1: Implement a Trie that supports insert, search, and startsWith. Walk me through the data structure and each operation.
Q1: Implement a Trie that supports insert, search, and startsWith. Walk me through the data structure and each operation.
- Node structure. “Each TrieNode holds a
childrenmap (character to child node) and anis_endboolean marking whether a complete word terminates here. The root is an empty node — it represents the empty prefix.” - Insert O(m). Walk character by character. If the child for the current character does not exist, create a new node. Move to the child. After the last character, set
is_end = True. You are literally building a path through the tree for each word. - Search vs startsWith — the critical distinction. Both traverse the trie identically. The difference:
searchreturnsTrueonly if the final node hasis_end = True(complete word), whilestartsWithreturnsTrueas long as the traversal does not fail (the prefix path exists, regardless ofis_end). This is the single most common bug — candidates forget theis_endcheck in search and treat every prefix as a word. - Children representation trade-off. HashMap gives O(1) average lookup and handles arbitrary character sets (Unicode, not just a-z). A fixed array of 26 gives O(1) worst-case and better cache locality, but wastes memory when the trie is sparse. In an interview, HashMap is safer unless told the charset is lowercase English only.
- Complexity: O(m) time for all three operations where m is the word/prefix length. O(n * m) total space in the worst case where n is the number of words and m is the average length, though shared prefixes reduce this significantly in practice.
is_end in the node. Confusing search and startsWith behavior. Using recursion for insert/search when iteration is simpler and avoids stack depth issues. Not being able to articulate the space trade-off between HashMap and array children.Follow-ups:- How would you implement
deletefor a trie? What makes it trickier than insert? (Answer: you need to remove nodes bottom-up, but only if they have no other children and are not the end of another word. A recursive approach works naturally: delete the character path, and on the way back up, remove nodes that are now childless and not word-endings. The tricky part is not accidentally deleting shared prefixes.) - If your trie stores 10 million English words, roughly how much memory does it use with HashMap children versus array-of-26 children? (Answer: with HashMap, each node is roughly 50-80 bytes depending on the language runtime — a HashMap object plus overhead. With array-of-26, each node is 26 pointers which is about 208 bytes on a 64-bit system, but access is faster. For 10M words averaging 8 characters with, say, 30% prefix sharing, you are looking at roughly 56M nodes. HashMap approach: ~3-4 GB. Array approach: ~11 GB. This is why compressed tries and radix trees exist for production use cases.)
Q2: Given a list of words, find the longest word that can be built one character at a time by other words in the list. For example, given ['a', 'ap', 'app', 'appl', 'apple', 'bat'], return 'apple' because a -> ap -> app -> appl -> apple are all valid words.
Q2: Given a list of words, find the longest word that can be built one character at a time by other words in the list. For example, given ['a', 'ap', 'app', 'appl', 'apple', 'bat'], return 'apple' because a -> ap -> app -> appl -> apple are all valid words.
is_end creates a secondary constraint on top of basic trie traversal.Strong Answer:- Key insight. “The answer is the longest word where every single prefix (length 1, 2, …, k) is also a word in the list. This is not just longest common prefix — it is longest chain of valid words.”
- Approach with Trie + BFS/DFS. Insert all words into a trie. Then do BFS (or DFS) starting from the root, but only visit child nodes where
is_end = True. This enforces the constraint that every prefix must be a complete word. Track the longest path found. - Why BFS is cleaner here. BFS naturally explores shorter words before longer ones. If we process level by level, we can guarantee that when we reach a node, all its prefix ancestors were valid words. Among words of the same length, BFS gives us lexicographic order for free if we process children in sorted order.
- Alternative: sort + HashSet. Sort words by length (then lexicographically). For each word, check if the word minus its last character exists in the set of already-validated words. If yes, add it to the set and update the answer. This is O(n * m * log n) but simpler to code. The trie approach is O(n * m) and more interview-impressive.
- Edge case. If no single-character word exists in the list, the answer is empty — you cannot start the chain.
- What if the problem asks for the longest word that can be formed by concatenating other words in the list (like “catdog” from “cat” + “dog”)? How does the approach change? (Answer: this becomes the Concatenated Words problem. You need trie + DP: for each word, check if it can be split into two or more valid words using a DP pass where you try every prefix that exists in the trie and recursively check the suffix.)
- How would you handle it if the word list is streaming in — you do not have all words upfront? (Answer: insert each word as it arrives. After each insertion, re-check whether the newly inserted word extends any existing chain. Maintain a running “longest chain” answer. The trie makes this efficient because you only need to check whether the new word’s prefix of length
m-1was already a valid chain endpoint.)
Q3: Design a data structure that supports adding words and searching for words where the search query can contain the dot character '.' which matches any single letter. This is LeetCode 211 -- Word Dictionary.
Q3: Design a data structure that supports adding words and searching for words where the search query can contain the dot character '.' which matches any single letter. This is LeetCode 211 -- Word Dictionary.
'.' wildcard forces you to branch into all children at that position, turning a single O(m) traversal into a DFS/backtracking problem. They want to see if you understand where the complexity changes and whether your recursive thinking is clean.Strong Answer:- Insert is unchanged. Standard trie insert — the wildcard only affects search.
- Search with
'.'requires branching. When you hit a'.'in the query, you cannot follow a single child. You must try every existing child at that position and returnTrueif any branch leads to a match. This is essentially DFS/backtracking on the trie. - Implementation. Use a recursive
_search(word, index, node)helper. Ifword[index]is a letter, follow that specific child. If it is'.', iterate over all children and recurse on each. Base case: ifindex == len(word), returnnode.is_end. - Complexity analysis. Worst case for a query like
"..."(all dots, length m) is O(26^m) because every path in the trie is explored. In practice, it is much faster because real tries are sparse — most nodes have far fewer than 26 children. For a trie built from an English dictionary, average branching factor is around 4-5, so the real cost is closer to O(5^m). - Why not regex? You could compile the pattern to a regex, but that has its own overhead. The trie-based DFS is more direct and avoids regex engine complexity. Plus, the trie gives you prefix pruning for free.
- What if the wildcard is
'*'meaning zero or more of any character, not just one? How does the complexity change? (Answer: this is dramatically harder. At each'*', you must try matching zero characters (skip the star), one character (advance in trie, keep the star), or one character (advance in trie, advance past the star). This is similar to regex matching and the complexity becomes O(n * m) with memoization where n is the trie depth and m is the pattern length.) - If you are getting millions of search queries per second, most without any wildcards, how would you optimize? (Answer: check upfront whether the query contains any dots. If not, use the standard O(m) trie search — no recursion needed. Only fall back to the DFS path for wildcard queries. This handles the common case in O(m) and reserves the expensive path for rare queries. You could also maintain a parallel HashMap for exact-match lookups.)
Q4: You are given a 2D board of characters and a list of words. Find all words that exist in the board where each word can be formed by sequentially adjacent cells (up, down, left, right) without reusing a cell. This is Word Search II -- LeetCode 212.
Q4: You are given a 2D board of characters and a list of words. Find all words that exist in the board where each word can be formed by sequentially adjacent cells (up, down, left, right) without reusing a cell. This is Word Search II -- LeetCode 212.
- Why trie beats brute force. “Searching each word independently with DFS is O(words * rows * cols * 4^L) where L is word length. With a trie, we do DFS once from each cell and simultaneously check all words. The trie lets us prune: if no word in the dictionary starts with the current path, we backtrack immediately.”
- Algorithm. Build a trie from the word list. For each cell
(r, c), start a DFS passing the trie root. At each step, check ifboard[r][c]exists as a child in the current trie node. If not, prune (return immediately). If yes, move to that child node. If the child hasis_end = True, add the word to results. - Visited tracking. Mark the current cell (e.g., replace with
'#') before recursing into neighbors, then restore it after. This prevents reusing a cell in the same path. - Critical optimization: remove found words. After finding a word, set
is_end = Falseon that node to avoid adding duplicates. Better yet, prune the trie node if it has no remaining children — this progressively shrinks the trie and speeds up subsequent DFS calls. Without this, the same word can be found from multiple starting cells and the trie never gets smaller. - Complexity: O(rows * cols * 4^L) worst case, but trie pruning makes real performance dramatically better. Space is O(n * m) for the trie where n = number of words, m = average length.
- Your solution is too slow for a 50x50 board with 30,000 words. What optimizations can you apply beyond basic trie pruning? (Answer: (a) Store the complete word at the end node instead of reconstructing it from the path — saves string concatenation overhead. (b) Prune leaf nodes from the trie as words are found, so future DFS calls hit dead ends faster. (c) Start DFS only from cells whose character exists as a direct child of the root — skip cells that cannot begin any word. (d) If many words share long prefixes, the trie already amortizes the search, but removing exhausted branches is the biggest win.)
- How would you parallelize this across multiple cores? (Answer: each starting cell
(r, c)is an independent DFS, so you could partition the grid into regions and run DFS from each region’s starting cells in parallel. The tricky part is the shared trie — concurrent reads are fine, but if you are pruning found words, you need synchronization. A practical approach: use a thread-local results set and merge at the end, with a concurrent set for deduplication.)
Q5: You are building an autocomplete system. Given a set of sentences with their historical frequencies, implement a function that takes one character at a time and returns the top 3 most frequent sentences matching the current prefix. When the user types '#', the current sentence is saved.
Q5: You are building an autocomplete system. Given a set of sentences with their historical frequencies, implement a function that takes one character at a time and returns the top 3 most frequent sentences matching the current prefix. When the user types '#', the current sentence is saved.
- Data structure. “Trie where each leaf (end-of-sentence node) stores a frequency count. The trie handles prefix matching, and we use sorting or a heap for top-K retrieval.”
- Input handling. Maintain a
current_inputstring across calls. Eachinput(c)appends c tocurrent_inputand searches the trie. Ifc == '#', insertcurrent_inputinto the trie with count +1 and reset. - Search. Traverse the trie to the node matching
current_input. Then DFS from that node to collect all sentence endings. Sort by frequency descending, then lexicographically ascending. Return top 3. - Optimization: precomputed top-3 at each node. Instead of DFS on every keystroke, store the top-3 results at every node in the trie during insertion. Each
input()call becomes O(1) lookup. The trade-off: insertion becomes more expensive because you must update top-3 lists along the insertion path, and memory increases. But reads become O(prefix_length) instead of O(prefix_length + total_sentences_under_prefix). - Complexity without optimization: O(p + n * m * log(n * m)) per input call where p = prefix length, n*m = total characters under the prefix node. With precomputed top-3: O(p) per call, O(p * n) for inserts.
- In production, your trie has 100 million sentences. DFS on every keystroke is too slow. How do you optimize beyond precomputed top-3? (Answer: store a min-heap of top-K results at each internal node, updated during insertion. Alternatively, use a “hot path” approach: for the first 1-2 characters, precompute results and cache them (the branching is small). For deeper prefixes, the number of matching sentences drops rapidly, making DFS fast. You can also shard the trie by first character across multiple machines.)
- How would you handle the trie growing too large for memory? (Answer: several approaches. (a) LRU eviction of infrequently accessed subtrees. (b) Move to a disk-backed trie using memory-mapped files or a B-trie structure. (c) Use a compressed trie (radix tree/Patricia trie) where single-child chains are collapsed into a single node with a string label instead of one node per character — this alone can reduce node count by 50-70% for natural language data.)
Q6: What is the difference between a standard trie, a compressed trie (radix tree / Patricia trie), and a ternary search tree? When would you pick each one?
Q6: What is the difference between a standard trie, a compressed trie (radix tree / Patricia trie), and a ternary search tree? When would you pick each one?
- Standard Trie. One node per character, children as HashMap or array. Simple to implement, O(m) operations. Downside: massive memory overhead when many nodes have only one child (think long unique suffixes like “internationalization” — 20 nodes for a single chain). Good for: interview problems, small dictionaries, when implementation simplicity matters.
- Compressed Trie (Radix Tree / Patricia Trie). Merges single-child chains into one node with a string label. “inter” + “national” + “ization” becomes three nodes instead of twenty. Same O(m) time complexity but dramatically less memory — typically 50-70% fewer nodes for natural language data. Used in: Linux kernel routing tables (
radix_tree), Redis key storage, HTTP routers (many Go routers likehttprouteruse radix trees). Downside: insertion requires splitting nodes, which adds code complexity. - Ternary Search Tree (TST). Each node has three children: less-than, equal-to, greater-than (like a BST at each level). Much more memory-efficient than a standard trie because you only allocate nodes for characters that actually exist — no empty slots. O(m * log k) search where k is the alphabet size at each level. Used when the alphabet is large (Unicode) and memory is critical. Downside: slower than a trie for small alphabets; benefits diminish when most of the 26 letters are used at each level.
- Decision framework. Interview or small dataset: standard trie. Production with English text: radix tree. Large or variable alphabet (e.g., URL routing, Unicode): radix tree or TST depending on whether you need prefix traversal (radix) or balanced lookups (TST).
- How does a radix tree handle insertion when you need to split an existing compressed edge? Walk me through the mechanics. (Answer: suppose a node has edge label “apple” and you insert “app”. You split the “apple” edge into “app” (which becomes a node with
is_end = True) and “le” (a child of that new node pointing to the original terminal node). The split is O(m) and involves creating one new node and adjusting string labels.) - Why do most HTTP routers use radix trees instead of hash maps for route matching? (Answer: HTTP routes have inherent prefix structure (
/api/v1/users,/api/v1/orders). A radix tree shares the/api/v1/prefix and supports parameterized routes like/users/:idnaturally as wildcard nodes. A flat HashMap cannot match parameterized routes, would need separate entries for every possible URL, and cannot support prefix-based middleware attachment.)
Q7: Given an array of integers, find the maximum XOR of any two elements. How would you use a trie for this?
Q7: Given an array of integers, find the maximum XOR of any two elements. How would you use a trie for this?
- Key insight. “XOR is maximized when bits differ. For each number, I want to find the number in the array that differs from it at the most significant bits possible. A binary trie lets me make the greedy choice at each bit level.”
- Binary trie construction. Insert each number as a 32-bit (or 64-bit) binary string from the most significant bit to the least significant bit. Each node has at most two children: 0 and 1. This is a trie over a binary alphabet.
- Greedy XOR search. For each number, traverse the trie trying to go the opposite direction at each bit. If the current bit is 1, try to go to the 0-child (and vice versa) because opposite bits produce a 1 in XOR. If the opposite child does not exist, follow the same-bit child. This greedy approach works because higher bits contribute exponentially more to the XOR value.
- Why greedy is optimal. A difference at bit position k contributes 2^k to the XOR. A difference at bit k is always worth more than all possible differences at bits 0 through k-1 combined (2^k > 2^(k-1) + 2^(k-2) + … + 1). So the greedy choice at each level is globally optimal.
- Complexity: O(n * 32) = O(n) time for building the trie and querying each element. O(n * 32) = O(n) space. Dramatically better than the O(n^2) brute force.
- What if the problem asks for the maximum XOR of a subarray (not just any two elements)? (Answer: use prefix XOR.
XOR(i, j) = prefix[j] XOR prefix[i-1]. So the problem reduces to: given an array of prefix XOR values, find the maximum XOR of any two elements — which is exactly the binary trie problem. Build the trie incrementally as you compute prefix XOR values.) - What if numbers are being added to the array dynamically and you need to support “find max XOR with existing numbers” queries? (Answer: the binary trie naturally supports this. Insert each new number into the trie, and for each query, traverse the trie greedily. Both insert and query are O(32) = O(1). This makes it an online algorithm, unlike sorting-based approaches.)
Q8: You have a dictionary of 1 million words and need to replace every word in a sentence with its shortest root from the dictionary. For example, if the dictionary has 'cat' and the sentence has 'cattle', replace 'cattle' with 'cat'. This is LeetCode 648 -- Replace Words.
Q8: You have a dictionary of 1 million words and need to replace every word in a sentence with its shortest root from the dictionary. For example, if the dictionary has 'cat' and the sentence has 'cattle', replace 'cattle' with 'cat'. This is LeetCode 648 -- Replace Words.
is_end node — you do not need to find the longest match. This tests whether you understand that trie traversal can be terminated early based on problem requirements.Strong Answer:- Key insight. “I need the shortest root that is a prefix of each word. A trie gives me this naturally: insert all dictionary roots, then for each word in the sentence, traverse the trie character by character and stop at the first
is_endnode.” - Approach. (1) Build a trie from the dictionary roots. (2) For each word in the sentence, traverse the trie. If at any point the current node has
is_end = True, return the path so far (the shortest root). If the traversal fails (character not found), return the original word unchanged. - Why shortest root? Because we return at the first
is_endencountered during traversal. If the dictionary has both “cat” and “cattle”, “cat” is found first (at depth 3), so “cattle” in the sentence becomes “cat”, not “cattle”. - Alternative without trie. Sort roots by length, and for each word, check if any root is a prefix using
startswith(). This is O(d * w * m) where d = dictionary size, w = words in sentence, m = average length. The trie approach is O(d * m + w * m) — much faster. - Complexity: O(d * m) to build the trie where d = dictionary words, m = average root length. O(w * m) to process the sentence where w = number of words. Total: O((d + w) * m). Space: O(d * m) for the trie.
- What if you wanted the longest root instead of the shortest? How does the traversal change? (Answer: do not stop at the first
is_end. Continue traversing until you can go no further, and track the lastis_endnode you saw. Return the path up to that last end marker. This is also how longest-prefix-match works in IP routing tables.) - If the dictionary is updated frequently (roots added and removed) and sentences arrive as a stream, how do you handle this? (Answer: the trie supports O(m) insertion natively. For deletion, you remove the
is_endflag and optionally prune childless nodes bottom-up. Sentences can be processed as they arrive since each word lookup is independent. The trie serves as a living index that evolves with the dictionary.)
Q9: How would you implement a spell checker that suggests corrections for misspelled words? What role does a trie play versus other approaches?
Q9: How would you implement a spell checker that suggests corrections for misspelled words? What role does a trie play versus other approaches?
- The trie’s role. “A trie is the backbone for storing the valid dictionary. But spell checking is not just prefix lookup — a misspelled word might have characters inserted, deleted, substituted, or transposed anywhere. The trie helps in two ways: (1) verifying if a word is valid in O(m), and (2) enabling a modified DFS that explores words within a given edit distance.”
- Edit distance on a trie. Run a parallel traversal: walk the trie while tracking a row of the Levenshtein distance DP table. At each trie node, compute the edit distance to the query word so far. If the minimum value in the current DP row exceeds your threshold (e.g., edit distance > 2), prune that entire subtree. This is dramatically faster than computing edit distance against every word in the dictionary.
- Complexity. Naive approach: O(d * m * n) where d = dictionary size. Trie + DP approach: depends on the threshold, but for threshold=2 on an English dictionary of 200K words, you typically visit only a few thousand nodes instead of all 200K words. Empirically 10-100x faster.
- Ranking suggestions. Edit distance alone is not enough. Weight by: (1) word frequency (common words should rank higher), (2) keyboard proximity (typing ‘r’ instead of ‘e’ is more likely than typing ‘z’), (3) phonetic similarity (Soundex or Metaphone). Production spell checkers like those in Google Search combine all of these with machine learning models.
- Alternatives to pure trie. BK-trees organize words by edit distance for efficient range queries. SymSpell precomputes all deletions within a threshold and uses a HashMap for O(1) lookup at the cost of more memory. For production, a hybrid approach is typical.
- SymSpell claims O(1) spell checking. How does it work and what is the trade-off? (Answer: precompute all possible deletions of each dictionary word up to a max edit distance and store them in a HashMap. For a query, generate its deletions and look them up. The trade-off is memory: for 200K words with max edit distance 2, you might store 5-10 million deletion entries. It is fast at query time but expensive to build and memory-hungry.)
- How would you extend your spell checker to handle multi-word corrections, like “teh quikc” to “the quick”? (Answer: process words independently first, then use a language model (n-gram or neural) to re-rank suggestions based on context. “quikc” could correct to “quick” or “quiche” — the preceding word “the” makes “the quick” far more likely. This is why production spell checkers are not pure edit-distance systems.)
Q10: Compare using a Trie versus a HashMap for prefix-based operations. When is a Trie actually better, and when is a HashMap sufficient or even superior?
Q10: Compare using a Trie versus a HashMap for prefix-based operations. When is a Trie actually better, and when is a HashMap sufficient or even superior?
- HashMap wins when you only need exact match. If the problem is “does this exact word exist?” a HashMap gives O(1) average lookup. A trie gives O(m). The trie’s per-character traversal with pointer chasing is slower in practice due to poor cache locality. For 200K English words with average length 8, a HashMap lookup touches 1-2 cache lines; a trie traversal touches 8+ nodes scattered in memory.
- Trie wins when you need prefix operations. “Find all words starting with
'pre'” requires scanning the entire HashMap (O(n * m)), but a trie traverses to the'pre'node in O(3) and then DFS collects only matching words. The asymptotic and practical difference is enormous for large dictionaries. - Trie wins when you need ordered traversal. A trie naturally supports lexicographic iteration — DFS on the trie yields words in sorted order. A HashMap cannot do this without sorting (O(n log n)).
- Trie wins for shared-prefix space optimization. If your dictionary has 10,000 words all starting with “micro” (microservice, microchip, etc.), the trie stores “micro” once. A HashMap stores it in every key. For highly prefix-redundant data (URLs, file paths, IP addresses), tries can use significantly less total space.
- HashMap wins when memory layout matters. Modern CPUs are cache-line optimized. A HashMap storing strings contiguously is far more cache-friendly than a trie with scattered node allocations. For latency-critical systems, the HashMap’s cache behavior often matters more than the trie’s algorithmic advantage.
- Decision rule. “If the problem only needs exact lookup and no prefix operations, use a HashMap. If it needs prefix search, ordered iteration, or deals with highly prefix-redundant data, use a trie. If you need both, use a trie with a parallel HashMap for exact lookups.”
- Your service needs to serve both exact lookups and prefix-based autocomplete on the same dictionary. How would you structure the storage? (Answer: use both. A HashMap for O(1) exact lookups (used for spell verification, membership checks) and a trie (or radix tree) for prefix-based autocomplete. The memory overhead of maintaining two structures is usually acceptable given the performance benefit. Alternatively, a sorted array with binary search can serve both — O(log n) exact match and O(log n + k) prefix enumeration — as a simpler middle ground.)
- In what scenario would you pick a sorted array over both a trie and a HashMap? (Answer: when the dictionary is static (no inserts/deletes), memory is constrained, and you need both exact and prefix operations. A sorted array supports binary search for exact match O(log n), lower_bound for prefix search O(log n + k), and uses minimal memory with excellent cache locality. The downside is O(n) insertion. Redis uses sorted sets for similar reasons in its internal structures.)
Interview Q&A: Structured Whiteboard Drills
Three high-frequency trie interview questions, each rehearsed with the structure your interviewer wants to hear: framework first, then code, then crisp follow-ups.Q1: Implement Trie with insert, search, startsWith (LC 208)
Q1: Implement Trie with insert, search, startsWith (LC 208)
- State the node shape. “Each node has a children map (character to child node) and an
is_endboolean.” - State the root. “The root is an empty node representing the empty prefix. It has no character associated with it.”
- Insert. Walk character by character, creating children that do not exist. Set
is_end = Trueon the final node. - Search. Walk character by character. Return false if any character is missing. Return true only if
is_endis true on the final node. - StartsWith. Same walk as search, but return true as long as the path exists — do not check
is_end. - State complexity. All three operations are O(m) where m is the word length. Space O(N * m) worst case where N is the number of words, much less in practice due to shared prefixes.
- Mention the array-vs-hashmap tradeoff. Hash map for arbitrary character sets and sparse tries; array of 26 for lowercase English with predictable layout and faster access.
delete(word)?Recursive bottom-up. For each character along the path, recursively delete from the subtree. After the recursive call returns, check whether the current node has any remaining children and whether it itself is the end of another word. If neither, prune it from its parent. The trick is that deletion of "car" must not break "care" — the 'c', 'a', 'r' nodes are shared. You only delete nodes that are no longer on any other word’s path.__slots__ and the is_end flag, you can drop to 50 to 60 bytes. With around 30 percent prefix sharing on natural English, 10 million words averaging 8 characters yields roughly 56 million nodes. Total: 3 to 5 GB. This is why production systems use radix trees (collapse single-child chains) or store the dictionary outside memory in a serialized format like a DAWG (Directed Acyclic Word Graph).- “
searchreturns true whenever the walk succeeds.” Misses theis_endcheck.search("ca")would wrongly return true when the trie has only"cat". - “Use a single hash set of all words for
searchand a separate trie forstartsWith.” Works but doubles the memory and asks the interviewer to evaluate two data structures. The trie alone solves both; that is the point of the data structure. - “Recursion for insert and search.” Works but adds stack depth equal to word length. Iterative is simpler, faster, and avoids recursion limit issues for very long inputs.
- LC 1804 Implement Trie II — adds count-words and count-prefixes operations.
- LC 720 Longest Word in Dictionary — the natural follow-up; same trie, BFS traversal.
- Sedgewick, Algorithms (4th ed.), section 5.2 — the textbook treatment of tries with array-of-R children.
Q2: Word Search II -- why trie + DFS beats individual word search (LC 212)
Q2: Word Search II -- why trie + DFS beats individual word search (LC 212)
- Establish the brute force. “For each word, do a DFS from every cell. With W words, R rows, C cols, average length L, this is O(W * R * C * 4^L). At W = 30,000 this is hopeless.”
- Identify the wasted work. Two words sharing the prefix
"app"both walk the same first three cells in their DFS. The DFS work for shared prefixes is duplicated W times. - Propose the trie. Build a trie from the word list. Now a single DFS from each cell, walking the trie in lockstep with the grid, simultaneously tests all words with the current path as prefix.
- State the pruning rule. When the DFS enters cell
(r, c), look up the cell’s character in the current trie node’s children. If it is missing, return immediately — no word in the dictionary starts with this path. This is the killer optimization. - State termination. When the trie node has
is_end(or stores the word), record a hit. Continue the DFS afterwards because longer words may also match through this cell. - State the senior optimizations. Store the full word at the terminal node (no string concatenation during DFS). Set
word = Noneafter finding (deduplication). Prune empty trie branches on backtrack so future DFS calls hit dead ends faster. - State complexity. O(R * C * 4^L) worst case where L is the longest word. The trie does not change worst-case but dramatically reduces constants on real inputs.
- The
wordfield at the terminal node. Storing the full word means you push the result withfound.append(nxt.word)instead of reconstructing from the path — this matters when L is large. nxt.word = Noneafter finding. Without this, the same word can be added multiple times if it is reachable from multiple starting cells. With it, each word is recorded exactly once.- Optional empty-branch pruning.
if not nxt.children: node.children.pop(ch, None). Once a trie subtree has no remaining words, deleting it means future DFS calls hit a dead end one step earlier. Empirically, this cuts runtime by 40 to 60 percent on dense word lists.
(r, c) spawns an independent DFS. Partition the grid into regions and dispatch a worker per region. The shared trie can be read concurrently; only the pruning step (deleting visited terminal nodes) requires synchronization. The cleanest design: workers append to thread-local result lists, a final merge happens at the end, and pruning is disabled during parallel execution (or done with atomic compare-and-swap on the children map). Speedup is sub-linear because the trie pruning — the biggest single optimization — is harder to apply across threads.board[r][c] = '#' marking and restoration. Without revisit avoidance, infinite loops are possible if any cycle in the graph is reachable, so you bound the search depth by the longest word in the trie. Critically, the trie pruning becomes even more important because the search space explodes; without pruning the search is effectively unbounded. This variant rarely appears in LeetCode but does appear in word-game search problems.- “Run a separate DFS per word.” Functionally correct but TLEs for any non-trivial dictionary. The whole point of the question is the trie.
- “Store the path string and compare against a hash set of words at each step.” Works but checks every prefix against a hash set instead of pruning when no word starts with the prefix. The trie is strictly stronger because it answers “does any word start with this path” in O(1) per character.
- “Mark cells visited with a separate
visitedmatrix instead of mutating the board.” Works but doubles the memory and triples the cache misses. Mutating the board with a sentinel like'#'is the standard idiom.
- LC 79 Word Search — the single-word version; useful warmup.
- LC 425 Word Squares — generalizes trie + DFS to a 2D word-grid construction problem.
- The classic Boggle solver write-ups (Knuth, Sedgewick) — foundational for the trie + DFS pattern.
Q3: Autocomplete System -- design considerations (LC 642)
Q3: Autocomplete System -- design considerations (LC 642)
- Restate the problem. “Each
input(c)call appends a character to the current query and returns the top 3 most frequent historical sentences with that prefix.'#'ends the current sentence and adds it to history with frequency 1.” - State the data structure. Trie keyed by character. Each node has children, an
is_endflag, and acountinteger. Counts live only on terminal nodes (the end of full sentences). - State the lookup algorithm. Walk the trie to the current prefix node. From there, DFS the entire subtree to collect every
(sentence, count)pair. Sort by(-count, sentence)and return the first 3. - State the complexity tradeoff. Naive approach is O(p + m) per
inputcall where p is the prefix length and m is the number of sentences under the prefix. For a popular short prefix like"the", m can be huge. - Propose the production optimization. Store a precomputed top-3 list at each trie node, updated on every insert. Each
inputcall becomes O(p) — one walk to the prefix node and a list read. Insert becomes O(p * log 3) because you must update top-3 lists along the entire insertion path. - Discuss memory. With 100 million sentences and roughly 5 characters per node on average, you have on the order of 500 million nodes, each carrying a top-3 list of pointers. This requires sharding across machines or a disk-backed trie.
cur_node cache is what makes this O(1) per keystroke after the first character. We do not re-walk from the root each time — we track our position in the trie across calls. When the user types a character that does not extend the current path, we set cur_node = None and stay there until the next '#'.'a' goes to shard 1, etc. Each shard handles roughly 1/26th of the volume. (b) Move the trie off-heap — either to a memory-mapped file using a serialized format like SuRF (Succinct Range Filter) or to a disk-backed structure like RocksDB with prefix scans. The top-3 cache lives in memory; the full sentence storage lives on disk. (c) Write-behind for the count updates — batches of insert traffic queued and applied periodically rather than synchronously, since exact real-time top-3 is rarely required.- “DFS the entire subtree on every keystroke.” Correct but slow for popular prefixes. For
"a"in a million-sentence trie, you DFS hundreds of thousands of nodes per keystroke. Production systems precompute top-K at each node. - “Use a SQL
LIKE 'prefix%' ORDER BY count DESC LIMIT 3query.” Works for small datasets and is genuinely the right answer when you have a database already and the dataset is under a few million rows. But the question is asking you to design the data structure, not delegate to a database. - “Maintain a global sorted list of all sentences and binary-search the prefix range.” O(log n + k) per query and easy to implement, but updates are O(n) due to shifting. Falls apart for high write throughput.
- LC 1268 Search Suggestions System — the static, simpler version of LC 642.
- LC 1804 Implement Trie II — adds count-words and count-prefixes, a stepping-stone toward autocomplete-with-counts.
- Google “Suggest” engineering talks (e.g., the 2010 talk by Daniel Russell) — production architecture of large-scale autocomplete.