Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Trie Pattern

What is Trie?

Trie (pronounced “try,” derived from retrieval) is a tree-like data structure for storing strings where each edge represents a character and each path from root to a marked node represents a complete word. Think of it like a phone book organized as a decision tree: to look up “cat,” you go to the ‘c’ branch, then the ‘a’ branch, then the ‘t’ branch. Every word that starts with “ca” shares the same first two edges, so shared prefixes are stored only once. This prefix-sharing property is what makes tries special. A HashMap can tell you “does this word exist?” in O(1), but it cannot efficiently answer “what words start with ‘pre’?” — that requires scanning every key. A trie answers prefix queries in O(m) where m is the prefix length, regardless of how many words are stored. This makes tries the backbone of autocomplete systems, spell checkers, IP routing tables, and any application where prefix-based lookup matters. How it differs from a binary search tree: In a BST, each node stores a complete key and you compare keys at each level. In a trie, each node stores a single character (or edge label), and the path from root to node is the key. This means all keys with a common prefix share the same initial path, saving both time and space.
Quick Recognition: Problems involving prefix matching, autocomplete, word dictionaries, or searching multiple words efficiently. Keywords: “prefix”, “dictionary”, “word search”, “autocomplete”. If the problem involves searching for multiple patterns simultaneously, a trie lets you do it in a single pass rather than one pass per pattern.

Pattern Recognition Cheatsheet (LeetCode Triggers)

If the problem says any of these, reach for a trie before you reach for a hash map:
  • “prefix lookup” / “starts with” — standard trie with is_end flag. LC 208, 1268.
  • “autocomplete” / “search suggestions” — trie + frequency or top-K cache per node. LC 642, 1268, 1804.
  • “word search in grid” — trie + DFS, prune when the current path exits the trie. LC 212.
  • “longest common prefix” — single chain in the trie until first branch. LC 14.
  • “wildcards in word match” / “dot matches any character” — trie + recursive DFS that branches on every child when it hits a dot. LC 211.
  • “replace word with shortest root” — trie traversal that returns at the first is_end. LC 648.
  • “max XOR of two numbers” — binary trie over bits, greedy “take the opposite bit” traversal. LC 421, 1707.
  • “concatenated words” / “word break with multiple matches” — trie + DP, every prefix that exists is a recursion candidate. LC 472, 139.
  • “stream of incoming words / queries” — trie supports online insert and search natively, no rebuild needed.
Anti-patterns — do NOT use a trie when:
  • You only need exact-word membership and no prefix queries — a hash set is faster (O(1) average versus O(m)) and uses far less memory.
  • The dictionary is small (under a few hundred items) — hash sets and sorted arrays beat the trie’s pointer-chasing overhead.
  • Memory is tight and your strings have few shared prefixes — a standard trie pays the per-character overhead with no compression benefit. Consider a radix tree if shared prefixes exist; otherwise stick with a hash set.
  • You need ordered lookups by an arbitrary attribute (frequency, length) — a trie sorts only lexicographically along its paths.

Pattern Recognition Checklist

Use Trie When

  • Prefix-based search (autocomplete)
  • Word dictionary with wildcard search
  • Word search in 2D grid (multiple words)
  • Longest common prefix
  • Spell checking
  • IP routing (binary tries)

Don't Use When

  • Exact match only (use HashSet)
  • Single word search (use HashMap)
  • Memory is very constrained
  • Small dictionary (overhead not worth it)

Trie Complexity Reference

OperationTimeSpace
InsertO(m)O(m) per word
SearchO(m)-
Prefix searchO(m)-
DeleteO(m)-
Build trieO(n * m)O(n * m * 26) worst case
Where m = word length, n = number of words

When to Use

Autocomplete

Suggest words by prefix

Spell Checker

Find words with similar prefixes

Word Search

Search word in grid

Longest Prefix

Find longest common prefix

Basic Implementation

Every trie node holds two things: a map of children (character to child node) and a boolean is_end marking whether a complete word terminates at this node. The root is an empty node representing the empty prefix. This distinction between search (requires is_end == True) and startsWith (only requires the path to exist) is the single most common source of bugs in trie implementations. Children representation trade-off: A HashMap gives O(1) average lookup and handles arbitrary character sets (Unicode). A fixed array of 26 gives O(1) worst-case and better cache locality, but wastes memory when the trie is sparse. In an interview, HashMap is the safer default unless you are told the charset is lowercase English only.
class TrieNode:
    def __init__(self):
        self.children = {}   # char -> TrieNode
        self.is_end = False  # True if a complete word ends here

class Trie:
    def __init__(self):
        self.root = TrieNode()  # Empty root represents the empty prefix
    
    def insert(self, word):
        """Insert word into trie - O(m) where m = len(word).
        
        Walkthrough: insert("cat") then insert("car")
          insert("cat"):
            root -> 'c' (create) -> 'a' (create) -> 't' (create, is_end=True)
          insert("car"):
            root -> 'c' (exists) -> 'a' (exists) -> 'r' (create, is_end=True)
          Now 'c' and 'a' nodes are shared between "cat" and "car".
        """
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()  # Create path if missing
            node = node.children[char]             # Move to child
        node.is_end = True  # Mark the end of a complete word
    
    def search(self, word):
        """Check if exact word exists - O(m).
        
        Critical: search("ca") returns False even though the path exists,
        because node 'a' has is_end=False. Only "cat" and "car" are words.
        """
        node = self._traverse(word)
        return node is not None and node.is_end  # Must be a complete word
    
    def startsWith(self, prefix):
        """Check if any word starts with prefix - O(m).
        
        startsWith("ca") returns True because the path c->a exists,
        even though "ca" itself is not a word.
        """
        return self._traverse(prefix) is not None  # Path existing is enough
    
    def _traverse(self, s):
        """Follow the path for string s. Returns the final node, or None
        if any character along the path does not exist."""
        node = self.root
        for char in s:
            if char not in node.children:
                return None  # Path does not exist
            node = node.children[char]
        return node

Trie Templates You Should Have Memorized

Three templates that cover roughly 80 percent of trie LeetCode. Type them from blank in under three minutes each.

Template 1: TrieNode + Insert + Search + StartsWith

The reference implementation. LC 208 in canonical form.
class TrieNode:
    __slots__ = ('children', 'is_end')           # cuts memory ~30% in CPython
    def __init__(self):
        self.children = {}                        # char -> TrieNode
        self.is_end = False

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word: str) -> None:
        node = self.root
        for ch in word:
            if ch not in node.children:
                node.children[ch] = TrieNode()
            node = node.children[ch]
        node.is_end = True

    def search(self, word: str) -> bool:
        node = self._walk(word)
        return node is not None and node.is_end   # complete word, not just a prefix

    def startsWith(self, prefix: str) -> bool:
        return self._walk(prefix) is not None      # path existing is enough

    def _walk(self, s):
        node = self.root
        for ch in s:
            if ch not in node.children:
                return None
            node = node.children[ch]
        return node

Template 2: Count Words and Count Prefixes

A common variant (LC 1804, LC 1023): each node stores the number of words ending here and the number of words passing through here. Decrement on erase, query in O(m).
class CountTrieNode:
    __slots__ = ('children', 'end_count', 'pass_count')
    def __init__(self):
        self.children = {}
        self.end_count = 0     # how many words end exactly at this node
        self.pass_count = 0    # how many inserted words traverse this node

class CountTrie:
    def __init__(self):
        self.root = CountTrieNode()

    def insert(self, word):
        node = self.root
        for ch in word:
            if ch not in node.children:
                node.children[ch] = CountTrieNode()
            node = node.children[ch]
            node.pass_count += 1
        node.end_count += 1

    def countWordsEqualTo(self, word):
        node = self._walk(word)
        return node.end_count if node else 0

    def countWordsStartingWith(self, prefix):
        node = self._walk(prefix)
        return node.pass_count if node else 0

    def erase(self, word):
        node = self.root
        for ch in word:
            node = node.children[ch]
            node.pass_count -= 1
        node.end_count -= 1

    def _walk(self, s):
        node = self.root
        for ch in s:
            if ch not in node.children:
                return None
            node = node.children[ch]
        return node
The two counters are independent. pass_count tells you how many inserted words have the current node on their path, which is exactly the answer for startsWith-with-count queries.

Template 3: Wildcard Search (DFS Branching on ’.’)

The LC 211 pattern. The standard trie walk is iterative; the moment you hit a wildcard you must recurse and branch.
class WordDictionary:
    def __init__(self):
        self.root = TrieNode()

    def addWord(self, word):
        node = self.root
        for ch in word:
            if ch not in node.children:
                node.children[ch] = TrieNode()
            node = node.children[ch]
        node.is_end = True

    def search(self, word):
        return self._dfs(self.root, word, 0)

    def _dfs(self, node, word, i):
        if i == len(word):
            return node.is_end
        ch = word[i]
        if ch == '.':
            # branch into every existing child
            for child in node.children.values():
                if self._dfs(child, word, i + 1):
                    return True
            return False
        if ch not in node.children:
            return False
        return self._dfs(node.children[ch], word, i + 1)
The complexity for a query of length m with d dots is roughly O(branching_factor^d * m). On real English-language tries the average branching factor is 4 to 5, so a query like "a.b.c" is fast; "....." is not.

Pattern Variations

1. Word Search II (Backtracking + Trie)

This is the classic hard trie problem and a FAANG favorite. The brute-force approach searches for each word independently via DFS, costing O(words * rows * cols * 4^L). A trie lets you search for all words simultaneously in a single DFS pass: at each cell, you check whether the current path exists in the trie and prune immediately if not. This can be 100x faster for large word lists. Critical optimization: After finding a word, set is_end = False on that node (or prune the node entirely if childless). Without this, the same word can be found from multiple starting cells, and the trie never shrinks. Complexity: O(rows * cols * 4^L) worst case, but trie pruning makes real performance dramatically better.
def find_words(board, words):
    """Find all words from dictionary in grid.
    
    Strategy: Build trie from word list, then DFS from every cell.
    The trie prunes impossible paths -- if no word starts with the
    current path, we backtrack immediately instead of exploring further.
    """
    # Build trie from words
    trie = Trie()
    for word in words:
        trie.insert(word)
    
    rows, cols = len(board), len(board[0])
    result = set()
    
    def dfs(r, c, node, path):
        char = board[r][c]
        if char not in node.children:
            return
        
        node = node.children[char]
        path += char
        
        if node.is_end:
            result.add(path)
        
        board[r][c] = '#'  # Mark visited
        
        for dr, dc in [(0, 1), (0, -1), (1, 0), (-1, 0)]:
            nr, nc = r + dr, c + dc
            if 0 <= nr < rows and 0 <= nc < cols and board[nr][nc] != '#':
                dfs(nr, nc, node, path)
        
        board[r][c] = char  # Restore
    
    for r in range(rows):
        for c in range(cols):
            dfs(r, c, trie.root, "")
    
    return list(result)

2. Autocomplete System

A system design + data structures hybrid. The trie handles prefix matching while frequency counts enable ranking. The stateful input() method accumulates characters across calls — each keystroke narrows the prefix and refines suggestions. When the user types #, the current sentence is saved. Performance insight: The naive approach (DFS on every keystroke) is fine for small dictionaries. For production-scale autocomplete (millions of sentences), precompute and store the top-3 results at each trie node during insertion, making each input() call O(prefix_length) instead of O(prefix_length + all_sentences_under_prefix).
class AutocompleteSystem:
    def __init__(self, sentences, times):
        self.root = TrieNode()
        self.current_input = ""
        
        for sentence, count in zip(sentences, times):
            self._insert(sentence, count)
    
    def _insert(self, sentence, count):
        node = self.root
        for char in sentence:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
        node.is_end = True
        node.count = getattr(node, 'count', 0) + count
    
    def input(self, c):
        if c == '#':
            self._insert(self.current_input, 1)
            self.current_input = ""
            return []
        
        self.current_input += c
        return self._search(self.current_input)
    
    def _search(self, prefix):
        node = self.root
        for char in prefix:
            if char not in node.children:
                return []
            node = node.children[char]
        
        # DFS to find all sentences with this prefix
        results = []
        self._dfs(node, prefix, results)
        
        # Sort by count (desc), then lexicographically
        results.sort(key=lambda x: (-x[1], x[0]))
        return [s for s, _ in results[:3]]
    
    def _dfs(self, node, path, results):
        if node.is_end:
            results.append((path, node.count))
        for char, child in node.children.items():
            self._dfs(child, path + char, results)

Classic Problems

ProblemPatternKey Insight
Implement TrieBasic opsHashMap children
Word Search IITrie + DFSPrune with trie
AutocompleteTrie + HeapTop-k by frequency
Longest WordBFS/DFSCheck prefixes exist
Word DictionaryTrie + ’.’DFS for wildcards

Interview Problems by Company

ProblemCompanyKey Concept
Implement TrieAll FAANGBasic trie ops
Word DictionaryMeta, AmazonWildcard DFS
Replace WordsAmazonPrefix replacement
Map Sum PairsGoogleTrie with values

Interview Tips

Script for interviews:
  1. “I’ll use a Trie for efficient prefix-based operations.”
  2. “Each node has children (HashMap or array of 26).”
  3. “I’ll mark end-of-word with a boolean flag.”
  4. “Insert and search are both O(word length).”
  5. “For Word Search II, I’ll combine Trie with DFS backtracking.”
ImplementationProsCons
HashMap childrenFlexible, sparse-friendlySlightly slower
Array[26]Faster accessWastes memory for sparse
With countSupports frequencyExtra space
With wordStore full word at endEasier retrieval
  1. Remove word from trie after finding (Word Search II)
  2. Store word at end node instead of just boolean
  3. Track count for frequency-based problems
  4. Prune empty branches after deletion

Practice Problems

Implement Trie

Basic trie operations

Word Search II

Backtracking with trie

Design Search Autocomplete

Trie with frequency

Replace Words

Prefix replacement

Practice Roadmap

1

Day 1: Basic Trie

  • Solve: Implement Trie, Longest Common Prefix
  • Focus: Insert, search, startsWith operations
2

Day 2: Pattern Matching

  • Solve: Word Dictionary, Replace Words
  • Focus: Wildcard handling, prefix matching
3

Day 3: Trie + DFS

  • Solve: Word Search II
  • Focus: Combining trie with backtracking
4

Day 4: Advanced Trie

  • Solve: Autocomplete System, Palindrome Pairs
  • Focus: Ranking, complex traversals
Interview Tip: When you see multiple string searches or prefix operations, think Trie. It is especially powerful when combined with DFS for grid problems. A HashMap can match a trie for exact lookups, but the moment the problem says “prefix,” “autocomplete,” or “find all words starting with,” the trie pulls ahead. Also consider tries for XOR maximization problems — a binary trie lets you greedily pick opposite bits at each level.

Worked LeetCode Problems

Five canonical trie problems, each walked through end to end. The point is not just the answer but the decision — why a trie, what each node carries, what the termination rule is, and which trap to avoid.

LC 208: Implement Trie (Prefix Tree) (Medium)

Decision: Standard trie with hash-map children. Use Template 1 verbatim. The interview hook is articulating the difference between search (requires is_end) and startsWith (path exists, regardless of is_end). See Template 1 above for the full code. A single subtle bug to avoid: do not return True from search just because the walk succeeded — you must check is_end on the final node. About 30 percent of candidates miss this on first attempt.

LC 211: Design Add and Search Words (Medium)

Decision: Standard trie + recursive DFS for the wildcard. Template 3 verbatim. See Template 3 above. The interview signal here is recognizing that an iterative walk works for plain queries but the wildcard forces recursion the moment you hit a dot. State the worst-case complexity out loud: O(26^m) for a pure-dot query of length m. Mention that real tries are sparse so the practical cost is closer to O(b^m) for branching factor b around 4 to 5 on English text.

LC 212: Word Search II (Hard)

Decision: Build a trie from the word list, then DFS from every cell, walking the trie in lockstep with the grid. The trie prunes catastrophic branches.
class TrieNode:
    __slots__ = ('children', 'word')
    def __init__(self):
        self.children = {}
        self.word = None         # store the full word at the terminal node

def findWords(board, words):
    root = TrieNode()
    for w in words:                                 # build trie
        n = root
        for ch in w:
            if ch not in n.children:
                n.children[ch] = TrieNode()
            n = n.children[ch]
        n.word = w

    rows, cols = len(board), len(board[0])
    found = []

    def dfs(r, c, node):
        ch = board[r][c]
        nxt = node.children.get(ch)
        if not nxt:
            return                                  # trie says: no word starts with this path
        if nxt.word:
            found.append(nxt.word)
            nxt.word = None                         # de-dup; same word found twice is one entry
        board[r][c] = '#'
        for dr, dc in [(0,1),(0,-1),(1,0),(-1,0)]:
            nr, nc = r+dr, c+dc
            if 0 <= nr < rows and 0 <= nc < cols and board[nr][nc] != '#':
                dfs(nr, nc, nxt)
        board[r][c] = ch
        # Optional pruning: if nxt has no children left, delete from parent
        if not nxt.children:
            node.children.pop(ch, None)

    for r in range(rows):
        for c in range(cols):
            dfs(r, c, root)
    return found
Why this beats searching each word individually: with W words and L average length, the per-word DFS approach is roughly O(W * rows * cols * 4^L). The trie approach is O(rows * cols * 4^L) total. For a 50x50 board with 30,000 words this is the difference between “finishes” and “times out.” The two senior optimizations are visible in the code: (1) store the word at the terminal node so you do not concatenate strings during DFS, (2) prune empty trie branches when you backtrack so future DFS calls hit dead ends faster.

LC 14: Longest Common Prefix (Easy)

Decision: A trie is overkill for the textbook two-pointer or vertical-scan solution, but it is the right answer when the same set of words is queried repeatedly or when the words arrive as a stream.
def longestCommonPrefix(strs):
    if not strs:
        return ""
    root = TrieNode()
    for s in strs:                               # insert all
        node = root
        for ch in s:
            if ch not in node.children:
                node.children[ch] = TrieNode()
            node = node.children[ch]
        node.is_end = True

    # walk down while there is exactly one child and we have not hit an end
    prefix, node = [], root
    while len(node.children) == 1 and not node.is_end:
        ch, child = next(iter(node.children.items()))
        prefix.append(ch)
        node = child
    return ''.join(prefix)
The trie’s longest-common-prefix is the path from the root to the first node that either branches or ends. Mention vertical scanning (compare characters column by column across all strings) as the simpler O(n * m) one-shot answer.

LC 648: Replace Words (Medium)

Decision: Build a trie from the dictionary roots. For each word in the sentence, walk the trie and return at the first is_end — that is the shortest root.
def replaceWords(roots, sentence):
    root = TrieNode()
    for r in roots:
        node = root
        for ch in r:
            if ch not in node.children:
                node.children[ch] = TrieNode()
            node = node.children[ch]
        node.is_end = True

    def shortest_root(word):
        node = root
        prefix = []
        for ch in word:
            if ch not in node.children:
                return word                         # no root applies
            prefix.append(ch)
            node = node.children[ch]
            if node.is_end:
                return ''.join(prefix)              # FIRST end -> shortest root
        return word

    return ' '.join(shortest_root(w) for w in sentence.split())
The “stop at first is_end” is the whole interview signal. Compare with longest prefix matching (used in IP routing) where you continue past is_end and remember the most recent end seen — different traversal rule, same trie.

Caveats and Traps

Trie traps that bite even strong candidates:
  1. Confusing search with startsWith. The single most common bug. search("ca") must return False when the trie contains only "cat" and "car", because no word ends at node 'a'. startsWith("ca") returns True for the same input. If your search does not check is_end on the final node, every prefix is wrongly reported as a word.
  2. Memory blowup with many short, dissimilar strings. A standard trie pays a per-character allocation cost. Storing 1 million unique 4-letter strings with no shared prefixes is dramatically more memory-hungry than a hash set of those same strings. For prefix-poor data, use a radix tree (compressed trie) or fall back to a hash set.
  3. Case sensitivity. Trie.insert("Apple") and Trie.search("apple") will disagree because the children map is case-sensitive. Normalize on insert and on query (s.lower()) unless the problem explicitly requires case sensitivity.
  4. Tries are not thread-safe. A standard trie has no synchronization. Concurrent inserts race on the children map and can lose writes or crash. For multi-threaded production use, either lock around mutations, use a concurrent map for children, or shard tries by first-character.
  5. Returning the path string by string concatenation in DFS. path += ch creates a new string in Python, Java, and JavaScript. For deep tries this is O(m) per step and O(m^2) overall. Either pass a mutable buffer (list in Python, StringBuilder in Java) or store the full word at the terminal node so you never reconstruct it.
  6. Forgetting to mark visited cells in grid + trie problems. LC 212 reuses cells if you do not mark board[r][c] = '#' before recursing. Always mark on entry, restore on exit.
  7. Not deduplicating found words in Word Search II. The same word can be reachable from multiple starting cells. Either set the terminal word = None after finding (the cleanest approach) or use a set for the result.
  8. Choosing array-of-26 children when the alphabet is not exactly lowercase English. Unicode, mixed case, digits, punctuation — any of these breaks the array assumption. Default to a hash map unless the problem statement explicitly restricts the alphabet.
Patterns that pair with the warnings above:
  • Use __slots__ in Python TrieNodes. Cuts per-node memory by roughly 30 percent in CPython by eliminating the per-instance __dict__. Significant when the trie has millions of nodes.
  • Store the full word at terminal nodes, not a boolean. Saves O(m) string reconstruction during DFS and makes deduplication a one-liner (node.word = None after first match).
  • Normalize once at the boundary (the public insert/search methods) rather than scattering .lower() calls. One source of truth for the case-folding rule.
  • Compressed (radix) trie when shared prefixes are deep but branching is rare. Linux kernel routing, HTTP routers like httprouter, and Redis key storage all use radix trees for this reason.
  • Parallel hash set for exact-match speedups. When 95 percent of queries are exact match and 5 percent are prefix, keep both data structures. The hash set serves the common case in O(1); the trie serves the prefix case in O(m). Memory cost is acceptable for the latency win.
  • Lazy delete by clearing is_end instead of removing nodes. True deletion needs a bottom-up sweep that respects shared prefixes. For most problems, simply marking the node as no-longer-a-word-ending is sufficient and avoids the bookkeeping.

Curated LeetCode List

A focused trie grind list. Each problem teaches a distinct lesson; do not skip the “why this is here” column.
LC #ProblemDifficultyWhy this problem
14Longest Common PrefixEasyTrie is overkill but conceptually clean; learn the two-pointer alternative
720Longest Word in DictionaryEasyTrie + DFS with the “every prefix must be a word” constraint
1268Search Suggestions SystemMediumReal-world autocomplete; combine trie with sorted top-3 storage
208Implement TrieMediumThe reference implementation; if you cannot do this in 5 minutes you are not ready
211Design Add and Search WordsMediumWildcard DFS; the canonical “trie + recursion” pattern
648Replace WordsMedium”Stop at first is_end” — shortest-root traversal
677Map Sum PairsMediumTrie that aggregates an integer along the path; foundation for autocomplete-with-counts
1023Camelcase MatchingMediumWildcard-like matching with a constraint; tests adapting the basic walk
212Word Search IIHardThe trie + DFS classic; pruning and dedup are senior signals
642Design Search Autocomplete SystemHardStateful input across calls; ranking by frequency then lexicographic
472Concatenated WordsHardTrie + DP; recognize that every prefix that exists is a recursion candidate
421Maximum XOR of Two NumbersHardBinary trie; greedy “take the opposite bit” traversal
1707Maximum XOR With an Element From ArrayHardOffline trie + sorting; queries answered as you build the trie
745Prefix and Suffix SearchHardTwo tries (one for prefix, one for reversed suffix) joined by a weight
336Palindrome PairsHardTrie + palindrome checks; the most fiddly but rewarding trie problem

Interview Questions

What interviewers are really testing: Can you build the most fundamental tree-based string data structure from scratch? They want to see if you understand the node structure, how characters map to edges, and the critical difference between “this prefix exists” and “this complete word exists.” This is the baseline — if your trie implementation has bugs, nothing else in the interview matters.Strong Answer:
  • Node structure. “Each TrieNode holds a children map (character to child node) and an is_end boolean marking whether a complete word terminates here. The root is an empty node — it represents the empty prefix.”
  • Insert O(m). Walk character by character. If the child for the current character does not exist, create a new node. Move to the child. After the last character, set is_end = True. You are literally building a path through the tree for each word.
  • Search vs startsWith — the critical distinction. Both traverse the trie identically. The difference: search returns True only if the final node has is_end = True (complete word), while startsWith returns True as long as the traversal does not fail (the prefix path exists, regardless of is_end). This is the single most common bug — candidates forget the is_end check in search and treat every prefix as a word.
  • Children representation trade-off. HashMap gives O(1) average lookup and handles arbitrary character sets (Unicode, not just a-z). A fixed array of 26 gives O(1) worst-case and better cache locality, but wastes memory when the trie is sparse. In an interview, HashMap is safer unless told the charset is lowercase English only.
  • Complexity: O(m) time for all three operations where m is the word/prefix length. O(n * m) total space in the worst case where n is the number of words and m is the average length, though shared prefixes reduce this significantly in practice.
Red flag answer: Not including is_end in the node. Confusing search and startsWith behavior. Using recursion for insert/search when iteration is simpler and avoids stack depth issues. Not being able to articulate the space trade-off between HashMap and array children.Follow-ups:
  1. How would you implement delete for a trie? What makes it trickier than insert? (Answer: you need to remove nodes bottom-up, but only if they have no other children and are not the end of another word. A recursive approach works naturally: delete the character path, and on the way back up, remove nodes that are now childless and not word-endings. The tricky part is not accidentally deleting shared prefixes.)
  2. If your trie stores 10 million English words, roughly how much memory does it use with HashMap children versus array-of-26 children? (Answer: with HashMap, each node is roughly 50-80 bytes depending on the language runtime — a HashMap object plus overhead. With array-of-26, each node is 26 pointers which is about 208 bytes on a 64-bit system, but access is faster. For 10M words averaging 8 characters with, say, 30% prefix sharing, you are looking at roughly 56M nodes. HashMap approach: ~3-4 GB. Array approach: ~11 GB. This is why compressed tries and radix trees exist for production use cases.)
What interviewers are really testing: Can you combine trie construction with a traversal strategy (BFS or DFS) that enforces an additional constraint — every prefix of the answer must also be a complete word? This tests whether you understand how is_end creates a secondary constraint on top of basic trie traversal.Strong Answer:
  • Key insight. “The answer is the longest word where every single prefix (length 1, 2, …, k) is also a word in the list. This is not just longest common prefix — it is longest chain of valid words.”
  • Approach with Trie + BFS/DFS. Insert all words into a trie. Then do BFS (or DFS) starting from the root, but only visit child nodes where is_end = True. This enforces the constraint that every prefix must be a complete word. Track the longest path found.
  • Why BFS is cleaner here. BFS naturally explores shorter words before longer ones. If we process level by level, we can guarantee that when we reach a node, all its prefix ancestors were valid words. Among words of the same length, BFS gives us lexicographic order for free if we process children in sorted order.
  • Alternative: sort + HashSet. Sort words by length (then lexicographically). For each word, check if the word minus its last character exists in the set of already-validated words. If yes, add it to the set and update the answer. This is O(n * m * log n) but simpler to code. The trie approach is O(n * m) and more interview-impressive.
  • Edge case. If no single-character word exists in the list, the answer is empty — you cannot start the chain.
Red flag answer: Just finding the longest word in the trie without checking the prefix chain constraint. Confusing this with “longest common prefix” between two words. Not handling the tie-breaking rule (lexicographically smallest among same-length answers).Follow-ups:
  1. What if the problem asks for the longest word that can be formed by concatenating other words in the list (like “catdog” from “cat” + “dog”)? How does the approach change? (Answer: this becomes the Concatenated Words problem. You need trie + DP: for each word, check if it can be split into two or more valid words using a DP pass where you try every prefix that exists in the trie and recursively check the suffix.)
  2. How would you handle it if the word list is streaming in — you do not have all words upfront? (Answer: insert each word as it arrives. After each insertion, re-check whether the newly inserted word extends any existing chain. Maintain a running “longest chain” answer. The trie makes this efficient because you only need to check whether the new word’s prefix of length m-1 was already a valid chain endpoint.)
What interviewers are really testing: Can you modify standard trie search to handle wildcards? The '.' wildcard forces you to branch into all children at that position, turning a single O(m) traversal into a DFS/backtracking problem. They want to see if you understand where the complexity changes and whether your recursive thinking is clean.Strong Answer:
  • Insert is unchanged. Standard trie insert — the wildcard only affects search.
  • Search with '.' requires branching. When you hit a '.' in the query, you cannot follow a single child. You must try every existing child at that position and return True if any branch leads to a match. This is essentially DFS/backtracking on the trie.
  • Implementation. Use a recursive _search(word, index, node) helper. If word[index] is a letter, follow that specific child. If it is '.', iterate over all children and recurse on each. Base case: if index == len(word), return node.is_end.
  • Complexity analysis. Worst case for a query like "..." (all dots, length m) is O(26^m) because every path in the trie is explored. In practice, it is much faster because real tries are sparse — most nodes have far fewer than 26 children. For a trie built from an English dictionary, average branching factor is around 4-5, so the real cost is closer to O(5^m).
  • Why not regex? You could compile the pattern to a regex, but that has its own overhead. The trie-based DFS is more direct and avoids regex engine complexity. Plus, the trie gives you prefix pruning for free.
Red flag answer: Trying to expand all possible words from the wildcard pattern before searching (exponential in the number of dots). Not using recursion and trying to handle dots iteratively (gets messy fast). Claiming the search is still O(m) with wildcards — it is not.Follow-ups:
  1. What if the wildcard is '*' meaning zero or more of any character, not just one? How does the complexity change? (Answer: this is dramatically harder. At each '*', you must try matching zero characters (skip the star), one character (advance in trie, keep the star), or one character (advance in trie, advance past the star). This is similar to regex matching and the complexity becomes O(n * m) with memoization where n is the trie depth and m is the pattern length.)
  2. If you are getting millions of search queries per second, most without any wildcards, how would you optimize? (Answer: check upfront whether the query contains any dots. If not, use the standard O(m) trie search — no recursion needed. Only fall back to the DFS path for wildcard queries. This handles the common case in O(m) and reserves the expensive path for rare queries. You could also maintain a parallel HashMap for exact-match lookups.)
What interviewers are really testing: This is the classic Trie + Backtracking fusion problem and one of the most frequently asked hard problems at Google and Amazon. They want to see three things: (1) why a trie is better than searching each word individually, (2) how you prune the search space using the trie, and (3) whether you know the optimization of removing found words from the trie.Strong Answer:
  • Why trie beats brute force. “Searching each word independently with DFS is O(words * rows * cols * 4^L) where L is word length. With a trie, we do DFS once from each cell and simultaneously check all words. The trie lets us prune: if no word in the dictionary starts with the current path, we backtrack immediately.”
  • Algorithm. Build a trie from the word list. For each cell (r, c), start a DFS passing the trie root. At each step, check if board[r][c] exists as a child in the current trie node. If not, prune (return immediately). If yes, move to that child node. If the child has is_end = True, add the word to results.
  • Visited tracking. Mark the current cell (e.g., replace with '#') before recursing into neighbors, then restore it after. This prevents reusing a cell in the same path.
  • Critical optimization: remove found words. After finding a word, set is_end = False on that node to avoid adding duplicates. Better yet, prune the trie node if it has no remaining children — this progressively shrinks the trie and speeds up subsequent DFS calls. Without this, the same word can be found from multiple starting cells and the trie never gets smaller.
  • Complexity: O(rows * cols * 4^L) worst case, but trie pruning makes real performance dramatically better. Space is O(n * m) for the trie where n = number of words, m = average length.
Red flag answer: Searching each word separately without a trie. Not pruning with the trie (doing full DFS from every cell regardless of trie state). Forgetting to mark cells as visited during DFS. Not handling duplicate words in results.Follow-ups:
  1. Your solution is too slow for a 50x50 board with 30,000 words. What optimizations can you apply beyond basic trie pruning? (Answer: (a) Store the complete word at the end node instead of reconstructing it from the path — saves string concatenation overhead. (b) Prune leaf nodes from the trie as words are found, so future DFS calls hit dead ends faster. (c) Start DFS only from cells whose character exists as a direct child of the root — skip cells that cannot begin any word. (d) If many words share long prefixes, the trie already amortizes the search, but removing exhausted branches is the biggest win.)
  2. How would you parallelize this across multiple cores? (Answer: each starting cell (r, c) is an independent DFS, so you could partition the grid into regions and run DFS from each region’s starting cells in parallel. The tricky part is the shared trie — concurrent reads are fine, but if you are pruning found words, you need synchronization. A practical approach: use a thread-local results set and merge at the end, with a concurrent set for deduplication.)
What interviewers are really testing: This is a system design + data structures hybrid. They want to see if you can combine a trie with ranking (heap or sort), handle stateful input across multiple calls, and think about the production implications — memory, latency, update frequency.Strong Answer:
  • Data structure. “Trie where each leaf (end-of-sentence node) stores a frequency count. The trie handles prefix matching, and we use sorting or a heap for top-K retrieval.”
  • Input handling. Maintain a current_input string across calls. Each input(c) appends c to current_input and searches the trie. If c == '#', insert current_input into the trie with count +1 and reset.
  • Search. Traverse the trie to the node matching current_input. Then DFS from that node to collect all sentence endings. Sort by frequency descending, then lexicographically ascending. Return top 3.
  • Optimization: precomputed top-3 at each node. Instead of DFS on every keystroke, store the top-3 results at every node in the trie during insertion. Each input() call becomes O(1) lookup. The trade-off: insertion becomes more expensive because you must update top-3 lists along the insertion path, and memory increases. But reads become O(prefix_length) instead of O(prefix_length + total_sentences_under_prefix).
  • Complexity without optimization: O(p + n * m * log(n * m)) per input call where p = prefix length, n*m = total characters under the prefix node. With precomputed top-3: O(p) per call, O(p * n) for inserts.
Red flag answer: Not maintaining state across input calls (treating each call independently). Sorting the entire sentence list on every keystroke. Not handling the ’#’ termination correctly. Ignoring the frequency tie-breaking rule.Follow-ups:
  1. In production, your trie has 100 million sentences. DFS on every keystroke is too slow. How do you optimize beyond precomputed top-3? (Answer: store a min-heap of top-K results at each internal node, updated during insertion. Alternatively, use a “hot path” approach: for the first 1-2 characters, precompute results and cache them (the branching is small). For deeper prefixes, the number of matching sentences drops rapidly, making DFS fast. You can also shard the trie by first character across multiple machines.)
  2. How would you handle the trie growing too large for memory? (Answer: several approaches. (a) LRU eviction of infrequently accessed subtrees. (b) Move to a disk-backed trie using memory-mapped files or a B-trie structure. (c) Use a compressed trie (radix tree/Patricia trie) where single-child chains are collapsed into a single node with a string label instead of one node per character — this alone can reduce node count by 50-70% for natural language data.)
What interviewers are really testing: Do you know tries beyond the textbook implementation? This separates candidates who have only solved LeetCode trie problems from those who have thought about tries in production systems where memory and performance matter. It also tests your ability to reason about trade-offs across multiple dimensions.Strong Answer:
  • Standard Trie. One node per character, children as HashMap or array. Simple to implement, O(m) operations. Downside: massive memory overhead when many nodes have only one child (think long unique suffixes like “internationalization” — 20 nodes for a single chain). Good for: interview problems, small dictionaries, when implementation simplicity matters.
  • Compressed Trie (Radix Tree / Patricia Trie). Merges single-child chains into one node with a string label. “inter” + “national” + “ization” becomes three nodes instead of twenty. Same O(m) time complexity but dramatically less memory — typically 50-70% fewer nodes for natural language data. Used in: Linux kernel routing tables (radix_tree), Redis key storage, HTTP routers (many Go routers like httprouter use radix trees). Downside: insertion requires splitting nodes, which adds code complexity.
  • Ternary Search Tree (TST). Each node has three children: less-than, equal-to, greater-than (like a BST at each level). Much more memory-efficient than a standard trie because you only allocate nodes for characters that actually exist — no empty slots. O(m * log k) search where k is the alphabet size at each level. Used when the alphabet is large (Unicode) and memory is critical. Downside: slower than a trie for small alphabets; benefits diminish when most of the 26 letters are used at each level.
  • Decision framework. Interview or small dataset: standard trie. Production with English text: radix tree. Large or variable alphabet (e.g., URL routing, Unicode): radix tree or TST depending on whether you need prefix traversal (radix) or balanced lookups (TST).
Red flag answer: Only knowing the standard trie. Saying “compressed trie” without being able to describe how node merging works. Confusing a TST with a regular BST. Not being able to articulate a concrete scenario where you would choose one over another.Follow-ups:
  1. How does a radix tree handle insertion when you need to split an existing compressed edge? Walk me through the mechanics. (Answer: suppose a node has edge label “apple” and you insert “app”. You split the “apple” edge into “app” (which becomes a node with is_end = True) and “le” (a child of that new node pointing to the original terminal node). The split is O(m) and involves creating one new node and adjusting string labels.)
  2. Why do most HTTP routers use radix trees instead of hash maps for route matching? (Answer: HTTP routes have inherent prefix structure (/api/v1/users, /api/v1/orders). A radix tree shares the /api/v1/ prefix and supports parameterized routes like /users/:id naturally as wildcard nodes. A flat HashMap cannot match parameterized routes, would need separate entries for every possible URL, and cannot support prefix-based middleware attachment.)
What interviewers are really testing: Can you apply tries to a non-string domain? The binary trie for XOR maximization is a classic pattern that tests whether you understand tries at a structural level (a path through a tree) rather than just as a “string data structure.” It also tests bit manipulation fundamentals.Strong Answer:
  • Key insight. “XOR is maximized when bits differ. For each number, I want to find the number in the array that differs from it at the most significant bits possible. A binary trie lets me make the greedy choice at each bit level.”
  • Binary trie construction. Insert each number as a 32-bit (or 64-bit) binary string from the most significant bit to the least significant bit. Each node has at most two children: 0 and 1. This is a trie over a binary alphabet.
  • Greedy XOR search. For each number, traverse the trie trying to go the opposite direction at each bit. If the current bit is 1, try to go to the 0-child (and vice versa) because opposite bits produce a 1 in XOR. If the opposite child does not exist, follow the same-bit child. This greedy approach works because higher bits contribute exponentially more to the XOR value.
  • Why greedy is optimal. A difference at bit position k contributes 2^k to the XOR. A difference at bit k is always worth more than all possible differences at bits 0 through k-1 combined (2^k > 2^(k-1) + 2^(k-2) + … + 1). So the greedy choice at each level is globally optimal.
  • Complexity: O(n * 32) = O(n) time for building the trie and querying each element. O(n * 32) = O(n) space. Dramatically better than the O(n^2) brute force.
Red flag answer: Not recognizing that tries can work with bits, only thinking of character strings. Trying to brute-force all pairs. Not being able to explain why the greedy approach (take the opposite bit) is optimal. Forgetting to fix the bit width (must pad all numbers to the same bit length).Follow-ups:
  1. What if the problem asks for the maximum XOR of a subarray (not just any two elements)? (Answer: use prefix XOR. XOR(i, j) = prefix[j] XOR prefix[i-1]. So the problem reduces to: given an array of prefix XOR values, find the maximum XOR of any two elements — which is exactly the binary trie problem. Build the trie incrementally as you compute prefix XOR values.)
  2. What if numbers are being added to the array dynamically and you need to support “find max XOR with existing numbers” queries? (Answer: the binary trie naturally supports this. Insert each new number into the trie, and for each query, traverse the trie greedily. Both insert and query are O(32) = O(1). This makes it an online algorithm, unlike sorting-based approaches.)
What interviewers are really testing: Can you leverage the trie’s prefix-searching property for a practical text transformation problem? The key insight is that you stop traversal as soon as you hit the first is_end node — you do not need to find the longest match. This tests whether you understand that trie traversal can be terminated early based on problem requirements.Strong Answer:
  • Key insight. “I need the shortest root that is a prefix of each word. A trie gives me this naturally: insert all dictionary roots, then for each word in the sentence, traverse the trie character by character and stop at the first is_end node.”
  • Approach. (1) Build a trie from the dictionary roots. (2) For each word in the sentence, traverse the trie. If at any point the current node has is_end = True, return the path so far (the shortest root). If the traversal fails (character not found), return the original word unchanged.
  • Why shortest root? Because we return at the first is_end encountered during traversal. If the dictionary has both “cat” and “cattle”, “cat” is found first (at depth 3), so “cattle” in the sentence becomes “cat”, not “cattle”.
  • Alternative without trie. Sort roots by length, and for each word, check if any root is a prefix using startswith(). This is O(d * w * m) where d = dictionary size, w = words in sentence, m = average length. The trie approach is O(d * m + w * m) — much faster.
  • Complexity: O(d * m) to build the trie where d = dictionary words, m = average root length. O(w * m) to process the sentence where w = number of words. Total: O((d + w) * m). Space: O(d * m) for the trie.
Red flag answer: Finding the longest matching prefix instead of the shortest (misreading the problem — but also not understanding early termination). Using a HashSet and checking all prefix lengths for each word (works but misses the elegance and efficiency of the trie approach). Not being able to articulate why the trie finds the shortest root naturally.Follow-ups:
  1. What if you wanted the longest root instead of the shortest? How does the traversal change? (Answer: do not stop at the first is_end. Continue traversing until you can go no further, and track the last is_end node you saw. Return the path up to that last end marker. This is also how longest-prefix-match works in IP routing tables.)
  2. If the dictionary is updated frequently (roots added and removed) and sentences arrive as a stream, how do you handle this? (Answer: the trie supports O(m) insertion natively. For deletion, you remove the is_end flag and optionally prune childless nodes bottom-up. Sentences can be processed as they arrive since each word lookup is independent. The trie serves as a living index that evolves with the dictionary.)
What interviewers are really testing: This is a system design question disguised as a data structures question. They want to see if you can combine tries with edit distance concepts, think about ranking suggestions, and reason about real-world constraints like latency and dictionary size. It tests breadth — connecting tries to Levenshtein distance, BK-trees, and probabilistic approaches.Strong Answer:
  • The trie’s role. “A trie is the backbone for storing the valid dictionary. But spell checking is not just prefix lookup — a misspelled word might have characters inserted, deleted, substituted, or transposed anywhere. The trie helps in two ways: (1) verifying if a word is valid in O(m), and (2) enabling a modified DFS that explores words within a given edit distance.”
  • Edit distance on a trie. Run a parallel traversal: walk the trie while tracking a row of the Levenshtein distance DP table. At each trie node, compute the edit distance to the query word so far. If the minimum value in the current DP row exceeds your threshold (e.g., edit distance > 2), prune that entire subtree. This is dramatically faster than computing edit distance against every word in the dictionary.
  • Complexity. Naive approach: O(d * m * n) where d = dictionary size. Trie + DP approach: depends on the threshold, but for threshold=2 on an English dictionary of 200K words, you typically visit only a few thousand nodes instead of all 200K words. Empirically 10-100x faster.
  • Ranking suggestions. Edit distance alone is not enough. Weight by: (1) word frequency (common words should rank higher), (2) keyboard proximity (typing ‘r’ instead of ‘e’ is more likely than typing ‘z’), (3) phonetic similarity (Soundex or Metaphone). Production spell checkers like those in Google Search combine all of these with machine learning models.
  • Alternatives to pure trie. BK-trees organize words by edit distance for efficient range queries. SymSpell precomputes all deletions within a threshold and uses a HashMap for O(1) lookup at the cost of more memory. For production, a hybrid approach is typical.
Red flag answer: Saying “just compute edit distance against every word.” Not knowing what edit distance is. Thinking a trie alone solves spell checking (it only solves the storage and prefix-pruning parts). Not mentioning ranking or frequency weighting.Follow-ups:
  1. SymSpell claims O(1) spell checking. How does it work and what is the trade-off? (Answer: precompute all possible deletions of each dictionary word up to a max edit distance and store them in a HashMap. For a query, generate its deletions and look them up. The trade-off is memory: for 200K words with max edit distance 2, you might store 5-10 million deletion entries. It is fast at query time but expensive to build and memory-hungry.)
  2. How would you extend your spell checker to handle multi-word corrections, like “teh quikc” to “the quick”? (Answer: process words independently first, then use a language model (n-gram or neural) to re-rank suggestions based on context. “quikc” could correct to “quick” or “quiche” — the preceding word “the” makes “the quick” far more likely. This is why production spell checkers are not pure edit-distance systems.)
What interviewers are really testing: This is a judgment and trade-offs question. Many candidates default to “use a trie for anything involving strings” without questioning whether simpler structures would work. Interviewers want to see that you can reason about when the overhead of a trie is justified versus when a HashMap with clever key management does the job.Strong Answer:
  • HashMap wins when you only need exact match. If the problem is “does this exact word exist?” a HashMap gives O(1) average lookup. A trie gives O(m). The trie’s per-character traversal with pointer chasing is slower in practice due to poor cache locality. For 200K English words with average length 8, a HashMap lookup touches 1-2 cache lines; a trie traversal touches 8+ nodes scattered in memory.
  • Trie wins when you need prefix operations. “Find all words starting with 'pre'” requires scanning the entire HashMap (O(n * m)), but a trie traverses to the 'pre' node in O(3) and then DFS collects only matching words. The asymptotic and practical difference is enormous for large dictionaries.
  • Trie wins when you need ordered traversal. A trie naturally supports lexicographic iteration — DFS on the trie yields words in sorted order. A HashMap cannot do this without sorting (O(n log n)).
  • Trie wins for shared-prefix space optimization. If your dictionary has 10,000 words all starting with “micro” (microservice, microchip, etc.), the trie stores “micro” once. A HashMap stores it in every key. For highly prefix-redundant data (URLs, file paths, IP addresses), tries can use significantly less total space.
  • HashMap wins when memory layout matters. Modern CPUs are cache-line optimized. A HashMap storing strings contiguously is far more cache-friendly than a trie with scattered node allocations. For latency-critical systems, the HashMap’s cache behavior often matters more than the trie’s algorithmic advantage.
  • Decision rule. “If the problem only needs exact lookup and no prefix operations, use a HashMap. If it needs prefix search, ordered iteration, or deals with highly prefix-redundant data, use a trie. If you need both, use a trie with a parallel HashMap for exact lookups.”
Red flag answer: Saying “trie is always better for string problems.” Not mentioning cache locality or memory overhead. Not being able to give a concrete scenario where a HashMap beats a trie. Treating algorithmic complexity as the only factor and ignoring constant factors and hardware realities.Follow-ups:
  1. Your service needs to serve both exact lookups and prefix-based autocomplete on the same dictionary. How would you structure the storage? (Answer: use both. A HashMap for O(1) exact lookups (used for spell verification, membership checks) and a trie (or radix tree) for prefix-based autocomplete. The memory overhead of maintaining two structures is usually acceptable given the performance benefit. Alternatively, a sorted array with binary search can serve both — O(log n) exact match and O(log n + k) prefix enumeration — as a simpler middle ground.)
  2. In what scenario would you pick a sorted array over both a trie and a HashMap? (Answer: when the dictionary is static (no inserts/deletes), memory is constrained, and you need both exact and prefix operations. A sorted array supports binary search for exact match O(log n), lower_bound for prefix search O(log n + k), and uses minimal memory with excellent cache locality. The downside is O(n) insertion. Redis uses sorted sets for similar reasons in its internal structures.)

Interview Q&A: Structured Whiteboard Drills

Three high-frequency trie interview questions, each rehearsed with the structure your interviewer wants to hear: framework first, then code, then crisp follow-ups.
Strong Answer Framework:
  1. State the node shape. “Each node has a children map (character to child node) and an is_end boolean.”
  2. State the root. “The root is an empty node representing the empty prefix. It has no character associated with it.”
  3. Insert. Walk character by character, creating children that do not exist. Set is_end = True on the final node.
  4. Search. Walk character by character. Return false if any character is missing. Return true only if is_end is true on the final node.
  5. StartsWith. Same walk as search, but return true as long as the path exists — do not check is_end.
  6. State complexity. All three operations are O(m) where m is the word length. Space O(N * m) worst case where N is the number of words, much less in practice due to shared prefixes.
  7. Mention the array-vs-hashmap tradeoff. Hash map for arbitrary character sets and sparse tries; array of 26 for lowercase English with predictable layout and faster access.
Real LeetCode Example — LC 208 full solution:
class Trie:
    def __init__(self):
        self.root = {}                              # nested dict; "$" key marks end-of-word

    def insert(self, word: str) -> None:
        node = self.root
        for ch in word:
            node = node.setdefault(ch, {})
        node['$'] = True                             # sentinel marking end of word

    def search(self, word: str) -> bool:
        node = self._walk(word)
        return node is not None and '$' in node

    def startsWith(self, prefix: str) -> bool:
        return self._walk(prefix) is not None

    def _walk(self, s):
        node = self.root
        for ch in s:
            if ch not in node:
                return None
            node = node[ch]
        return node
The dict-of-dicts version is the most compact way to write this in Python and is genuinely faster than the class-based version because it avoids attribute lookups. Mention that the class-based version (Template 1) is more readable and what you would write in production code.
Senior follow-up 1: How would you implement delete(word)?Recursive bottom-up. For each character along the path, recursively delete from the subtree. After the recursive call returns, check whether the current node has any remaining children and whether it itself is the end of another word. If neither, prune it from its parent. The trick is that deletion of "car" must not break "care" — the 'c', 'a', 'r' nodes are shared. You only delete nodes that are no longer on any other word’s path.
Senior follow-up 2: Memory cost for 10 million English words?Each TrieNode in CPython is roughly 60 to 80 bytes (a dict plus PyObject overhead). With __slots__ and the is_end flag, you can drop to 50 to 60 bytes. With around 30 percent prefix sharing on natural English, 10 million words averaging 8 characters yields roughly 56 million nodes. Total: 3 to 5 GB. This is why production systems use radix trees (collapse single-child chains) or store the dictionary outside memory in a serialized format like a DAWG (Directed Acyclic Word Graph).
Senior follow-up 3: What if you need ordered iteration (lexicographic)?A trie naturally supports this: DFS in sorted-children order yields words in lexicographic order. With a hash map of children you must sort the keys at each node, costing O(b log b) per node for branching factor b. With an array of 26, the order is implicit — iterate 0 to 25 and yield in order. Use the array layout when ordered iteration matters and the alphabet is fixed.
Common Wrong Answers:
  1. search returns true whenever the walk succeeds.” Misses the is_end check. search("ca") would wrongly return true when the trie has only "cat".
  2. “Use a single hash set of all words for search and a separate trie for startsWith.” Works but doubles the memory and asks the interviewer to evaluate two data structures. The trie alone solves both; that is the point of the data structure.
  3. “Recursion for insert and search.” Works but adds stack depth equal to word length. Iterative is simpler, faster, and avoids recursion limit issues for very long inputs.
Further Reading:
  • LC 1804 Implement Trie II — adds count-words and count-prefixes operations.
  • LC 720 Longest Word in Dictionary — the natural follow-up; same trie, BFS traversal.
  • Sedgewick, Algorithms (4th ed.), section 5.2 — the textbook treatment of tries with array-of-R children.
Strong Answer Framework:
  1. Establish the brute force. “For each word, do a DFS from every cell. With W words, R rows, C cols, average length L, this is O(W * R * C * 4^L). At W = 30,000 this is hopeless.”
  2. Identify the wasted work. Two words sharing the prefix "app" both walk the same first three cells in their DFS. The DFS work for shared prefixes is duplicated W times.
  3. Propose the trie. Build a trie from the word list. Now a single DFS from each cell, walking the trie in lockstep with the grid, simultaneously tests all words with the current path as prefix.
  4. State the pruning rule. When the DFS enters cell (r, c), look up the cell’s character in the current trie node’s children. If it is missing, return immediately — no word in the dictionary starts with this path. This is the killer optimization.
  5. State termination. When the trie node has is_end (or stores the word), record a hit. Continue the DFS afterwards because longer words may also match through this cell.
  6. State the senior optimizations. Store the full word at the terminal node (no string concatenation during DFS). Set word = None after finding (deduplication). Prune empty trie branches on backtrack so future DFS calls hit dead ends faster.
  7. State complexity. O(R * C * 4^L) worst case where L is the longest word. The trie does not change worst-case but dramatically reduces constants on real inputs.
Real LeetCode Example — LC 212 full solution:See the worked-problems section above for the complete code. Walk the interviewer through three details:
  • The word field at the terminal node. Storing the full word means you push the result with found.append(nxt.word) instead of reconstructing from the path — this matters when L is large.
  • nxt.word = None after finding. Without this, the same word can be added multiple times if it is reachable from multiple starting cells. With it, each word is recorded exactly once.
  • Optional empty-branch pruning. if not nxt.children: node.children.pop(ch, None). Once a trie subtree has no remaining words, deleting it means future DFS calls hit a dead end one step earlier. Empirically, this cuts runtime by 40 to 60 percent on dense word lists.
Senior follow-up 1: Your solution still TLEs on a 50x50 board with 30,000 words. What else can you do?Two structural options. (a) Pre-filter the trie: drop any word that contains a character not present anywhere in the grid — one O(W * L) pass that often eliminates a third of the dictionary. (b) Start DFS only from cells whose character exists as a direct child of the root: this skips entire useless DFS calls. (c) Bidirectional pruning: after marking found, walk back up the trie path and prune nodes that no longer lead to any unfound word. The combined effect on adversarial test cases is order-of-magnitude.
Senior follow-up 2: Parallelize across cores.Each starting cell (r, c) spawns an independent DFS. Partition the grid into regions and dispatch a worker per region. The shared trie can be read concurrently; only the pruning step (deleting visited terminal nodes) requires synchronization. The cleanest design: workers append to thread-local result lists, a final merge happens at the end, and pruning is disabled during parallel execution (or done with atomic compare-and-swap on the children map). Speedup is sub-linear because the trie pruning — the biggest single optimization — is harder to apply across threads.
Senior follow-up 3: What changes if cells can be reused (no “do not revisit” rule)?The DFS no longer needs the board[r][c] = '#' marking and restoration. Without revisit avoidance, infinite loops are possible if any cycle in the graph is reachable, so you bound the search depth by the longest word in the trie. Critically, the trie pruning becomes even more important because the search space explodes; without pruning the search is effectively unbounded. This variant rarely appears in LeetCode but does appear in word-game search problems.
Common Wrong Answers:
  1. “Run a separate DFS per word.” Functionally correct but TLEs for any non-trivial dictionary. The whole point of the question is the trie.
  2. “Store the path string and compare against a hash set of words at each step.” Works but checks every prefix against a hash set instead of pruning when no word starts with the prefix. The trie is strictly stronger because it answers “does any word start with this path” in O(1) per character.
  3. “Mark cells visited with a separate visited matrix instead of mutating the board.” Works but doubles the memory and triples the cache misses. Mutating the board with a sentinel like '#' is the standard idiom.
Further Reading:
  • LC 79 Word Search — the single-word version; useful warmup.
  • LC 425 Word Squares — generalizes trie + DFS to a 2D word-grid construction problem.
  • The classic Boggle solver write-ups (Knuth, Sedgewick) — foundational for the trie + DFS pattern.
Strong Answer Framework:
  1. Restate the problem. “Each input(c) call appends a character to the current query and returns the top 3 most frequent historical sentences with that prefix. '#' ends the current sentence and adds it to history with frequency 1.”
  2. State the data structure. Trie keyed by character. Each node has children, an is_end flag, and a count integer. Counts live only on terminal nodes (the end of full sentences).
  3. State the lookup algorithm. Walk the trie to the current prefix node. From there, DFS the entire subtree to collect every (sentence, count) pair. Sort by (-count, sentence) and return the first 3.
  4. State the complexity tradeoff. Naive approach is O(p + m) per input call where p is the prefix length and m is the number of sentences under the prefix. For a popular short prefix like "the", m can be huge.
  5. Propose the production optimization. Store a precomputed top-3 list at each trie node, updated on every insert. Each input call becomes O(p) — one walk to the prefix node and a list read. Insert becomes O(p * log 3) because you must update top-3 lists along the entire insertion path.
  6. Discuss memory. With 100 million sentences and roughly 5 characters per node on average, you have on the order of 500 million nodes, each carrying a top-3 list of pointers. This requires sharding across machines or a disk-backed trie.
Real LeetCode Example — LC 642 with the precomputed-top-3 optimization:
import heapq

class TrieNode:
    __slots__ = ('children', 'sentence', 'count', 'top3')
    def __init__(self):
        self.children = {}
        self.sentence = None
        self.count = 0
        self.top3 = []                     # list of (sentence, count) pairs

class AutocompleteSystem:
    def __init__(self, sentences, times):
        self.root = TrieNode()
        self.cur_input = []
        self.cur_node = self.root
        for s, t in zip(sentences, times):
            self._insert(s, t)

    def _insert(self, sentence, count):
        node = self.root
        path = [self.root]
        for ch in sentence:
            if ch not in node.children:
                node.children[ch] = TrieNode()
            node = node.children[ch]
            path.append(node)
        node.sentence = sentence
        node.count += count
        # Update top-3 along the path
        for n in path:
            self._update_top3(n, sentence, node.count)

    def _update_top3(self, node, sentence, count):
        # Keep top-3 by count desc, then sentence lex asc
        entries = {s: c for s, c in node.top3}
        entries[sentence] = count
        ranked = sorted(entries.items(), key=lambda x: (-x[1], x[0]))
        node.top3 = ranked[:3]

    def input(self, c):
        if c == '#':
            self._insert(''.join(self.cur_input), 1)
            self.cur_input = []
            self.cur_node = self.root
            return []

        self.cur_input.append(c)
        if self.cur_node is not None:
            self.cur_node = self.cur_node.children.get(c)
        if self.cur_node is None:
            return []
        return [s for s, _ in self.cur_node.top3]
The cur_node cache is what makes this O(1) per keystroke after the first character. We do not re-walk from the root each time — we track our position in the trie across calls. When the user types a character that does not extend the current path, we set cur_node = None and stay there until the next '#'.
Senior follow-up 1: How does this scale to 1 billion sentences?Two layers of work. (a) Shard the trie by first character across N machines: traffic for queries starting with 'a' goes to shard 1, etc. Each shard handles roughly 1/26th of the volume. (b) Move the trie off-heap — either to a memory-mapped file using a serialized format like SuRF (Succinct Range Filter) or to a disk-backed structure like RocksDB with prefix scans. The top-3 cache lives in memory; the full sentence storage lives on disk. (c) Write-behind for the count updates — batches of insert traffic queued and applied periodically rather than synchronously, since exact real-time top-3 is rarely required.
Senior follow-up 2: Cold start — the system has just launched and has no historical data.Two strategies, often combined. (a) Bootstrap from a static dictionary of common queries (an English n-gram corpus, query logs from a similar product). Initial autocomplete suggestions are reasonable from day one. (b) Online learning — as users type and submit queries, the trie grows in real time. After a week of traffic, the bootstrap dictionary’s weight is decayed and the live distribution dominates. Production search engines like Bing and DuckDuckGo run a hybrid forever — the bootstrap acts as a fallback for queries with no history.
Senior follow-up 3: The problem changes — now suggestions must be personalized per user.Per-user tries are infeasible at scale. The standard approach is a global trie + per-user reranker. The global trie returns the top 50 candidates; a downstream model (often gradient-boosted trees or a small neural network) re-ranks based on per-user features (clicked sentences, location, time of day). The trie does the recall step in microseconds; the reranker does the precision step in milliseconds. This is roughly the architecture of Google search-suggest as published in their engineering talks.
Common Wrong Answers:
  1. “DFS the entire subtree on every keystroke.” Correct but slow for popular prefixes. For "a" in a million-sentence trie, you DFS hundreds of thousands of nodes per keystroke. Production systems precompute top-K at each node.
  2. “Use a SQL LIKE 'prefix%' ORDER BY count DESC LIMIT 3 query.” Works for small datasets and is genuinely the right answer when you have a database already and the dataset is under a few million rows. But the question is asking you to design the data structure, not delegate to a database.
  3. “Maintain a global sorted list of all sentences and binary-search the prefix range.” O(log n + k) per query and easy to implement, but updates are O(n) due to shifting. Falls apart for high write throughput.
Further Reading:
  • LC 1268 Search Suggestions System — the static, simpler version of LC 642.
  • LC 1804 Implement Trie II — adds count-words and count-prefixes, a stepping-stone toward autocomplete-with-counts.
  • Google “Suggest” engineering talks (e.g., the 2010 talk by Daniel Russell) — production architecture of large-scale autocomplete.