> ## Documentation Index > Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt > Use this file to discover all available pages before exploring further. # Depth-First Search (DFS) > Master DFS for trees, graphs, and recursive exploration problems DFS Pattern

## What is DFS? **Depth-First Search** explores as far as possible along each branch before backtracking. It uses a stack (explicit or via recursion) to track the path and is ideal for problems requiring complete path exploration. Imagine you are exploring a cave system. Instead of checking every tunnel at the entrance first (that would be BFS), you pick one tunnel and walk as deep as you can. When you hit a dead end, you backtrack to the last fork and try the next tunnel. That is DFS -- you go deep before you go wide. Under the hood, DFS is powered by a **stack**. Recursive DFS uses the call stack implicitly; iterative DFS uses an explicit stack data structure. Both produce the same traversal order. The practical difference is that recursion is cleaner for tree problems but can cause stack overflow on very deep graphs (Python's default recursion limit is around 1000), while iterative DFS handles any depth safely. **Complexity**: O(V + E) time for graphs (every vertex and edge visited once), O(n) for trees. Space is O(h) where h is the maximum depth -- this is O(log n) for balanced trees but O(n) for skewed trees or linear graphs. **Quick Recognition**: If you need to explore **all paths**, find **connected components**, detect **cycles**, or perform **tree traversals**, think DFS! **Pattern Recognition Cheatsheet — When the LeetCode prompt says X, reach for DFS:** | Phrase in the Problem | What to Reach For | | ------------------------------------------------------------------ | ---------------------------------------------------------------------------- | | "find all paths from A to B" | DFS with backtracking — push to path on entry, pop on exit | | "count connected components" / "number of islands" / "provinces" | DFS flood-fill from each unvisited node | | "topological sort" / "valid course ordering" | DFS postorder, reverse the result (or Kahn's BFS) | | "detect cycle in a directed graph" | DFS with three-color marking (WHITE / GRAY / BLACK) | | "detect cycle in an undirected graph" | DFS with visited set, skip the parent node | | "tree problem" — preorder, inorder, postorder, max depth, path sum | Recursive tree DFS, choose order based on when you need parent vs child info | | "clone a graph" / "deep copy" | DFS with a `node -> clone` hashmap | | "surrounded regions" / "regions captured on borders" | DFS from boundary inward, mark "safe" cells, then sweep | | "word search on a grid" | DFS with backtracking — mark cell on entry, restore on exit | | "all subsets / permutations / combinations" | DFS with backtracking (the canonical backtracking template) | | "Pacific Atlantic Water Flow" / "reach from multiple sources" | Two DFS sweeps from each ocean's border, intersect the reachable sets | If the problem says "shortest path" or "minimum steps," DFS is wrong — use BFS. If it says "shortest weighted path," neither — use Dijkstra. ## Pattern Recognition Checklist * Need to explore **all paths** * Finding **connected components** * Detecting **cycles** in graphs * **Tree traversals** (pre/in/post order) * **Backtracking** problems * Finding **shortest path** (use BFS) * **Level-order** traversal needed * Path length matters more than existence * Graph is very **deep** (stack overflow risk) ## When to Use Preorder, inorder, postorder traversals Finding all paths, checking path existence Counting islands, detecting cycles Permutations, combinations, subset generation ## Pattern Variations ### 1. Tree DFS (Recursive) The three traversal orders -- preorder, inorder, postorder -- differ only in **when** you process the current node relative to its children. The recursive structure is identical; only the placement of "do work here" changes. * **Preorder** (root, left, right): Use when you need to process a node before its subtree (e.g., serialization, copying a tree). * **Inorder** (left, root, right): Use for BSTs -- it visits nodes in sorted order. * **Postorder** (left, right, root): Use when you need child results before processing the parent (e.g., computing subtree sizes, deleting a tree). ```python Python theme={null} def tree_dfs(root): """Basic tree DFS traversal showing all three orders""" if not root: return # PREORDER: process BEFORE visiting children # Use for: serialization, tree copying, prefix expression evaluation print(root.val) tree_dfs(root.left) # INORDER: process BETWEEN children # Use for: BST sorted-order traversal, converting BST to sorted array tree_dfs(root.right) # POSTORDER: process AFTER children # Use for: subtree aggregation, tree deletion, postfix evaluation ``` ```java Java theme={null} public void treeDfs(TreeNode root) { // Basic tree DFS traversal if (root == null) { return; } // Preorder: process before children System.out.println(root.val); treeDfs(root.left); // Inorder: process between children (BST gives sorted order) treeDfs(root.right); // Postorder: process after children } ``` ```cpp C++ theme={null} void treeDfs(TreeNode* root) { // Basic tree DFS traversal if (root == nullptr) { return; } // Preorder: process before children cout << root->val << endl; treeDfs(root->left); // Inorder: process between children (BST gives sorted order) treeDfs(root->right); // Postorder: process after children } ``` ### 2. Graph DFS with Visited Set ```python Python theme={null} def graph_dfs(graph, start): """DFS on adjacency list graph""" visited = set() result = [] def dfs(node): if node in visited: return visited.add(node) result.append(node) for neighbor in graph[node]: dfs(neighbor) dfs(start) return result ``` ```java Java theme={null} public List graphDfs(Map> graph, int start) { // DFS on adjacency list graph Set visited = new HashSet<>(); List result = new ArrayList<>(); dfs(graph, start, visited, result); return result; } private void dfs(Map> graph, int node, Set visited, List result) { if (visited.contains(node)) { return; } visited.add(node); result.add(node); for (int neighbor : graph.getOrDefault(node, new ArrayList<>())) { dfs(graph, neighbor, visited, result); } } ``` ```cpp C++ theme={null} vector graphDfs(unordered_map>& graph, int start) { // DFS on adjacency list graph unordered_set visited; vector result; function dfs = [&](int node) { if (visited.count(node)) { return; } visited.insert(node); result.push_back(node); for (int neighbor : graph[node]) { dfs(neighbor); } }; dfs(start); return result; } ``` ### 3. Grid DFS (Island Problems) The "Number of Islands" pattern: iterate through every cell, and when you find unvisited land ('1'), increment the island count and use DFS to "sink" the entire connected island by marking all reachable land cells as visited. This avoids counting the same island multiple times. **Visited tracking**: Here we modify the grid in place (changing '1' to '0') as the visited marker. If mutating the input is not allowed, use a separate `visited` matrix. **Time: O(m \* n)** -- each cell visited at most once. **Space: O(m \* n)** worst case for the recursion stack on a grid that is entirely land. **Stack overflow warning**: For large grids (e.g., 300x300), the recursive call depth can hit 90,000. In Python, you need `sys.setrecursionlimit()` or should convert to iterative DFS with an explicit stack. ```python Python theme={null} def count_islands(grid): """Count connected components in a grid""" if not grid: return 0 rows, cols = len(grid), len(grid[0]) count = 0 def dfs(r, c): # BOUNDARY CHECK: out of bounds or already visited/water if r < 0 or r >= rows or c < 0 or c >= cols: return if grid[r][c] != '1': return grid[r][c] = '0' # MARK VISITED by sinking this cell # Explore all 4 adjacent cells dfs(r + 1, c) # Down dfs(r - 1, c) # Up dfs(r, c + 1) # Right dfs(r, c - 1) # Left for r in range(rows): for c in range(cols): if grid[r][c] == '1': # Found unvisited land count += 1 # New island discovered dfs(r, c) # Sink the entire island return count ``` ```java Java theme={null} public int countIslands(char[][] grid) { // Count connected components in a grid if (grid == null || grid.length == 0) { return 0; } int rows = grid.length, cols = grid[0].length; int count = 0; for (int r = 0; r < rows; r++) { for (int c = 0; c < cols; c++) { if (grid[r][c] == '1') { count++; dfs(grid, r, c); } } } return count; } private void dfs(char[][] grid, int r, int c) { int rows = grid.length, cols = grid[0].length; if (r < 0 || r >= rows || c < 0 || c >= cols) { return; } if (grid[r][c] != '1') { return; } grid[r][c] = '0'; // Mark visited // Explore 4 directions dfs(grid, r + 1, c); dfs(grid, r - 1, c); dfs(grid, r, c + 1); dfs(grid, r, c - 1); } ``` ```cpp C++ theme={null} int countIslands(vector>& grid) { // Count connected components in a grid if (grid.empty()) { return 0; } int rows = grid.size(), cols = grid[0].size(); int count = 0; function dfs = [&](int r, int c) { if (r < 0 || r >= rows || c < 0 || c >= cols) { return; } if (grid[r][c] != '1') { return; } grid[r][c] = '0'; // Mark visited // Explore 4 directions dfs(r + 1, c); dfs(r - 1, c); dfs(r, c + 1); dfs(r, c - 1); }; for (int r = 0; r < rows; r++) { for (int c = 0; c < cols; c++) { if (grid[r][c] == '1') { count++; dfs(r, c); } } } return count; } ``` ### 4. DFS with Path Tracking When you need **all paths** (not just reachability), DFS becomes a backtracking problem. You build the path as you recurse, and you **un-visit** nodes when you backtrack so they can appear in other paths. **Critical difference from standard DFS**: In standard graph DFS, once you mark a node visited, it stays visited forever. In path-tracking DFS, you **remove** the node from the visited set (or the path) after exploring its subtree. Without this, paths that share intermediate nodes get incorrectly pruned. **Walked example**: graph with edges A->B, A->C, B->C, B->D, C->D. Paths from A to D: \[A,B,D], \[A,B,C,D], \[A,C,D]. Standard DFS (permanent visited) would find only \[A,B,D] and miss \[A,C,D] because C was already visited via B. **Time**: O(V! \* V) in the worst case (exponentially many paths). **Space**: O(V) for the recursion stack and path. ```python Python theme={null} def find_all_paths(graph, start, end): """Find all paths from start to end""" all_paths = [] def dfs(node, path): if node == end: all_paths.append(path[:]) return for neighbor in graph[node]: if neighbor not in path: # Avoid cycles path.append(neighbor) dfs(neighbor, path) path.pop() # Backtrack dfs(start, [start]) return all_paths ``` ```java Java theme={null} public List> findAllPaths(Map> graph, int start, int end) { // Find all paths from start to end List> allPaths = new ArrayList<>(); List path = new ArrayList<>(); path.add(start); dfs(graph, start, end, path, allPaths); return allPaths; } private void dfs(Map> graph, int node, int end, List path, List> allPaths) { if (node == end) { allPaths.add(new ArrayList<>(path)); return; } for (int neighbor : graph.getOrDefault(node, new ArrayList<>())) { if (!path.contains(neighbor)) { // Avoid cycles path.add(neighbor); dfs(graph, neighbor, end, path, allPaths); path.remove(path.size() - 1); // Backtrack } } } ``` ```cpp C++ theme={null} vector> findAllPaths(unordered_map>& graph, int start, int end) { // Find all paths from start to end vector> allPaths; vector path = {start}; unordered_set visited = {start}; function dfs = [&](int node) { if (node == end) { allPaths.push_back(path); return; } for (int neighbor : graph[node]) { if (!visited.count(neighbor)) { // Avoid cycles visited.insert(neighbor); path.push_back(neighbor); dfs(neighbor); path.pop_back(); // Backtrack visited.erase(neighbor); } } }; dfs(start); return allPaths; } ``` ### 5. Iterative DFS with Stack The iterative version replaces the call stack with an explicit stack data structure. This avoids stack overflow for deep graphs (Python default recursion limit is \~1000) and gives you fine-grained control over traversal. **Subtle difference from recursive DFS**: A stack is LIFO, so neighbors pushed left-to-right are popped right-to-left. To visit neighbors in the same order as recursive DFS, push them in **reverse** order. **When to use iterative over recursive**: When graph depth could exceed the call stack limit (e.g., a 1000x1000 grid has max depth 1,000,000), when you need explicit control over the traversal order, or when performance-critical code needs to avoid function-call overhead. **Time: O(V + E). Space: O(V)** for the stack and visited set. ```python Python theme={null} def iterative_dfs(graph, start): """DFS using explicit stack""" visited = set() stack = [start] result = [] while stack: node = stack.pop() if node in visited: continue visited.add(node) result.append(node) # Add neighbors in reverse order for correct traversal for neighbor in reversed(graph[node]): if neighbor not in visited: stack.append(neighbor) return result ``` ```java Java theme={null} public List iterativeDfs(Map> graph, int start) { // DFS using explicit stack Set visited = new HashSet<>(); Stack stack = new Stack<>(); List result = new ArrayList<>(); stack.push(start); while (!stack.isEmpty()) { int node = stack.pop(); if (visited.contains(node)) { continue; } visited.add(node); result.add(node); List neighbors = graph.getOrDefault(node, new ArrayList<>()); for (int i = neighbors.size() - 1; i >= 0; i--) { if (!visited.contains(neighbors.get(i))) { stack.push(neighbors.get(i)); } } } return result; } ``` ```cpp C++ theme={null} vector iterativeDfs(unordered_map>& graph, int start) { // DFS using explicit stack unordered_set visited; stack stk; vector result; stk.push(start); while (!stk.empty()) { int node = stk.top(); stk.pop(); if (visited.count(node)) { continue; } visited.insert(node); result.push_back(node); // Add neighbors in reverse order vector& neighbors = graph[node]; for (int i = neighbors.size() - 1; i >= 0; i--) { if (!visited.count(neighbors[i])) { stk.push(neighbors[i]); } } } return result; } ``` ## Classic Problems **Pattern**: Grid DFS marking visited cells **Key**: Sink the island as you explore it **Pattern**: DFS with HashMap for visited/cloned nodes **Key**: Create clone first, then connect neighbors **Pattern**: Cycle detection with coloring (white/gray/black) **Key**: Gray nodes indicate cycle in current path **Pattern**: Grid DFS with backtracking **Key**: Mark cell temporarily, restore after exploration ## Cycle Detection The three-color algorithm (white/gray/black) is the standard approach for directed graphs. The colors represent: * **WHITE (0)**: Not yet discovered. * **GRAY (1)**: Currently in the recursion stack -- we are exploring its descendants. * **BLACK (2)**: Fully processed -- all descendants have been explored. A cycle exists if and only if we encounter a **GRAY** node during DFS -- meaning we have found a **back edge** (an edge from a descendant back to an ancestor in the current DFS path). **Why not just use a visited set?** A simple visited boolean cannot distinguish between a back edge (cycle) and a cross edge (no cycle) in directed graphs. The three-color scheme solves this. For **undirected** graphs, a simple visited set is sufficient -- just skip the parent node. **Time: O(V + E). Space: O(V).** ```python Python theme={null} def has_cycle_directed(graph, n): """Detect cycle in directed graph using DFS coloring""" WHITE, GRAY, BLACK = 0, 1, 2 color = [WHITE] * n def dfs(node): color[node] = GRAY # Mark: currently exploring this path for neighbor in graph[node]: if color[neighbor] == GRAY: # BACK EDGE: cycle detected! return True if color[neighbor] == WHITE and dfs(neighbor): return True # Cycle found deeper in the tree color[node] = BLACK # Mark: fully explored, no cycle here return False # Must try starting DFS from every unvisited node (graph may be disconnected) for i in range(n): if color[i] == WHITE and dfs(i): return True return False ``` ```java Java theme={null} public boolean hasCycleDirected(List> graph, int n) { // Detect cycle in directed graph using DFS coloring int[] color = new int[n]; // 0=WHITE, 1=GRAY, 2=BLACK for (int i = 0; i < n; i++) { if (color[i] == 0 && dfs(graph, i, color)) { return true; } } return false; } private boolean dfs(List> graph, int node, int[] color) { color[node] = 1; // GRAY - currently exploring for (int neighbor : graph.get(node)) { if (color[neighbor] == 1) { // Back edge = cycle return true; } if (color[neighbor] == 0 && dfs(graph, neighbor, color)) { return true; } } color[node] = 2; // BLACK - fully explored return false; } ``` ```cpp C++ theme={null} bool hasCycleDirected(vector>& graph, int n) { // Detect cycle in directed graph using DFS coloring vector color(n, 0); // 0=WHITE, 1=GRAY, 2=BLACK function dfs = [&](int node) -> bool { color[node] = 1; // GRAY - currently exploring for (int neighbor : graph[node]) { if (color[neighbor] == 1) { // Back edge = cycle return true; } if (color[neighbor] == 0 && dfs(neighbor)) { return true; } } color[node] = 2; // BLACK - fully explored return false; }; for (int i = 0; i < n; i++) { if (color[i] == 0 && dfs(i)) { return true; } } return false; } ``` ## Common Mistakes **Avoid These Pitfalls:** 1. **Stack overflow on deep graphs**: Python defaults to a recursion limit of \~1000. A grid of 500x500 can have DFS depth of 250,000. For grid problems, either increase the limit with `sys.setrecursionlimit()` or convert to iterative DFS with an explicit stack. In Java/C++, the default stack is larger but can still overflow on extreme inputs. 2. **Forgetting the visited check**: Without marking nodes as visited, DFS on a graph with cycles loops forever. On trees this is not an issue (no cycles), but on general graphs it is the most common infinite-loop bug. 3. **Not restoring state in backtracking DFS**: When you need to explore ALL paths (not just reachability), you must un-mark a node after exploring its subtree so other paths can reuse it. Forgetting this causes you to miss valid paths. This is different from standard DFS where you mark once and never unmark. 4. **Confusing preorder and postorder**: When computing properties that depend on subtree results (e.g., subtree sum, height), you must use postorder -- process children first, then the parent. Using preorder by mistake means child results are not yet computed when you try to use them. 5. **Modifying the input grid without permission**: Many grid DFS solutions overwrite cells to mark them visited. In an interview, always ask if modifying the input is acceptable. If not, use a separate visited matrix. ## Debugging Checklist When your DFS solution fails: Are you marking nodes as visited? Using set or boolean array? What stops the recursion? Null node? Visited node? Are you restoring state after exploring each path? Directed or undirected? Affects cycle detection logic. Preorder, inorder, or postorder? Order matters for result. ## DFS vs BFS Quick Reference | Aspect | DFS | BFS | | ---------------- | ------------------- | ---------------- | | Data Structure | Stack / Recursion | Queue | | Path Finding | All paths | Shortest path | | Space | O(h) height | O(w) width | | Cycle Detection | Yes (with coloring) | Yes (undirected) | | Topological Sort | Yes | Yes (Kahn's) | | Best For | Deep exploration | Level-by-level | ## Interview Problems by Company | Problem | Company | Key Concept | | ------------------ | ------- | ----------------- | | Max Depth of Tree | All | Basic tree DFS | | Same Tree | Amazon | Parallel DFS | | Path Sum | Meta | Root-to-leaf DFS | | Invert Binary Tree | Google | Tree modification | | Symmetric Tree | Amazon | Mirror DFS | | Problem | Company | Key Concept | | ----------------- | ------------ | ------------------- | | Number of Islands | All FAANG | Grid DFS | | Clone Graph | Meta | Graph DFS + HashMap | | Course Schedule | All FAANG | Cycle detection | | Word Search | Amazon, Meta | Grid backtracking | | Path Sum II | Amazon | Collect all paths | | Problem | Company | Key Concept | | -------------------- | ------------ | --------------------- | | Word Ladder II | Google | BFS + DFS combination | | Alien Dictionary | Meta, Google | Topological sort | | Critical Connections | Amazon | Tarjan's algorithm | | Word Search II | Amazon | Trie + DFS | ## Interview Tips **Script for interviews:** 1. "I'll use DFS to explore all possibilities." 2. "I'll maintain a visited set to avoid cycles." 3. "At each node, I'll \[do work] before/after visiting children." 4. "Base case is when \[condition] is met." 5. "Time is O(V+E) for graphs, O(n) for trees." | Interviewer Says | You Should Think | | --------------------------- | ---------------------- | | "Find all paths" | DFS with path tracking | | "Check if path exists" | Simple DFS | | "Find connected components" | DFS on each unvisited | | "Detect cycle" | DFS with coloring | | "Topological order" | DFS postorder reversal | **Use Recursive When**: * Tree problems (max depth is log n or limited) * Code clarity matters * Natural recursive structure **Use Iterative When**: * Graph can be very deep (avoid stack overflow) * Need explicit control over stack * Performance is critical ## Worked LeetCode Problems These five cover the DFS interview spectrum: grid flood-fill (200), area aggregation (695), directed cycle detection (207), boundary-aware flood (130), and multi-source from edges (417). If you can solve all five fluently, you have \~75 percent of DFS interview territory covered. ### LC 200 — Number of Islands (Medium, DFS variant) The DFS prototype. Iterate every cell; whenever you find unvisited land, increment the count and DFS-flood the entire island. The flood marks every reachable land cell as visited so the outer loop never double-counts. ```python theme={null} def numIslands(grid): if not grid: return 0 rows, cols = len(grid), len(grid[0]) def dfs(r, c): if r < 0 or r >= rows or c < 0 or c >= cols or grid[r][c] != '1': return grid[r][c] = '0' # Mark visited (sink) dfs(r+1, c); dfs(r-1, c); dfs(r, c+1); dfs(r, c-1) count = 0 for r in range(rows): for c in range(cols): if grid[r][c] == '1': count += 1 dfs(r, c) return count ``` Time O(m\*n), space O(m\*n) recursion depth worst case (entire grid is land). On a 500x500 all-land grid Python's default recursion limit will explode — use BFS or `sys.setrecursionlimit(...)` defensively, or just convert to iterative DFS with an explicit stack. ### LC 695 — Max Area of Island (Medium) Same shape as 200, except the DFS returns the area of the connected component instead of just visiting it. The pattern — DFS returning an aggregated value — generalizes to "longest path in a tree," "sum of all subtree values," and many tree DP problems. ```python theme={null} def maxAreaOfIsland(grid): rows, cols = len(grid), len(grid[0]) def dfs(r, c): if r < 0 or r >= rows or c < 0 or c >= cols or grid[r][c] != 1: return 0 grid[r][c] = 0 return 1 + dfs(r+1, c) + dfs(r-1, c) + dfs(r, c+1) + dfs(r, c-1) best = 0 for r in range(rows): for c in range(cols): if grid[r][c] == 1: best = max(best, dfs(r, c)) return best ``` ### LC 207 — Course Schedule (Medium, directed cycle detection) Build the prerequisite graph; the question reduces to "does this graph contain a directed cycle?" Use the three-color DFS. WHITE = unvisited, GRAY = on current recursion stack, BLACK = fully processed. Hitting GRAY = back edge = cycle. ```python theme={null} def canFinish(numCourses, prerequisites): graph = [[] for _ in range(numCourses)] for course, prereq in prerequisites: graph[prereq].append(course) WHITE, GRAY, BLACK = 0, 1, 2 color = [WHITE] * numCourses def dfs(node): color[node] = GRAY for nb in graph[node]: if color[nb] == GRAY: # Back edge -> cycle return False if color[nb] == WHITE and not dfs(nb): return False color[node] = BLACK return True for i in range(numCourses): if color[i] == WHITE and not dfs(i): return False return True ``` Why a simple `visited` set fails on directed graphs: a "visited" node could be either an ancestor (cycle!) or a fully-processed sibling (cross edge, harmless). The three colors disambiguate. ### LC 130 — Surrounded Regions (Medium, boundary DFS) "Capture all O regions completely surrounded by X." The trick: instead of figuring out which regions are surrounded, find which are NOT — those connected to a border O. Run DFS from every border O, mark them safe (e.g., as `'#'`). Then sweep the grid: remaining O = surrounded -> X; safe `'#'` -> O. ```python theme={null} def solve(board): if not board: return rows, cols = len(board), len(board[0]) def dfs(r, c): if r < 0 or r >= rows or c < 0 or c >= cols or board[r][c] != 'O': return board[r][c] = '#' # Safe sentinel dfs(r+1, c); dfs(r-1, c); dfs(r, c+1); dfs(r, c-1) for r in range(rows): dfs(r, 0); dfs(r, cols-1) for c in range(cols): dfs(0, c); dfs(rows-1, c) for r in range(rows): for c in range(cols): if board[r][c] == 'O': board[r][c] = 'X' # Surrounded -> capture elif board[r][c] == '#': board[r][c] = 'O' # Safe -> restore ``` The "search from the complement" trick recurs constantly: 1020 (Number of Enclaves), 1254 (Closed Islands), 417 (Pacific Atlantic). ### LC 417 — Pacific Atlantic Water Flow (Medium) Find all cells from which water can flow to BOTH oceans. Naive: BFS/DFS from every cell — O(m^2 \* n^2). Smart: invert. DFS from each Pacific border cell, flowing UPHILL (or equal), mark every cell that can reach the Pacific. Same from Atlantic. Intersect the two sets. ```python theme={null} def pacificAtlantic(heights): if not heights: return [] rows, cols = len(heights), len(heights[0]) pacific, atlantic = set(), set() def dfs(r, c, visited, prev_height): if (r, c) in visited or r < 0 or r >= rows or c < 0 or c >= cols: return if heights[r][c] < prev_height: # Water cannot flow up here return visited.add((r, c)) for dr, dc in [(0,1),(0,-1),(1,0),(-1,0)]: dfs(r+dr, c+dc, visited, heights[r][c]) for r in range(rows): dfs(r, 0, pacific, heights[r][0]) dfs(r, cols-1, atlantic, heights[r][cols-1]) for c in range(cols): dfs(0, c, pacific, heights[0][c]) dfs(rows-1, c, atlantic, heights[rows-1][c]) return [[r, c] for r, c in pacific & atlantic] ``` The mental flip: instead of asking each cell "can you reach the ocean?", let each ocean ask "what cells can reach me?" That collapses the complexity from O((m\*n)^2) to O(m\*n). ## Caveats and Traps **Trap 1 — Recursion depth limit (Python \~1000) on large graphs.** A 500x500 all-land grid has DFS depth 250000. Python default recursion limit is around 1000 — instant `RecursionError`. Even raising the limit risks a C-level stack overflow segfault. **Solution:** Either (a) convert to iterative DFS with an explicit `stack = [start]` and `pop()` loop, or (b) use BFS, which is naturally iterative. For interviews, mention this risk: "I'll use recursive DFS for clarity, but for grids beyond 30x30 in Python I would switch to iterative or BFS." **Trap 2 — Directed cycle detection with a simple visited set.** A boolean `visited` cannot tell you whether a node is on the CURRENT DFS path (back edge = cycle) or on a different completed branch (cross edge = harmless). On a directed graph this gives false positives. **Solution:** Use three-color marking. WHITE = not yet seen, GRAY = on current DFS stack, BLACK = fully processed. Cycle iff DFS encounters a GRAY neighbor. For UNDIRECTED graphs a simple `visited` set works — just skip the parent so you do not flag the edge you came in on as a cycle. **Trap 3 — Forgetting to backtrack (un-mark) on path-enumeration problems.** In standard DFS you mark visited once and never unmark. In path-enumeration / backtracking, the "visited" status is path-specific: a node visited on one path must be available on another. Forgetting to `path.pop()` and `visited.remove(node)` after the recursive call gives wrong (truncated) results. **Solution:** Always pair `path.append(node)` with `path.pop()` and `visited.add(node)` with `visited.remove(node)` around the recursive call. This is the defining structure of backtracking. **Trap 4 — Modifying input grid without permission.** Many DFS solutions sink visited cells (`'1'` -> `'0'`). In an interview, mutating the input is technically a side effect — the caller may need the grid afterward. Stating it without asking signals immaturity. **Solution:** Ask: "May I modify the input to save space, or should I use a separate visited matrix?" Both are valid; choosing knowingly is the senior move. For production code, default to a separate visited array unless the problem explicitly allows mutation. **Trap 5 — Using preorder when the problem requires postorder.** Computing subtree-dependent properties (subtree sum, height, balanced check) needs CHILD results before processing the parent. Doing the work in preorder means the children have not been computed yet — your recursion sees garbage. **Solution:** Memorize: preorder = "do work on arrival" (serialization, copying), inorder = "do work between children" (BST sorted traversal), postorder = "do work on departure" (subtree aggregations, deletion, balanced check, height). When in doubt, ask: do I need the parent's info before children, or children's info before parent? ## Curated LeetCode List | Difficulty | Problem | LC # | Why It Matters | | ---------- | --------------------------------- | ---- | ------------------------------------------ | | Easy | Maximum Depth of Binary Tree | 104 | Cleanest postorder example | | Easy | Same Tree | 100 | Parallel DFS on two trees | | Easy | Symmetric Tree | 101 | Mirror DFS, recursive structure | | Easy | Path Sum | 112 | Root-to-leaf DFS with running sum | | Easy | Invert Binary Tree | 226 | DFS modifying tree structure | | Medium | Number of Islands | 200 | Grid DFS prototype | | Medium | Max Area of Island | 695 | DFS returning an aggregate | | Medium | Course Schedule | 207 | Directed cycle detection | | Medium | Course Schedule II | 210 | Topological sort via DFS postorder | | Medium | Surrounded Regions | 130 | Boundary DFS, complement trick | | Medium | Pacific Atlantic Water Flow | 417 | Multi-source DFS from borders | | Medium | Clone Graph | 133 | DFS with hashmap of clones | | Medium | Word Search | 79 | DFS with backtracking on grid | | Medium | Number of Closed Islands | 1254 | Boundary trick (similar to 130) | | Medium | All Paths From Source to Target | 797 | DFS path enumeration on a DAG | | Medium | Number of Provinces | 547 | DFS or Union-Find for components | | Hard | Word Search II | 212 | DFS plus Trie pruning | | Hard | Alien Dictionary | 269 | Topological sort, careful char graph build | | Hard | Critical Connections | 1192 | Tarjan's bridges algorithm | | Hard | Longest Increasing Path in Matrix | 329 | DFS plus memoization (DP on DAG) | Solve order: every Easy first (one sitting), then 200 -> 695 -> 130 -> 417 -> 207 -> 210 -> 133 (a week of grid + graph DFS), then 79 -> 797 -> 547 (backtracking + path enum), then a hard each weekend. ## DFS Interview Q\&A **Strong Answer Framework:** 1. Recognize: connected components in a grid -> flood-fill from each unvisited land cell. 2. Outer loop: scan every cell. When you find `'1'`, increment count and flood. 3. Flood-fill is interchangeable: DFS (recursive or iterative) or BFS — pick based on grid size. 4. Mark visited inline (sink the cell to `'0'`) or use a separate matrix. 5. Time O(m\*n). Space depends on flood method: DFS recursion is O(m\*n) worst case; BFS queue is O(min(m, n)). **Real LeetCode Example — LC 200 Number of Islands.** DFS version: ```python theme={null} def numIslands(grid): if not grid: return 0 rows, cols = len(grid), len(grid[0]) def dfs(r, c): if r < 0 or r >= rows or c < 0 or c >= cols or grid[r][c] != '1': return grid[r][c] = '0' dfs(r+1, c); dfs(r-1, c); dfs(r, c+1); dfs(r, c-1) count = 0 for r in range(rows): for c in range(cols): if grid[r][c] == '1': count += 1 dfs(r, c) return count ``` BFS version (preferred for large grids in Python — no recursion limit): ```python theme={null} from collections import deque def numIslands(grid): if not grid: return 0 rows, cols = len(grid), len(grid[0]) count = 0 for r in range(rows): for c in range(cols): if grid[r][c] == '1': count += 1 queue = deque([(r, c)]) grid[r][c] = '0' while queue: cr, cc = queue.popleft() for dr, dc in [(0,1),(0,-1),(1,0),(-1,0)]: nr, nc = cr+dr, cc+dc if 0 <= nr < rows and 0 <= nc < cols and grid[nr][nc] == '1': grid[nr][nc] = '0' queue.append((nr, nc)) return count ``` **Senior follow-up 1 — What if islands can be added one at a time and you need the count after each addition?** This is LC 305 Number of Islands II. Pure DFS/BFS recomputes from scratch each time — O(K \* m \* n). Use **Union-Find with offline coordinate flattening**: each new land cell is a new node. Union with each of its 4 land neighbors. Track the component count. Each addition is O(alpha(n)) — effectively constant. Total: O(K \* alpha(K)). **Senior follow-up 2 — What if the grid is 10000x10000?** DFS recursion blows up Python's call stack. Either iterative DFS with an explicit stack or BFS. BFS gives a tighter memory bound (queue holds the wavefront, not the full path). At this scale also consider memory layout: a numpy boolean array for the visited mask is 10x faster than a Python set of tuples. **Senior follow-up 3 — What if we want to count islands that touch the border (escape islands)?** Modify the flood: track a flag during the DFS — set to True if any cell of the component is on the border. Increment "escape count" only if the flag is True at the end of the flood. This pattern shows up in LC 1254 (Number of Closed Islands) and LC 1020 (Number of Enclaves). **Common Wrong Answers:** 1. *Forgetting the bounds check inside DFS.* Out-of-bounds access crashes or silently wraps in some languages. 2. *Counting visited cells instead of components.* Confuses "size of largest island" with "number of islands." 3. *Using DFS recursion on huge grids in Python without raising the recursion limit.* TLE or RecursionError. **Further Reading:** LC 200, LC 695, LC 305 (UF version), LC 1254. **Strong Answer Framework:** 1. Recognize: prerequisites form a directed graph. "Can we finish all courses?" = "is the graph acyclic?" 2. Build adjacency list from prerequisite pairs. 3. Run DFS from every unvisited node with three-color marking. 4. Hitting a GRAY neighbor = back edge = cycle = return False. 5. If all DFS calls finish without GRAY hits, no cycle = return True. 6. Alternative: Kahn's algorithm (BFS on in-degree-zero nodes). Both are O(V + E). **Real LeetCode Example — LC 207 Course Schedule:** ```python theme={null} def canFinish(numCourses, prerequisites): graph = [[] for _ in range(numCourses)] for course, prereq in prerequisites: graph[prereq].append(course) WHITE, GRAY, BLACK = 0, 1, 2 color = [WHITE] * numCourses def dfs(node): color[node] = GRAY for nb in graph[node]: if color[nb] == GRAY: # Back edge return False if color[nb] == WHITE and not dfs(nb): return False color[node] = BLACK return True for i in range(numCourses): if color[i] == WHITE and not dfs(i): return False return True ``` Time O(V + E), space O(V) for color array plus recursion stack. **Senior follow-up 1 — Return one valid course ordering, not just yes/no.** This is LC 210. Use DFS POSTORDER: when a node finishes (all descendants processed), append it to a result list. After all DFS calls, REVERSE the result. This is the DFS-based topological sort. Or use Kahn's algorithm — the order in which nodes are dequeued IS a valid topological order, no reverse needed. **Senior follow-up 2 — What if there are 10 million courses and 100 million prerequisites?** The graph itself is fine — adjacency list is O(V + E) = \~110M entries. The recursion stack is the worry: a worst-case chain of 10M courses would stack-overflow in any language. Convert to iterative DFS with an explicit stack, or use Kahn's BFS — naturally iterative, predictable memory. **Senior follow-up 3 — Find ALL valid orderings (lexicographically sorted).** Hard problem. Use Kahn's, but at each step process the smallest available node when ties exist (use a min-heap instead of a queue). This finds the lexicographically smallest single ordering in O((V + E) log V). To enumerate ALL orderings, you have to backtrack on every choice point — exponential in the worst case. The number of topological orderings can be huge. **Common Wrong Answers:** 1. *Using a simple `visited` set.* False positives on cross edges in directed graphs. 2. *Forgetting to loop over all unvisited nodes.* If the graph is disconnected and the first DFS finishes without finding a cycle, you must still check the other components. 3. *Building the adjacency list with the wrong direction.* `prerequisites[i] = [course, prereq]` means there is an edge `prereq -> course`, not the other way around. Reversing this silently produces wrong results. **Further Reading:** LC 207, LC 210 (Course Schedule II), LC 269 (Alien Dictionary), LC 802 (Find Eventual Safe States). **Strong Answer Framework:** 1. Recognize: "cells from which water reaches BOTH oceans" — naive is O((m\*n)^2), too slow. 2. Invert: instead of asking each cell "can you reach the ocean?", let each ocean ask "what cells can reach me?" 3. DFS from every Pacific-border cell flowing UPHILL (or flat). Track in `pacific` set. 4. Same from Atlantic-border. Track in `atlantic` set. 5. Answer is the intersection. 6. Time O(m\*n), space O(m\*n). **Real LeetCode Example — LC 417 Pacific Atlantic Water Flow:** ```python theme={null} def pacificAtlantic(heights): if not heights: return [] rows, cols = len(heights), len(heights[0]) pacific, atlantic = set(), set() def dfs(r, c, visited, prev_height): if (r, c) in visited or r < 0 or r >= rows or c < 0 or c >= cols: return if heights[r][c] < prev_height: # Cannot flow uphill return visited.add((r, c)) for dr, dc in [(0,1),(0,-1),(1,0),(-1,0)]: dfs(r+dr, c+dc, visited, heights[r][c]) for r in range(rows): dfs(r, 0, pacific, heights[r][0]) dfs(r, cols-1, atlantic, heights[r][cols-1]) for c in range(cols): dfs(0, c, pacific, heights[0][c]) dfs(rows-1, c, atlantic, heights[rows-1][c]) return [[r, c] for r, c in pacific & atlantic] ``` **Senior follow-up 1 — How is this different from BFS multi-source?** BFS from all Pacific cells simultaneously also works and gives O(m\*n). The DFS version is conceptually identical — both are "frontier expansion from a set of sources." The choice is style: DFS recursion is shorter; BFS is safer for huge grids in Python (no recursion limit). For competitive programming I would use BFS for predictability. **Senior follow-up 2 — Generalize to N oceans / N source sets.** Run N DFS sweeps, one per source set. Track an N-bit mask per cell — bit i set if cell can reach source i. Cells with all bits set are the answer. For N up to 32, use a single int as the bitmask. For larger N, use a Python `set` of source-IDs per cell. **Senior follow-up 3 — What if the grid has different terrain types (water, land, mountain) and water can only flow through certain types?** Add a passability check inside DFS: skip neighbors whose terrain forbids water flow. The structure is unchanged; only the boundary condition is more nuanced. This is also how "shortest bridge" (LC 934) and "minimum knight moves with obstacles" generalize. **Common Wrong Answers:** 1. *Naive O((m\*n)^2) — DFS from every cell to check if it reaches both oceans.* TLE on a 200x200 grid. 2. *Forgetting the `>= prev_height` check.* Water can flow downhill or stay flat; we are tracing UPHILL or flat, so use `<`. 3. *Returning `pacific - atlantic` or `pacific | atlantic` instead of `pacific & atlantic`.* The problem asks for both. **Further Reading:** LC 417, LC 934 (Shortest Bridge), LC 1020 (Number of Enclaves). ## Practice Roadmap * Solve: Max Depth, Path Sum, Same Tree * Focus: Basic recursive tree traversal * Solve: Number of Islands, Flood Fill * Focus: 4-directional traversal pattern * Solve: Clone Graph, Course Schedule * Focus: Visited tracking and cycle detection * Solve: Word Search, All Paths from Source * Focus: Backtracking in DFS **Interview Tip**: When exploring all possibilities or finding paths, think DFS. When finding shortest path, think BFS. ## Interview Questions **What interviewers are really testing:** Whether you understand the difference between back edges and cross edges in directed graphs -- a subtle but critical distinction that separates candidates who have memorized DFS from those who truly understand graph theory. **Strong Answer:** * In a directed graph, a simple visited boolean only tells you "I have seen this node before." It cannot distinguish between two very different situations: (a) encountering a node that is an **ancestor** in the current DFS path (a back edge, which means a cycle), and (b) encountering a node that was **fully processed** in a previous, separate DFS branch (a cross edge, which does NOT mean a cycle). * The three-color scheme solves this. WHITE means unvisited. GRAY means "currently in the recursion stack -- I started exploring this node but haven't finished." BLACK means "fully explored -- all descendants processed." * A cycle exists if and only if you encounter a GRAY node. This means you have found a path from a node back to itself (a back edge). Encountering a BLACK node is harmless -- it just means you found an edge to an already-completed subtree. * Concrete example: consider nodes A -> B -> C and A -> C. When DFS processes A -> B -> C, it colors C black. Then when it processes A -> C, C is black (cross edge, no cycle). But with a simple `visited` set, C would be marked as "visited" after the first path, and encountering it again from A would falsely look like a cycle -- except it is not, because C is not on the current path from A. * For **undirected** graphs, a simple visited set suffices because there are no "cross edges" in the directed sense. You just need to skip the parent node to avoid falsely detecting the edge you came from as a cycle. **Red flag answer:** "Just use a visited set and if you see a visited node, there's a cycle" -- this is correct for undirected graphs but wrong for directed graphs, and confusing the two is a fundamental error. **Follow-ups:** 1. How would you modify this algorithm to not only detect a cycle but also return the actual nodes in the cycle? 2. What is a topological sort, and how does cycle detection relate to it? **What interviewers are really testing:** Production awareness and coding maturity -- whether you think about side effects, input mutation, and space optimization trade-offs, not just correctness. **Strong Answer:** * **In-place modification** (changing `'1'` to `'0'`): zero extra space for visited tracking, which is optimal. But it mutates the input data. In an interview, you must ask the interviewer whether modifying the input is allowed. In production code, mutating input data is usually a bad practice -- the caller may need the original grid afterward, or other threads may be reading it concurrently. * **Separate visited matrix**: uses O(m \* n) extra space but preserves the original input. This is the safer, more "production-ready" approach. You can also use a hash set of `(row, col)` tuples, but a 2D boolean array is faster due to O(1) array access without hashing overhead. * **Hybrid approach**: if you need to preserve the input but want to avoid extra space, you can modify the grid during DFS (e.g., change `'1'` to `'2'`) and then restore it afterward with a second pass -- O(m \* n) time for restoration but O(1) extra space. * Practical consideration: for a 300x300 grid in Python, a separate visited matrix is \~90,000 booleans (\~90 KB) -- negligible. The space savings of in-place modification only matter for truly massive grids where you are close to the memory limit. * Interview tip: state your choice explicitly. Say "I'll modify the grid in-place to save space, but in production I'd use a visited array to avoid side effects. Is that acceptable here?" **Red flag answer:** Modifying the input without acknowledging it, or not knowing that `list.pop(0)` in Python is O(n) when asked about iterative alternatives. **Follow-ups:** 1. What happens to recursive DFS on a 500x500 grid that is entirely land? How would you handle it? 2. If the grid cells could have more than two states (e.g., land, water, mountain), how would you adapt the visited tracking? **What interviewers are really testing:** Whether you understand the practical limitations of recursion (stack overflow) and can implement DFS both ways -- and whether you know the subtle behavioral difference between the two. **Strong Answer:** * **Choose iterative DFS when**: the maximum depth could exceed the call stack limit. Python's default recursion limit is \~1000 (configurable but risky to raise too high). Java's default stack is \~512 KB to 1 MB. A grid of 1000x1000 can have DFS depth of 1,000,000 -- guaranteed stack overflow with recursion. * **Choose recursive DFS when**: the depth is bounded (e.g., balanced binary tree with depth log(n)), code clarity is prioritized, and the problem naturally maps to recursion (like tree traversals or backtracking). * **Key implementation difference**: recursive DFS processes nodes in a strict preorder when you add work before the recursive calls. Iterative DFS with a stack processes neighbors in **reverse order** unless you push them in reverse. This is because the stack is LIFO -- if you push neighbors left-to-right, you pop right-to-left. To match recursive DFS order, push neighbors in reverse. * **Backtracking difference**: recursive DFS gets automatic state restoration when a function returns (local variables are on the call stack frame). With iterative DFS, you must manually manage state -- pushing and popping from your own data structures. This makes iterative backtracking significantly more complex to implement correctly. * **Performance**: iterative DFS avoids function call overhead (frame allocation, register saving). In practice the difference is marginal for most problems, but in tight competitive programming scenarios it can matter. **Red flag answer:** "They are exactly the same, just syntax differences" -- this misses the stack depth issue and the backtracking complexity difference, both of which are critical in practice. **Follow-ups:** 1. Can you convert a postorder traversal to iterative form? Why is it harder than converting preorder? 2. What is the space complexity of iterative DFS versus recursive DFS? Are they the same? **What interviewers are really testing:** Whether you understand the semantic meaning of each traversal order and can connect them to real algorithmic applications -- not just recite the definition. **Strong Answer:** * **Preorder (root, left, right)**: you process a node BEFORE its children. Use case: **serializing a binary tree**. When you serialize in preorder and record null markers, you can reconstruct the tree uniquely by reading the serialized sequence left to right. This is also the order used in tree copying (clone the node, then recursively clone children). * **Inorder (left, root, right)**: you process a node BETWEEN its children. Use case: **extracting sorted elements from a BST**. Because a BST maintains the invariant `left < root < right`, inorder traversal visits nodes in ascending sorted order. This is the foundation of BST iterator implementations and BST validation (check that inorder sequence is strictly increasing). * **Postorder (left, right, root)**: you process a node AFTER its children. Use case: **calculating subtree-dependent properties** like directory sizes in a file system. You cannot know the total size of a directory until you have summed up all its files and subdirectories. Similarly, computing tree height, deleting a tree (free children before parent), or evaluating a postfix expression all require postorder. * A practical way to remember: preorder is "do work on arrival," inorder is "do work between children," postorder is "do work on departure." The choice depends on whether you need parent information before processing children or child results before processing the parent. * Important nuance: for non-binary trees (e.g., n-ary trees), inorder is not well-defined (there is no single "between" when you have 5 children). Pre- and postorder generalize naturally. **Red flag answer:** Giving the definitions without any real use cases, or confusing which order visits BST nodes in sorted order. **Follow-ups:** 1. If you are given the preorder and inorder traversals of a binary tree, can you reconstruct the tree? What about preorder and postorder? 2. In Morris traversal, how do you achieve inorder traversal in O(1) space without recursion or an explicit stack? **What interviewers are really testing:** Whether you can map a real-world problem to cycle detection in a directed graph and implement it cleanly -- this is one of the most common medium-level interview questions. **Strong Answer:** * This is a **cycle detection** problem on a directed graph. If there is a cycle in the prerequisite graph, it means a set of courses circularly depend on each other, making it impossible to complete any of them. If there is no cycle, a valid ordering exists (topological sort). * DFS approach: use the three-color algorithm. Build an adjacency list from the prerequisite pairs. For each unvisited course, run DFS. If you ever encounter a GRAY node (back edge), return false -- a cycle exists. If all DFS calls complete without finding a cycle, return true. * BFS approach (Kahn's algorithm): compute the in-degree of every node. Add all nodes with in-degree 0 to the queue. Process each node by removing it and decrementing the in-degree of its neighbors. If a neighbor's in-degree drops to 0, add it to the queue. At the end, if all nodes were processed, there is no cycle. If some nodes remain unprocessed (still have non-zero in-degree), a cycle exists. * Trade-off: DFS with coloring is more natural if you are already comfortable with DFS and need just cycle detection. Kahn's algorithm is better when you also need the topological ordering (the order in which nodes are dequeued IS a valid topological order). * Edge cases: disconnected graph (must check all components), self-loops (a course requiring itself), and duplicate edges in the input. **Red flag answer:** Trying to solve this with BFS shortest path, or not recognizing that this is a cycle detection problem at all. Another red flag is using a simple visited set instead of three-color marking for the DFS approach. **Follow-ups:** 1. How would you modify your solution to return one valid course ordering (not just whether it is possible)? 2. If there are multiple valid orderings, how would you find the lexicographically smallest one? **What interviewers are really testing:** Precision in complexity analysis -- many candidates say "O(n)" for everything without understanding why trees and graphs differ. **Strong Answer:** * **Tree DFS**: Time is O(n) where n is the number of nodes. Space is O(h) where h is the height -- this is the maximum depth of the recursion stack. For a balanced binary tree, h = O(log n). For a skewed tree (essentially a linked list), h = O(n). There is no "visited" set needed because trees have no cycles -- you naturally never revisit a node. * **Graph DFS**: Time is O(V + E) where V is vertices and E is edges. You visit each vertex once and examine each edge once (in directed graphs) or twice (in undirected graphs, once from each endpoint). Space is O(V) for the visited set plus O(V) for the recursion stack in the worst case, so O(V) total. * Why the difference: a tree with n nodes has exactly n-1 edges, so O(V + E) = O(n + n-1) = O(n). But a general graph can have up to O(V^2) edges, so E dominates. You also need the visited set for graphs to prevent infinite loops -- trees do not require this because they are acyclic by definition. * Practical implication: on a dense graph with 10,000 nodes and 50 million edges, the O(E) term dominates. On a tree with 10,000 nodes, you have exactly 9,999 edges. The same algorithm behaves very differently depending on the structure. * The space analysis is critical for interview scoring. Many candidates forget that the recursion stack can be O(V) deep for graphs (imagine a long chain) and incorrectly state O(1) extra space. **Red flag answer:** Saying "DFS is always O(n)" without distinguishing between trees and general graphs, or forgetting to account for the recursion stack in space complexity. **Follow-ups:** 1. What is the space complexity of iterative DFS with an explicit stack? Is it different from recursive DFS? 2. For a graph with 1 million nodes and 100 million edges, roughly how much memory would the visited set consume? **What interviewers are really testing:** Judgment and pattern-matching ability -- knowing when NOT to use a tool is as important as knowing how to use it. **Strong Answer:** * **Shortest path in unweighted graphs**: DFS may find A path but not the SHORTEST path. It dives deep into one branch and might reach the target via a long route. BFS guarantees shortest path because it explores all nodes at distance d before distance d+1. Example: in a maze, DFS might find a path of length 50 while BFS finds the optimal path of length 10. * **Finding the shortest weighted path**: neither DFS nor BFS works correctly. You need Dijkstra's algorithm (non-negative weights) or Bellman-Ford (with negative weights). * **Level-order processing**: if you need to process nodes level by level (e.g., find the average value at each depth of a tree), DFS requires extra bookkeeping to track which level each node is on. BFS handles this naturally with its queue structure. * **Minimum spanning tree**: DFS traversal of a graph does not produce a minimum spanning tree. You need Prim's or Kruskal's algorithm. * **When graph is too deep for recursion**: a linear graph of 100,000 nodes will cause stack overflow with recursive DFS. Use iterative DFS or BFS instead. * The meta-principle: DFS is optimal for **exhaustive exploration** (find all paths, detect all cycles, visit all components) but suboptimal for **optimization** problems (shortest, cheapest, minimum) on graphs with structure that BFS or priority-queue-based algorithms can exploit. **Red flag answer:** "DFS always works, you just need to check all paths" -- this technically finds the optimal answer but at exponential cost. A strong candidate knows when a polynomial-time algorithm (BFS, Dijkstra) exists and avoids brute force. **Follow-ups:** 1. Can you think of an optimization problem where DFS IS the right choice? (Hint: think about problems where you must explore all possibilities regardless.) 2. What about A\* search -- when is it better than both BFS and DFS? **What interviewers are really testing:** System design thinking applied to algorithm selection -- can you connect textbook algorithms to real engineering decisions with real constraints like politeness, memory, and failure handling? **Strong Answer:** * **BFS is generally preferred** for web crawling because it explores pages in order of link distance from the seed URL. This means you discover the most important, highly-linked pages first (pages linked directly from the homepage are typically more important than pages buried 10 links deep). This is also called "breadth-first crawling" and is what search engines like Googlebot historically used as a baseline. * **DFS is problematic** because it would follow one chain of links as deep as possible. The internet has effectively infinite depth (think of paginated results, dynamic content, spider traps). DFS would get stuck in one obscure corner of a site while high-value pages remain undiscovered. * **Practical considerations** beyond the algorithm choice: * **Politeness**: you must rate-limit requests to each domain. A pure BFS queue might hammer one domain with 1000 requests before moving to the next. Solution: use per-domain queues and round-robin across them. * **Memory**: the frontier (queue of URLs to visit) can grow to billions of URLs. You need to store it on disk or use a distributed queue (e.g., Kafka, Redis). * **Deduplication**: the visited set for billions of URLs cannot fit in memory. Use a Bloom filter or a distributed hash table. * **Failure handling**: if a URL times out or returns an error, you need retry logic with backoff, not infinite retries that block the queue. * **Priority**: in practice, neither pure BFS nor DFS is used. Crawlers use priority-based scheduling where URL importance (PageRank estimate, freshness, domain authority) determines what to crawl next. **Red flag answer:** Picking DFS without considering the infinite-depth trap, or picking BFS without discussing the practical constraints (memory, politeness, scale). **Follow-ups:** 1. How would you design the URL frontier (the queue) for a crawler that needs to handle 1 billion URLs? 2. How does a Bloom filter help with visited-URL tracking, and what happens when it gives a false positive? ## Interview Deep-Dive **Strong Answer:** * **Choose DFS when** the problem requires exhaustive path exploration, cycle detection in directed graphs, topological sorting, or finding all connected components. DFS naturally fits problems that say "find all paths," "detect if a cycle exists," or "explore every reachable node." * **Choose BFS when** the problem requires shortest path (unweighted), level-order processing, or nearest-target finding. If the problem says "minimum steps," "shortest distance," or "closest," BFS is almost always correct. * **The space trade-off:** DFS uses O(h) space where h is the maximum depth. BFS uses O(w) space where w is the maximum width. For balanced binary trees, DFS is O(log n) vs BFS is O(n/2). For deep, narrow graphs (like a chain), DFS is O(n) vs BFS is O(1). Choose based on the graph's shape. * **DFS advantages:** simpler to implement recursively (especially for tree problems), naturally supports backtracking, and uses less memory on wide graphs. * **DFS disadvantages:** does not find shortest paths, can stack-overflow on deep graphs, and the order of exploration depends on implementation details. * **Decision framework:** (1) Need shortest path? BFS. (2) Need all paths or cycle detection? DFS. (3) Need level-order? BFS. (4) Need topological sort? DFS (postorder) or BFS (Kahn's). (5) Memory constrained with wide graph? DFS. (6) Memory constrained with deep graph? BFS. **Follow-up: For the Number of Islands problem, both BFS and DFS work. Is there any practical reason to prefer one over the other?** * For typical grid sizes (up to 300x300), both work fine. DFS is slightly easier to code. BFS avoids stack overflow on very large islands -- a 500x500 all-land grid has DFS depth 250,000, which crashes Python's default recursion limit. For robustness on large inputs, BFS is safer. **Strong Answer:** * The three colors are WHITE (unvisited), GRAY (in the current recursion stack), and BLACK (fully processed). * **A cycle exists if and only if DFS encounters a GRAY node** -- a back edge from a descendant to an ancestor in the current path. * **Why a simple boolean fails:** It cannot distinguish a **back edge** (cycle) from a **cross edge** (no cycle). A boolean only says "seen before" without indicating whether the node is an ancestor on the current path or was fully processed in a different branch. * **Concrete false-positive example:** Graph: A -> B -> C and A -> C. DFS from A visits B, then C (marking both visited). When DFS returns to A and follows A -> C, C is already "visited." With a simple boolean, this looks like a cycle. But C was fully processed -- it is a cross edge (diamond shape), not a cycle. The three-color scheme correctly marks C as BLACK after the first path, so the A -> C edge is recognized as harmless. * **For undirected graphs**, a simple visited set suffices -- just skip the parent node to avoid falsely detecting the traversal edge. * **Time: O(V + E). Space: O(V).** **Follow-up: How would you modify cycle detection to also return the actual nodes forming the cycle?** * When back edge `u -> v` is detected (v is GRAY), the cycle consists of all nodes on the DFS path from v to u. Maintain a `parent` array and trace `u -> parent[u] -> ... -> v` to collect cycle nodes. This is O(cycle\_length) additional work. **Strong Answer:** * **Maximum depth:** For an M x N grid that is entirely traversable, DFS can visit every cell, giving recursion depth M \* N. For a 500x500 grid, that is 250,000 frames. * **Python:** Default recursion limit \~1000. Fix: `sys.setrecursionlimit(300000)` (risks segfault), convert to iterative DFS, or use BFS instead. * **Java:** Default stack \~512 KB to 1 MB, supporting \~5,000-15,000 frames. Fix: `-Xss4m` flag or iterative conversion. * **C++:** Stack varies by OS (1-8 MB). Fix: `ulimit -s unlimited` on Linux, or iterative conversion. * **Best practice for interviews:** Mention the risk explicitly: "I will implement this recursively for clarity, but for grids larger than \~30x30 in Python, I would switch to iterative DFS or BFS to avoid stack overflow." * **Iterative DFS conversion:** Replace the call stack with `stack = [(start_r, start_c)]`. Pop, mark visited, push unvisited neighbors. Traversal order differs slightly but correctness is preserved. **Follow-up: Converting postorder traversal to iterative form is harder than preorder. Why, and how do you do it?** * Postorder processes a node after all children are done. With an explicit stack, you lack the "return" signal. Two approaches: (1) Two-stack method -- push to first stack, pop to second while pushing children; second stack's pop order is postorder. (2) Visited-flag method -- push with visited=False, pop and re-push with visited=True plus children; process when popping visited=True. **Strong Answer:** * The number of simple paths in a DAG can be **exponential** -- a diamond lattice with n layers yields O(2^(n/2)) paths. * **Time complexity:** O(V + E + P \* L) where P is the number of paths and L is average path length. The V + E term is the traversal overhead. The P \* L term is the output size -- unavoidable because you must write each path. * **Why this matters:** If an interviewer asks "what is the complexity?" and you say "O(V + E)," you are wrong. For 20 nodes in a binary lattice, there are C(20,10) = 184,756 paths, each of length \~20. The output dominates. * **Space:** O(V) for the recursion stack and current path, plus O(P \* L) for the output. * **Backtracking is essential:** You must un-mark nodes after exploring each path so other paths can reuse them. * **For counting only:** Use DP on topological order: `count[v] = sum(count[u])` for predecessors u. This is O(V + E) with no exponential blowup. **Follow-up: If the graph has cycles (not a DAG), can you still enumerate all simple paths?** * Yes, but the visited set becomes path-specific (mark on entry, unmark on exit). The number of simple paths in a general graph can be O(n!) in the worst case. Enumeration is inherently exponential.