Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
What is DFS?
Depth-First Search explores as far as possible along each branch before backtracking. It uses a stack (explicit or via recursion) to track the path and is ideal for problems requiring complete path exploration. Imagine you are exploring a cave system. Instead of checking every tunnel at the entrance first (that would be BFS), you pick one tunnel and walk as deep as you can. When you hit a dead end, you backtrack to the last fork and try the next tunnel. That is DFS — you go deep before you go wide. Under the hood, DFS is powered by a stack. Recursive DFS uses the call stack implicitly; iterative DFS uses an explicit stack data structure. Both produce the same traversal order. The practical difference is that recursion is cleaner for tree problems but can cause stack overflow on very deep graphs (Python’s default recursion limit is around 1000), while iterative DFS handles any depth safely. Complexity: O(V + E) time for graphs (every vertex and edge visited once), O(n) for trees. Space is O(h) where h is the maximum depth — this is O(log n) for balanced trees but O(n) for skewed trees or linear graphs.Pattern Recognition Checklist
Use DFS When
- Need to explore all paths
- Finding connected components
- Detecting cycles in graphs
- Tree traversals (pre/in/post order)
- Backtracking problems
Don't Use When
- Finding shortest path (use BFS)
- Level-order traversal needed
- Path length matters more than existence
- Graph is very deep (stack overflow risk)
When to Use
Tree Traversals
Path Finding
Connected Components
Backtracking
Pattern Variations
1. Tree DFS (Recursive)
The three traversal orders — preorder, inorder, postorder — differ only in when you process the current node relative to its children. The recursive structure is identical; only the placement of “do work here” changes.- Preorder (root, left, right): Use when you need to process a node before its subtree (e.g., serialization, copying a tree).
- Inorder (left, root, right): Use for BSTs — it visits nodes in sorted order.
- Postorder (left, right, root): Use when you need child results before processing the parent (e.g., computing subtree sizes, deleting a tree).
2. Graph DFS with Visited Set
3. Grid DFS (Island Problems)
The “Number of Islands” pattern: iterate through every cell, and when you find unvisited land (‘1’), increment the island count and use DFS to “sink” the entire connected island by marking all reachable land cells as visited. This avoids counting the same island multiple times. Visited tracking: Here we modify the grid in place (changing ‘1’ to ‘0’) as the visited marker. If mutating the input is not allowed, use a separatevisited matrix.
Time: O(m * n) — each cell visited at most once. Space: O(m * n) worst case for the recursion stack on a grid that is entirely land.
Stack overflow warning: For large grids (e.g., 300x300), the recursive call depth can hit 90,000. In Python, you need sys.setrecursionlimit() or should convert to iterative DFS with an explicit stack.
4. DFS with Path Tracking
When you need all paths (not just reachability), DFS becomes a backtracking problem. You build the path as you recurse, and you un-visit nodes when you backtrack so they can appear in other paths. Critical difference from standard DFS: In standard graph DFS, once you mark a node visited, it stays visited forever. In path-tracking DFS, you remove the node from the visited set (or the path) after exploring its subtree. Without this, paths that share intermediate nodes get incorrectly pruned. Walked example: graph with edges A->B, A->C, B->C, B->D, C->D. Paths from A to D: [A,B,D], [A,B,C,D], [A,C,D]. Standard DFS (permanent visited) would find only [A,B,D] and miss [A,C,D] because C was already visited via B. Time: O(V! * V) in the worst case (exponentially many paths). Space: O(V) for the recursion stack and path.5. Iterative DFS with Stack
The iterative version replaces the call stack with an explicit stack data structure. This avoids stack overflow for deep graphs (Python default recursion limit is ~1000) and gives you fine-grained control over traversal. Subtle difference from recursive DFS: A stack is LIFO, so neighbors pushed left-to-right are popped right-to-left. To visit neighbors in the same order as recursive DFS, push them in reverse order. When to use iterative over recursive: When graph depth could exceed the call stack limit (e.g., a 1000x1000 grid has max depth 1,000,000), when you need explicit control over the traversal order, or when performance-critical code needs to avoid function-call overhead. Time: O(V + E). Space: O(V) for the stack and visited set.Classic Problems
1. Number of Islands
1. Number of Islands
2. Clone Graph
2. Clone Graph
3. Course Schedule
3. Course Schedule
4. Word Search
4. Word Search
Cycle Detection
The three-color algorithm (white/gray/black) is the standard approach for directed graphs. The colors represent:- WHITE (0): Not yet discovered.
- GRAY (1): Currently in the recursion stack — we are exploring its descendants.
- BLACK (2): Fully processed — all descendants have been explored.
Common Mistakes
Debugging Checklist
When your DFS solution fails:DFS vs BFS Quick Reference
| Aspect | DFS | BFS |
|---|---|---|
| Data Structure | Stack / Recursion | Queue |
| Path Finding | All paths | Shortest path |
| Space | O(h) height | O(w) width |
| Cycle Detection | Yes (with coloring) | Yes (undirected) |
| Topological Sort | Yes | Yes (Kahn’s) |
| Best For | Deep exploration | Level-by-level |
Interview Problems by Company
- Easy
- Medium
- Hard
| Problem | Company | Key Concept |
|---|---|---|
| Max Depth of Tree | All | Basic tree DFS |
| Same Tree | Amazon | Parallel DFS |
| Path Sum | Meta | Root-to-leaf DFS |
| Invert Binary Tree | Tree modification | |
| Symmetric Tree | Amazon | Mirror DFS |
Interview Tips
How to Explain Your Approach
How to Explain Your Approach
- “I’ll use DFS to explore all possibilities.”
- “I’ll maintain a visited set to avoid cycles.”
- “At each node, I’ll [do work] before/after visiting children.”
- “Base case is when [condition] is met.”
- “Time is O(V+E) for graphs, O(n) for trees.”
When Interviewer Says...
When Interviewer Says...
| Interviewer Says | You Should Think |
|---|---|
| ”Find all paths” | DFS with path tracking |
| ”Check if path exists” | Simple DFS |
| ”Find connected components” | DFS on each unvisited |
| ”Detect cycle” | DFS with coloring |
| ”Topological order” | DFS postorder reversal |
Recursive vs Iterative
Recursive vs Iterative
- Tree problems (max depth is log n or limited)
- Code clarity matters
- Natural recursive structure
- Graph can be very deep (avoid stack overflow)
- Need explicit control over stack
- Performance is critical
Worked LeetCode Problems
These five cover the DFS interview spectrum: grid flood-fill (200), area aggregation (695), directed cycle detection (207), boundary-aware flood (130), and multi-source from edges (417). If you can solve all five fluently, you have ~75 percent of DFS interview territory covered.LC 200 — Number of Islands (Medium, DFS variant)
The DFS prototype. Iterate every cell; whenever you find unvisited land, increment the count and DFS-flood the entire island. The flood marks every reachable land cell as visited so the outer loop never double-counts.sys.setrecursionlimit(...) defensively, or just convert to iterative DFS with an explicit stack.
LC 695 — Max Area of Island (Medium)
Same shape as 200, except the DFS returns the area of the connected component instead of just visiting it. The pattern — DFS returning an aggregated value — generalizes to “longest path in a tree,” “sum of all subtree values,” and many tree DP problems.LC 207 — Course Schedule (Medium, directed cycle detection)
Build the prerequisite graph; the question reduces to “does this graph contain a directed cycle?” Use the three-color DFS. WHITE = unvisited, GRAY = on current recursion stack, BLACK = fully processed. Hitting GRAY = back edge = cycle.visited set fails on directed graphs: a “visited” node could be either an ancestor (cycle!) or a fully-processed sibling (cross edge, harmless). The three colors disambiguate.
LC 130 — Surrounded Regions (Medium, boundary DFS)
“Capture all O regions completely surrounded by X.” The trick: instead of figuring out which regions are surrounded, find which are NOT — those connected to a border O. Run DFS from every border O, mark them safe (e.g., as'#'). Then sweep the grid: remaining O = surrounded -> X; safe '#' -> O.
LC 417 — Pacific Atlantic Water Flow (Medium)
Find all cells from which water can flow to BOTH oceans. Naive: BFS/DFS from every cell — O(m^2 * n^2). Smart: invert. DFS from each Pacific border cell, flowing UPHILL (or equal), mark every cell that can reach the Pacific. Same from Atlantic. Intersect the two sets.Caveats and Traps
Curated LeetCode List
| Difficulty | Problem | LC # | Why It Matters |
|---|---|---|---|
| Easy | Maximum Depth of Binary Tree | 104 | Cleanest postorder example |
| Easy | Same Tree | 100 | Parallel DFS on two trees |
| Easy | Symmetric Tree | 101 | Mirror DFS, recursive structure |
| Easy | Path Sum | 112 | Root-to-leaf DFS with running sum |
| Easy | Invert Binary Tree | 226 | DFS modifying tree structure |
| Medium | Number of Islands | 200 | Grid DFS prototype |
| Medium | Max Area of Island | 695 | DFS returning an aggregate |
| Medium | Course Schedule | 207 | Directed cycle detection |
| Medium | Course Schedule II | 210 | Topological sort via DFS postorder |
| Medium | Surrounded Regions | 130 | Boundary DFS, complement trick |
| Medium | Pacific Atlantic Water Flow | 417 | Multi-source DFS from borders |
| Medium | Clone Graph | 133 | DFS with hashmap of clones |
| Medium | Word Search | 79 | DFS with backtracking on grid |
| Medium | Number of Closed Islands | 1254 | Boundary trick (similar to 130) |
| Medium | All Paths From Source to Target | 797 | DFS path enumeration on a DAG |
| Medium | Number of Provinces | 547 | DFS or Union-Find for components |
| Hard | Word Search II | 212 | DFS plus Trie pruning |
| Hard | Alien Dictionary | 269 | Topological sort, careful char graph build |
| Hard | Critical Connections | 1192 | Tarjan’s bridges algorithm |
| Hard | Longest Increasing Path in Matrix | 329 | DFS plus memoization (DP on DAG) |
DFS Interview Q&A
Number of Islands — both BFS and DFS
Number of Islands — both BFS and DFS
- Recognize: connected components in a grid -> flood-fill from each unvisited land cell.
- Outer loop: scan every cell. When you find
'1', increment count and flood. - Flood-fill is interchangeable: DFS (recursive or iterative) or BFS — pick based on grid size.
- Mark visited inline (sink the cell to
'0') or use a separate matrix. - Time O(m*n). Space depends on flood method: DFS recursion is O(m*n) worst case; BFS queue is O(min(m, n)).
- Forgetting the bounds check inside DFS. Out-of-bounds access crashes or silently wraps in some languages.
- Counting visited cells instead of components. Confuses “size of largest island” with “number of islands.”
- Using DFS recursion on huge grids in Python without raising the recursion limit. TLE or RecursionError.
Course Schedule — detect cycle in a directed graph
Course Schedule — detect cycle in a directed graph
- Recognize: prerequisites form a directed graph. “Can we finish all courses?” = “is the graph acyclic?”
- Build adjacency list from prerequisite pairs.
- Run DFS from every unvisited node with three-color marking.
- Hitting a GRAY neighbor = back edge = cycle = return False.
- If all DFS calls finish without GRAY hits, no cycle = return True.
- Alternative: Kahn’s algorithm (BFS on in-degree-zero nodes). Both are O(V + E).
- Using a simple
visitedset. False positives on cross edges in directed graphs. - Forgetting to loop over all unvisited nodes. If the graph is disconnected and the first DFS finishes without finding a cycle, you must still check the other components.
- Building the adjacency list with the wrong direction.
prerequisites[i] = [course, prereq]means there is an edgeprereq -> course, not the other way around. Reversing this silently produces wrong results.
Pacific Atlantic Water Flow — the inversion trick
Pacific Atlantic Water Flow — the inversion trick
- Recognize: “cells from which water reaches BOTH oceans” — naive is O((m*n)^2), too slow.
- Invert: instead of asking each cell “can you reach the ocean?”, let each ocean ask “what cells can reach me?”
- DFS from every Pacific-border cell flowing UPHILL (or flat). Track in
pacificset. - Same from Atlantic-border. Track in
atlanticset. - Answer is the intersection.
- Time O(m*n), space O(m*n).
set of source-IDs per cell.- Naive O((m*n)^2) — DFS from every cell to check if it reaches both oceans. TLE on a 200x200 grid.
- Forgetting the
>= prev_heightcheck. Water can flow downhill or stay flat; we are tracing UPHILL or flat, so use<. - Returning
pacific - atlanticorpacific | atlanticinstead ofpacific & atlantic. The problem asks for both.
Practice Roadmap
Interview Questions
Q1: Explain the three-color (white/gray/black) cycle detection algorithm for directed graphs. Why can't you just use a simple visited boolean?
Q1: Explain the three-color (white/gray/black) cycle detection algorithm for directed graphs. Why can't you just use a simple visited boolean?
- In a directed graph, a simple visited boolean only tells you “I have seen this node before.” It cannot distinguish between two very different situations: (a) encountering a node that is an ancestor in the current DFS path (a back edge, which means a cycle), and (b) encountering a node that was fully processed in a previous, separate DFS branch (a cross edge, which does NOT mean a cycle).
- The three-color scheme solves this. WHITE means unvisited. GRAY means “currently in the recursion stack — I started exploring this node but haven’t finished.” BLACK means “fully explored — all descendants processed.”
- A cycle exists if and only if you encounter a GRAY node. This means you have found a path from a node back to itself (a back edge). Encountering a BLACK node is harmless — it just means you found an edge to an already-completed subtree.
- Concrete example: consider nodes A -> B -> C and A -> C. When DFS processes A -> B -> C, it colors C black. Then when it processes A -> C, C is black (cross edge, no cycle). But with a simple
visitedset, C would be marked as “visited” after the first path, and encountering it again from A would falsely look like a cycle — except it is not, because C is not on the current path from A. - For undirected graphs, a simple visited set suffices because there are no “cross edges” in the directed sense. You just need to skip the parent node to avoid falsely detecting the edge you came from as a cycle.
- How would you modify this algorithm to not only detect a cycle but also return the actual nodes in the cycle?
- What is a topological sort, and how does cycle detection relate to it?
Q2: In a grid DFS problem like 'Number of Islands,' you mark cells visited by changing '1' to '0' in the grid itself. What are the trade-offs of this approach versus using a separate visited matrix?
Q2: In a grid DFS problem like 'Number of Islands,' you mark cells visited by changing '1' to '0' in the grid itself. What are the trade-offs of this approach versus using a separate visited matrix?
- In-place modification (changing
'1'to'0'): zero extra space for visited tracking, which is optimal. But it mutates the input data. In an interview, you must ask the interviewer whether modifying the input is allowed. In production code, mutating input data is usually a bad practice — the caller may need the original grid afterward, or other threads may be reading it concurrently. - Separate visited matrix: uses O(m * n) extra space but preserves the original input. This is the safer, more “production-ready” approach. You can also use a hash set of
(row, col)tuples, but a 2D boolean array is faster due to O(1) array access without hashing overhead. - Hybrid approach: if you need to preserve the input but want to avoid extra space, you can modify the grid during DFS (e.g., change
'1'to'2') and then restore it afterward with a second pass — O(m * n) time for restoration but O(1) extra space. - Practical consideration: for a 300x300 grid in Python, a separate visited matrix is ~90,000 booleans (~90 KB) — negligible. The space savings of in-place modification only matter for truly massive grids where you are close to the memory limit.
- Interview tip: state your choice explicitly. Say “I’ll modify the grid in-place to save space, but in production I’d use a visited array to avoid side effects. Is that acceptable here?”
list.pop(0) in Python is O(n) when asked about iterative alternatives.Follow-ups:- What happens to recursive DFS on a 500x500 grid that is entirely land? How would you handle it?
- If the grid cells could have more than two states (e.g., land, water, mountain), how would you adapt the visited tracking?
Q3: When would you choose iterative DFS with an explicit stack over recursive DFS? What are the implementation differences?
Q3: When would you choose iterative DFS with an explicit stack over recursive DFS? What are the implementation differences?
- Choose iterative DFS when: the maximum depth could exceed the call stack limit. Python’s default recursion limit is ~1000 (configurable but risky to raise too high). Java’s default stack is ~512 KB to 1 MB. A grid of 1000x1000 can have DFS depth of 1,000,000 — guaranteed stack overflow with recursion.
- Choose recursive DFS when: the depth is bounded (e.g., balanced binary tree with depth log(n)), code clarity is prioritized, and the problem naturally maps to recursion (like tree traversals or backtracking).
- Key implementation difference: recursive DFS processes nodes in a strict preorder when you add work before the recursive calls. Iterative DFS with a stack processes neighbors in reverse order unless you push them in reverse. This is because the stack is LIFO — if you push neighbors left-to-right, you pop right-to-left. To match recursive DFS order, push neighbors in reverse.
- Backtracking difference: recursive DFS gets automatic state restoration when a function returns (local variables are on the call stack frame). With iterative DFS, you must manually manage state — pushing and popping from your own data structures. This makes iterative backtracking significantly more complex to implement correctly.
- Performance: iterative DFS avoids function call overhead (frame allocation, register saving). In practice the difference is marginal for most problems, but in tight competitive programming scenarios it can matter.
- Can you convert a postorder traversal to iterative form? Why is it harder than converting preorder?
- What is the space complexity of iterative DFS versus recursive DFS? Are they the same?
Q4: Explain the difference between preorder, inorder, and postorder traversals. Give a concrete use case for each that is NOT just 'traversing a tree.'
Q4: Explain the difference between preorder, inorder, and postorder traversals. Give a concrete use case for each that is NOT just 'traversing a tree.'
- Preorder (root, left, right): you process a node BEFORE its children. Use case: serializing a binary tree. When you serialize in preorder and record null markers, you can reconstruct the tree uniquely by reading the serialized sequence left to right. This is also the order used in tree copying (clone the node, then recursively clone children).
- Inorder (left, root, right): you process a node BETWEEN its children. Use case: extracting sorted elements from a BST. Because a BST maintains the invariant
left < root < right, inorder traversal visits nodes in ascending sorted order. This is the foundation of BST iterator implementations and BST validation (check that inorder sequence is strictly increasing). - Postorder (left, right, root): you process a node AFTER its children. Use case: calculating subtree-dependent properties like directory sizes in a file system. You cannot know the total size of a directory until you have summed up all its files and subdirectories. Similarly, computing tree height, deleting a tree (free children before parent), or evaluating a postfix expression all require postorder.
- A practical way to remember: preorder is “do work on arrival,” inorder is “do work between children,” postorder is “do work on departure.” The choice depends on whether you need parent information before processing children or child results before processing the parent.
- Important nuance: for non-binary trees (e.g., n-ary trees), inorder is not well-defined (there is no single “between” when you have 5 children). Pre- and postorder generalize naturally.
- If you are given the preorder and inorder traversals of a binary tree, can you reconstruct the tree? What about preorder and postorder?
- In Morris traversal, how do you achieve inorder traversal in O(1) space without recursion or an explicit stack?
Q5: You are given a directed graph representing course prerequisites (Course Schedule problem). How do you determine if it is possible to finish all courses?
Q5: You are given a directed graph representing course prerequisites (Course Schedule problem). How do you determine if it is possible to finish all courses?
- This is a cycle detection problem on a directed graph. If there is a cycle in the prerequisite graph, it means a set of courses circularly depend on each other, making it impossible to complete any of them. If there is no cycle, a valid ordering exists (topological sort).
- DFS approach: use the three-color algorithm. Build an adjacency list from the prerequisite pairs. For each unvisited course, run DFS. If you ever encounter a GRAY node (back edge), return false — a cycle exists. If all DFS calls complete without finding a cycle, return true.
- BFS approach (Kahn’s algorithm): compute the in-degree of every node. Add all nodes with in-degree 0 to the queue. Process each node by removing it and decrementing the in-degree of its neighbors. If a neighbor’s in-degree drops to 0, add it to the queue. At the end, if all nodes were processed, there is no cycle. If some nodes remain unprocessed (still have non-zero in-degree), a cycle exists.
- Trade-off: DFS with coloring is more natural if you are already comfortable with DFS and need just cycle detection. Kahn’s algorithm is better when you also need the topological ordering (the order in which nodes are dequeued IS a valid topological order).
- Edge cases: disconnected graph (must check all components), self-loops (a course requiring itself), and duplicate edges in the input.
- How would you modify your solution to return one valid course ordering (not just whether it is possible)?
- If there are multiple valid orderings, how would you find the lexicographically smallest one?
Q6: What is the time and space complexity of DFS on a tree versus on a general graph? Why are they different?
Q6: What is the time and space complexity of DFS on a tree versus on a general graph? Why are they different?
- Tree DFS: Time is O(n) where n is the number of nodes. Space is O(h) where h is the height — this is the maximum depth of the recursion stack. For a balanced binary tree, h = O(log n). For a skewed tree (essentially a linked list), h = O(n). There is no “visited” set needed because trees have no cycles — you naturally never revisit a node.
- Graph DFS: Time is O(V + E) where V is vertices and E is edges. You visit each vertex once and examine each edge once (in directed graphs) or twice (in undirected graphs, once from each endpoint). Space is O(V) for the visited set plus O(V) for the recursion stack in the worst case, so O(V) total.
- Why the difference: a tree with n nodes has exactly n-1 edges, so O(V + E) = O(n + n-1) = O(n). But a general graph can have up to O(V^2) edges, so E dominates. You also need the visited set for graphs to prevent infinite loops — trees do not require this because they are acyclic by definition.
- Practical implication: on a dense graph with 10,000 nodes and 50 million edges, the O(E) term dominates. On a tree with 10,000 nodes, you have exactly 9,999 edges. The same algorithm behaves very differently depending on the structure.
- The space analysis is critical for interview scoring. Many candidates forget that the recursion stack can be O(V) deep for graphs (imagine a long chain) and incorrectly state O(1) extra space.
- What is the space complexity of iterative DFS with an explicit stack? Is it different from recursive DFS?
- For a graph with 1 million nodes and 100 million edges, roughly how much memory would the visited set consume?
Q7: Describe a situation where DFS would give incorrect or suboptimal results, and what you would use instead.
Q7: Describe a situation where DFS would give incorrect or suboptimal results, and what you would use instead.
- Shortest path in unweighted graphs: DFS may find A path but not the SHORTEST path. It dives deep into one branch and might reach the target via a long route. BFS guarantees shortest path because it explores all nodes at distance d before distance d+1. Example: in a maze, DFS might find a path of length 50 while BFS finds the optimal path of length 10.
- Finding the shortest weighted path: neither DFS nor BFS works correctly. You need Dijkstra’s algorithm (non-negative weights) or Bellman-Ford (with negative weights).
- Level-order processing: if you need to process nodes level by level (e.g., find the average value at each depth of a tree), DFS requires extra bookkeeping to track which level each node is on. BFS handles this naturally with its queue structure.
- Minimum spanning tree: DFS traversal of a graph does not produce a minimum spanning tree. You need Prim’s or Kruskal’s algorithm.
- When graph is too deep for recursion: a linear graph of 100,000 nodes will cause stack overflow with recursive DFS. Use iterative DFS or BFS instead.
- The meta-principle: DFS is optimal for exhaustive exploration (find all paths, detect all cycles, visit all components) but suboptimal for optimization problems (shortest, cheapest, minimum) on graphs with structure that BFS or priority-queue-based algorithms can exploit.
- Can you think of an optimization problem where DFS IS the right choice? (Hint: think about problems where you must explore all possibilities regardless.)
- What about A* search — when is it better than both BFS and DFS?
Q8: You are implementing a web crawler. Would you use DFS or BFS? Why? What practical considerations arise?
Q8: You are implementing a web crawler. Would you use DFS or BFS? Why? What practical considerations arise?
- BFS is generally preferred for web crawling because it explores pages in order of link distance from the seed URL. This means you discover the most important, highly-linked pages first (pages linked directly from the homepage are typically more important than pages buried 10 links deep). This is also called “breadth-first crawling” and is what search engines like Googlebot historically used as a baseline.
- DFS is problematic because it would follow one chain of links as deep as possible. The internet has effectively infinite depth (think of paginated results, dynamic content, spider traps). DFS would get stuck in one obscure corner of a site while high-value pages remain undiscovered.
- Practical considerations beyond the algorithm choice:
- Politeness: you must rate-limit requests to each domain. A pure BFS queue might hammer one domain with 1000 requests before moving to the next. Solution: use per-domain queues and round-robin across them.
- Memory: the frontier (queue of URLs to visit) can grow to billions of URLs. You need to store it on disk or use a distributed queue (e.g., Kafka, Redis).
- Deduplication: the visited set for billions of URLs cannot fit in memory. Use a Bloom filter or a distributed hash table.
- Failure handling: if a URL times out or returns an error, you need retry logic with backoff, not infinite retries that block the queue.
- Priority: in practice, neither pure BFS nor DFS is used. Crawlers use priority-based scheduling where URL importance (PageRank estimate, freshness, domain authority) determines what to crawl next.
- How would you design the URL frontier (the queue) for a crawler that needs to handle 1 billion URLs?
- How does a Bloom filter help with visited-URL tracking, and what happens when it gives a false positive?
Interview Deep-Dive
When would you choose DFS over BFS for a graph problem, and what specific problem properties make that decision clear?
When would you choose DFS over BFS for a graph problem, and what specific problem properties make that decision clear?
- Choose DFS when the problem requires exhaustive path exploration, cycle detection in directed graphs, topological sorting, or finding all connected components. DFS naturally fits problems that say “find all paths,” “detect if a cycle exists,” or “explore every reachable node.”
- Choose BFS when the problem requires shortest path (unweighted), level-order processing, or nearest-target finding. If the problem says “minimum steps,” “shortest distance,” or “closest,” BFS is almost always correct.
- The space trade-off: DFS uses O(h) space where h is the maximum depth. BFS uses O(w) space where w is the maximum width. For balanced binary trees, DFS is O(log n) vs BFS is O(n/2). For deep, narrow graphs (like a chain), DFS is O(n) vs BFS is O(1). Choose based on the graph’s shape.
- DFS advantages: simpler to implement recursively (especially for tree problems), naturally supports backtracking, and uses less memory on wide graphs.
- DFS disadvantages: does not find shortest paths, can stack-overflow on deep graphs, and the order of exploration depends on implementation details.
- Decision framework: (1) Need shortest path? BFS. (2) Need all paths or cycle detection? DFS. (3) Need level-order? BFS. (4) Need topological sort? DFS (postorder) or BFS (Kahn’s). (5) Memory constrained with wide graph? DFS. (6) Memory constrained with deep graph? BFS.
- For typical grid sizes (up to 300x300), both work fine. DFS is slightly easier to code. BFS avoids stack overflow on very large islands — a 500x500 all-land grid has DFS depth 250,000, which crashes Python’s default recursion limit. For robustness on large inputs, BFS is safer.
Explain the three-color cycle detection algorithm for directed graphs. Why does a simple visited boolean fail, and can you construct a specific example where it gives a false positive?
Explain the three-color cycle detection algorithm for directed graphs. Why does a simple visited boolean fail, and can you construct a specific example where it gives a false positive?
- The three colors are WHITE (unvisited), GRAY (in the current recursion stack), and BLACK (fully processed).
- A cycle exists if and only if DFS encounters a GRAY node — a back edge from a descendant to an ancestor in the current path.
- Why a simple boolean fails: It cannot distinguish a back edge (cycle) from a cross edge (no cycle). A boolean only says “seen before” without indicating whether the node is an ancestor on the current path or was fully processed in a different branch.
- Concrete false-positive example: Graph: A -> B -> C and A -> C. DFS from A visits B, then C (marking both visited). When DFS returns to A and follows A -> C, C is already “visited.” With a simple boolean, this looks like a cycle. But C was fully processed — it is a cross edge (diamond shape), not a cycle. The three-color scheme correctly marks C as BLACK after the first path, so the A -> C edge is recognized as harmless.
- For undirected graphs, a simple visited set suffices — just skip the parent node to avoid falsely detecting the traversal edge.
- Time: O(V + E). Space: O(V).
- When back edge
u -> vis detected (v is GRAY), the cycle consists of all nodes on the DFS path from v to u. Maintain aparentarray and traceu -> parent[u] -> ... -> vto collect cycle nodes. This is O(cycle_length) additional work.
What is the maximum recursion depth for DFS on a grid, and how do you handle the stack overflow risk in different languages?
What is the maximum recursion depth for DFS on a grid, and how do you handle the stack overflow risk in different languages?
- Maximum depth: For an M x N grid that is entirely traversable, DFS can visit every cell, giving recursion depth M * N. For a 500x500 grid, that is 250,000 frames.
- Python: Default recursion limit ~1000. Fix:
sys.setrecursionlimit(300000)(risks segfault), convert to iterative DFS, or use BFS instead. - Java: Default stack ~512 KB to 1 MB, supporting ~5,000-15,000 frames. Fix:
-Xss4mflag or iterative conversion. - C++: Stack varies by OS (1-8 MB). Fix:
ulimit -s unlimitedon Linux, or iterative conversion. - Best practice for interviews: Mention the risk explicitly: “I will implement this recursively for clarity, but for grids larger than ~30x30 in Python, I would switch to iterative DFS or BFS to avoid stack overflow.”
- Iterative DFS conversion: Replace the call stack with
stack = [(start_r, start_c)]. Pop, mark visited, push unvisited neighbors. Traversal order differs slightly but correctness is preserved.
- Postorder processes a node after all children are done. With an explicit stack, you lack the “return” signal. Two approaches: (1) Two-stack method — push to first stack, pop to second while pushing children; second stack’s pop order is postorder. (2) Visited-flag method — push with visited=False, pop and re-push with visited=True plus children; process when popping visited=True.
You need to find all paths from source to target in a DAG. What is the time complexity, and why does the answer depend on the number of paths rather than just V + E?
You need to find all paths from source to target in a DAG. What is the time complexity, and why does the answer depend on the number of paths rather than just V + E?
- The number of simple paths in a DAG can be exponential — a diamond lattice with n layers yields O(2^(n/2)) paths.
- Time complexity: O(V + E + P * L) where P is the number of paths and L is average path length. The V + E term is the traversal overhead. The P * L term is the output size — unavoidable because you must write each path.
- Why this matters: If an interviewer asks “what is the complexity?” and you say “O(V + E),” you are wrong. For 20 nodes in a binary lattice, there are C(20,10) = 184,756 paths, each of length ~20. The output dominates.
- Space: O(V) for the recursion stack and current path, plus O(P * L) for the output.
- Backtracking is essential: You must un-mark nodes after exploring each path so other paths can reuse them.
- For counting only: Use DP on topological order:
count[v] = sum(count[u])for predecessors u. This is O(V + E) with no exponential blowup.
- Yes, but the visited set becomes path-specific (mark on entry, unmark on exit). The number of simple paths in a general graph can be O(n!) in the worst case. Enumeration is inherently exponential.