Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
What is BFS?
Breadth-First Search explores all neighbors at the current depth before moving to nodes at the next depth level. It uses a queue and guarantees the shortest path in unweighted graphs. Think of dropping a stone into a still pond. Ripples spread outward in concentric circles — each ring reaches everything at the same distance from the center before moving further out. BFS works exactly like those ripples: it visits all nodes 1 step away, then all nodes 2 steps away, and so on. This is why BFS naturally finds the shortest path in unweighted graphs — the first time it reaches a node is guaranteed to be via the shortest route. The data structure behind the ripple is a queue (FIFO). You process the node at the front, enqueue its unvisited neighbors at the back, and repeat. Because older nodes (closer to the source) are always processed before newer ones (further away), distance ordering is preserved automatically. Complexity: O(V + E) time (every vertex and edge visited once), O(V) space for the queue and visited set. Space can be the bottleneck — the queue can hold an entire level at once, which for a balanced binary tree means up to n/2 nodes at the widest point.Pattern Recognition Checklist
Use BFS When
- Finding shortest path (unweighted)
- Level-order tree traversal
- Finding nearest target
- State space exploration (puzzles)
- Multi-source shortest distances
Don't Use When
- Weighted graph (use Dijkstra)
- Need to explore deep paths first
- Memory is very constrained
- Preorder/inorder/postorder needed
BFS vs DFS Quick Reference
| Aspect | BFS | DFS |
|---|---|---|
| Data Structure | Queue (FIFO) | Stack/Recursion (LIFO) |
| Shortest Path | Yes (unweighted) | No |
| Space Complexity | O(width) | O(height) |
| Use Case | Nearest, level-order | Paths, cycles, connectivity |
| Traversal Order | Level by level | Deep first |
When to Use
Shortest Path
Level-Order Traversal
Nearest Neighbor
State Space Search
Pattern Variations
1. Tree Level-Order Traversal
The most common BFS variant. The trick is snapshotting the queue size at the start of each level so you know exactly how many nodes belong to the current level before their children are added. Time: O(n) — every node visited once. Space: O(w) where w is the maximum tree width (up to n/2 for a complete binary tree).2. Graph BFS (Shortest Path)
The canonical use of BFS. A critical detail that trips up many candidates: mark nodes as visited when enqueuing, not when dequeuing. If you wait until dequeue time, multiple copies of the same node can pile up in the queue, wasting time and memory — and in the worst case turning O(V+E) into O(V^2). Time: O(V + E). Space: O(V).3. Grid BFS
4. Multi-Source BFS
Instead of starting from one source, you seed the queue with all sources simultaneously. The BFS wavefront expands from every source at once, so the first wave to reach any cell automatically carries the shortest distance. This is a favorite pattern in problems like “Rotting Oranges” and “01 Matrix.” Why it works: Think of lighting multiple candles in a dark room at the same time. Each candle’s light spreads at the same rate. The first light to reach a point is from the nearest candle. Time: O(m * n). Space: O(m * n) worst case.5. Word Ladder (State Space BFS)
Classic Problems
1. Binary Tree Level Order Traversal
1. Binary Tree Level Order Traversal
2. Rotting Oranges
2. Rotting Oranges
3. Word Ladder
3. Word Ladder
4. 01 Matrix
4. 01 Matrix
Common Mistakes
Interview Problems by Company
- Medium
- Hard
| Problem | Company | Key Concept |
|---|---|---|
| Level Order Traversal | All FAANG | Tree BFS |
| Rotting Oranges | Amazon, Meta | Multi-source BFS |
| Number of Islands | All FAANG | Grid BFS/DFS |
| Open the Lock | State space BFS | |
| Shortest Path in Matrix | Meta | Grid BFS |
Interview Tips
How to Explain Your Approach
How to Explain Your Approach
- “I need shortest path in unweighted graph, so I’ll use BFS.”
- “I’ll use a queue starting from the source.”
- “For each level, I process all current nodes before moving deeper.”
- “I mark nodes visited when enqueuing to avoid duplicates.”
- “Time is O(V + E), space is O(V) for the queue.”
BFS Template
BFS Template
Multi-Source BFS
Multi-Source BFS
- Add ALL sources to initial queue
- Process level by level
- All sources expand simultaneously
- First to reach a cell = shortest distance
Worked LeetCode Problems
The five problems below cover roughly 80 percent of BFS interview territory. Solve them in this order: 102 to lock down the level-size snapshot, 200 to nail grid BFS, 994 for multi-source, 542 for the “distance to nearest” pattern, then 127 for state-space BFS.LC 102 — Binary Tree Level Order Traversal (Medium)
The canonical tree BFS problem. The trick that trips candidates: capturinglen(queue) BEFORE the inner loop, not during. If you read the size inside the loop, it grows as you enqueue children and you bleed across levels.
LC 200 — Number of Islands (Medium, BFS variant)
Iterate the grid; whenever you find a'1', increment the island count and BFS-flood the entire island, marking each cell visited. Why BFS instead of DFS here: on a 300x300 all-land grid Python’s recursion limit blows up with DFS; iterative BFS handles any grid size.
LC 994 — Rotting Oranges (Medium, multi-source BFS)
The textbook multi-source BFS. Seed the queue with every rotten orange before the main loop starts; track minutes via thelevel_size trick.
LC 542 — 01 Matrix (Medium, multi-source BFS)
“For each cell, distance to the nearest 0.” Naive: BFS from every 1. Smart: invert the problem — multi-source BFS from every 0. The first wave to hit a 1 gives its shortest distance.LC 127 — Word Ladder (Hard, state-space BFS)
Each word is a state; two words are adjacent if they differ in one character. The performance trap: do NOT compare every pair of words upfront (O(N^2 * L)). Generate neighbors by iterating each character position and trying all 26 replacements (O(26 * L) per word).Caveats and Traps
Curated LeetCode List
| Difficulty | Problem | LC # | Why It Matters |
|---|---|---|---|
| Easy | Binary Tree Level Order Traversal | 102 | Locks in the level-size snapshot pattern |
| Easy | Average of Levels in Binary Tree | 637 | Same template, different aggregation |
| Easy | N-ary Tree Level Order | 429 | BFS on non-binary children list |
| Easy | Cousins in Binary Tree | 993 | Track parent and depth during BFS |
| Easy | Maximum Depth of Binary Tree | 104 | BFS counts levels, DFS counts recursion depth |
| Medium | Number of Islands | 200 | Grid BFS / DFS prototype |
| Medium | Rotting Oranges | 994 | Multi-source BFS prototype |
| Medium | 01 Matrix | 542 | Multi-source distance-to-nearest |
| Medium | Walls and Gates | 286 | Same as 01 Matrix, gates are sources |
| Medium | Open the Lock | 752 | State-space BFS, dead-ends as constraints |
| Medium | Shortest Path in Binary Matrix | 1091 | 8-directional grid BFS |
| Medium | Keys and Rooms | 841 | Connectivity via BFS or DFS |
| Medium | Pacific Atlantic Water Flow | 417 | Multi-source from both oceans |
| Medium | Bus Routes | 815 | Bipartite BFS — bus is a node, stop is a node |
| Hard | Word Ladder | 127 | State-space BFS, on-the-fly neighbor gen |
| Hard | Word Ladder II | 126 | BFS to find distance, then DFS to enumerate paths |
| Hard | Sliding Puzzle | 773 | State-space BFS, encode board as string |
| Hard | Shortest Path Visiting All Nodes | 847 | BFS with bitmask state |
| Hard | Minimum Number of Days to Disconnect Island | 1568 | BFS plus articulation points |
BFS Interview Q&A
Shortest path in a maze with obstacles — full walkthrough
Shortest path in a maze with obstacles — full walkthrough
- Recognize: unweighted shortest path on a grid -> standard BFS.
- Define state:
(row, col)plus the running distance. - Seed the queue with the start cell; mark visited at enqueue.
- Pop, return immediately when you reach the target.
- For each of 4 (or 8) directions, bounds-check, obstacle-check, visited-check, then enqueue.
(0,0) to (n-1, n-1). Length = number of cells visited.- Using DFS. “I’ll DFS and track the minimum.” Correct in principle but exponential in practice. DFS explores all paths; on a 30x30 grid that is more states than atoms in your laptop.
- Marking visited at dequeue time. Same node enters the queue from each neighbor; queue balloons; TLE.
- Forgetting the
n == 1edge case.(0,0)is both start and end. The answer is 1, not 0 and not -1.
Word Ladder explained — why state-space BFS, and how to avoid TLE
Word Ladder explained — why state-space BFS, and how to avoid TLE
- Recognize: shortest transformation sequence -> shortest path -> BFS.
- Each word is a state. Two states are adjacent if they differ by exactly one character.
- The graph is implicit — never build it upfront. Generate neighbors on the fly.
- Use a
setof valid words for O(1) membership checks. - Mark visited (or remove from word set) at enqueue, return when you pop the target.
beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log","cog"], the shortest transformation is “hit” -> “hot” -> “dot” -> “dog” -> “cog”, length 5.begin_set and end_set. At each step expand the smaller one. When a generated word lives in the other set, you have found the meeting point.*. Two words share a pattern -> they differ in that position. Group words by patterns. Now neighbor lookup is O(L) instead of O(26 * L). Memory cost: O(N * L) for the pattern map. Beats the on-the-fly approach when you run many ladder queries on a fixed dictionary.- Building an explicit graph by comparing every word pair. O(N^2 * L) preprocessing — TLE on dictionaries of 5000+ words.
- Using DFS with backtracking. Finds A path but not the SHORTEST. DFS Word Ladder is famously broken.
- Forgetting to check
endWord in wordList. If the end word is not in the dictionary, the answer is 0, not 1.
BFS vs DFS — when to choose which
BFS vs DFS — when to choose which
- Default decision rule: if the problem says “shortest” or “minimum steps” -> BFS. If it says “all paths,” “cycle,” or “topological order” -> DFS.
- Memory shape matters: BFS uses O(width), DFS uses O(height). Wide tree -> DFS wins memory; deep tree -> BFS wins.
- Recursion limit: deep graphs can overflow the call stack with recursive DFS. BFS is iterative by default.
- Implementation cost: tree problems map naturally to recursion (DFS feels easier). Grid shortest-path problems map naturally to BFS.
path.pop(), and explores systematically.- “BFS is always faster.” No — BFS is faster for shortest paths in unweighted graphs. For “all paths,” “cycle detection,” or “topological order,” DFS is at least as fast and usually simpler.
- “DFS is faster on memory.” True for narrow-deep graphs; false for wide-shallow ones. The honest answer is “it depends on the shape.”
- “They’re interchangeable.” They are not. BFS gives shortest path; DFS does not. DFS detects cycles in directed graphs cleanly with three-coloring; BFS needs Kahn’s.
Practice Roadmap
Day 2: Grid BFS
- Solve: Number of Islands, Rotting Oranges
- Focus: 4-directional movement, multi-source
Day 3: State Space BFS
- Solve: Open the Lock, Minimum Genetic Mutation
- Focus: State as node, transformation as edge
Interview Questions
Q1: Why does BFS guarantee the shortest path in unweighted graphs but DFS does not?
Q1: Why does BFS guarantee the shortest path in unweighted graphs but DFS does not?
- BFS explores nodes in order of increasing distance from the source. It processes all nodes at distance
dbefore any node at distanced+1. This is a direct consequence of the FIFO queue: nodes discovered earlier (closer to the source) are always dequeued and processed before nodes discovered later (farther away). - The first time BFS reaches any node, it has arrived via a path of minimum length. No shorter path can exist because all shorter paths would have been explored in a previous level.
- DFS, by contrast, commits to one branch and may reach a node via a long, winding path before ever exploring a direct shortcut. It has no distance ordering — the stack (LIFO) processes the most recently discovered node, not the closest one.
- Concrete example: in a social network graph, BFS from person A finds all 1st-degree connections, then all 2nd-degree, etc. DFS from A might tunnel through a chain of 50 people before finding someone who was actually a direct friend-of-friend.
- The caveat: this guarantee holds only for unweighted graphs (or graphs where all edges have equal weight). For weighted graphs, BFS breaks down — you need Dijkstra’s algorithm, which is essentially BFS with a priority queue replacing the plain queue, so it processes by cumulative weight rather than hop count.
- What would happen if you ran BFS on a graph with edge weights of 1 and 2? Would it still give shortest paths? Why or why not?
- Can you modify BFS to handle a graph where edges have weights of only 0 and 1? (Hint: 0-1 BFS with a deque.)
Q2: Explain the difference between marking a node visited at enqueue time versus dequeue time in BFS. Why does it matter?
Q2: Explain the difference between marking a node visited at enqueue time versus dequeue time in BFS. Why does it matter?
- When you mark visited at enqueue time, you prevent the same node from being added to the queue multiple times. Each node enters the queue exactly once, so total work is O(V + E).
- When you mark visited at dequeue time, multiple neighbors can independently enqueue the same node before any of those copies gets dequeued. In a dense graph, this can cause the queue to balloon. Worst case, the same node appears O(V) times in the queue, turning the algorithm into O(V^2) time and space.
- Correctness is preserved either way for shortest-path queries in unweighted graphs — the first dequeue of a node still carries the correct shortest distance. But performance degrades significantly with dequeue-time marking.
- Practical example: on a 1000x1000 grid (1 million cells), dequeue-time marking can cause the queue to hold millions of duplicate entries. This is the single most common cause of Time Limit Exceeded on BFS grid problems on LeetCode.
- The fix is one line: move
visited.add(neighbor)from afterqueue.popleft()to immediately before or afterqueue.append(neighbor).
- Is there any scenario where you would intentionally mark visited at dequeue time rather than enqueue time?
- In the multi-source BFS pattern (e.g., Rotting Oranges), how do you ensure all sources are properly marked visited before the main loop begins?
Q3: Walk me through how you would solve the 'Rotting Oranges' problem. What pattern does it use and why?
Q3: Walk me through how you would solve the 'Rotting Oranges' problem. What pattern does it use and why?
- This is a multi-source BFS problem. You start by seeding the queue with all rotten oranges simultaneously, then expand outward one time step at a time. Each fresh orange that gets reached is “infected” and joins the queue for the next round.
- Why multi-source rather than running BFS from each rotten orange separately: with k rotten oranges on an m*n grid, separate BFS would be O(k * m * n). Multi-source BFS is O(m * n) because the wavefronts merge — once a cell is visited by any wavefront, it is never revisited.
- Implementation steps: (1) Scan the grid, enqueue all rotten orange positions, and count fresh oranges. (2) BFS level-by-level, where each level represents one minute. For each rotten orange dequeued, infect its 4-directional fresh neighbors, mark them rotten, decrement the fresh count, and enqueue them. (3) After BFS finishes, if fresh count is 0, return the number of levels processed (minutes). Otherwise return -1.
- Edge cases: grid with no fresh oranges (return 0), grid with fresh oranges unreachable from any rotten one (return -1), grid with no rotten oranges but some fresh (return -1).
- The “time” dimension is tracked by processing level-by-level using the
level_size = len(queue)snapshot, exactly like tree level-order traversal.
- How would you modify this solution if different rotten oranges rotted at different speeds (e.g., some take 1 minute, others take 2)?
- What if the grid were 3D (a cube of oranges)? What changes in your approach?
Q4: When would you use BFS on a state space (like Word Ladder or Open the Lock) rather than on an explicit graph? How do you model the states and transitions?
Q4: When would you use BFS on a state space (like Word Ladder or Open the Lock) rather than on an explicit graph? How do you model the states and transitions?
- State-space BFS treats each configuration of the problem as a node and each valid transformation as an edge. You do not build the graph upfront; instead, you generate neighbors on the fly.
- For Word Ladder: each word is a state, and two words are connected (edge) if they differ by exactly one character. BFS finds the shortest transformation sequence. To generate neighbors efficiently, you iterate over each character position and try all 26 replacements, checking membership in the word set — this is O(26 * L) per word where L is word length, which is much faster than comparing against every word in the dictionary (O(N * L)).
- For Open the Lock: each 4-digit combination is a state (10,000 total states). Each digit can be turned up or down, giving 8 neighbors per state. BFS from “0000” finds the minimum turns to reach the target, skipping deadend states.
- The key insight is that BFS still guarantees shortest path because all edges have equal weight (one transformation = one step). The “graph” is implicit but logically identical to an explicit adjacency list.
- Performance consideration: the state space can be enormous. Word Ladder with 5-letter words has a state space of up to 26^5 = ~12 million, but the word set typically constrains this to a few thousand. For problems where the state space is truly huge, consider bidirectional BFS (search from both start and end, meeting in the middle) to reduce the explored space from O(b^d) to O(b^(d/2)).
- How does bidirectional BFS work, and when is it a clear win over standard BFS?
- In Word Ladder, why do we remove words from the word set as we visit them, rather than using a separate visited set? What is the trade-off?
Q5: Compare BFS and Dijkstra's algorithm. When does BFS break down and Dijkstra become necessary?
Q5: Compare BFS and Dijkstra's algorithm. When does BFS break down and Dijkstra become necessary?
- BFS assumes all edges have equal weight (or weight 1). Under this assumption, the FIFO queue naturally processes nodes in order of increasing distance. The first time you reach a node is via the shortest path.
- Dijkstra generalizes BFS to non-negative weighted graphs by replacing the FIFO queue with a min-heap (priority queue). Instead of processing by hop count, it processes by cumulative edge weight, always expanding the node with the smallest tentative distance.
- BFS breaks down with unequal weights because a node 2 hops away via heavy edges could be farther than a node 5 hops away via light edges. BFS would visit the 2-hop node first and potentially record a wrong shortest distance.
- Concrete example: routing network packets where links have different latencies. BFS would say the 2-hop route is shorter, but if those 2 hops have 500ms latency each (1000ms total) and the 5-hop route has 50ms per hop (250ms total), BFS gives the wrong answer.
- Complexity comparison: BFS is O(V + E). Dijkstra with a binary heap is O((V + E) log V). For unweighted graphs, BFS is strictly faster — never use Dijkstra when BFS suffices.
- Special case — 0-1 BFS: if edge weights are only 0 or 1, you can use a deque instead of a priority queue. Push weight-0 edges to the front, weight-1 edges to the back. This achieves O(V + E), same as BFS.
- Can Dijkstra handle negative edge weights? If not, what algorithm would you use?
- What is 0-1 BFS, and can you think of a real problem where it applies?
Q6: You are given an unweighted graph and need to find shortest paths from every node to every other node. How would you approach this?
Q6: You are given an unweighted graph and need to find shortest paths from every node to every other node. How would you approach this?
- For an unweighted graph, the most straightforward approach is to run BFS from every node. Each BFS is O(V + E), so total time is O(V * (V + E)). For sparse graphs where E is O(V), this is O(V^2). For dense graphs where E is O(V^2), this is O(V^3).
- The alternative is Floyd-Warshall, which computes all-pairs shortest paths in O(V^3) time and O(V^2) space, regardless of graph density. But Floyd-Warshall is designed for weighted graphs — for unweighted graphs it does unnecessary work compared to BFS on sparse graphs.
- Decision rule: if the graph is sparse (E much less than V^2), V runs of BFS is faster. If the graph is dense, both approaches are O(V^3) and Floyd-Warshall may be simpler to implement.
- Practical consideration: BFS from all nodes is trivially parallelizable — each source BFS is independent. Floyd-Warshall’s dynamic programming has sequential dependencies between iterations, making it harder to parallelize.
- Space: BFS approach needs O(V) per run (reusable), or O(V^2) if you store all results. Floyd-Warshall always needs O(V^2).
- What if you only need shortest paths between a specific subset of S nodes, not all V nodes? How does that change your approach?
- If the graph is unweighted and undirected, can you exploit any symmetry to reduce work?
Q7: How would you implement BFS on a graph that does not fit in memory? Describe the high-level approach.
Q7: How would you implement BFS on a graph that does not fit in memory? Describe the high-level approach.
- When the graph does not fit in memory, you need external-memory BFS (sometimes called out-of-core BFS). The core idea is to process the BFS level by level, reading and writing the frontier to disk at each step.
- Approach: store the graph on disk (as an edge list or adjacency list in files). Maintain two files — current frontier and next frontier. At each level, stream through the current frontier, look up each node’s neighbors from the disk-based graph, and write unvisited neighbors to the next frontier file. Then swap the files and repeat.
- The visited set is the biggest challenge. If V is too large for a hash set in memory, you can use a Bloom filter (probabilistic, may miss some nodes), a bitmap if node IDs are dense integers, or an external sort-based approach where you periodically sort and deduplicate the visited list on disk.
- Systems used in practice: graph processing frameworks like Apache Giraph, GraphX (Spark), or Google’s Pregel model essentially do distributed BFS using this level-synchronous approach, where each “superstep” processes one BFS level across a cluster.
- Real-world example: computing degrees of separation in a social graph with billions of nodes. Facebook’s social graph has ~3 billion nodes and hundreds of billions of edges — single-machine BFS is impossible, so they use distributed graph systems.
- What trade-offs does a Bloom filter introduce when used as the visited set? How would you handle false positives?
- How does the Pregel/BSP model relate to level-synchronous BFS?
Q8: You run BFS on a large grid and it passes all test cases but gets Time Limit Exceeded. Walk me through how you would debug and optimize.
Q8: You run BFS on a large grid and it passes all test cases but gets Time Limit Exceeded. Walk me through how you would debug and optimize.
- Step 1: Check visited marking. The most common cause of TLE in BFS is marking visited at dequeue time instead of enqueue time. This causes duplicate entries in the queue, potentially multiplying work by a factor of the average degree. Fix: move the visited check to enqueue time.
- Step 2: Check data structure choice. In Python, using
list.pop(0)instead ofcollections.deque.popleft()turns every dequeue from O(1) to O(n). On a 1000x1000 grid with ~1 million cells, this alone can cause a 500x slowdown. Fix: always usedeque. - Step 3: Check neighbor generation. Are you creating new objects or tuples unnecessarily? In tight loops on large grids, object creation overhead matters. Pre-compute the directions array outside the loop. Avoid string concatenation for state keys — use tuples or integers.
- Step 4: Check visited set implementation. For grids, a 2D boolean array is faster than a hash set of tuples.
visited[r][c] = Trueis O(1) with no hashing, versus the overhead of hashing a tuple for every cell. - Step 5: Consider bidirectional BFS if the problem has a single start and single end. This can reduce explored nodes from O(b^d) to O(b^(d/2)), which for branching factor 4 and depth 20 is the difference between 4^20 (~1 trillion) and 2 * 4^10 (~2 million).
- Step 6: Profile if the above do not fix it. Measure queue size at peak, total nodes processed, and time per operation.
- When would A* search be preferable to plain BFS on a grid? What heuristic would you use?
- How would you estimate whether your BFS solution will pass within the time limit before submitting?
Interview Deep-Dive
When would you choose BFS over DFS, Dijkstra, or A*? What are the precise conditions where BFS is optimal?
When would you choose BFS over DFS, Dijkstra, or A*? What are the precise conditions where BFS is optimal?
- BFS is optimal when you need shortest path in an unweighted graph (or a graph where all edges have equal weight). The FIFO queue ensures nodes are processed in order of increasing hop count, so the first time you reach any node is via the shortest path.
- BFS over DFS: Use BFS when path length matters. DFS finds a path but not necessarily the shortest. DFS is better for exhaustive exploration (all paths, cycle detection, topological sort) where you do not care about distance.
- BFS over Dijkstra: Dijkstra generalizes BFS to weighted graphs using a priority queue, adding an O(log V) factor per operation. On unweighted graphs, Dijkstra’s priority queue degenerates to a plain queue — so BFS gives O(V + E) versus Dijkstra’s O((V + E) log V). Never use Dijkstra when BFS suffices.
- BFS over A:* A* uses a heuristic to guide search toward the goal, potentially visiting far fewer nodes than BFS. But A* requires a valid heuristic (admissible and consistent), adds heap overhead, and is harder to implement. Use BFS when the graph is small enough that exploring all nodes at each distance is acceptable, or when no good heuristic exists.
- The 0-1 BFS special case: When edge weights are 0 or 1, use a deque instead of a priority queue. Weight-0 edges push to the front, weight-1 edges push to the back. This achieves O(V + E) — same as plain BFS — while handling two weight classes correctly.
- Time: O(V + E). Space: O(V) for the queue and visited set. Space can be the bottleneck — the queue can hold an entire level at once.
- Plain BFS fails because it processes by hop count, not cumulative cost. A corridor of cost 2 should “count double.”
- Option 1: Dijkstra with a binary heap — O((V + E) log V). Correct but adds logarithmic overhead.
- Option 2: “Virtual node expansion” — split every cost-2 edge into two cost-1 edges by inserting a virtual intermediate node. Now standard BFS works on the expanded graph. The graph grows by at most E nodes, so complexity is still O(V + E). This trick works when the number of distinct weights is small.
- Option 3: If weights are only 1 and 2, use a 2-bucket BFS (a generalization of 0-1 BFS with two queues). Process all distance-d nodes before distance-(d+1) nodes, using bucket assignment by edge weight.
What is the time and space complexity of BFS on a grid of size M x N? Walk me through why space can be the real bottleneck, not time.
What is the time and space complexity of BFS on a grid of size M x N? Walk me through why space can be the real bottleneck, not time.
- Time: O(M * N). Each cell is enqueued and dequeued at most once (assuming visited-at-enqueue). Processing each cell involves checking 4 neighbors — constant work per cell.
- Space: O(M * N) in the worst case for the visited array. The queue itself holds at most one full “level” of the BFS wavefront. For a grid, the maximum wavefront width depends on the shape: for a square grid of side S, the wavefront can hold up to O(S) = O(sqrt(MN)) cells. But in pathological cases (a thin corridor), the wavefront is O(1) — so queue space ranges from O(1) to O(MN).
- Why space is often the real bottleneck: For a 10,000 x 10,000 grid (100 million cells), the visited array alone requires 100 MB as a boolean matrix. The queue at peak can hold up to 40,000 cells (the diagonal wavefront), each stored as a tuple of (row, col, distance) — roughly 1 MB. The visited array dominates.
- Optimization 1 — In-place visited marking. If you can modify the grid (change ‘0’ to ‘2’), you eliminate the visited array entirely, reducing space from O(M*N) to O(queue_size).
- Optimization 2 — Bitset for visited. Use a bitarray instead of a boolean array. Each cell uses 1 bit instead of 1 byte, reducing visited storage by 8x. For a 10,000 x 10,000 grid, that is ~12 MB instead of ~100 MB.
- Optimization 3 — Bidirectional BFS. For single-source-to-single-target problems, BFS from both ends halves the explored radius, reducing visited cells from O(r^2) to O(2 * (r/2)^2) = O(r^2 / 2). For grids with long shortest paths, this cuts space and time roughly in half.
- The grid has 10^12 cells — it cannot be stored in RAM. You need an external-memory BFS or implicit graph approach.
- If the grid is sparse (mostly walls with few open cells), represent it as an adjacency list of only the open cells. BFS operates on this compact representation.
- If the grid is dense, process it tile by tile. Load one tile into memory, compute the BFS wavefront within it, write the frontier to disk, load the next tile, and continue. This is level-synchronous external-memory BFS.
- In practice, grids this large arise in GIS applications (satellite imagery, terrain analysis). Tools like GRASS GIS use disk-based raster processing with similar tile-based approaches.
Explain multi-source BFS. Why is it O(M * N) total instead of O(K * M * N) where K is the number of sources?
Explain multi-source BFS. Why is it O(M * N) total instead of O(K * M * N) where K is the number of sources?
- Multi-source BFS seeds the queue with all source nodes simultaneously, then expands the wavefront as if all sources started at time 0. The key property: the first wavefront to reach any cell carries the shortest distance from the nearest source.
- Why O(M * N) and not O(K * M * N): Every cell is visited at most once, regardless of how many sources exist. When a cell is reached by one wavefront, it is marked visited and never processed again by any other wavefront. The wavefronts merge — they do not duplicate work.
- Analogy: Imagine lighting K candles in a dark room simultaneously. Each candle’s light spreads at the same speed. The lights merge where they meet, and no point in the room is “re-illuminated.” Total work is proportional to the room size, not the number of candles.
- Contrast with running K separate BFS runs: Each independent BFS is O(M * N), and running K of them is O(K * M * N). On a 1000x1000 grid with 10,000 sources, that is 10^10 operations (TLE) versus 10^6 for multi-source BFS.
- Classic problems that use this: Rotting Oranges (all rotten oranges are sources), 01 Matrix (all 0-cells are sources), Walls and Gates (all gates are sources).
- Implementation detail: Before the BFS loop, iterate through the entire grid and enqueue every source. Mark all sources as visited. Then run standard BFS — the queue already contains the full “level 0” wavefront.
- Track minutes by processing the queue level-by-level using the
level_size = len(queue)snapshot technique. Each level corresponds to one minute. Increment the minute counter after processing each level. - The answer is -1 when at least one fresh orange is unreachable from any rotten orange — it is in a disconnected region (surrounded by walls or absent rotten neighbors). After BFS completes, check if any fresh oranges remain. If yes, return -1.
- Edge case: if there are no fresh oranges initially, the answer is 0 (not -1). If there are no rotten oranges but some fresh ones exist, the answer is -1.
What is bidirectional BFS? When does it provide a meaningful speedup over standard BFS, and when does it not help?
What is bidirectional BFS? When does it provide a meaningful speedup over standard BFS, and when does it not help?
- Bidirectional BFS runs two BFS instances simultaneously: one from the source and one from the target. When their frontiers meet (share a common node), the shortest path is the sum of the distances from each side to the meeting point.
- Speedup analysis: Standard BFS explores all nodes within distance d, visiting O(b^d) nodes where b is the branching factor. Bidirectional BFS explores O(b^(d/2)) from each side, totaling O(2 * b^(d/2)). For b = 4 and d = 20: standard = 4^20 = 10^12, bidirectional = 2 * 4^10 = 2 * 10^6. That is a 500,000x speedup.
- When it helps most: Large, sparse graphs with a single source and single target where the shortest path is long (large d). Social network degree-of-separation queries are the classic use case.
- When it does NOT help:
- Multi-target problems (e.g., nearest hospital from a patient) — you cannot run BFS “from the target” if there are many targets. Use multi-source BFS from all targets instead.
- Very short paths (d is small) — the constant factor of maintaining two queues and checking for frontier intersection outweighs the savings.
- Directed graphs where reverse edges are expensive or unknown — backward BFS requires traversing edges in reverse, which may not be available in the graph representation.
- State-space BFS with complex states — tracking which states belong to which frontier and detecting intersection becomes error-prone.
- Implementation detail: At each step, expand the smaller frontier (fewer nodes) to minimize total work. Check intersection by looking up nodes from one frontier’s visited set in the other’s.
- Maintain two sets:
begin_set(words reachable frombeginWord) andend_set(words reachable fromendWord). At each step, expand the smaller set. For each word in the smaller set, generate all single-character transformations and check if any appear in the other set (meeting point found) or in the word dictionary (add to the next level). - Practical speedup on a dictionary of 5000 five-letter words with a path length of 10: standard BFS visits roughly 26 * 5 * (number of words at each level) for 10 levels. Bidirectional BFS does 5 levels from each side. For branching factor ~10, that is 10^10 vs 2 * 10^5 — a dramatic reduction.
- The key trick: swap
begin_setandend_setso you always expand the smaller one. This keeps the frontier sizes balanced and prevents one side from exploding.