Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

BFS Pattern

What is BFS?

Breadth-First Search explores all neighbors at the current depth before moving to nodes at the next depth level. It uses a queue and guarantees the shortest path in unweighted graphs. Think of dropping a stone into a still pond. Ripples spread outward in concentric circles — each ring reaches everything at the same distance from the center before moving further out. BFS works exactly like those ripples: it visits all nodes 1 step away, then all nodes 2 steps away, and so on. This is why BFS naturally finds the shortest path in unweighted graphs — the first time it reaches a node is guaranteed to be via the shortest route. The data structure behind the ripple is a queue (FIFO). You process the node at the front, enqueue its unvisited neighbors at the back, and repeat. Because older nodes (closer to the source) are always processed before newer ones (further away), distance ordering is preserved automatically. Complexity: O(V + E) time (every vertex and edge visited once), O(V) space for the queue and visited set. Space can be the bottleneck — the queue can hold an entire level at once, which for a balanced binary tree means up to n/2 nodes at the widest point.
Quick Recognition: Shortest path in unweighted graphs, level-order traversal, finding nearest target. Keywords: “minimum steps”, “shortest path”, “level by level”.
Pattern Recognition Cheatsheet — When the LeetCode prompt says X, reach for BFS:
Phrase in the ProblemWhat to Reach For
”shortest path in unweighted graph / grid”Standard BFS with visited set, track distance in tuple
”level order” / “by level” / “level by level”Tree BFS with the level_size = len(queue) snapshot trick
”minimum number of steps / moves / mutations / transformations”State-space BFS — state is a node, transformation is an edge
”shortest path between two words / locks / configurations”State-space BFS (Word Ladder, Open the Lock)
“is the graph bipartite?” / “two-coloring”BFS coloring — alternate colors at each level, conflict means non-bipartite
”rotting oranges” / “infect all” / “spread from multiple sources simultaneously”Multi-source BFS — seed the queue with every source before the loop starts
”distance from each cell to the nearest X”Multi-source BFS — sources are the X cells, expand outward
”all nodes K distance from target”Tree BFS treating tree as graph (build parent pointers first)
“minimum knight moves” / “shortest path on infinite chessboard”BFS with custom move set (knight directions)
“shortest bridge” / “connect two islands”DFS to find first island, then multi-source BFS expanding outward
If the problem mentions weights that are not all equal, BFS is wrong — use Dijkstra. If weights are 0 or 1 only, use 0-1 BFS with a deque.

Pattern Recognition Checklist

Use BFS When

  • Finding shortest path (unweighted)
  • Level-order tree traversal
  • Finding nearest target
  • State space exploration (puzzles)
  • Multi-source shortest distances

Don't Use When

  • Weighted graph (use Dijkstra)
  • Need to explore deep paths first
  • Memory is very constrained
  • Preorder/inorder/postorder needed

BFS vs DFS Quick Reference

AspectBFSDFS
Data StructureQueue (FIFO)Stack/Recursion (LIFO)
Shortest PathYes (unweighted)No
Space ComplexityO(width)O(height)
Use CaseNearest, level-orderPaths, cycles, connectivity
Traversal OrderLevel by levelDeep first

When to Use

Shortest Path

Finding minimum steps in unweighted graphs

Level-Order Traversal

Processing tree/graph by levels

Nearest Neighbor

Finding closest target from source

State Space Search

Puzzle solving, game states

Pattern Variations

1. Tree Level-Order Traversal

The most common BFS variant. The trick is snapshotting the queue size at the start of each level so you know exactly how many nodes belong to the current level before their children are added. Time: O(n) — every node visited once. Space: O(w) where w is the maximum tree width (up to n/2 for a complete binary tree).
from collections import deque

def level_order(root):
    """Return nodes level by level"""
    if not root:
        return []
    
    result = []
    queue = deque([root])
    
    while queue:
        level_size = len(queue)    # Snapshot: how many nodes are in this level
        current_level = []
        
        for _ in range(level_size):       # Process exactly this many nodes
            node = queue.popleft()        # Dequeue front (FIFO order)
            current_level.append(node.val)
            
            if node.left:                 # Enqueue children for the NEXT level
                queue.append(node.left)
            if node.right:
                queue.append(node.right)
        
        result.append(current_level)      # One complete level collected
    
    return result

2. Graph BFS (Shortest Path)

The canonical use of BFS. A critical detail that trips up many candidates: mark nodes as visited when enqueuing, not when dequeuing. If you wait until dequeue time, multiple copies of the same node can pile up in the queue, wasting time and memory — and in the worst case turning O(V+E) into O(V^2). Time: O(V + E). Space: O(V).
from collections import deque

def shortest_path(graph, start, end):
    """Find shortest path in unweighted graph"""
    if start == end:
        return 0
    
    queue = deque([(start, 0)])  # (node, distance)
    visited = {start}            # Mark visited AT ENQUEUE TIME
    
    while queue:
        node, dist = queue.popleft()
        
        for neighbor in graph[node]:
            if neighbor == end:          # Found target -- return immediately
                return dist + 1
            
            if neighbor not in visited:
                visited.add(neighbor)    # Mark BEFORE pushing to queue
                queue.append((neighbor, dist + 1))
    
    return -1  # No path found -- graph is disconnected or target unreachable

3. Grid BFS

from collections import deque

def shortest_path_grid(grid):
    """Find shortest path from top-left to bottom-right"""
    if not grid or grid[0][0] == 1:
        return -1
    
    rows, cols = len(grid), len(grid[0])
    directions = [(0, 1), (0, -1), (1, 0), (-1, 0)]
    
    queue = deque([(0, 0, 1)])  # (row, col, distance)
    visited = {(0, 0)}
    
    while queue:
        r, c, dist = queue.popleft()
        
        if r == rows - 1 and c == cols - 1:
            return dist
        
        for dr, dc in directions:
            nr, nc = r + dr, c + dc
            
            if 0 <= nr < rows and 0 <= nc < cols:
                if grid[nr][nc] == 0 and (nr, nc) not in visited:
                    visited.add((nr, nc))
                    queue.append((nr, nc, dist + 1))
    
    return -1

4. Multi-Source BFS

Instead of starting from one source, you seed the queue with all sources simultaneously. The BFS wavefront expands from every source at once, so the first wave to reach any cell automatically carries the shortest distance. This is a favorite pattern in problems like “Rotting Oranges” and “01 Matrix.” Why it works: Think of lighting multiple candles in a dark room at the same time. Each candle’s light spreads at the same rate. The first light to reach a point is from the nearest candle. Time: O(m * n). Space: O(m * n) worst case.
from collections import deque

def walls_and_gates(rooms):
    """Fill each empty room with distance to nearest gate"""
    if not rooms:
        return
    
    INF = 2147483647
    GATE = 0
    rows, cols = len(rooms), len(rooms[0])
    directions = [(0, 1), (0, -1), (1, 0), (-1, 0)]
    
    # Seed the queue with ALL gates -- this is what makes it multi-source
    queue = deque()
    for r in range(rows):
        for c in range(cols):
            if rooms[r][c] == GATE:
                queue.append((r, c))
    
    while queue:
        r, c = queue.popleft()
        
        for dr, dc in directions:
            nr, nc = r + dr, c + dc
            
            if 0 <= nr < rows and 0 <= nc < cols:
                if rooms[nr][nc] == INF:             # Unvisited empty room
                    rooms[nr][nc] = rooms[r][c] + 1  # Distance = parent + 1
                    queue.append((nr, nc))

5. Word Ladder (State Space BFS)

from collections import deque

def word_ladder(begin_word, end_word, word_list):
    """Find shortest transformation sequence length"""
    word_set = set(word_list)
    if end_word not in word_set:
        return 0
    
    queue = deque([(begin_word, 1)])
    visited = {begin_word}
    
    while queue:
        word, length = queue.popleft()
        
        if word == end_word:
            return length
        
        # Try changing each character
        for i in range(len(word)):
            for c in 'abcdefghijklmnopqrstuvwxyz':
                new_word = word[:i] + c + word[i+1:]
                
                if new_word in word_set and new_word not in visited:
                    visited.add(new_word)
                    queue.append((new_word, length + 1))
    
    return 0

Classic Problems

Pattern: Process nodes level by levelKey: Track level size before processing
Pattern: Multi-source BFS with time trackingKey: Start from all rotten oranges simultaneously
Pattern: State space BFSKey: Each word is a state, edges are single-char changes
Pattern: Multi-source BFS from target cellsKey: Start from 0s to find distance to nearest 0

Common Mistakes

Avoid These Pitfalls:
  1. Marking visited at dequeue instead of enqueue: This is the most common BFS bug. If you check visited when you pop from the queue, the same node can be enqueued multiple times from different neighbors. Mark it as visited immediately when you add it to the queue. This can be the difference between passing and TLE on large inputs.
  2. Forgetting to track distance: Either include distance as part of the queue tuple (node, dist), or process level-by-level using the level_size = len(queue) snapshot technique. Without explicit distance tracking, you have no way to return the shortest path length.
  3. Not handling disconnected components: BFS from a single source only reaches nodes in the same connected component. If the problem requires visiting all nodes (e.g., counting components), you must loop over all unvisited nodes and start a new BFS from each.
  4. Using a list instead of a deque in Python: list.pop(0) is O(n). deque.popleft() is O(1). On grids with millions of cells, this mistake alone can cause TLE.
  5. Forgetting the initial visited check for multi-source BFS: When seeding the queue with multiple sources, make sure all sources are marked as visited before the main loop begins. Otherwise the BFS can “revisit” source nodes.

Interview Problems by Company

ProblemCompanyKey Concept
Level Order TraversalAll FAANGTree BFS
Rotting OrangesAmazon, MetaMulti-source BFS
Number of IslandsAll FAANGGrid BFS/DFS
Open the LockGoogleState space BFS
Shortest Path in MatrixMetaGrid BFS

Interview Tips

Script for interviews:
  1. “I need shortest path in unweighted graph, so I’ll use BFS.”
  2. “I’ll use a queue starting from the source.”
  3. “For each level, I process all current nodes before moving deeper.”
  4. “I mark nodes visited when enqueuing to avoid duplicates.”
  5. “Time is O(V + E), space is O(V) for the queue.”
from collections import deque

def bfs(start, target, graph):
    queue = deque([(start, 0)])  # (node, distance)
    visited = {start}
    
    while queue:
        node, dist = queue.popleft()
        
        if node == target:
            return dist
        
        for neighbor in graph[node]:
            if neighbor not in visited:
                visited.add(neighbor)  # Mark before enqueue!
                queue.append((neighbor, dist + 1))
    
    return -1  # Not found
For problems like “Rotting Oranges” or “01 Matrix”:
  1. Add ALL sources to initial queue
  2. Process level by level
  3. All sources expand simultaneously
  4. First to reach a cell = shortest distance

Worked LeetCode Problems

The five problems below cover roughly 80 percent of BFS interview territory. Solve them in this order: 102 to lock down the level-size snapshot, 200 to nail grid BFS, 994 for multi-source, 542 for the “distance to nearest” pattern, then 127 for state-space BFS.

LC 102 — Binary Tree Level Order Traversal (Medium)

The canonical tree BFS problem. The trick that trips candidates: capturing len(queue) BEFORE the inner loop, not during. If you read the size inside the loop, it grows as you enqueue children and you bleed across levels.
from collections import deque

def levelOrder(root):
    if not root:
        return []
    result, queue = [], deque([root])
    while queue:
        size = len(queue)              # Snapshot BEFORE inner loop
        level = []
        for _ in range(size):          # Process exactly this many
            node = queue.popleft()
            level.append(node.val)
            if node.left:  queue.append(node.left)
            if node.right: queue.append(node.right)
        result.append(level)
    return result
Time O(n), space O(w) where w is max width (up to n/2). The same template generalizes to “right side view” (just take last element of each level), “average of levels,” “zigzag traversal” (reverse alternate levels), and “level with maximum sum.”

LC 200 — Number of Islands (Medium, BFS variant)

Iterate the grid; whenever you find a '1', increment the island count and BFS-flood the entire island, marking each cell visited. Why BFS instead of DFS here: on a 300x300 all-land grid Python’s recursion limit blows up with DFS; iterative BFS handles any grid size.
from collections import deque

def numIslands(grid):
    if not grid: return 0
    rows, cols = len(grid), len(grid[0])
    count = 0
    for r in range(rows):
        for c in range(cols):
            if grid[r][c] == '1':
                count += 1
                queue = deque([(r, c)])
                grid[r][c] = '0'                       # Mark at enqueue
                while queue:
                    cr, cc = queue.popleft()
                    for dr, dc in [(0,1),(0,-1),(1,0),(-1,0)]:
                        nr, nc = cr+dr, cc+dc
                        if 0 <= nr < rows and 0 <= nc < cols and grid[nr][nc] == '1':
                            grid[nr][nc] = '0'         # Mark BEFORE pushing
                            queue.append((nr, nc))
    return count
Time O(m*n), space O(min(m,n)) for the queue (max wavefront on a grid).

LC 994 — Rotting Oranges (Medium, multi-source BFS)

The textbook multi-source BFS. Seed the queue with every rotten orange before the main loop starts; track minutes via the level_size trick.
from collections import deque

def orangesRotting(grid):
    rows, cols = len(grid), len(grid[0])
    queue, fresh = deque(), 0
    for r in range(rows):
        for c in range(cols):
            if grid[r][c] == 2: queue.append((r, c))
            elif grid[r][c] == 1: fresh += 1
    if fresh == 0: return 0                            # Edge case
    minutes = 0
    while queue and fresh > 0:
        minutes += 1
        for _ in range(len(queue)):                    # One full minute
            r, c = queue.popleft()
            for dr, dc in [(0,1),(0,-1),(1,0),(-1,0)]:
                nr, nc = r+dr, c+dc
                if 0 <= nr < rows and 0 <= nc < cols and grid[nr][nc] == 1:
                    grid[nr][nc] = 2
                    fresh -= 1
                    queue.append((nr, nc))
    return minutes if fresh == 0 else -1
Why multi-source matters: with k rotten oranges, running BFS from each separately is O(k*m*n). Multi-source merges the wavefronts and is O(m*n) regardless of k.

LC 542 — 01 Matrix (Medium, multi-source BFS)

“For each cell, distance to the nearest 0.” Naive: BFS from every 1. Smart: invert the problem — multi-source BFS from every 0. The first wave to hit a 1 gives its shortest distance.
from collections import deque

def updateMatrix(mat):
    rows, cols = len(mat), len(mat[0])
    queue = deque()
    for r in range(rows):
        for c in range(cols):
            if mat[r][c] == 0:
                queue.append((r, c))
            else:
                mat[r][c] = -1                         # Sentinel for unvisited
    while queue:
        r, c = queue.popleft()
        for dr, dc in [(0,1),(0,-1),(1,0),(-1,0)]:
            nr, nc = r+dr, c+dc
            if 0 <= nr < rows and 0 <= nc < cols and mat[nr][nc] == -1:
                mat[nr][nc] = mat[r][c] + 1
                queue.append((nr, nc))
    return mat
The mental flip — “instead of asking each 1 where the nearest 0 is, ask each 0 to spread outward” — is the most reused trick in BFS interviews.

LC 127 — Word Ladder (Hard, state-space BFS)

Each word is a state; two words are adjacent if they differ in one character. The performance trap: do NOT compare every pair of words upfront (O(N^2 * L)). Generate neighbors by iterating each character position and trying all 26 replacements (O(26 * L) per word).
from collections import deque

def ladderLength(beginWord, endWord, wordList):
    word_set = set(wordList)
    if endWord not in word_set: return 0
    queue = deque([(beginWord, 1)])
    visited = {beginWord}
    while queue:
        word, length = queue.popleft()
        if word == endWord: return length
        for i in range(len(word)):
            for ch in 'abcdefghijklmnopqrstuvwxyz':
                new_word = word[:i] + ch + word[i+1:]
                if new_word in word_set and new_word not in visited:
                    visited.add(new_word)
                    queue.append((new_word, length + 1))
    return 0
For very large word lists, bidirectional BFS (search from both ends, meet in the middle) reduces explored nodes from O(b^d) to O(2 * b^(d/2)). On a typical Word Ladder test case that drops 30 seconds to under 1 second.

Caveats and Traps

Trap 1 — Marking visited at dequeue time leads to TLE. The single most common BFS bug. If you only check visited after popping from the queue, every neighbor that gets re-enqueued from multiple parents piles up duplicates. On a dense graph this turns O(V+E) into O(V^2). On a 1000x1000 grid you can have millions of duplicate queue entries — instant TLE on LeetCode.
Solution: Always mark visited at enqueue time. Move visited.add(neighbor) to be IMMEDIATELY before or after queue.append(neighbor), never inside the popped-node block. The fix is one line. The performance difference can be 100x on large inputs.
Trap 2 — BFS does not give shortest path in weighted graphs. BFS counts hops, not weights. If edges have weights 1, 2, 5, the 2-hop path could cost 10 while a 5-hop path costs 5. BFS will declare the 2-hop path “shorter” and be wrong.
Solution: Use Dijkstra (non-negative weights) or Bellman-Ford (any weights). Special case — if weights are only 0 or 1, use 0-1 BFS with a deque: push 0-weight edges to the front, 1-weight edges to the back. That gives O(V+E) without the heap overhead of Dijkstra.
Trap 3 — Capturing queue size DURING the inner loop instead of before it. Wrong: while i < len(queue): .... As you enqueue children inside the loop, len(queue) grows and you bleed the next level into the current one. Levels merge; you lose the level boundary.
Solution: Always snapshot level_size = len(queue) BEFORE the inner for _ in range(level_size): loop. The size you read at that moment is the only number you trust.
Trap 4 — Using list.pop(0) in Python instead of deque.popleft(). list.pop(0) is O(n) — it shifts every remaining element left. On a million-cell BFS that is a 500x slowdown for no reason.
Solution: from collections import deque and use deque.popleft(), which is O(1). This single change has saved more LeetCode submissions than any other Python trick.
Trap 5 — Multi-source BFS without marking all sources visited up front. If you seed the queue with sources but only mark the first one visited (or none of them), source nodes can be re-entered from each other and counted multiple times.
Solution: Loop over the grid once, enqueue every source AND mark each as visited (or use a sentinel like changing 0 to -1 in 01 Matrix) before the main BFS loop begins.

Curated LeetCode List

DifficultyProblemLC #Why It Matters
EasyBinary Tree Level Order Traversal102Locks in the level-size snapshot pattern
EasyAverage of Levels in Binary Tree637Same template, different aggregation
EasyN-ary Tree Level Order429BFS on non-binary children list
EasyCousins in Binary Tree993Track parent and depth during BFS
EasyMaximum Depth of Binary Tree104BFS counts levels, DFS counts recursion depth
MediumNumber of Islands200Grid BFS / DFS prototype
MediumRotting Oranges994Multi-source BFS prototype
Medium01 Matrix542Multi-source distance-to-nearest
MediumWalls and Gates286Same as 01 Matrix, gates are sources
MediumOpen the Lock752State-space BFS, dead-ends as constraints
MediumShortest Path in Binary Matrix10918-directional grid BFS
MediumKeys and Rooms841Connectivity via BFS or DFS
MediumPacific Atlantic Water Flow417Multi-source from both oceans
MediumBus Routes815Bipartite BFS — bus is a node, stop is a node
HardWord Ladder127State-space BFS, on-the-fly neighbor gen
HardWord Ladder II126BFS to find distance, then DFS to enumerate paths
HardSliding Puzzle773State-space BFS, encode board as string
HardShortest Path Visiting All Nodes847BFS with bitmask state
HardMinimum Number of Days to Disconnect Island1568BFS plus articulation points
Solve in this order: every Easy first (one sitting), then 200 / 994 / 542 / 752 / 1091 (the medium core, ~3 days), then 127 / 126 (hard state-space, plan a weekend).

BFS Interview Q&A

Strong Answer Framework:
  1. Recognize: unweighted shortest path on a grid -> standard BFS.
  2. Define state: (row, col) plus the running distance.
  3. Seed the queue with the start cell; mark visited at enqueue.
  4. Pop, return immediately when you reach the target.
  5. For each of 4 (or 8) directions, bounds-check, obstacle-check, visited-check, then enqueue.
Real LeetCode Example — LC 1091 Shortest Path in Binary Matrix:Move 8-directionally on an n x n grid of 0s (passable) and 1s (blocked). Find shortest path from (0,0) to (n-1, n-1). Length = number of cells visited.
from collections import deque

def shortestPathBinaryMatrix(grid):
    n = len(grid)
    if grid[0][0] != 0 or grid[n-1][n-1] != 0:
        return -1                                  # Start or end blocked
    if n == 1:
        return 1                                   # Single-cell edge case
    directions = [(-1,-1),(-1,0),(-1,1),(0,-1),(0,1),(1,-1),(1,0),(1,1)]
    queue = deque([(0, 0, 1)])                     # (r, c, distance)
    visited = {(0, 0)}
    while queue:
        r, c, dist = queue.popleft()
        for dr, dc in directions:
            nr, nc = r+dr, c+dc
            if nr == n-1 and nc == n-1:
                return dist + 1                    # Found target
            if (0 <= nr < n and 0 <= nc < n
                and grid[nr][nc] == 0
                and (nr, nc) not in visited):
                visited.add((nr, nc))
                queue.append((nr, nc, dist + 1))
    return -1
Time O(n^2), space O(n^2).
Senior follow-up 1 — What if the grid is 10 million by 10 million? The visited set alone would be 100 trillion cells. You cannot allocate that. Options: (a) if the obstacle density is very high, represent the grid as a sparse adjacency list of only passable cells; (b) tile the grid and run BFS one tile at a time, persisting the frontier to disk between tiles; (c) if you only need approximate answers, bidirectional BFS plus an A* heuristic (Chebyshev distance for 8-directional moves) to prune dramatically.
Senior follow-up 2 — What if obstacles can be added or removed in real time? Pure BFS recomputes from scratch each query — too slow. Use dynamic graph algorithms like the link-cut tree for connectivity, or accept eventually-consistent answers and recompute periodically. For pathfinding specifically, D* Lite is the algorithm: it incrementally repairs the previous shortest-path tree when edges change, much faster than rerunning Dijkstra/BFS from scratch.
Senior follow-up 3 — What if memory is constrained to a few MB? Replace the visited set with a bitset (1 bit per cell instead of 1 byte) — 8x reduction. If even that is too much, use iterative deepening BFS (IDDFS): run BFS with depth limit 1, then 2, then 3. Each run uses only O(frontier) memory. Worst case is O(b^d) total work but constant memory.
Common Wrong Answers:
  1. Using DFS. “I’ll DFS and track the minimum.” Correct in principle but exponential in practice. DFS explores all paths; on a 30x30 grid that is more states than atoms in your laptop.
  2. Marking visited at dequeue time. Same node enters the queue from each neighbor; queue balloons; TLE.
  3. Forgetting the n == 1 edge case. (0,0) is both start and end. The answer is 1, not 0 and not -1.
Further Reading: LC 1091, LC 994, LC 542.
Strong Answer Framework:
  1. Recognize: shortest transformation sequence -> shortest path -> BFS.
  2. Each word is a state. Two states are adjacent if they differ by exactly one character.
  3. The graph is implicit — never build it upfront. Generate neighbors on the fly.
  4. Use a set of valid words for O(1) membership checks.
  5. Mark visited (or remove from word set) at enqueue, return when you pop the target.
Real LeetCode Example — LC 127 Word Ladder:Given beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log","cog"], the shortest transformation is “hit” -> “hot” -> “dot” -> “dog” -> “cog”, length 5.
from collections import deque

def ladderLength(beginWord, endWord, wordList):
    word_set = set(wordList)
    if endWord not in word_set:
        return 0
    queue = deque([(beginWord, 1)])
    visited = {beginWord}
    while queue:
        word, length = queue.popleft()
        if word == endWord:
            return length
        for i in range(len(word)):
            for ch in 'abcdefghijklmnopqrstuvwxyz':
                if ch == word[i]:
                    continue
                new_word = word[:i] + ch + word[i+1:]
                if new_word in word_set and new_word not in visited:
                    visited.add(new_word)
                    queue.append((new_word, length + 1))
    return 0
Time O(N * L * 26) where N is the word list size and L is the word length. Space O(N * L) for the queue and visited set.
Senior follow-up 1 — Bidirectional BFS optimization. Standard BFS explores O(b^d) nodes. Bidirectional BFS searches from both ends simultaneously and meets in the middle, exploring only O(2 * b^(d/2)). For Word Ladder with branching factor ~10 and depth 10, that drops 10^10 to 2 * 10^5 — five orders of magnitude. The trick: maintain begin_set and end_set. At each step expand the smaller one. When a generated word lives in the other set, you have found the meeting point.
Senior follow-up 2 — How would you precompute neighbor relationships if there are many queries on the same dictionary? Build a “wildcard pattern” map: for each word, generate L patterns by replacing each character with *. Two words share a pattern -> they differ in that position. Group words by patterns. Now neighbor lookup is O(L) instead of O(26 * L). Memory cost: O(N * L) for the pattern map. Beats the on-the-fly approach when you run many ladder queries on a fixed dictionary.
Senior follow-up 3 — What if word length L is 100, not 5? On-the-fly neighbor generation is 26 * 100 = 2600 per word, vs comparing against every other word at L = 100 (potentially N = 10000 words * 100 chars = 1M). On-the-fly still wins. But if N is also huge, the wildcard pattern map becomes the right call. For L = 1000 and N = 1, the problem trivially reduces to character-by-character diff.
Common Wrong Answers:
  1. Building an explicit graph by comparing every word pair. O(N^2 * L) preprocessing — TLE on dictionaries of 5000+ words.
  2. Using DFS with backtracking. Finds A path but not the SHORTEST. DFS Word Ladder is famously broken.
  3. Forgetting to check endWord in wordList. If the end word is not in the dictionary, the answer is 0, not 1.
Further Reading: LC 127, LC 126 (return all shortest paths), LC 752 (Open the Lock).
Strong Answer Framework:
  1. Default decision rule: if the problem says “shortest” or “minimum steps” -> BFS. If it says “all paths,” “cycle,” or “topological order” -> DFS.
  2. Memory shape matters: BFS uses O(width), DFS uses O(height). Wide tree -> DFS wins memory; deep tree -> BFS wins.
  3. Recursion limit: deep graphs can overflow the call stack with recursive DFS. BFS is iterative by default.
  4. Implementation cost: tree problems map naturally to recursion (DFS feels easier). Grid shortest-path problems map naturally to BFS.
Real LeetCode Example — LC 200 Number of Islands.Both work. DFS is shorter to write but can stack-overflow on a 500x500 all-land grid in Python. BFS is more code but never overflows.
# DFS (recursive)
def numIslands_dfs(grid):
    if not grid: return 0
    rows, cols = len(grid), len(grid[0])
    def dfs(r, c):
        if r < 0 or r >= rows or c < 0 or c >= cols or grid[r][c] != '1':
            return
        grid[r][c] = '0'
        dfs(r+1, c); dfs(r-1, c); dfs(r, c+1); dfs(r, c-1)
    count = 0
    for r in range(rows):
        for c in range(cols):
            if grid[r][c] == '1':
                count += 1
                dfs(r, c)
    return count
# BFS (iterative, safer for large grids)
from collections import deque
def numIslands_bfs(grid):
    if not grid: return 0
    rows, cols = len(grid), len(grid[0])
    count = 0
    for r in range(rows):
        for c in range(cols):
            if grid[r][c] == '1':
                count += 1
                queue = deque([(r, c)])
                grid[r][c] = '0'
                while queue:
                    cr, cc = queue.popleft()
                    for dr, dc in [(0,1),(0,-1),(1,0),(-1,0)]:
                        nr, nc = cr+dr, cc+dc
                        if 0 <= nr < rows and 0 <= nc < cols and grid[nr][nc] == '1':
                            grid[nr][nc] = '0'
                            queue.append((nr, nc))
    return count
Senior follow-up 1 — The grid is 10000x10000 entirely land. Which version do you choose? Recursive DFS is dead — Python recursion limit is ~1000, and you would have a recursion depth of 100 million. Either iterative DFS with an explicit stack or BFS. BFS gives more predictable memory because the queue holds only the wavefront (proportional to the grid diagonal, ~14000 cells), while iterative DFS can hold the full O(m+n) frontier in pathological grids. I would pick BFS.
Senior follow-up 2 — You need to find ALL paths from A to B, not just shortest. Now DFS wins decisively. BFS is built around shortest-path distance — using it for “all paths” forces you to track the full path in every queue entry, which exponentially blows up memory. DFS naturally builds a path on the call stack, backtracks via path.pop(), and explores systematically.
Senior follow-up 3 — Topological sort: BFS or DFS? Both work. Kahn’s algorithm uses BFS on in-degree-zero nodes — simpler to reason about, and naturally produces a valid order or detects cycles when the result has fewer than N nodes. DFS-based topo sort uses postorder reversal — more elegant but harder to debug. Default to Kahn’s for interviews.
Common Wrong Answers:
  1. “BFS is always faster.” No — BFS is faster for shortest paths in unweighted graphs. For “all paths,” “cycle detection,” or “topological order,” DFS is at least as fast and usually simpler.
  2. “DFS is faster on memory.” True for narrow-deep graphs; false for wide-shallow ones. The honest answer is “it depends on the shape.”
  3. “They’re interchangeable.” They are not. BFS gives shortest path; DFS does not. DFS detects cycles in directed graphs cleanly with three-coloring; BFS needs Kahn’s.
Further Reading: LC 200, LC 207 (Course Schedule, DFS cycle detection), LC 102 (level order, BFS).

Practice Roadmap

1

Day 1: Tree BFS

  • Solve: Level Order Traversal, Right Side View
  • Focus: Level-by-level processing
2

Day 2: Grid BFS

  • Solve: Number of Islands, Rotting Oranges
  • Focus: 4-directional movement, multi-source
3

Day 3: State Space BFS

  • Solve: Open the Lock, Minimum Genetic Mutation
  • Focus: State as node, transformation as edge
4

Day 4: Advanced BFS

  • Solve: Word Ladder, Sliding Puzzle
  • Focus: Complex state representations
Interview Tip: When you need shortest path in unweighted graph or level-by-level processing, BFS is the answer.

Interview Questions

What interviewers are really testing: Whether you understand the fundamental mechanics of BFS’s layer-by-layer expansion versus DFS’s deep-first exploration — and can articulate why the guarantee holds, not just that it does.Strong Answer:
  • BFS explores nodes in order of increasing distance from the source. It processes all nodes at distance d before any node at distance d+1. This is a direct consequence of the FIFO queue: nodes discovered earlier (closer to the source) are always dequeued and processed before nodes discovered later (farther away).
  • The first time BFS reaches any node, it has arrived via a path of minimum length. No shorter path can exist because all shorter paths would have been explored in a previous level.
  • DFS, by contrast, commits to one branch and may reach a node via a long, winding path before ever exploring a direct shortcut. It has no distance ordering — the stack (LIFO) processes the most recently discovered node, not the closest one.
  • Concrete example: in a social network graph, BFS from person A finds all 1st-degree connections, then all 2nd-degree, etc. DFS from A might tunnel through a chain of 50 people before finding someone who was actually a direct friend-of-friend.
  • The caveat: this guarantee holds only for unweighted graphs (or graphs where all edges have equal weight). For weighted graphs, BFS breaks down — you need Dijkstra’s algorithm, which is essentially BFS with a priority queue replacing the plain queue, so it processes by cumulative weight rather than hop count.
Red flag answer: “BFS is just faster” or “BFS finds paths and DFS doesn’t” — these miss the core mechanism entirely. Another red flag is claiming BFS finds shortest paths in weighted graphs.Follow-ups:
  1. What would happen if you ran BFS on a graph with edge weights of 1 and 2? Would it still give shortest paths? Why or why not?
  2. Can you modify BFS to handle a graph where edges have weights of only 0 and 1? (Hint: 0-1 BFS with a deque.)
What interviewers are really testing: This is a classic correctness-versus-efficiency trap. Many candidates write BFS that works but has a subtle performance bug. Interviewers want to know if you have actually debugged a TLE caused by this.Strong Answer:
  • When you mark visited at enqueue time, you prevent the same node from being added to the queue multiple times. Each node enters the queue exactly once, so total work is O(V + E).
  • When you mark visited at dequeue time, multiple neighbors can independently enqueue the same node before any of those copies gets dequeued. In a dense graph, this can cause the queue to balloon. Worst case, the same node appears O(V) times in the queue, turning the algorithm into O(V^2) time and space.
  • Correctness is preserved either way for shortest-path queries in unweighted graphs — the first dequeue of a node still carries the correct shortest distance. But performance degrades significantly with dequeue-time marking.
  • Practical example: on a 1000x1000 grid (1 million cells), dequeue-time marking can cause the queue to hold millions of duplicate entries. This is the single most common cause of Time Limit Exceeded on BFS grid problems on LeetCode.
  • The fix is one line: move visited.add(neighbor) from after queue.popleft() to immediately before or after queue.append(neighbor).
Red flag answer: “It doesn’t matter, both work the same” — this indicates the candidate has never dealt with BFS performance issues on large inputs.Follow-ups:
  1. Is there any scenario where you would intentionally mark visited at dequeue time rather than enqueue time?
  2. In the multi-source BFS pattern (e.g., Rotting Oranges), how do you ensure all sources are properly marked visited before the main loop begins?
What interviewers are really testing: Whether you can recognize multi-source BFS, articulate why it is more efficient than running BFS from each source independently, and correctly track the time/distance dimension.Strong Answer:
  • This is a multi-source BFS problem. You start by seeding the queue with all rotten oranges simultaneously, then expand outward one time step at a time. Each fresh orange that gets reached is “infected” and joins the queue for the next round.
  • Why multi-source rather than running BFS from each rotten orange separately: with k rotten oranges on an m*n grid, separate BFS would be O(k * m * n). Multi-source BFS is O(m * n) because the wavefronts merge — once a cell is visited by any wavefront, it is never revisited.
  • Implementation steps: (1) Scan the grid, enqueue all rotten orange positions, and count fresh oranges. (2) BFS level-by-level, where each level represents one minute. For each rotten orange dequeued, infect its 4-directional fresh neighbors, mark them rotten, decrement the fresh count, and enqueue them. (3) After BFS finishes, if fresh count is 0, return the number of levels processed (minutes). Otherwise return -1.
  • Edge cases: grid with no fresh oranges (return 0), grid with fresh oranges unreachable from any rotten one (return -1), grid with no rotten oranges but some fresh (return -1).
  • The “time” dimension is tracked by processing level-by-level using the level_size = len(queue) snapshot, exactly like tree level-order traversal.
Red flag answer: Trying to solve this with DFS, or running a separate BFS from each rotten orange and taking the minimum — both show a misunderstanding of the simultaneous-spread nature of the problem.Follow-ups:
  1. How would you modify this solution if different rotten oranges rotted at different speeds (e.g., some take 1 minute, others take 2)?
  2. What if the grid were 3D (a cube of oranges)? What changes in your approach?
What interviewers are really testing: Whether you can abstract a problem into a graph search even when no explicit graph is given — a skill that separates candidates who memorize templates from those who understand the underlying model.Strong Answer:
  • State-space BFS treats each configuration of the problem as a node and each valid transformation as an edge. You do not build the graph upfront; instead, you generate neighbors on the fly.
  • For Word Ladder: each word is a state, and two words are connected (edge) if they differ by exactly one character. BFS finds the shortest transformation sequence. To generate neighbors efficiently, you iterate over each character position and try all 26 replacements, checking membership in the word set — this is O(26 * L) per word where L is word length, which is much faster than comparing against every word in the dictionary (O(N * L)).
  • For Open the Lock: each 4-digit combination is a state (10,000 total states). Each digit can be turned up or down, giving 8 neighbors per state. BFS from “0000” finds the minimum turns to reach the target, skipping deadend states.
  • The key insight is that BFS still guarantees shortest path because all edges have equal weight (one transformation = one step). The “graph” is implicit but logically identical to an explicit adjacency list.
  • Performance consideration: the state space can be enormous. Word Ladder with 5-letter words has a state space of up to 26^5 = ~12 million, but the word set typically constrains this to a few thousand. For problems where the state space is truly huge, consider bidirectional BFS (search from both start and end, meeting in the middle) to reduce the explored space from O(b^d) to O(b^(d/2)).
Red flag answer: Building an explicit adjacency list of all word pairs before BFS — this is O(N^2 * L) and will TLE on large word lists when the on-the-fly approach is O(N * 26 * L).Follow-ups:
  1. How does bidirectional BFS work, and when is it a clear win over standard BFS?
  2. In Word Ladder, why do we remove words from the word set as we visit them, rather than using a separate visited set? What is the trade-off?
What interviewers are really testing: Depth of understanding about why BFS works and its precise limitations — and whether you can bridge from BFS to more general shortest-path algorithms.Strong Answer:
  • BFS assumes all edges have equal weight (or weight 1). Under this assumption, the FIFO queue naturally processes nodes in order of increasing distance. The first time you reach a node is via the shortest path.
  • Dijkstra generalizes BFS to non-negative weighted graphs by replacing the FIFO queue with a min-heap (priority queue). Instead of processing by hop count, it processes by cumulative edge weight, always expanding the node with the smallest tentative distance.
  • BFS breaks down with unequal weights because a node 2 hops away via heavy edges could be farther than a node 5 hops away via light edges. BFS would visit the 2-hop node first and potentially record a wrong shortest distance.
  • Concrete example: routing network packets where links have different latencies. BFS would say the 2-hop route is shorter, but if those 2 hops have 500ms latency each (1000ms total) and the 5-hop route has 50ms per hop (250ms total), BFS gives the wrong answer.
  • Complexity comparison: BFS is O(V + E). Dijkstra with a binary heap is O((V + E) log V). For unweighted graphs, BFS is strictly faster — never use Dijkstra when BFS suffices.
  • Special case — 0-1 BFS: if edge weights are only 0 or 1, you can use a deque instead of a priority queue. Push weight-0 edges to the front, weight-1 edges to the back. This achieves O(V + E), same as BFS.
Red flag answer: “Dijkstra is just a better BFS, always use Dijkstra” — this ignores the log factor overhead and shows the candidate does not understand when the simpler tool is the right choice.Follow-ups:
  1. Can Dijkstra handle negative edge weights? If not, what algorithm would you use?
  2. What is 0-1 BFS, and can you think of a real problem where it applies?
What interviewers are really testing: Whether you can think beyond single-source BFS and reason about the trade-offs between running BFS V times versus using Floyd-Warshall, and whether you understand the complexity implications.Strong Answer:
  • For an unweighted graph, the most straightforward approach is to run BFS from every node. Each BFS is O(V + E), so total time is O(V * (V + E)). For sparse graphs where E is O(V), this is O(V^2). For dense graphs where E is O(V^2), this is O(V^3).
  • The alternative is Floyd-Warshall, which computes all-pairs shortest paths in O(V^3) time and O(V^2) space, regardless of graph density. But Floyd-Warshall is designed for weighted graphs — for unweighted graphs it does unnecessary work compared to BFS on sparse graphs.
  • Decision rule: if the graph is sparse (E much less than V^2), V runs of BFS is faster. If the graph is dense, both approaches are O(V^3) and Floyd-Warshall may be simpler to implement.
  • Practical consideration: BFS from all nodes is trivially parallelizable — each source BFS is independent. Floyd-Warshall’s dynamic programming has sequential dependencies between iterations, making it harder to parallelize.
  • Space: BFS approach needs O(V) per run (reusable), or O(V^2) if you store all results. Floyd-Warshall always needs O(V^2).
Red flag answer: Immediately jumping to Floyd-Warshall without considering that BFS is simpler and faster for sparse unweighted graphs. Or suggesting Dijkstra from each node — correct but unnecessarily slow (O(V * (V + E) log V) vs O(V * (V + E))).Follow-ups:
  1. What if you only need shortest paths between a specific subset of S nodes, not all V nodes? How does that change your approach?
  2. If the graph is unweighted and undirected, can you exploit any symmetry to reduce work?
What interviewers are really testing: Staff-level thinking about scaling algorithms beyond textbook assumptions. This tests whether you can reason about systems constraints, not just algorithmic correctness.Strong Answer:
  • When the graph does not fit in memory, you need external-memory BFS (sometimes called out-of-core BFS). The core idea is to process the BFS level by level, reading and writing the frontier to disk at each step.
  • Approach: store the graph on disk (as an edge list or adjacency list in files). Maintain two files — current frontier and next frontier. At each level, stream through the current frontier, look up each node’s neighbors from the disk-based graph, and write unvisited neighbors to the next frontier file. Then swap the files and repeat.
  • The visited set is the biggest challenge. If V is too large for a hash set in memory, you can use a Bloom filter (probabilistic, may miss some nodes), a bitmap if node IDs are dense integers, or an external sort-based approach where you periodically sort and deduplicate the visited list on disk.
  • Systems used in practice: graph processing frameworks like Apache Giraph, GraphX (Spark), or Google’s Pregel model essentially do distributed BFS using this level-synchronous approach, where each “superstep” processes one BFS level across a cluster.
  • Real-world example: computing degrees of separation in a social graph with billions of nodes. Facebook’s social graph has ~3 billion nodes and hundreds of billions of edges — single-machine BFS is impossible, so they use distributed graph systems.
Red flag answer: “You just need more RAM” or having no idea how to reason about the problem once the assumption of in-memory computation breaks down.Follow-ups:
  1. What trade-offs does a Bloom filter introduce when used as the visited set? How would you handle false positives?
  2. How does the Pregel/BSP model relate to level-synchronous BFS?
What interviewers are really testing: Practical debugging skills and optimization intuition — not just knowing the algorithm, but knowing what goes wrong in practice and how to fix it.Strong Answer:
  • Step 1: Check visited marking. The most common cause of TLE in BFS is marking visited at dequeue time instead of enqueue time. This causes duplicate entries in the queue, potentially multiplying work by a factor of the average degree. Fix: move the visited check to enqueue time.
  • Step 2: Check data structure choice. In Python, using list.pop(0) instead of collections.deque.popleft() turns every dequeue from O(1) to O(n). On a 1000x1000 grid with ~1 million cells, this alone can cause a 500x slowdown. Fix: always use deque.
  • Step 3: Check neighbor generation. Are you creating new objects or tuples unnecessarily? In tight loops on large grids, object creation overhead matters. Pre-compute the directions array outside the loop. Avoid string concatenation for state keys — use tuples or integers.
  • Step 4: Check visited set implementation. For grids, a 2D boolean array is faster than a hash set of tuples. visited[r][c] = True is O(1) with no hashing, versus the overhead of hashing a tuple for every cell.
  • Step 5: Consider bidirectional BFS if the problem has a single start and single end. This can reduce explored nodes from O(b^d) to O(b^(d/2)), which for branching factor 4 and depth 20 is the difference between 4^20 (~1 trillion) and 2 * 4^10 (~2 million).
  • Step 6: Profile if the above do not fix it. Measure queue size at peak, total nodes processed, and time per operation.
Red flag answer: Jumping to a completely different algorithm (like A*) without first checking for implementation bugs. The vast majority of BFS TLEs are caused by the visited-marking bug or the wrong data structure, not by BFS being the wrong algorithm.Follow-ups:
  1. When would A* search be preferable to plain BFS on a grid? What heuristic would you use?
  2. How would you estimate whether your BFS solution will pass within the time limit before submitting?

Interview Deep-Dive

Strong Answer:
  • BFS is optimal when you need shortest path in an unweighted graph (or a graph where all edges have equal weight). The FIFO queue ensures nodes are processed in order of increasing hop count, so the first time you reach any node is via the shortest path.
  • BFS over DFS: Use BFS when path length matters. DFS finds a path but not necessarily the shortest. DFS is better for exhaustive exploration (all paths, cycle detection, topological sort) where you do not care about distance.
  • BFS over Dijkstra: Dijkstra generalizes BFS to weighted graphs using a priority queue, adding an O(log V) factor per operation. On unweighted graphs, Dijkstra’s priority queue degenerates to a plain queue — so BFS gives O(V + E) versus Dijkstra’s O((V + E) log V). Never use Dijkstra when BFS suffices.
  • BFS over A:* A* uses a heuristic to guide search toward the goal, potentially visiting far fewer nodes than BFS. But A* requires a valid heuristic (admissible and consistent), adds heap overhead, and is harder to implement. Use BFS when the graph is small enough that exploring all nodes at each distance is acceptable, or when no good heuristic exists.
  • The 0-1 BFS special case: When edge weights are 0 or 1, use a deque instead of a priority queue. Weight-0 edges push to the front, weight-1 edges push to the back. This achieves O(V + E) — same as plain BFS — while handling two weight classes correctly.
  • Time: O(V + E). Space: O(V) for the queue and visited set. Space can be the bottleneck — the queue can hold an entire level at once.
Follow-up: You are solving a maze problem where some corridors have cost 1 and some have cost 2. Can you still use BFS? What modifications are needed?
  • Plain BFS fails because it processes by hop count, not cumulative cost. A corridor of cost 2 should “count double.”
  • Option 1: Dijkstra with a binary heap — O((V + E) log V). Correct but adds logarithmic overhead.
  • Option 2: “Virtual node expansion” — split every cost-2 edge into two cost-1 edges by inserting a virtual intermediate node. Now standard BFS works on the expanded graph. The graph grows by at most E nodes, so complexity is still O(V + E). This trick works when the number of distinct weights is small.
  • Option 3: If weights are only 1 and 2, use a 2-bucket BFS (a generalization of 0-1 BFS with two queues). Process all distance-d nodes before distance-(d+1) nodes, using bucket assignment by edge weight.
Strong Answer:
  • Time: O(M * N). Each cell is enqueued and dequeued at most once (assuming visited-at-enqueue). Processing each cell involves checking 4 neighbors — constant work per cell.
  • Space: O(M * N) in the worst case for the visited array. The queue itself holds at most one full “level” of the BFS wavefront. For a grid, the maximum wavefront width depends on the shape: for a square grid of side S, the wavefront can hold up to O(S) = O(sqrt(MN)) cells. But in pathological cases (a thin corridor), the wavefront is O(1) — so queue space ranges from O(1) to O(MN).
  • Why space is often the real bottleneck: For a 10,000 x 10,000 grid (100 million cells), the visited array alone requires 100 MB as a boolean matrix. The queue at peak can hold up to 40,000 cells (the diagonal wavefront), each stored as a tuple of (row, col, distance) — roughly 1 MB. The visited array dominates.
  • Optimization 1 — In-place visited marking. If you can modify the grid (change ‘0’ to ‘2’), you eliminate the visited array entirely, reducing space from O(M*N) to O(queue_size).
  • Optimization 2 — Bitset for visited. Use a bitarray instead of a boolean array. Each cell uses 1 bit instead of 1 byte, reducing visited storage by 8x. For a 10,000 x 10,000 grid, that is ~12 MB instead of ~100 MB.
  • Optimization 3 — Bidirectional BFS. For single-source-to-single-target problems, BFS from both ends halves the explored radius, reducing visited cells from O(r^2) to O(2 * (r/2)^2) = O(r^2 / 2). For grids with long shortest paths, this cuts space and time roughly in half.
Follow-up: If the grid does not fit in memory (say 1 million x 1 million), how would you approach BFS?
  • The grid has 10^12 cells — it cannot be stored in RAM. You need an external-memory BFS or implicit graph approach.
  • If the grid is sparse (mostly walls with few open cells), represent it as an adjacency list of only the open cells. BFS operates on this compact representation.
  • If the grid is dense, process it tile by tile. Load one tile into memory, compute the BFS wavefront within it, write the frontier to disk, load the next tile, and continue. This is level-synchronous external-memory BFS.
  • In practice, grids this large arise in GIS applications (satellite imagery, terrain analysis). Tools like GRASS GIS use disk-based raster processing with similar tile-based approaches.
Strong Answer:
  • Multi-source BFS seeds the queue with all source nodes simultaneously, then expands the wavefront as if all sources started at time 0. The key property: the first wavefront to reach any cell carries the shortest distance from the nearest source.
  • Why O(M * N) and not O(K * M * N): Every cell is visited at most once, regardless of how many sources exist. When a cell is reached by one wavefront, it is marked visited and never processed again by any other wavefront. The wavefronts merge — they do not duplicate work.
  • Analogy: Imagine lighting K candles in a dark room simultaneously. Each candle’s light spreads at the same speed. The lights merge where they meet, and no point in the room is “re-illuminated.” Total work is proportional to the room size, not the number of candles.
  • Contrast with running K separate BFS runs: Each independent BFS is O(M * N), and running K of them is O(K * M * N). On a 1000x1000 grid with 10,000 sources, that is 10^10 operations (TLE) versus 10^6 for multi-source BFS.
  • Classic problems that use this: Rotting Oranges (all rotten oranges are sources), 01 Matrix (all 0-cells are sources), Walls and Gates (all gates are sources).
  • Implementation detail: Before the BFS loop, iterate through the entire grid and enqueue every source. Mark all sources as visited. Then run standard BFS — the queue already contains the full “level 0” wavefront.
Follow-up: In the Rotting Oranges problem, how do you track the number of minutes elapsed? What edge case causes the answer to be -1?
  • Track minutes by processing the queue level-by-level using the level_size = len(queue) snapshot technique. Each level corresponds to one minute. Increment the minute counter after processing each level.
  • The answer is -1 when at least one fresh orange is unreachable from any rotten orange — it is in a disconnected region (surrounded by walls or absent rotten neighbors). After BFS completes, check if any fresh oranges remain. If yes, return -1.
  • Edge case: if there are no fresh oranges initially, the answer is 0 (not -1). If there are no rotten oranges but some fresh ones exist, the answer is -1.
Strong Answer:
  • Bidirectional BFS runs two BFS instances simultaneously: one from the source and one from the target. When their frontiers meet (share a common node), the shortest path is the sum of the distances from each side to the meeting point.
  • Speedup analysis: Standard BFS explores all nodes within distance d, visiting O(b^d) nodes where b is the branching factor. Bidirectional BFS explores O(b^(d/2)) from each side, totaling O(2 * b^(d/2)). For b = 4 and d = 20: standard = 4^20 = 10^12, bidirectional = 2 * 4^10 = 2 * 10^6. That is a 500,000x speedup.
  • When it helps most: Large, sparse graphs with a single source and single target where the shortest path is long (large d). Social network degree-of-separation queries are the classic use case.
  • When it does NOT help:
    1. Multi-target problems (e.g., nearest hospital from a patient) — you cannot run BFS “from the target” if there are many targets. Use multi-source BFS from all targets instead.
    2. Very short paths (d is small) — the constant factor of maintaining two queues and checking for frontier intersection outweighs the savings.
    3. Directed graphs where reverse edges are expensive or unknown — backward BFS requires traversing edges in reverse, which may not be available in the graph representation.
    4. State-space BFS with complex states — tracking which states belong to which frontier and detecting intersection becomes error-prone.
  • Implementation detail: At each step, expand the smaller frontier (fewer nodes) to minimize total work. Check intersection by looking up nodes from one frontier’s visited set in the other’s.
Follow-up: In the Word Ladder problem, bidirectional BFS is a common optimization. How do you implement it, and what is the practical speedup?
  • Maintain two sets: begin_set (words reachable from beginWord) and end_set (words reachable from endWord). At each step, expand the smaller set. For each word in the smaller set, generate all single-character transformations and check if any appear in the other set (meeting point found) or in the word dictionary (add to the next level).
  • Practical speedup on a dictionary of 5000 five-letter words with a path length of 10: standard BFS visits roughly 26 * 5 * (number of words at each level) for 10 levels. Bidirectional BFS does 5 levels from each side. For branching factor ~10, that is 10^10 vs 2 * 10^5 — a dramatic reduction.
  • The key trick: swap begin_set and end_set so you always expand the smaller one. This keeps the frontier sizes balanced and prevents one side from exploding.