Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Graph Algorithms

Overview

Graph algorithms solve problems involving nodes and edges. Master these core algorithms to tackle most graph problems. The key to graph algorithm mastery is knowing which algorithm to reach for based on the problem constraints. Think of it as a decision tree:
  • Unweighted shortest path? Use BFS (covered in the BFS chapter).
  • Weighted, non-negative edges? Use Dijkstra.
  • Negative edges possible? Use Bellman-Ford.
  • All-pairs shortest paths? Use Floyd-Warshall.
  • Minimum cost to connect everything? Use Kruskal or Prim (MST).
  • Ordering with dependencies? Use Topological Sort.
Each algorithm has a specific “lane” — using the wrong one either gives wrong results (Dijkstra with negative edges) or wastes time (Floyd-Warshall when you only need single-source).
Pattern Recognition Cheatsheet — When the LeetCode prompt says X, reach for Y:
Phrase in the ProblemWhat to Reach For
”shortest path” + unweighted edgesBFS — O(V + E)
“shortest path” + non-negative weightsDijkstra with min-heap — O((V + E) log V)
“shortest path” + possibly negative edgesBellman-Ford — O(V * E), also detects negative cycles
”shortest path with at most K stops/edges”Bellman-Ford with K rounds (LC 787 Cheapest Flights)
“all-pairs shortest paths”Floyd-Warshall — O(V^3), simple triple loop
”minimum cost to connect all points/nodes”MST: Kruskal (sort edges + Union-Find) or Prim (heap)
“minimum cost to make graph connected”MST plus careful edge filtering
”topological sort” / “course ordering” / “task scheduling”Kahn’s BFS on in-degrees, or DFS postorder
”alien dictionary” / “build word order from rules”Build directed graph, then topological sort
”0/1 weighted edges”0-1 BFS with deque (push 0-edges to front, 1-edges to back)
“swim in rising water” / “minimax path / bottleneck path”Modified Dijkstra: minimize the MAX edge weight along the path
”shortest path with min effort”Modified Dijkstra: minimize the MAX absolute height difference
”minimum number of edges to add for full connectivity”Components count via Union-Find, answer is (components - 1)
Picking the wrong algorithm is the #1 way to fail a graph interview question. Memorize this table.

When to Use

Shortest Path

Dijkstra, Bellman-Ford, Floyd-Warshall

MST

Kruskal, Prim for minimum spanning tree

Topological Sort

Task ordering, course prerequisites

Cycle Detection

DFS colors or Union Find

Key Algorithms

1. Dijkstra’s Algorithm

Dijkstra finds the shortest path from a single source to all other nodes in a graph with non-negative edge weights. It works like a “greedy BFS” — always process the unvisited node with the smallest known distance. How it works: Maintain a priority queue (min-heap) of (distance, node). Pop the smallest, relax all its neighbors (update their distance if a shorter path is found). The if dist > distances[node]: continue line is a lazy deletion optimization — it skips stale entries instead of decreasing keys in the heap. Critical constraint: Dijkstra fails with negative edge weights. Once a node is popped from the heap, Dijkstra assumes its distance is final. A negative edge could invalidate that assumption. Use Bellman-Ford instead. Time: O((V + E) log V) with a binary heap. Space: O(V + E).
import heapq
from collections import defaultdict

def dijkstra(graph, start):
    """Shortest paths from start to all nodes - O((V+E) log V)"""
    distances = {node: float('inf') for node in graph}
    distances[start] = 0
    pq = [(0, start)]   # Min-heap: (distance, node)
    
    while pq:
        dist, node = heapq.heappop(pq)  # Process closest unvisited node
        
        if dist > distances[node]:       # LAZY DELETION: skip stale entries
            continue
        
        for neighbor, weight in graph[node]:
            new_dist = dist + weight     # RELAX: can we reach neighbor faster via node?
            if new_dist < distances[neighbor]:
                distances[neighbor] = new_dist
                heapq.heappush(pq, (new_dist, neighbor))
    
    return distances

2. Bellman-Ford

Bellman-Ford handles negative edge weights and can detect negative cycles (a cycle whose total weight is negative, making shortest paths undefined). How it works: Relax every edge V-1 times. After V-1 rounds, all shortest paths (which have at most V-1 edges) are finalized. Then do one more pass: if any distance can still be reduced, a negative cycle exists. Why V-1 iterations? The shortest path between any two nodes visits at most V-1 edges (no cycles in shortest paths for non-negative total cost). Each iteration guarantees at least one more node gets its correct distance. When to use over Dijkstra: When edges can be negative, or when you need to detect negative cycles, or when you need “shortest path with at most K edges” (just run K iterations instead of V-1 — this is the “Cheapest Flights Within K Stops” pattern). Time: O(V * E). Space: O(V).
def bellman_ford(n, edges, start):
    """Handles negative weights, detects negative cycles - O(VE)"""
    distances = [float('inf')] * n
    distances[start] = 0
    
    # Relax all edges V-1 times -- each iteration extends paths by one edge
    for _ in range(n - 1):
        for u, v, w in edges:
            if distances[u] != float('inf') and distances[u] + w < distances[v]:
                distances[v] = distances[u] + w
    
    # NEGATIVE CYCLE CHECK: one more pass -- if any distance improves,
    # a negative cycle is reachable from the source
    for u, v, w in edges:
        if distances[u] != float('inf') and distances[u] + w < distances[v]:
            return None  # Negative cycle exists
    
    return distances

3. Topological Sort (Kahn’s BFS)

Topological sort orders vertices of a DAG (directed acyclic graph) so that for every edge u -> v, u comes before v. Think of it as task scheduling where some tasks depend on others. Kahn’s algorithm (BFS-based): Start with all nodes that have no incoming edges (in-degree 0). Process them, remove their outgoing edges (decrement neighbors’ in-degrees), and add any neighbor whose in-degree drops to 0. If the result includes all n nodes, the ordering is valid; otherwise a cycle exists. Why Kahn’s over DFS-based? Kahn’s is iterative (no recursion), gives a natural BFS-level ordering, and makes cycle detection trivial (just check len(result) == n). The DFS approach uses postorder reversal and three-color marking. Time: O(V + E). Space: O(V + E).
from collections import deque, defaultdict

def topological_sort(n, prerequisites):
    """Return valid ordering or empty list if cycle exists"""
    graph = defaultdict(list)
    in_degree = [0] * n
    
    # Build adjacency list and compute in-degrees
    for course, prereq in prerequisites:
        graph[prereq].append(course)
        in_degree[course] += 1
    
    # Seed queue with all nodes that have no prerequisites (in-degree 0)
    queue = deque([i for i in range(n) if in_degree[i] == 0])
    result = []
    
    while queue:
        node = queue.popleft()
        result.append(node)
        
        for neighbor in graph[node]:
            in_degree[neighbor] -= 1       # "Remove" this edge
            if in_degree[neighbor] == 0:   # All prerequisites satisfied
                queue.append(neighbor)
    
    # If result is shorter than n, a cycle prevented some nodes from being processed
    return result if len(result) == n else []

4. Kruskal’s MST

Kruskal’s builds a Minimum Spanning Tree by greedily picking the cheapest edge that does not form a cycle. It uses Union-Find (disjoint set) to efficiently detect whether two nodes are already connected. How it works: Sort all edges by weight. Iterate through them in order. For each edge, if it connects two different components (Union-Find says their roots differ), add it to the MST and merge the components. Stop when the MST has V-1 edges. Union-Find internals: Two optimizations make it near-O(1) per operation: path compression (during find, point every node directly to the root) and union by rank (attach the shorter tree under the taller one). Together they achieve amortized O(alpha(n)) per operation, where alpha is the inverse Ackermann function — effectively constant. Time: O(E log E) dominated by sorting. Space: O(V) for Union-Find.
class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))  # Each node is its own root initially
        self.rank = [0] * n
    
    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])  # PATH COMPRESSION
        return self.parent[x]
    
    def union(self, x, y):
        px, py = self.find(x), self.find(y)
        if px == py:
            return False  # Already in the same component -- adding edge would create cycle
        # UNION BY RANK: attach smaller tree under larger tree
        if self.rank[px] < self.rank[py]:
            px, py = py, px
        self.parent[py] = px
        if self.rank[px] == self.rank[py]:
            self.rank[px] += 1
        return True

def kruskal(n, edges):
    """Find minimum spanning tree - O(E log E)"""
    # GREEDY: sort edges by weight (cheapest first)
    edges.sort(key=lambda x: x[2])
    
    uf = UnionFind(n)
    mst = []
    total_weight = 0
    
    for u, v, weight in edges:
        if uf.union(u, v):              # Only add if it connects two different components
            mst.append((u, v, weight))
            total_weight += weight
            if len(mst) == n - 1:       # MST has exactly V-1 edges
                break
    
    return mst, total_weight

Common Mistakes

Avoid These Pitfalls:
  1. Using Dijkstra with negative edges: Dijkstra assumes that once a node is finalized, no shorter path exists. Negative edges break this assumption and produce incorrect results. Always check: if negative weights are possible, use Bellman-Ford.
  2. Forgetting the lazy deletion check in Dijkstra: Without if dist > distances[node]: continue, you process stale entries and waste time. On dense graphs, this can cause TLE.
  3. Topological sort on a graph with cycles: Topological ordering only exists for DAGs. If the graph has a cycle, Kahn’s algorithm will process fewer than n nodes — always check len(result) == n.
  4. Union-Find without path compression or union by rank: Without these optimizations, Union-Find degrades to O(n) per operation. Both are easy to implement and critical for performance.
  5. Bellman-Ford overflow: When distances[u] is infinity (INT_MAX), distances[u] + w can overflow in Java/C++. Always guard: if distances[u] != INF and distances[u] + w < distances[v].

Complexity Comparison

AlgorithmTimeSpaceUse Case
DijkstraO((V+E) log V)O(V)Non-negative weights, single source
Bellman-FordO(VE)O(V)Negative weights, negative cycle detection
Floyd-WarshallO(V^3)O(V^2)All-pairs shortest paths (small V)
KruskalO(E log E)O(V)MST, works well with sparse graphs
PrimO(E log V)O(V)MST, works well with dense graphs
Topological SortO(V + E)O(V + E)DAG ordering, dependency resolution

Worked LeetCode Problems

These five cover the bulk of weighted-graph interview territory: pure Dijkstra (743), constrained shortest path that breaks Dijkstra (787), Dijkstra-on-bottleneck (1631), MST (1584), and topological sort with custom graph construction (269).

LC 743 — Network Delay Time (Medium, Dijkstra)

A signal originates at node k and propagates along directed edges with travel times. Return the time for ALL nodes to receive the signal — the maximum over the shortest paths from k.
import heapq
from collections import defaultdict

def networkDelayTime(times, n, k):
    graph = defaultdict(list)
    for u, v, w in times:
        graph[u].append((v, w))
    dist = {i: float('inf') for i in range(1, n+1)}
    dist[k] = 0
    pq = [(0, k)]                                      # (distance, node)
    while pq:
        d, node = heapq.heappop(pq)
        if d > dist[node]:                             # Lazy deletion
            continue
        for nb, w in graph[node]:
            nd = d + w
            if nd < dist[nb]:
                dist[nb] = nd
                heapq.heappush(pq, (nd, nb))
    ans = max(dist.values())
    return ans if ans != float('inf') else -1
Time O((V + E) log V), space O(V + E). The if d > dist[node] line is critical — it skips stale heap entries instead of trying to decrease-key (which Python’s heapq does not support).

LC 787 — Cheapest Flights Within K Stops (Medium, Bellman-Ford)

Find the cheapest path with AT MOST k stops. Standard Dijkstra fails: a node could be reached cheaply with too many stops, blocking the discovery of a shorter-stops-but-pricier path. Bellman-Ford with exactly K+1 rounds is the natural fit.
def findCheapestPrice(n, flights, src, dst, k):
    INF = float('inf')
    dist = [INF] * n
    dist[src] = 0
    for _ in range(k + 1):                             # K stops = K+1 edges
        new_dist = dist[:]                             # CRITICAL: snapshot
        for u, v, w in flights:
            if dist[u] != INF and dist[u] + w < new_dist[v]:
                new_dist[v] = dist[u] + w
        dist = new_dist
    return dist[dst] if dist[dst] != INF else -1
Why the snapshot matters: in standard Bellman-Ford you relax in-place and chained relaxations within a round are fine — they just converge faster. Here, chaining within one round means you used MORE edges than the round number, breaking the K-edge guarantee. Always copy dist per round.

LC 1631 — Path With Minimum Effort (Medium, modified Dijkstra)

Move on a 2D grid; “effort” is the MAX absolute height difference along the path. Minimize the max — a bottleneck path. Dijkstra still works if you redefine “distance”: instead of summing edge weights, take the MAX along the path.
import heapq
def minimumEffortPath(heights):
    rows, cols = len(heights), len(heights[0])
    effort = [[float('inf')] * cols for _ in range(rows)]
    effort[0][0] = 0
    pq = [(0, 0, 0)]                                   # (effort, r, c)
    while pq:
        e, r, c = heapq.heappop(pq)
        if r == rows-1 and c == cols-1:
            return e
        if e > effort[r][c]:
            continue
        for dr, dc in [(0,1),(0,-1),(1,0),(-1,0)]:
            nr, nc = r+dr, c+dc
            if 0 <= nr < rows and 0 <= nc < cols:
                ne = max(e, abs(heights[nr][nc] - heights[r][c]))
                if ne < effort[nr][nc]:
                    effort[nr][nc] = ne
                    heapq.heappush(pq, (ne, nr, nc))
    return 0
The pattern — Dijkstra with a custom “distance” combiner (MAX instead of SUM) — generalizes to “swim in rising water” (LC 778) and “path with minimum bandwidth.”

LC 1584 — Min Cost to Connect All Points (Medium, MST via Kruskal)

Connect all points with minimum total Manhattan-distance edges. Classic MST. Build all O(n^2) edges, sort by weight, Kruskal with Union-Find.
def minCostConnectPoints(points):
    n = len(points)
    edges = []
    for i in range(n):
        for j in range(i+1, n):
            d = abs(points[i][0]-points[j][0]) + abs(points[i][1]-points[j][1])
            edges.append((d, i, j))
    edges.sort()
    parent = list(range(n))
    def find(x):
        while parent[x] != x:
            parent[x] = parent[parent[x]]              # Path compression
            x = parent[x]
        return x
    total, used = 0, 0
    for w, u, v in edges:
        ru, rv = find(u), find(v)
        if ru != rv:
            parent[ru] = rv
            total += w
            used += 1
            if used == n - 1:
                break
    return total
Time O(n^2 log n) dominated by edge sort. For dense graphs Prim with a heap is O(E log V) = O(n^2 log n) — same. For sparse graphs Kruskal wins because you skip generating non-existent edges.

LC 269 — Alien Dictionary (Hard, topological sort)

Given a list of words from an alien language, sorted lexicographically by their unknown alphabet, derive a possible alphabet ordering.
from collections import defaultdict, deque

def alienOrder(words):
    graph = defaultdict(set)
    in_degree = {ch: 0 for word in words for ch in word}
    for w1, w2 in zip(words, words[1:]):
        for c1, c2 in zip(w1, w2):
            if c1 != c2:
                if c2 not in graph[c1]:
                    graph[c1].add(c2)
                    in_degree[c2] += 1
                break
        else:
            if len(w1) > len(w2):                      # Invalid: prefix violates
                return ""
    queue = deque([c for c, d in in_degree.items() if d == 0])
    result = []
    while queue:
        c = queue.popleft()
        result.append(c)
        for nb in graph[c]:
            in_degree[nb] -= 1
            if in_degree[nb] == 0:
                queue.append(nb)
    return ''.join(result) if len(result) == len(in_degree) else ""
Two trap inputs: (1) ["abc", "ab"]"abc" cannot come before "ab" since "ab" is a prefix of "abc". Return "". (2) Cycle in the inferred graph (e.g., ["a", "b", "a"]). Detect via len(result) != len(in_degree).

Caveats and Traps

Trap 1 — Dijkstra fails with negative edge weights. Dijkstra’s greedy invariant — “once popped, distance is final” — only holds for non-negative edges. A negative edge from an unfinalized node could retroactively shorten a finalized node’s distance, but Dijkstra never revisits.
Solution: Use Bellman-Ford if any edge can be negative — O(V * E), and as a bonus it detects negative cycles. If edges are non-negative but you have a special structure (DAG), topological-sort + relaxation is O(V + E) and handles negative edges fine.
Trap 2 — Forgetting lazy deletion in Dijkstra. Python’s heapq has no decrease-key. When you push a shorter distance for a node already in the heap, the OLD entry is still there. Without if d > dist[node]: continue, you process stale entries and waste time relaxing edges from already-finalized nodes.
Solution: Always include the lazy-deletion guard at the top of the pop loop. This single line is the difference between a clean O((V+E) log V) Dijkstra and a buggy one that times out.
Trap 3 — Bellman-Ford “K stops” without snapshotting distances. If you relax in-place with a single dist array, chained relaxations within one round can use multiple edges — breaking the “at most K edges” constraint. The cheapest flights problem WILL fail intermittently.
Solution: Make a copy of dist at the start of every round. Read from the previous-round dist, write to the new one. Swap at the end of the round. This guarantees each round adds exactly one edge to the path.
Trap 4 — Kruskal without Union-Find / using BFS to detect cycles. Some candidates implement Kruskal by, for each edge, doing a BFS from u to check if v is reachable. That is O(E * (V + E)) — too slow. Union-Find makes the cycle check O(alpha(n)) per query, giving O(E log E) overall (dominated by sort).
Solution: Always pair Kruskal with Union-Find (path compression + union by rank). The cycle check becomes “do u and v have the same root?” — near-O(1).
Trap 5 — Picking Kruskal vs Prim incorrectly for graph density. Kruskal is O(E log E) — best for sparse graphs. Prim with a binary heap is O(E log V); Prim with a Fibonacci heap is O(E + V log V) — best for dense graphs.
Solution: Sparse (E close to V): Kruskal. Dense (E close to V^2): Prim. In interviews, Kruskal is usually easier to implement (sort + UF), so default to it unless the graph is very dense and Prim’s heap-based loop is more natural.
Trap 6 — Floyd-Warshall loop order. The intermediate-vertex loop k must be the OUTERMOST. If you put k inside, the DP recurrence reads partially-computed values and produces wrong answers.
Solution: Memorize: for k: for i: for j: dist[i][j] = min(dist[i][j], dist[i][k] + dist[k][j]). The order is k, i, j. Always.

Curated LeetCode List

DifficultyProblemLC #Pattern
EasyFind the Town Judge997In-degree / out-degree counting
EasyFind Center of Star Graph1791Trivial degree analysis
EasyFind if Path Exists in Graph1971BFS / DFS / Union-Find
MediumNetwork Delay Time743Dijkstra
MediumCheapest Flights Within K Stops787Bellman-Ford with K rounds
MediumPath With Minimum Effort1631Dijkstra with MAX combiner
MediumSwim in Rising Water778Dijkstra-style or binary search + BFS
MediumMin Cost to Connect All Points1584MST (Kruskal or Prim)
MediumCourse Schedule207Topological sort / cycle detection
MediumCourse Schedule II210Topological sort returning order
MediumNumber of Connected Components323Union-Find or DFS
MediumReconstruct Itinerary332Hierholzer’s Eulerian path
MediumEvaluate Division399Weighted Union-Find or DFS with weights
HardAlien Dictionary269Topological sort, careful char graph
HardWord Ladder II126BFS for distance + DFS for paths
HardCritical Connections in a Network1192Tarjan’s bridges algorithm
HardReachable Nodes In Subdivided Graph882Modified Dijkstra
HardMinimum Cost to Make at Least One Valid Path13680-1 BFS with deque
Solve order: pure Dijkstra (743), then variants (1631, 778), then Bellman-Ford constrained (787), then MST (1584), then topo sort (207, 210, 269), then advanced (332, 1192).

Graph Algorithms Interview Q&A

Strong Answer Framework:
  1. Default to Dijkstra when all edges are non-negative — it is faster: O((V + E) log V) vs Bellman-Ford’s O(V * E).
  2. Switch to Bellman-Ford when any edge can be negative, or when you need to detect a negative cycle.
  3. Switch to Bellman-Ford for the “shortest path with at most K edges” variant — run only K rounds instead of V-1.
  4. Special case: 0/1 weighted edges -> 0-1 BFS with a deque, O(V + E).
  5. Special case: DAG -> topological sort + relaxation, O(V + E), handles negatives.
Real LeetCode Example — LC 743 Network Delay Time (Dijkstra):
import heapq
from collections import defaultdict
def networkDelayTime(times, n, k):
    graph = defaultdict(list)
    for u, v, w in times:
        graph[u].append((v, w))
    dist = {i: float('inf') for i in range(1, n+1)}
    dist[k] = 0
    pq = [(0, k)]
    while pq:
        d, node = heapq.heappop(pq)
        if d > dist[node]: continue
        for nb, w in graph[node]:
            nd = d + w
            if nd < dist[nb]:
                dist[nb] = nd
                heapq.heappush(pq, (nd, nb))
    ans = max(dist.values())
    return ans if ans != float('inf') else -1
LC 787 Cheapest Flights (Bellman-Ford):
def findCheapestPrice(n, flights, src, dst, k):
    INF = float('inf')
    dist = [INF]*n
    dist[src] = 0
    for _ in range(k + 1):
        new_dist = dist[:]                             # Snapshot
        for u, v, w in flights:
            if dist[u] != INF and dist[u] + w < new_dist[v]:
                new_dist[v] = dist[u] + w
        dist = new_dist
    return dist[dst] if dist[dst] != INF else -1
Senior follow-up 1 — Construct a graph where Dijkstra gives the wrong answer. 3 nodes, edges A->B weight 1, A->C weight 5, B->C weight -10. Dijkstra finalizes B at distance 1, then relaxes B->C giving C = -9, which it returns. So far so good. But add edge C->A weight -100: now there is a negative cycle A->B->C->A with total weight -109. Dijkstra cannot detect this and will report the cycle as a finite path. Bellman-Ford catches it: after V-1 rounds, doing one more pass and finding any further relaxation indicates a negative cycle.
Senior follow-up 2 — Johnson’s algorithm — what is it and when is it the right call? For all-pairs shortest paths in a sparse graph WITH negative edges. Step 1: add a virtual source s with weight-0 edges to every node. Step 2: run Bellman-Ford from s to compute h(v) for all v. Step 3: reweight every edge: w’(u,v) = w(u,v) + h(u) - h(v) — guaranteed non-negative. Step 4: run Dijkstra from each source on the reweighted graph. Total: O(VE + V(V+E) log V) — better than Floyd-Warshall’s O(V^3) on sparse graphs.
Senior follow-up 3 — How would you parallelize Dijkstra at scale? Pure Dijkstra is sequential — the next node to process depends on the current heap state. Parallel variants exist (Delta-stepping, parallel SSSP) that batch-process nodes within a “delta band” of distances. Used in real systems like Pregel for billion-node graphs. For interview purposes: “Dijkstra is hard to parallelize; for massive scale I would use Bellman-Ford or Delta-stepping with Pregel-style supersteps.”
Common Wrong Answers:
  1. Using Dijkstra for the K-stops problem. Dijkstra finalizes nodes greedily; once it commits to “shortest distance” for a node, it cannot revisit even if more stops would yield a cheaper route. You get wrong answers.
  2. Saying “Bellman-Ford is just slower Dijkstra.” It handles negative weights; Dijkstra does not. They solve different problems.
  3. Forgetting that Bellman-Ford detects negative cycles as a bonus. This is a unique capability — using it interchangeably with Dijkstra misses the point.
Further Reading: LC 743, LC 787, LC 1786 (Number of Restricted Paths).
Strong Answer Framework:
  1. Both produce a Minimum Spanning Tree of total weight — the result is identical (up to ties).
  2. Kruskal: sort all edges by weight, scan in order, use Union-Find to skip cycle-creating edges. O(E log E).
  3. Prim: start from any node, maintain a heap of edges crossing the cut, pop the lightest. O(E log V) with binary heap, O(E + V log V) with Fibonacci heap.
  4. Kruskal wins on sparse graphs (E close to V) — sorting fewer edges is cheaper.
  5. Prim wins on dense graphs (E close to V^2) — avoids sorting all edges.
  6. Both rely on the cut property: the lightest edge crossing any cut is in some MST.
Real LeetCode Example — LC 1584 Min Cost to Connect All Points (Kruskal):
def minCostConnectPoints(points):
    n = len(points)
    edges = []
    for i in range(n):
        for j in range(i+1, n):
            d = abs(points[i][0]-points[j][0]) + abs(points[i][1]-points[j][1])
            edges.append((d, i, j))
    edges.sort()
    parent = list(range(n))
    def find(x):
        while parent[x] != x:
            parent[x] = parent[parent[x]]
            x = parent[x]
        return x
    total, used = 0, 0
    for w, u, v in edges:
        ru, rv = find(u), find(v)
        if ru != rv:
            parent[ru] = rv
            total += w
            used += 1
            if used == n - 1: break
    return total
Same problem with Prim:
import heapq
def minCostConnectPoints_prim(points):
    n = len(points)
    in_mst = [False]*n
    min_edge = [float('inf')]*n
    min_edge[0] = 0
    total = 0
    pq = [(0, 0)]
    while pq:
        w, u = heapq.heappop(pq)
        if in_mst[u]: continue
        in_mst[u] = True
        total += w
        for v in range(n):
            if not in_mst[v]:
                d = abs(points[u][0]-points[v][0]) + abs(points[u][1]-points[v][1])
                if d < min_edge[v]:
                    min_edge[v] = d
                    heapq.heappush(pq, (d, v))
    return total
Senior follow-up 1 — One edge weight in the MST decreases by 5 after the MST is built. How do you efficiently update the MST? Adding the cheaper edge creates exactly one cycle. Find and remove the heaviest edge in that cycle. If the heaviest is the new edge itself, the MST does not change. Finding the heaviest edge on a tree path is O(V) naively, O(log V) with link-cut trees. Decremental MST is a well-studied problem.
Senior follow-up 2 — Prove that the lightest edge in the graph is always in some MST. Cut property: take any spanning tree T that does NOT contain the lightest edge e=(u,v). Adding e creates a cycle. Removing any edge e’ on that cycle other than e gives a spanning tree of strictly lower weight (since e is lightest). Contradiction with T being minimum unless T already contains e. So every MST contains e (when weights are unique).
Senior follow-up 3 — Minimum spanning forest for a graph with multiple components. Kruskal handles this naturally: after processing all edges, you have the minimum spanning forest. The number of trees in the forest is V minus the number of edges added. Prim needs explicit handling: run Prim from each unvisited node and union the results.
Common Wrong Answers:
  1. Using BFS or DFS for cycle detection in Kruskal. O(V+E) per edge -> O(E(V+E)) total. Use Union-Find for O(alpha(n)) per check.
  2. Picking Prim for sparse graphs because “it sounds fancier.” Kruskal beats Prim on sparse graphs because edge sorting dominates for both.
  3. Confusing MST with shortest path tree. MST minimizes TOTAL weight; SPT minimizes distance from a single source. They are different trees.
Further Reading: LC 1584, LC 1135 (Connecting Cities With Minimum Cost), LC 1168 (Optimize Water Distribution).
Strong Answer Framework:
  1. Recognize: ordering with dependencies -> topological sort on a DAG.
  2. Kahn’s BFS: compute in-degrees, seed queue with all in-degree-zero nodes, process and decrement neighbors’ in-degrees, push when in-degree hits zero.
  3. DFS postorder: run DFS from each unvisited node, append node to result when finished, reverse the result at the end.
  4. Cycle detection: Kahn’s — if len(result) < V, cycle exists. DFS — three-color marking, GRAY hit means cycle.
  5. Default to Kahn’s: simpler, naturally iterative, fewer bugs.
Real LeetCode Example — LC 207 / 210 Course Schedule II (Kahn’s):
from collections import defaultdict, deque
def findOrder(numCourses, prerequisites):
    graph = defaultdict(list)
    in_degree = [0]*numCourses
    for course, prereq in prerequisites:
        graph[prereq].append(course)
        in_degree[course] += 1
    queue = deque([i for i in range(numCourses) if in_degree[i] == 0])
    order = []
    while queue:
        node = queue.popleft()
        order.append(node)
        for nb in graph[node]:
            in_degree[nb] -= 1
            if in_degree[nb] == 0:
                queue.append(nb)
    return order if len(order) == numCourses else []
Senior follow-up 1 — Find the lexicographically smallest valid ordering. Replace the BFS queue with a min-heap. Whenever multiple nodes have in-degree 0 simultaneously, the heap pops the smallest first. Time becomes O((V + E) log V). This works because the topo-order constraint is preserved regardless of which in-degree-zero node you pick first.
Senior follow-up 2 — Maximum parallelism in a build system. Run Kahn’s; at each step the number of nodes with in-degree zero is the parallelism level for that step. Track the max. More formally, this is the width of the longest antichain in the DAG — by Dilworth’s theorem, equal to the minimum number of chains needed to cover the DAG. Computable in polynomial time via min-cut on a bipartite reduction.
Senior follow-up 3 — All valid topological orderings. Kahn’s gives one valid order; enumerating all is much harder. Backtrack: at each step, try every available in-degree-zero node, recurse, restore. The number of valid orderings can be exponential — for a DAG that is just N independent nodes, there are N! orderings. For large DAGs you typically want one canonical order, not enumeration.
Common Wrong Answers:
  1. Using BFS shortest-path style. Topological sort is not a shortest-path problem; the distance metric is irrelevant.
  2. Returning the result without checking for cycle. If len(result) < V, the input has a cycle — return empty / report invalid.
  3. Building edges in the wrong direction. For prerequisites, [a, b] means b is a prerequisite of a -> edge b -> a. Reversing this silently flips the entire ordering.
Further Reading: LC 207, LC 210, LC 269 (Alien Dictionary), LC 444 (Sequence Reconstruction).

Interview Questions

What interviewers are really testing: Do you actually understand the greedy invariant that makes Dijkstra correct, or have you just memorized the code? Explaining the negative weight failure is the litmus test for genuine understanding versus template recall.Strong Answer:
  • Core idea. “Dijkstra is a greedy single-source shortest path algorithm. It processes nodes in order of their current shortest known distance, using a min-heap. Once a node is popped from the heap, its distance is finalized — we will never find a shorter path to it.”
  • Why the greedy works. With non-negative weights, any path that goes through an unvisited node must be at least as long as the direct distance we already have. If I have finalized node A at distance 5, any path to A through some unvisited node B must cost at least dist(B) + edge(B, A), and since dist(B) >= 5 (it is still in the heap) and the edge is non-negative, the total is >= 5. So the greedy choice is safe.
  • How negatives break it. With negative edges, a node finalized at distance 5 might later be reachable at distance 3 via a path that includes a negative edge from an unfinalized node. The greedy invariant — “once popped, it is settled” — collapses. Concretely: node A popped at distance 5, then later we discover edge B->A with weight -10 and B was at distance 12, giving A a new distance of 2. But we already declared A done.
  • What to use instead. Bellman-Ford handles negative weights by relaxing all edges V-1 times, costing O(VE) but correctly handling negative edges and detecting negative cycles.
  • Complexity reminder. O((V+E) log V) with a binary heap, O(V^2) with a simple array (which is actually better for dense graphs).
Red flag answer: “Dijkstra does not work with negative weights because… it just does not.” Not being able to give a concrete example of how negative weights cause an incorrect result. Saying Dijkstra uses BFS (it does not — it uses a priority queue, which is a critical distinction).Follow-ups:
  1. Can you construct a small graph (3-4 nodes) where Dijkstra gives the wrong answer due to a negative edge? Walk me through step by step. (Tests whether they can actually demonstrate the failure, not just state it.)
  2. What about negative edges but no negative cycles? Is there a way to adapt Dijkstra to handle that case? (Answer: yes, Johnson’s algorithm re-weights all edges to be non-negative using a potential function derived from a Bellman-Ford run, then runs Dijkstra from each source. O(VE + V(V+E) log V).)
What interviewers are really testing: This is a cleverness and breadth test. Candidates who only know Dijkstra will use it and get O((V+E) log V). Strong candidates recognize that 0-1 BFS with a deque gives O(V+E) — the same complexity as plain BFS. This separates people who understand why algorithms work from those who pattern-match.Strong Answer:
  • The trick: 0-1 BFS. “When edge weights are only 0 or 1, I use a deque instead of a priority queue. Zero-weight edges push to the front of the deque, one-weight edges push to the back. This maintains the sorted order that Dijkstra needs, but without the log V overhead of a heap.”
  • Why it works. The deque naturally maintains the invariant that elements are in non-decreasing order of distance. When we add a node via a 0-weight edge, its distance equals the current node’s distance — so it belongs at the front. When we add via a 1-weight edge, its distance is current + 1 — so it belongs at the back. This is exactly a BFS-like level ordering.
  • Complexity. O(V + E) time, O(V) space. Compared to Dijkstra’s O((V+E) log V), we eliminate the log factor entirely.
  • When to recognize this pattern. Any graph where weights are from a set of two values {a, b} where b = a + 1 (or can be normalized to 0/1). Also works for problems like “minimum flips” in a grid where each move has cost 0 or 1.
  • Real example. Grid problems where moving in the same direction is free (cost 0) but turning has cost 1.
Red flag answer: Saying “just use Dijkstra” without recognizing the optimization opportunity. Not knowing what a deque-based BFS is. Saying “use BFS” without explaining how to handle the 0-weight edges differently from 1-weight edges.Follow-ups:
  1. Can you extend this to weights of 0, 1, and 2? What about 0 through K? (Answer: for 0 to K, you can use a generalization with K+1 buckets — called “dial’s algorithm” or bucket queue. Complexity becomes O(V + E + K*V). For small K, this beats Dijkstra.)
  2. How would you implement this for a grid where moving forward costs 0 but turning costs 1, to find the minimum number of turns from corner to corner? (Tests whether they can apply the 0-1 BFS idea to an actual problem encoding.)
What interviewers are really testing: Trade-off reasoning and algorithm selection judgment. Interviewers want to see that you do not just know these algorithms in isolation — you understand the decision matrix for choosing between them and know the full landscape of shortest path options.Strong Answer:
  • Dijkstra when: All edge weights are non-negative. You need single-source shortest paths efficiently. O((V+E) log V) with a binary heap makes it the default choice for road networks, navigation systems, and most “find the shortest path” interview problems.
  • Bellman-Ford when: The graph has negative edge weights. You need to detect negative cycles. The edge list representation is more natural than adjacency lists (Bellman-Ford iterates over edges, not neighbors). Also critical for the “cheapest flights within K stops” variant, where you run only K relaxation rounds instead of V-1.
  • Neither is best when: You need all-pairs shortest paths — use Floyd-Warshall at O(V^3) for dense graphs or run Dijkstra from each vertex for sparse graphs. For unweighted graphs, plain BFS at O(V+E) beats both. For DAGs, topological sort + relaxation gives O(V+E) even with negative weights.
  • The underrated option. For DAGs, you can compute shortest paths in O(V+E) by processing nodes in topological order and relaxing edges. This works with negative weights and is faster than both Dijkstra and Bellman-Ford. Most candidates forget this exists.
  • SPFA note. SPFA (Shortest Path Faster Algorithm) is a queue-based optimization of Bellman-Ford that is often faster in practice (average O(E)) but has the same O(VE) worst case. Some competitive programmers default to SPFA, but it can be killed by adversarial inputs.
Red flag answer: Only knowing Dijkstra and Bellman-Ford without mentioning BFS for unweighted graphs or topological sort for DAGs. Saying “Bellman-Ford is always slower so just use Dijkstra.” Not mentioning negative cycle detection as a unique capability of Bellman-Ford.Follow-ups:
  1. You are building a navigation system for a city with 10 million intersections. Which algorithm do you use and what data structure optimizations would you consider? (Answer: Dijkstra with a Fibonacci heap for O(V log V + E) theoretically, though in practice a binary heap or a 4-ary heap is faster. Also consider A* with a good heuristic, bidirectional Dijkstra, or contraction hierarchies for repeated queries.)
  2. How does the “Cheapest Flights Within K Stops” problem modify Bellman-Ford, and why can you not just use Dijkstra with a visited-with-stops check? (Answer: you run Bellman-Ford for only K+1 rounds instead of V-1. Dijkstra with a stops counter does not work cleanly because a node can be reached in fewer stops at a higher cost, and you need to keep both options open.)
What interviewers are really testing: Topological sort is used constantly in build systems, task schedulers, and dependency resolution. Interviewers want to know if you understand the concept beyond course scheduling examples — can you reason about uniqueness, cycle detection, and real applications?Strong Answer:
  • What it is. “A topological ordering of a DAG is a linear sequence of vertices such that for every directed edge u->v, u appears before v. It is only possible on directed acyclic graphs.”
  • Two approaches. Kahn’s algorithm (BFS-based): compute in-degrees, start with nodes at in-degree 0, process them and decrement neighbors’ in-degrees. DFS-based: run DFS and add nodes to a stack in post-order (when you finish processing all of a node’s neighbors), then reverse the stack. Both are O(V+E).
  • Multiple valid orderings? Yes, whenever there are multiple nodes with in-degree 0 at the same time in Kahn’s algorithm. For example, if courses A and B have no prerequisites, either can come first. The number of valid topological orderings can be exponentially large.
  • Detecting impossibility (cycles). With Kahn’s: if the result contains fewer than V nodes, a cycle exists — some nodes never reached in-degree 0 because they are in a cycle. With DFS: detect a back edge (an edge to a node currently on the recursion stack, typically tracked with a “gray” color in the white/gray/black coloring scheme).
  • Uniqueness check. The topological order is unique if and only if the queue in Kahn’s algorithm never contains more than one node at any step. This implies a Hamiltonian path exists in the DAG. You can check this during the BFS.
Red flag answer: Confusing topological sort with graph traversal order from BFS/DFS. Saying a topological sort is “just DFS.” Not knowing how to detect cycles as a byproduct of the algorithm. Not being able to explain the in-degree approach.Follow-ups:
  1. You have a build system with 50,000 packages and their dependency graph. How do you determine the maximum parallelism — the most packages you can build simultaneously? (Answer: the maximum parallelism at any point is the maximum number of nodes with in-degree 0 at the same step in Kahn’s. More formally, it is the width of the longest antichain in the DAG, which can be found by computing the longest path and using Dilworth’s theorem.)
  2. Can you find the longest path in a DAG? How? (Answer: process nodes in topological order, relax edges to compute the longest distance instead of shortest. O(V+E). This is impossible efficiently on general graphs — longest path is NP-hard on graphs with cycles.)
What interviewers are really testing: Most candidates can recite the steps of Kruskal’s algorithm. The real test is whether you can explain why the greedy choice is correct — this requires understanding the cut property or the cycle property of MSTs. This separates memorizers from people who actually understand graph theory.Strong Answer:
  • The algorithm. “Sort all edges by weight. Process them in order from lightest to heaviest. For each edge, if it connects two different components (check with Union-Find), add it to the MST. Otherwise skip it. Stop when we have V-1 edges.”
  • Why the greedy works — the cut property. For any cut (partition of vertices into two sets S and V-S), the minimum weight edge crossing the cut must be in every MST (assuming unique weights). Kruskal implicitly uses this: when it picks the cheapest edge connecting two different components, that edge is the minimum weight edge across the cut between those two components.
  • Why Union-Find is the right data structure. Each “is this edge connecting two components?” query is a find() operation, and each “merge components” is a union() operation. With path compression and union by rank, each operation is nearly O(1) — amortized O(alpha(n)) where alpha is the inverse Ackermann function, effectively constant.
  • Complexity. O(E log E) dominated by the sort. The Union-Find operations total O(E * alpha(V)) which is essentially O(E). Since E can be at most V^2, O(E log E) = O(E log V^2) = O(2E log V) = O(E log V).
  • When Kruskal beats Prim. Kruskal is better for sparse graphs (E close to V) because sorting fewer edges is cheap. Prim’s with a binary heap is O(E log V) as well, but Prim’s with a Fibonacci heap achieves O(E + V log V) which wins on dense graphs.
Red flag answer: Saying “it works because greedy always works” without invoking the cut property or cycle property. Not knowing what Union-Find is or why it is needed. Confusing Kruskal’s with Prim’s algorithm.Follow-ups:
  1. What if some edges have equal weights? Can the MST be non-unique? How does this affect Kruskal’s? (Answer: yes, the MST can be non-unique when ties exist. Different tie-breaking orders in the sort can produce different MSTs, but all will have the same total weight. The MST is unique if and only if all edge weights are distinct.)
  2. After building an MST, one edge weight in the original graph decreases. How do you efficiently update the MST without recomputing from scratch? (Answer: add the edge to the MST, which creates exactly one cycle. Find and remove the heaviest edge in that cycle. This can be done in O(V) with a path query on the MST. For repeated updates, link-cut trees give O(log V) per update.)
What interviewers are really testing: Cycle detection is deceptively different for directed vs. undirected graphs, and many candidates apply the wrong technique. This tests whether you understand why back edges mean different things in each case and whether you know multiple detection strategies.Strong Answer:
  • Undirected graphs — Union-Find or DFS. With Union-Find: for each edge (u, v), check if u and v are already in the same component. If yes, adding this edge creates a cycle. With DFS: if we visit a neighbor that is already visited and it is not the parent of the current node, we found a cycle. The parent check is critical — without it, the edge we just traversed from would be falsely flagged.
  • Directed graphs — DFS with three colors. White (unvisited), Gray (on the current recursion stack), Black (fully processed). A cycle exists if and only if we encounter a Gray node during DFS — this is a back edge, meaning we have found a path from that node back to itself. A cross edge to a Black node is fine — it does not indicate a cycle.
  • Why the distinction matters. In an undirected graph, edge (A, B) is traversed both ways, so we need the parent check to avoid false positives. In a directed graph, edge A->B does not imply B->A, so cross edges (to fully processed nodes in a different subtree) are harmless. Only back edges (to ancestors in the current DFS path) create cycles.
  • Alternative for directed graphs. Kahn’s topological sort: if the result has fewer than V nodes, a cycle exists. This is sometimes cleaner than DFS coloring when you are already computing a topological order.
Red flag answer: Using the same cycle detection approach for both directed and undirected graphs. Saying “just check if any node is visited twice” (this is insufficient for directed graphs — visiting a node twice via different paths is not a cycle). Not knowing the three-color DFS technique.Follow-ups:
  1. Can you detect the length of the shortest cycle in an undirected graph? What is the complexity? (Answer: run BFS from each vertex. When BFS finds an already-visited node, the cycle length is dist[u] + dist[v] + 1. Total: O(V(V+E)). The shortest cycle in a graph is called the “girth.”)
  2. In a directed graph, how would you find all nodes that are part of some cycle? (Answer: compute strongly connected components using Tarjan’s or Kosaraju’s algorithm. Any SCC with more than one node contains a cycle, and all nodes in it are part of a cycle. O(V+E).)
What interviewers are really testing: Do you understand the all-pairs shortest path problem space? Can you reason about when O(V^3) is acceptable versus running V instances of Dijkstra? This also tests whether you understand the DP formulation behind Floyd-Warshall, which is elegant and non-obvious.Strong Answer:
  • What it is. “Floyd-Warshall computes shortest paths between all pairs of vertices in O(V^3) time and O(V^2) space. It uses a dynamic programming approach: for each intermediate vertex k, update dist[i][j] = min(dist[i][j], dist[i][k] + dist[k][j]). The key insight is iterating over intermediate vertices in the outer loop.”
  • The DP formulation. dist_k[i][j] = shortest path from i to j using only vertices {0, 1, ..., k} as intermediates. The recurrence: either the shortest path through {0..k} uses vertex k (then it is dist_{k-1}[i][k] + dist_{k-1}[k][j]) or it does not (then it is dist_{k-1}[i][j]). We can update in-place because dist[i][k] and dist[k][j] do not change when we are processing vertex k.
  • Floyd-Warshall vs. V times Dijkstra. Floyd-Warshall: O(V^3), handles negative weights (no negative cycles), extremely simple to implement (three nested loops). V times Dijkstra: O(V(V+E) log V) with a binary heap. For dense graphs (E ~ V^2), Dijkstra gives O(V^3 log V) — worse than Floyd-Warshall. For sparse graphs (E ~ V), Dijkstra gives O(V^2 log V) — much better. Rule of thumb: Floyd-Warshall for dense graphs or graphs with negative weights, repeated Dijkstra for sparse graphs.
  • Negative cycle detection. After running Floyd-Warshall, if any dist[i][i] < 0, a negative cycle exists through vertex i.
  • The simplicity advantage. Floyd-Warshall is 5 lines of code. In an interview or contest, when you need all-pairs and the graph is not huge, it is often the pragmatic choice.
Red flag answer: Not knowing Floyd-Warshall exists or confusing it with Bellman-Ford. Getting the loop order wrong (the k loop must be outermost). Saying “just run Dijkstra from every node” without considering negative weights or graph density.Follow-ups:
  1. Why must the k-loop (intermediate vertex loop) be the outermost loop? What goes wrong if you put it inside? (Answer: the DP recurrence requires that when computing paths through vertex k, all paths through vertices 0..k-1 are already finalized. If k is not the outer loop, you use partially computed values and get wrong answers.)
  2. Floyd-Warshall uses O(V^2) space. Can you reconstruct the actual shortest path, not just the distance? How? (Answer: maintain a next[i][j] matrix. Initialize next[i][j] = j for each edge. When updating dist[i][j] through k, set next[i][j] = next[i][k]. To reconstruct the path from i to j, follow next[i][j], next[next[i][j]][j], etc. Same O(V^2) space.)
What interviewers are really testing: This is a system design question disguised as a graph algorithm question. It tests whether you can bridge the gap from textbook algorithms to real-world constraints: the graph does not fit in memory, you cannot run Dijkstra on 500M nodes per query, and latency matters. This is a staff-level question.Strong Answer:
  • The naive approach fails. Standard BFS on a graph with 500M nodes and perhaps 100B edges (average 200 friends per user) requires traversing potentially millions of nodes per query. At scale, this is too slow for real-time responses.
  • Bidirectional BFS. Start BFS from both the source and the target simultaneously. When the two frontiers meet, you have found the shortest path. This reduces the search space from O(b^d) to O(2 * b^(d/2)) where b is the branching factor and d is the distance. For a social network with average 200 friends and 6 degrees of separation, this drops from 200^6 (~64 trillion) to 2 * 200^3 (~16 million) — a massive improvement.
  • Graph partitioning. The full graph does not fit on one machine. Partition users across shards (e.g., by geography or by community detection). Store each partition’s subgraph on a separate server. Cross-partition edges are handled with remote calls. Tools like Pregel, Apache Giraph, or custom graph databases (like Facebook’s TAO) handle this.
  • Pre-computation and caching. For “degrees of separation” queries, pre-compute BFS layers for high-degree nodes (celebrities, influencers). Cache shortest distances for frequently queried pairs. Use approximate answers: “within 4 hops” is often good enough.
  • The LinkedIn approach. LinkedIn computes “degrees of connection” using a combination of pre-computed graph indices and bidirectional BFS at query time, with aggressive caching of intermediate results. They serve this in under 200ms for their 900M+ member graph.
Red flag answer: Saying “just run Dijkstra” without considering the scale. Not mentioning bidirectional BFS. Treating this purely as an algorithm problem without discussing distributed storage, caching, or approximate answers. Not recognizing that the graph is unweighted (so BFS is sufficient, no need for Dijkstra).Follow-ups:
  1. What if you need to find not just the shortest path but the top 5 shortest paths? How does this change your approach? (Answer: Yen’s algorithm finds K shortest simple paths but is expensive — O(KV(V+E)). At scale, you would approximate: run bidirectional BFS, find the first few meeting points, and reconstruct paths through those points. Exact K-shortest paths at 500M scale likely requires offline pre-computation.)
  2. How do you handle the graph changing in real-time — users adding and removing friends? (Answer: eventual consistency is acceptable for social graphs. Maintain a write-ahead log for graph mutations, batch-apply updates to the graph index periodically, and serve slightly stale data. Real-time freshness for “are we connected?” is less critical than for, say, a feed ranking system.)
What interviewers are really testing: SCCs are a topic that separates candidates who have studied graph theory seriously from those who only know BFS/DFS/Dijkstra. This tests depth of knowledge and the ability to explain a non-trivial two-pass algorithm clearly.Strong Answer:
  • Definition. “A strongly connected component (SCC) of a directed graph is a maximal set of vertices such that there is a directed path from every vertex to every other vertex in the set. In other words, within an SCC, you can reach any node from any other node.”
  • Kosaraju’s algorithm (two-pass DFS). Pass 1: Run DFS on the original graph, recording the finish order of nodes (push to a stack when done). Pass 2: Transpose the graph (reverse all edges). Pop nodes from the stack and run DFS on the transposed graph. Each DFS in pass 2 discovers one SCC. O(V+E) total.
  • Why it works. The first DFS orders nodes so that the “root” of each SCC’s DFS tree finishes last. Reversing edges means that starting from these roots in reverse finish order, the DFS can only reach nodes within the same SCC — the reversed edges prevent crossing into other SCCs.
  • Tarjan’s algorithm (single-pass). Uses a single DFS with a stack and tracking of “low-link” values. When a node’s low-link equals its discovery time, it is the root of an SCC — pop everything down to it from the stack. Also O(V+E) but in a single pass with more complex bookkeeping.
  • Real-world use cases. Compiler optimization: finding cycles in call graphs to determine which functions are mutually recursive. Web crawling: SCC analysis of web links reveals tightly connected clusters of pages. Dependency analysis: in a package manager, an SCC in the dependency graph means circular dependencies that must be resolved together.
Red flag answer: Confusing SCCs with connected components in undirected graphs. Not knowing any SCC algorithm. Describing the algorithm steps without being able to explain why reversing edges and processing in reverse finish order works.Follow-ups:
  1. After finding all SCCs, if you contract each SCC into a single node, what kind of graph do you get? (Answer: a directed acyclic graph — the “condensation” of the original graph. This is extremely useful because you can now run topological sort and DP on the condensation. For example, finding the longest path in a directed graph with cycles reduces to finding SCCs, condensing, then doing longest-path DP on the DAG.)
  2. Can you find SCCs in a graph that does not fit in memory? (Answer: this is an active research area. External-memory SCC algorithms exist but are significantly more complex. One approach: partition the graph, compute SCCs within partitions, then iteratively merge across partitions. In practice, systems like MapReduce-based graph processing frameworks handle this with multiple rounds of computation.)
What interviewers are really testing: This is the “Cheapest Flights Within K Stops” problem (LeetCode 787), and it is a favorite because it breaks standard Dijkstra in a subtle way. It tests whether you truly understand why Dijkstra’s visited-set optimization fails when there is an additional constraint dimension, and whether you can adapt Bellman-Ford cleanly.Strong Answer:
  • Why standard Dijkstra fails. Dijkstra pops a node and marks it as finalized. But with a K-edge constraint, a node might be reachable in 2 edges at cost 100 and in 5 edges at cost 50. If K >= 5, we want the second path. If K = 3, we want the first. Standard Dijkstra would finalize the node at cost 50 (via 5 edges) and never consider the 2-edge path. You cannot just track (node, edges_used) as your state in Dijkstra either, because the state space explodes and the greedy invariant gets complicated.
  • Bellman-Ford approach. Run exactly K rounds of Bellman-Ford (instead of V-1). After i rounds, dist[v] contains the shortest path to v using at most i edges. The key detail: on each round, use the distances from the previous round (make a copy) to avoid “chaining” relaxations within the same round, which could use more than one additional edge.
  • The copy trick. In standard Bellman-Ford, you relax in-place and it does not matter if relaxations chain within a round (it converges in V-1 rounds regardless). But for the K-edge variant, chaining within a round means a single round could use multiple edges. So you must relax from a snapshot of the previous round’s distances.
  • DP formulation. dp[k][v] = minimum cost to reach v using at most k edges. Transition: dp[k][v] = min(dp[k-1][v], min over all edges (u,v) of dp[k-1][u] + weight(u,v)). Space-optimized to two arrays (current and previous round).
  • Complexity. O(K * E) time, O(V) space (with the rolling array optimization).
Red flag answer: Saying “just use Dijkstra with a visited set.” Not recognizing the need to copy distances between rounds. Implementing Bellman-Ford with in-place relaxation and getting wrong answers on K-constrained inputs. Confusing “K stops” with “K edges” (K stops means K+1 edges).Follow-ups:
  1. What if K is very large (close to V)? Is Bellman-Ford still the best approach? (Answer: if K >= V-1, the constraint is effectively lifted and standard Dijkstra or full Bellman-Ford works. For K close to V, Bellman-Ford at O(KE) is similar to its standard O(VE). Dijkstra at O((V+E) log V) would be faster if there are no negative weights.)
  2. Can you solve this problem with a modified BFS if all weights are equal to 1? (Answer: yes. With unit weights, “shortest path with at most K edges” is just “is the BFS distance at most K?” Standard BFS in O(V+E) handles it directly. The problem is only interesting with varying weights.)

Interview Deep-Dive

Strong Answer:
  • BFS: Unweighted graph (or all edges weight 1). Single-source. O(V + E). The fastest possible shortest-path algorithm for this case — never use Dijkstra when BFS suffices.
  • Dijkstra: Weighted graph with non-negative edges. Single-source. O((V + E) log V) with binary heap. The default choice for most weighted shortest-path problems. Fails silently with negative edges.
  • Bellman-Ford: Weighted graph, negative edges allowed but no negative cycles. Single-source. O(VE). Also detects negative cycles (if any distance decreases in the Vth relaxation round). Use for “cheapest flights within K stops” variant by running only K rounds.
  • Floyd-Warshall: All-pairs shortest paths. O(V^3). Handles negative edges (no negative cycles). Best for dense graphs or when you need every pair’s distance. For sparse graphs, V runs of Dijkstra is faster: O(V(V+E) log V).
  • 0-1 BFS: Edges with weights 0 or 1 only. Single-source. O(V + E) using a deque. Push weight-0 edges to front, weight-1 edges to back.
  • DAG shortest path: Topological sort + relaxation. O(V + E). Works with negative edges. Fastest single-source shortest path for DAGs.
  • Decision tree: Unweighted -> BFS. Weighted, non-negative -> Dijkstra. Negative edges -> Bellman-Ford. All-pairs -> Floyd-Warshall (dense) or V x Dijkstra (sparse). DAG -> topological sort.
Follow-up: You are building a navigation system for a city with 10 million intersections. Which algorithm do you use, and what optimizations beyond the basic algorithm are needed?
  • Dijkstra is the starting point. But at 10M nodes, even Dijkstra is too slow per query for real-time routing. Production systems use contraction hierarchies (preprocess shortcut edges) or A search* (heuristic guided — use Euclidean distance as the heuristic for road networks). Google Maps uses a variant of contraction hierarchies that precomputes a hierarchy of important roads, reducing query time from seconds to milliseconds.
Strong Answer:
  • The invariant Dijkstra relies on: Once a node is popped from the min-heap, its distance is final. No shorter path to it can be discovered later. This is true when all edges are non-negative because any path through an unvisited node must cost at least as much as the direct path.
  • Concrete 3-node counterexample: Nodes A, B, C. Edges: A->B (weight 1), A->C (weight 5), B->C (weight -10).
    • Initialize: dist = [A:0, B:inf, C:inf]. Heap: [(0, A)].
    • Pop A (dist 0). Relax A->B: dist[B] = 1. Relax A->C: dist[C] = 5. Heap: [(1, B), (5, C)].
    • Pop B (dist 1). Dijkstra finalizes B at distance 1. Relax B->C: dist[C] = min(5, 1 + (-10)) = -9. Heap: [(5, C), (-9, C)].
    • Pop C (dist -9). Dijkstra finalizes C at -9. But wait — Dijkstra already finalized B at distance 1. Is that correct? Yes in this case.
    • Actually, the real failure is subtler. Consider: A->B (weight 6), A->C (weight 2), C->B (weight -5).
    • Initialize: dist = [A:0, B:inf, C:inf]. Heap: [(0, A)].
    • Pop A. Relax A->B: dist[B] = 6. Relax A->C: dist[C] = 2. Heap: [(2, C), (6, B)].
    • Pop C (dist 2). Finalize C at 2. Relax C->B: dist[B] = min(6, 2 + (-5)) = -3. Heap: [(-3, B), (6, B)].
    • Pop B (dist -3). Finalize B at -3. Correct answer is -3 via A->C->B.
    • But with lazy deletion, Dijkstra gets this right! The failure is more subtle — it occurs when a finalized node could be reached cheaper through a not-yet-finalized node with a negative edge. For a clear failure: A->B (1), B->C (1), A->C (10), C->A (-20). Dijkstra finalizes A at 0, B at 1, C at 2. But the path A->C->A has cost 10 + (-20) = -10, meaning A’s true shortest “distance” from itself is -10 (a negative cycle). Dijkstra cannot detect this.
  • The fundamental issue: Negative edges violate the greedy invariant. Dijkstra assumes “settled” means “optimal,” but a negative edge from an unsettled node can retroactively improve a settled node’s distance.
Follow-up: If the graph has negative edges but no negative cycles, can you adapt Dijkstra? How?
  • Yes, via Johnson’s algorithm. Run Bellman-Ford from a virtual source node connected to all vertices with weight-0 edges. This computes a potential function h(v). Reweight each edge: w’(u,v) = w(u,v) + h(u) - h(v). All reweighted edges are non-negative. Then run Dijkstra from each source on the reweighted graph. Total: O(VE + V(V+E) log V).
Strong Answer:
  • Kruskal’s: Sort all edges by weight. Process edges from lightest to heaviest. If an edge connects two different components (checked via Union-Find), add it to the MST. Stop at V-1 edges. Time: O(E log E) dominated by sorting. Union-Find operations are O(E * alpha(V)), effectively O(E).
  • Prim’s: Start from any node. Maintain a min-heap of edges crossing the cut between MST nodes and non-MST nodes. Pop the lightest crossing edge, add the new node to the MST, and push its edges. Time: O(E log V) with binary heap, O(E + V log V) with Fibonacci heap.
  • When Kruskal wins: Sparse graphs (E close to V). Sorting fewer edges is cheap. Also, Kruskal naturally works with edge lists and does not need an adjacency list.
  • When Prim wins: Dense graphs (E close to V^2). Prim with an adjacency matrix is O(V^2), which beats Kruskal’s O(E log E) = O(V^2 log V) on dense graphs. With Fibonacci heap, Prim achieves O(E + V log V), which is optimal.
  • Correctness proof: Both rely on the cut property — for any cut of the graph, the minimum weight edge crossing the cut must be in every MST (assuming unique weights). Kruskal implicitly applies this by always choosing the globally cheapest edge that does not create a cycle. Prim applies it by always choosing the cheapest edge crossing the MST/non-MST boundary.
  • MST uniqueness: If all edge weights are distinct, the MST is unique. If some weights are equal, multiple MSTs may exist, but all have the same total weight.
Follow-up: After building an MST, one edge weight in the original graph decreases. How do you efficiently update the MST without recomputing from scratch?
  • Add the updated (cheaper) edge to the MST. This creates exactly one cycle. Find and remove the heaviest edge in that cycle. If the removed edge is the updated edge itself (it was already the heaviest in the cycle), the MST is unchanged. Otherwise, you have a new, lighter MST. Finding the heaviest edge in the cycle can be done in O(V) with a path query, or O(log V) with link-cut trees.