The Core Idea: Represent each set as a tree. Each element points to its parent, and the root is the set’s representative.Path Compression: When we find the root of an element, we make all nodes along the path point directly to the root. This flattens the tree, making future queries faster.Union by Rank: When merging two sets, attach the shorter tree under the taller one. This keeps trees balanced.Visual Example:
Before find(4): After path compression: 1 1 | / | \ 2 2 3 4 | 3 | 4Finding root of 4: 4→3→2→1 (root is 1)After compression: 4→1, 3→1, 2→1 (all point to root)
Why O(α(n))? The inverse Ackermann function α(n) grows incredibly slowly—for any practical n (even n = 10^80), α(n) ≤ 4. So it’s effectively constant time!
class DSU {public: vector<int> parent, rank_; int components; DSU(int n) : parent(n + 1), rank_(n + 1, 0), components(n) { iota(parent.begin(), parent.end(), 0); // parent[i] = i } int find(int x) { if (parent[x] != x) { parent[x] = find(parent[x]); // Path compression } return parent[x]; } bool unite(int x, int y) { int px = find(x), py = find(y); if (px == py) return false; // Already in same set // Union by rank if (rank_[px] < rank_[py]) swap(px, py); parent[py] = px; if (rank_[px] == rank_[py]) rank_[px]++; components--; return true; } bool connected(int x, int y) { return find(x) == find(y); }};
Problem: Find Minimum Spanning Tree of a weighted graph.Why Kruskal + DSU? Kruskal’s algorithm sorts edges by weight and greedily adds each edge if it does not form a cycle. The cycle check is exactly the DSU “are these in the same set?” query. Sort takes O(E log E), and E union/find operations take O(E * alpha(V)), so total is O(E log E).Key insight: An MST of n nodes always has exactly n-1 edges. If fewer edges are used after processing all edges, the graph is disconnected (no spanning tree exists).
struct Edge { int u, v, w; bool operator<(const Edge& other) const { return w < other.w; }};long long kruskal(int n, vector<Edge>& edges) { sort(edges.begin(), edges.end()); // Sort by weight ascending DSU dsu(n); long long mstWeight = 0; int edgesUsed = 0; for (auto& [u, v, w] : edges) { if (dsu.unite(u, v)) { // If u and v are in different components mstWeight += w; // Add this edge to MST edgesUsed++; if (edgesUsed == n - 1) break; // MST complete (n-1 edges) } // If unite returns false, adding this edge would form a cycle -- skip } // If edgesUsed < n-1, graph is disconnected return mstWeight;}
Problem: Answer “are u and v connected?” after edge additions.
// Process edges onlinevoid solve() { int n, m, q; cin >> n >> m >> q; DSU dsu(n); // Add initial edges for (int i = 0; i < m; i++) { int u, v; cin >> u >> v; dsu.unite(u, v); } // Answer queries for (int i = 0; i < q; i++) { int type, u, v; cin >> type >> u >> v; if (type == 1) { dsu.unite(u, v); } else { cout << (dsu.connected(u, v) ? "YES" : "NO") << "\n"; } }}
class WeightedDSU {public: vector<int> parent; vector<long long> dist; // Distance from node to parent WeightedDSU(int n) : parent(n + 1), dist(n + 1, 0) { iota(parent.begin(), parent.end(), 0); } pair<int, long long> find(int x) { if (parent[x] == x) return {x, 0}; auto [root, d] = find(parent[x]); parent[x] = root; dist[x] += d; return {root, dist[x]}; } void unite(int x, int y, long long w) { // w = dist[y] - dist[x] (relative distance) auto [px, dx] = find(x); auto [py, dy] = find(y); if (px == py) return; parent[py] = px; dist[py] = dx - dy + w; } long long getDist(int x, int y) { auto [px, dx] = find(x); auto [py, dy] = find(y); if (px != py) return LLONG_MAX; // Not connected return dy - dx; }};
Application: “A is 5 units more than B” type constraints. Also used in problems like “determine if relationships are consistent” (e.g., “if X says A is heavier than B by 3 kg, and Y says B is heavier than C by 2 kg, what is the difference between A and C?”).
Contest tip: Weighted DSU is commonly tested in problems that give pairwise relative information and ask you to detect contradictions or compute absolute values. If you see “relative differences” or “potentials” between elements in the same group, think weighted DSU.
Problem: Process queries where edges can be added AND removed.Solution: Use union by size (no path compression) to enable rollback.Why no path compression? Path compression is irreversible — it flattens the tree structure, making it impossible to “undo” a union. Without path compression, each union only changes one parent pointer, which is easy to save and restore.When to use: Offline dynamic connectivity, divide and conquer on queries, and problems where you need to “undo” merges (e.g., process edges that exist only during a time interval).
class RollbackDSU {public: vector<int> parent, size_; vector<pair<int, int>> history; // {node, old_parent} RollbackDSU(int n) : parent(n + 1), size_(n + 1, 1) { iota(parent.begin(), parent.end(), 0); } int find(int x) { while (parent[x] != x) x = parent[x]; // No path compression! return x; } bool unite(int x, int y) { int px = find(x), py = find(y); if (px == py) return false; if (size_[px] < size_[py]) swap(px, py); history.push_back({py, py}); // Save state history.push_back({px, size_[px]}); // Actually store size parent[py] = px; size_[px] += size_[py]; return true; } void rollback(int savedState) { while (history.size() > savedState) { // Restore previous state // Implementation depends on what you're tracking } } int saveState() { return history.size(); }};
// Problem: For each node, count distinct values in subtreevector<set<int>> nodeSet(n + 1);vector<int> answer(n + 1);void dfs(int u, int p) { nodeSet[u].insert(val[u]); for (int v : adj[u]) { if (v != p) { dfs(v, u); // Merge smaller into larger (small-to-large) if (nodeSet[u].size() < nodeSet[v].size()) { swap(nodeSet[u], nodeSet[v]); } for (int x : nodeSet[v]) { nodeSet[u].insert(x); } } } answer[u] = nodeSet[u].size();}