Distributed Snapshots
How do you take a picture of a system that never stops moving? In a distributed system, there is no global clock and no global memory. Capturing a “consistent” global state is one of the most fundamental challenges.Module Duration: 6-8 hours
Key Topics: Global State, Consistent Cuts, Chandy-Lamport Algorithm, Termination Detection
Interview Focus: How to debug distributed state, checkpointing, and recovery
Key Topics: Global State, Consistent Cuts, Chandy-Lamport Algorithm, Termination Detection
Interview Focus: How to debug distributed state, checkpointing, and recovery
1. The Challenge of Global State
In a single machine, you can stop the world and dump memory. In a distributed system:- No Global Clock: We can’t say “everyone record your state at exactly 10:00:00”.
- Messages in Flight: The state isn’t just what’s on the nodes; it’s also the messages currently traveling on the network.
1.1 The “Money Transfer” Problem
Imagine two banks, A and B, each with 200.- A sends $50 to B.
- If you snapshot A after it sends, and B before it receives, the total looks like 50 vanished!
- If you snapshot B after it receives, and A before it sends, the total looks like 50 was created!
2. Consistent Cuts
A Cut is a line drawn across the execution of a distributed system, dividing events into “past” and “future”.2.1 Consistent vs. Inconsistent Cuts
- Consistent Cut: If an event is in the cut (past), then every event that happened-before must also be in the cut.
- Inconsistent Cut: A cut where an effect is recorded but its cause is not. (e.g., recording a message receipt but not its sending).
3. The Chandy-Lamport Algorithm
The classic algorithm for capturing a consistent global state in a system with FIFO channels.3.1 The Protocol: Markers and Recording Rules
The algorithm uses a special Marker message to coordinate the snapshot. Crucially, markers don’t stop the application; they “sweep” across the system.- Marker Sending Rule:
- Process records its state and sends a Marker on all outgoing channels before sending any more application messages.
- Marker Receiving Rule (Process receives Marker from ):
- Case A: First Marker received:
- records its state immediately.
- marks the channel from to as empty.
- starts recording all incoming messages on all other channels.
- sends Marker on all its outgoing channels.
- Case B: Already recorded state:
- stops recording the channel from to .
- The state of that channel is the sequence of messages recorded since first took its own snapshot.
- Case A: First Marker received:
3.2 Why it Works: The Proof of Consistency
A Chandy-Lamport snapshot represents a state that could have happened. Even if the snapshot doesn’t correspond to any specific “instant” in real time, it is Consistent. The Logic: If message was received by before took its snapshot, but sent by after took its snapshot, the cut would be inconsistent. However, the protocol prevents this: sends the Marker before any message sent after its snapshot. Since channels are FIFO, the Marker must reach before any subsequent message . Thus, would have already taken its snapshot.4. Advanced Variant: Asynchronous Barrier Snapshotting (ABS)
Modern stream processors like Apache Flink use a specialized version of Chandy-Lamport designed for high-throughput DAGs.4.1 Flink’s Barrier Mechanism
Instead of markers on every channel, Flink injects Barriers into the data stream at the sources.- Alignment: When an operator has multiple input streams, it must wait (buffer) messages from faster streams until it receives the barrier from all input streams.
- Snapshot: Once all barriers arrive, the operator snapshots its state to a durable store (like S3/HDFS) and forwards the barrier.
- Optimization: Unaligned Checkpoints allow operators to snapshot immediately upon seeing the first barrier, by including the buffered data from other inputs in the snapshot itself. This reduces “backpressure” at the cost of larger snapshot sizes.
5. Evaluating Global Predicates
Taking a snapshot is only half the battle. Once you have it, how do you use it to detect distributed properties?5.1 Stable vs. Unstable Predicates
- Stable Predicates: Once true, they stay true (e.g., “The system is deadlocked”, “The computation has terminated”).
- Unstable Predicates: Can be true at one instant and false the next (e.g., “Queue size > 100”).
- Staff Tip: You can only reliably detect Stable predicates using Chandy-Lamport. Detecting unstable predicates requires capturing all possible consistent cuts (lattice of global states), which is complex.
5.2 Distributed Deadlock Detection
To find a deadlock, you take a snapshot of the Wait-For Graph (WFG). If the snapshot contains a cycle, the system is deadlocked. Without Chandy-Lamport, you might see a “phantom deadlock” because you caught the tail of one message and the head of another.6. Real-World Applications
4.1 Distributed Debugging
Finding “zombie” processes or deadlocks requires a global view of who is waiting on whom.4.2 Checkpointing and Recovery
Large-scale processing systems (like Apache Flink) use Chandy-Lamport (or variants like Asynchronous Barrier Snapshotting) to save state. If a node fails, the system rolls back to the last consistent global snapshot.4.3 Garbage Collection
Identifying objects that are no longer reachable across a network requires a consistent snapshot of the “object graph”.5. Interview Questions
Q: Why can't we just use NTP to take a snapshot?
Q: Why can't we just use NTP to take a snapshot?
Answer: Even with NTP, clock skew exists (typically 1-10ms). In that window, thousands of messages can be sent and received. A snapshot triggered by physical time would likely be inconsistent because it wouldn’t account for messages “in flight”. Chandy-Lamport uses causality (markers) rather than physical time to ensure consistency.
Q: What are the assumptions of Chandy-Lamport?
Q: What are the assumptions of Chandy-Lamport?
Answer:
- FIFO Channels: Messages on a channel must be delivered in the order they were sent.
- Reliable Delivery: No messages are lost.
- No Process Crashes: During the snapshot period (though modern variants handle this).
6. Key Takeaways
Global State = Nodes + Channels
You must record both the local memory and the messages currently on the wires.
Causality Over Time
Consistent snapshots rely on logical ordering (happened-before) rather than wall-clock time.