Time & Clock Synchronization

Time is the foundation upon which all distributed systems reasoning is built. Understanding clocks is crucial for designing systems that maintain consistency and order.

Track Duration: 10-14 hours
Key Topics: Physical Clocks, NTP, PTP, Logical Clocks, Vector Clocks, HLC, TrueTime
Interview Focus: Google Spanner’s TrueTime, causality tracking, clock synchronization trade-offs

The Fundamental Problem

The Hard Truth: There is no global clock in distributed systems. Every node has its own view of time, and they never perfectly agree.

┌─────────────────────────────────────────────────────────────────────────────┐
│                   WHY PERFECT TIME SYNC IS IMPOSSIBLE                        │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  PHYSICAL LIMITATIONS:                                                      │
│  ─────────────────────                                                      │
│  ▸ Speed of light: ~200km/ms in fiber                                      │
│  ▸ NYC ↔ London: minimum ~28ms one-way                                     │
│  ▸ Message delivery time is variable                                        │
│  ▸ Network congestion adds unpredictable delay                              │
│                                                                              │
│  CLOCK HARDWARE:                                                            │
│  ───────────────                                                            │
│  ▸ Quartz oscillators drift: 10-100 ppm                                    │
│  ▸ 50 ppm = 4.32 seconds/day drift                                         │
│  ▸ Temperature affects crystal frequency                                    │
│  ▸ Each machine drifts differently                                          │
│                                                                              │
│  SYNCHRONIZATION PARADOX:                                                   │
│  ────────────────────────                                                   │
│  To sync clocks, you need to send messages                                  │
│  Message delay is unknown and variable                                       │
│  Therefore, you can never know the exact time another node has              │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Module 33: Clock Synchronization Protocols

Physical Clock Synchronization

Network Time Protocol (NTP)

NTP is the most widely used clock synchronization protocol on the internet.

┌─────────────────────────────────────────────────────────────────────────────┐
│                      NTP ARCHITECTURE                                        │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│                    ┌───────────────┐                                        │
│                    │  Stratum 0    │ ← Atomic clocks, GPS receivers        │
│                    │  (Reference)  │   NIST, USNO primary standards        │
│                    └───────┬───────┘                                        │
│                            │                                                 │
│              ┌─────────────┼─────────────┐                                  │
│              ▼             ▼             ▼                                  │
│        ┌──────────┐  ┌──────────┐  ┌──────────┐                            │
│        │Stratum 1 │  │Stratum 1 │  │Stratum 1 │  ← Direct connection       │
│        │  Server  │  │  Server  │  │  Server  │    to reference clocks     │
│        └────┬─────┘  └────┬─────┘  └────┬─────┘                            │
│             │             │             │                                   │
│        ┌────┴────────────┴──────────────┴────┐                              │
│        ▼             ▼             ▼         ▼                              │
│   ┌──────────┐  ┌──────────┐  ┌──────────┐  ...                            │
│   │Stratum 2 │  │Stratum 2 │  │Stratum 2 │       ← Most enterprise servers │
│   │  Server  │  │  Server  │  │  Server  │                                 │
│   └────┬─────┘  └────┬─────┘  └────┬─────┘                                 │
│        │             │             │                                        │
│        ▼             ▼             ▼                                        │
│   [Stratum 3+]  [Your servers]  [End users]                                │
│                                                                              │
│  ACCURACY BY STRATUM:                                                       │
│  ────────────────────                                                       │
│  Stratum 0: < 1 microsecond (reference)                                     │
│  Stratum 1: < 10 microseconds                                               │
│  Stratum 2: 1-10 milliseconds (typical datacenter)                          │
│  Stratum 3+: 10-100+ milliseconds                                           │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

NTP Synchronization Algorithm

┌─────────────────────────────────────────────────────────────────────────────┐
│                    NTP MESSAGE EXCHANGE                                      │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   Client                                              Server                │
│     │                                                   │                   │
│     │  T1 = client send time                           │                   │
│     │────────────── Request ──────────────────────────>│                   │
│     │                                                   │ T2 = server recv  │
│     │                                                   │ T3 = server send  │
│     │<────────────── Response ─────────────────────────│                   │
│     │  T4 = client recv time                           │                   │
│                                                                              │
│   CALCULATIONS:                                                             │
│   ─────────────                                                             │
│   Round-trip delay: δ = (T4 - T1) - (T3 - T2)                               │
│   Clock offset: θ = ((T2 - T1) + (T3 - T4)) / 2                            │
│                                                                              │
│   EXAMPLE:                                                                  │
│   ────────                                                                  │
│   T1 = 10:00:00.000 (client thinks)                                        │
│   T2 = 10:00:00.050 (server records)                                        │
│   T3 = 10:00:00.051 (server sends)                                          │
│   T4 = 10:00:00.101 (client receives)                                       │
│                                                                              │
│   δ = (0.101) - (0.001) = 0.100s (100ms round-trip)                        │
│   θ = ((0.050) + (-0.050)) / 2 = 0s                                        │
│                                                                              │
│   If θ ≠ 0, client adjusts its clock                                        │
│                                                                              │
│   CLOCK ADJUSTMENT MODES:                                                   │
│   ───────────────────────                                                   │
│   • Small offset (< 128ms): Slew (gradual adjustment)                      │
│   • Medium offset: Step (jump, may break applications!)                     │
│   • Large offset (> 1000s): Panic (refuse to adjust)                       │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Precision Time Protocol (PTP - IEEE 1588)

For applications requiring sub-microsecond accuracy:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    PTP vs NTP COMPARISON                                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  FEATURE             NTP                    PTP                             │
│  ───────             ───                    ───                             │
│  Accuracy            1-10 ms                < 1 μs (with hardware)          │
│  Hardware support    Not required           Required for best accuracy      │
│  Network type        Any IP network         Usually LAN only                │
│  Complexity          Simple                 More complex                    │
│  Cost                Free                   Hardware timestamping NICs      │
│                                                                              │
│  PTP COMPONENTS:                                                            │
│  ───────────────                                                            │
│  • Grandmaster clock (time source)                                          │
│  • Boundary clocks (synchronize segments)                                   │
│  • Ordinary clocks (end devices)                                            │
│  • Transparent clocks (switches that compensate for delay)                  │
│                                                                              │
│  USE CASES:                                                                 │
│  ──────────                                                                 │
│  • High-frequency trading (nanoseconds matter)                              │
│  • Telecom (5G requires tight sync)                                         │
│  • Industrial automation                                                    │
│  • Scientific experiments                                                   │
│                                                                              │
│  DISTRIBUTED DATABASES:                                                     │
│  ──────────────────────                                                     │
│  Most use NTP + additional mechanisms (HLC, TrueTime)                       │
│  CockroachDB: NTP + Hybrid Logical Clocks                                   │
│  Spanner: GPS + atomic clocks (TrueTime)                                    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Clock Anomalies and Edge Cases

Real-World Clock Problems

Leap Seconds

Earth’s rotation is gradually slowing, requiring occasional corrections.

LEAP SECOND EVENT:
─────────────────
Normal:  23:59:58 → 23:59:59 → 00:00:00
Leap:    23:59:58 → 23:59:59 → 23:59:60 → 00:00:00

PROBLEMS CAUSED:
────────────────
• 2012 leap second: Crashed Reddit, LinkedIn, Gawker, Mozilla
• Linux kernel bug: Livelock due to clock_gettime() returning 60
• Cassandra outages: Time comparisons broke

SOLUTIONS:
──────────
1. Leap smear (Google): Spread the second over 24 hours
   Add/subtract 11.6μs per second throughout the day

2. Step at midnight: Risk of duplicate timestamps

3. Ignore (dangerous): Systems may crash or produce errors

Interview Tip: Explain how your timestamp-dependent features handle leap seconds.

Clock Jumps (Step Changes)

When NTP decides the clock is too far off, it jumps:

BEFORE JUMP:
───────────
Event A at t=100
Event B at t=101
Event C at t=102

NTP JUMP: Clock was 50ms behind, jumps forward

AFTER JUMP:
───────────
Event D at t=152 (jumped from 102 to 152)

PROBLEM: If you're calculating durations:
Duration C→D = 152 - 102 = 50 seconds
Actual time elapsed: ~0 seconds!

MITIGATION:
───────────
• Use monotonic clocks for duration measurement
• CLOCK_MONOTONIC (Linux): Only moves forward, no jumps
• System.nanoTime() (Java): Monotonic
• time.monotonic() (Python): Monotonic

NTP Slewing

For small corrections, NTP gradually adjusts the clock rate:

SLEW ADJUSTMENT:
────────────────
Clock is 100ms behind
Slew rate: 500μs per second
Time to correct: 100ms / 500μs = 200 seconds (~3.3 minutes)

DURING SLEW:
────────────
Your clock runs 0.05% faster
1 second in real time = 1.0005 seconds on your clock

IMPLICATIONS:
─────────────
• Timeouts may be slightly longer/shorter
• Rate calculations affected
• Usually not a problem in practice

VM Clock Issues

Virtual machines have additional clock challenges:

VM CLOCK PROBLEMS:
──────────────────
1. VM Pause:
   - VM paused for migration/snapshot
   - Guest OS clock stops
   - Resumes hours later, still thinks old time

2. Steal Time:
   - Hypervisor schedules other VMs
   - Guest gets less CPU than expected
   - Time-based assumptions break

3. Clock Source:
   - kvm-clock, Hyper-V clock, VMware tools
   - Each has different accuracy/overhead

AWS RECOMMENDATIONS:
────────────────────
• Use Amazon Time Sync Service (chrony)
• Enable enhanced networking
• Consider dedicated hosts for timing-critical apps

GCP/AZURE:
──────────
• Similar managed NTP services available
• Same VM clock challenges apply

Module 34: Logical and Vector Clocks

Logical Clocks Deep Dive

Lamport Timestamps

┌─────────────────────────────────────────────────────────────────────────────┐
│                    LAMPORT TIMESTAMPS IN ACTION                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Three processes exchanging messages:                                       │
│                                                                              │
│  Process A      Process B      Process C                                    │
│     │              │              │                                         │
│   (1)─────────────────────────────────────── Event: A starts                │
│     │              │              │                                         │
│   (2)────send────>(3)             │          A sends to B                   │
│     │              │              │                                         │
│     │            (4)────send────>(5)         B sends to C                   │
│     │              │              │                                         │
│   (3)              │            (6)          A, C do local events           │
│     │              │              │                                         │
│   (4)<─────────────────────────(7)           C sends to A                   │
│     │              │              │                                         │
│                                                                              │
│  LAMPORT CLOCK VALUES:                                                      │
│  ─────────────────────                                                      │
│  A: 1 → 2 → 3 → 8 (receives C's 7+1)                                       │
│  B: 3 → 4                                                                   │
│  C: 5 → 6 → 7                                                               │
│                                                                              │
│  PROPERTY:                                                                  │
│  ─────────                                                                  │
│  If A → B (A happened before B), then L(A) < L(B)                          │
│                                                                              │
│  BUT NOT THE CONVERSE!                                                      │
│  L(A) < L(B) does NOT mean A → B                                           │
│                                                                              │
│  Example: A's event (3) and C's event (6)                                  │
│  L(A:3) = 3, L(C:6) = 6, but they are CONCURRENT                          │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Implementation:

class LamportClock:
    """Thread-safe Lamport clock implementation"""
    
    def __init__(self, node_id: str):
        self._counter = 0
        self._lock = threading.Lock()
        self.node_id = node_id
    
    def tick(self) -> int:
        """Increment clock for local event"""
        with self._lock:
            self._counter += 1
            return self._counter
    
    def send_timestamp(self) -> int:
        """Get timestamp for outgoing message"""
        with self._lock:
            self._counter += 1
            return self._counter
    
    def receive_timestamp(self, received: int) -> int:
        """Update clock on message receipt"""
        with self._lock:
            self._counter = max(self._counter, received) + 1
            return self._counter
    
    def current(self) -> int:
        """Get current clock value (no increment)"""
        with self._lock:
            return self._counter
    
    def compare(self, ts1: int, ts2: int) -> int:
        """Compare timestamps, break ties with node_id"""
        if ts1 != ts2:
            return ts1 - ts2
        return 0  # In practice, also compare node_ids

Vector Clocks

Vector clocks capture causality - they can tell you if two events are concurrent:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    VECTOR CLOCKS DEEP DIVE                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Each process maintains a vector of counters (one per process):             │
│                                                                              │
│  Process A: [A:0, B:0, C:0]                                                 │
│  Process B: [A:0, B:0, C:0]                                                 │
│  Process C: [A:0, B:0, C:0]                                                 │
│                                                                              │
│  RULES:                                                                     │
│  ──────                                                                     │
│  1. Before event: Increment own counter                                     │
│  2. Send: Attach current vector to message                                  │
│  3. Receive: Merge vectors (take max), then increment own                   │
│                                                                              │
│  EXAMPLE EXECUTION:                                                         │
│  ──────────────────                                                         │
│                                                                              │
│  A: [1,0,0]──────send──────>[1,1,0] (B)                                    │
│      │                        │                                             │
│    [2,0,0]                  [1,2,0]──send──>[1,2,1] (C)                     │
│      │                        │               │                             │
│    [3,0,0]                    │           [1,2,2]                           │
│      │                        │               │                             │
│    [4,2,2]<─────────────────────────────[1,2,3]                             │
│                                                                              │
│  COMPARISON RULES:                                                          │
│  ─────────────────                                                          │
│  V1 < V2  iff  ∀i: V1[i] ≤ V2[i] AND ∃j: V1[j] < V2[j]                    │
│  V1 || V2 (concurrent) iff neither V1 < V2 nor V2 < V1                     │
│                                                                              │
│  EXAMPLES:                                                                  │
│  [1,0,0] < [1,1,0] ✓ (A's first event happened before B's second)          │
│  [2,0,0] || [1,2,0]  (A's [2,0,0] and B's [1,2,0] are concurrent)         │
│  [4,2,2] > [1,2,3] ✗ (Not comparable! 4>1 but 0<3)                         │
│  [4,2,2] || [1,2,3]  (These are concurrent)                                │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Implementation with Conflict Detection:

from enum import Enum
from typing import Dict, Tuple
import copy

class Ordering(Enum):
    BEFORE = -1
    AFTER = 1
    CONCURRENT = 0
    EQUAL = 2

class VectorClock:
    """Vector clock implementation for distributed systems"""
    
    def __init__(self, node_id: str, nodes: list[str]):
        self.node_id = node_id
        self.clock: Dict[str, int] = {n: 0 for n in nodes}
    
    def tick(self) -> Dict[str, int]:
        """Increment for local event"""
        self.clock[self.node_id] += 1
        return self.copy()
    
    def send(self) -> Dict[str, int]:
        """Get timestamp for outgoing message"""
        self.clock[self.node_id] += 1
        return self.copy()
    
    def receive(self, received: Dict[str, int]) -> Dict[str, int]:
        """Merge incoming clock and increment"""
        for node, time in received.items():
            if node in self.clock:
                self.clock[node] = max(self.clock[node], time)
            else:
                self.clock[node] = time
        self.clock[self.node_id] += 1
        return self.copy()
    
    def copy(self) -> Dict[str, int]:
        return copy.deepcopy(self.clock)
    
    @staticmethod
    def compare(vc1: Dict[str, int], vc2: Dict[str, int]) -> Ordering:
        """Compare two vector clocks"""
        all_nodes = set(vc1.keys()) | set(vc2.keys())
        
        less = False
        greater = False
        
        for node in all_nodes:
            v1 = vc1.get(node, 0)
            v2 = vc2.get(node, 0)
            
            if v1 < v2:
                less = True
            elif v1 > v2:
                greater = True
        
        if less and not greater:
            return Ordering.BEFORE
        elif greater and not less:
            return Ordering.AFTER
        elif not less and not greater:
            return Ordering.EQUAL
        else:
            return Ordering.CONCURRENT
    
    @staticmethod
    def merge(vc1: Dict[str, int], vc2: Dict[str, int]) -> Dict[str, int]:
        """Merge two vector clocks (for conflict resolution)"""
        all_nodes = set(vc1.keys()) | set(vc2.keys())
        return {n: max(vc1.get(n, 0), vc2.get(n, 0)) for n in all_nodes}

Vector Clock in DynamoDB/Riak

┌─────────────────────────────────────────────────────────────────────────────┐
│                    VECTOR CLOCKS FOR CONFLICT DETECTION                      │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  SCENARIO: Shopping cart updates from multiple devices                      │
│                                                                              │
│  Initial: cart = ["shirt"], vc = [A:1]                                      │
│                                                                              │
│  Device 1 (Node A):                     Device 2 (Node B):                  │
│  Read cart ["shirt"], vc=[A:1]          Read cart ["shirt"], vc=[A:1]       │
│       │                                       │                             │
│       ▼                                       ▼                             │
│  Add "pants"                            Add "hat"                           │
│  cart = ["shirt","pants"]               cart = ["shirt","hat"]              │
│  vc = [A:2]                             vc = [A:1, B:1]                     │
│       │                                       │                             │
│       └────────────┬──────────────────────────┘                             │
│                    ▼                                                        │
│              CONFLICT DETECTED!                                             │
│              [A:2] || [A:1, B:1]                                            │
│                                                                              │
│  RESOLUTION OPTIONS:                                                        │
│  ───────────────────                                                        │
│  1. Last-Write-Wins (LWW): Pick one, lose data                             │
│  2. Return both: Let application merge                                      │
│  3. CRDT merge: ["shirt", "pants", "hat"]                                  │
│                                                                              │
│  AMAZON DYNAMO APPROACH:                                                    │
│  ───────────────────────                                                    │
│  • Return all conflicting versions (siblings)                               │
│  • Application must merge                                                   │
│  • New write includes merged vector clock                                   │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Module 35: Hybrid Logical Clocks (HLC)

Used by CockroachDB, MongoDB, and many modern databases:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    HYBRID LOGICAL CLOCKS                                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  MOTIVATION:                                                                │
│  ───────────                                                                │
│  • Lamport clocks: No relation to physical time (hard to debug)            │
│  • Physical clocks: No causality guarantees                                 │
│  • HLC: Best of both worlds!                                                │
│                                                                              │
│  HLC STRUCTURE:                                                             │
│  ──────────────                                                             │
│  HLC = (physical_time, logical_counter)                                     │
│        └──────┬──────┘ └──────┬───────┘                                    │
│               │               │                                             │
│     Close to wall clock   Breaks ties when                                 │
│                           physical time is same                             │
│                                                                              │
│  PROPERTIES:                                                                │
│  ───────────                                                                │
│  1. l.pt ≤ pt (HLC physical component ≤ actual physical time)              │
│  2. If e1 → e2, then l.e1 < l.e2 (causality preserved)                     │
│  3. Bounded divergence from physical time                                   │
│                                                                              │
│  ALGORITHM:                                                                 │
│  ──────────                                                                 │
│  Send/Local event:                                                          │
│    l.pt = max(l.pt, pt)  // Take max of HLC and physical time              │
│    if l.pt == pt:                                                           │
│      l.c = 0             // Physical time advanced, reset counter          │
│    else:                                                                    │
│      l.c = l.c + 1       // Same physical time, increment counter          │
│                                                                              │
│  Receive(m):                                                                │
│    l.pt = max(l.pt, m.pt, pt)                                               │
│    if l.pt == l.pt == m.pt:                                                 │
│      l.c = max(l.c, m.c) + 1                                               │
│    elif l.pt == l.pt:                                                       │
│      l.c = l.c + 1                                                          │
│    elif l.pt == m.pt:                                                       │
│      l.c = m.c + 1                                                          │
│    else:                                                                    │
│      l.c = 0                                                                │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Implementation:

import time
from dataclasses import dataclass
from threading import Lock

@dataclass
class HLCTimestamp:
    physical: int  # Milliseconds
    logical: int   # Counter
    
    def __lt__(self, other: 'HLCTimestamp') -> bool:
        if self.physical != other.physical:
            return self.physical < other.physical
        return self.logical < other.logical
    
    def __eq__(self, other: 'HLCTimestamp') -> bool:
        return self.physical == other.physical and self.logical == other.logical
    
    def __repr__(self) -> str:
        return f"HLC({self.physical}.{self.logical})"

class HybridLogicalClock:
    """
    Hybrid Logical Clock implementation.
    Used by CockroachDB, MongoDB, and others.
    """
    
    def __init__(self, max_drift_ms: int = 500):
        self._physical = 0
        self._logical = 0
        self._lock = Lock()
        self._max_drift = max_drift_ms
    
    def _now_ms(self) -> int:
        return int(time.time() * 1000)
    
    def now(self) -> HLCTimestamp:
        """Get current HLC timestamp for local event or send"""
        with self._lock:
            pt = self._now_ms()
            
            if pt > self._physical:
                self._physical = pt
                self._logical = 0
            else:
                self._logical += 1
            
            return HLCTimestamp(self._physical, self._logical)
    
    def receive(self, remote: HLCTimestamp) -> HLCTimestamp:
        """Update HLC on receiving a message"""
        with self._lock:
            pt = self._now_ms()
            
            # Check for excessive drift
            if remote.physical - pt > self._max_drift:
                raise ValueError(f"Remote clock too far ahead: {remote.physical - pt}ms")
            
            if pt > self._physical and pt > remote.physical:
                self._physical = pt
                self._logical = 0
            elif self._physical > remote.physical:
                self._logical += 1
            elif remote.physical > self._physical:
                self._physical = remote.physical
                self._logical = remote.logical + 1
            else:  # Equal physical times
                self._logical = max(self._logical, remote.logical) + 1
            
            return HLCTimestamp(self._physical, self._logical)
    
    def current(self) -> HLCTimestamp:
        """Get current timestamp without incrementing"""
        with self._lock:
            return HLCTimestamp(self._physical, self._logical)

Module 36: TrueTime and Atomic Clocks

Google TrueTime

The gold standard for distributed time synchronization:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    GOOGLE TRUETIME ARCHITECTURE                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  TRUETIME API:                                                              │
│  ─────────────                                                              │
│  TT.now() → returns TTinterval: [earliest, latest]                          │
│  TT.after(t) → true if t is definitely in the past                         │
│  TT.before(t) → true if t is definitely in the future                      │
│                                                                              │
│  KEY INSIGHT:                                                               │
│  ────────────                                                               │
│  Instead of claiming "it's exactly 10:00:00.000"                            │
│  TrueTime says "it's between 10:00:00.000 and 10:00:00.007"                │
│                                                                              │
│  HARDWARE INFRASTRUCTURE:                                                   │
│  ────────────────────────                                                   │
│                                                                              │
│   ┌─────────────────────────────────────────────────────────────┐           │
│   │               DATACENTER TIME MASTERS                       │           │
│   │  ┌──────────────┐   ┌──────────────┐   ┌──────────────┐    │           │
│   │  │  GPS        │   │  Atomic      │   │  GPS        │    │           │
│   │  │  Receiver   │   │  Clock       │   │  Receiver   │    │           │
│   │  │  (Armageddon│   │  (Rubidium)  │   │  (Armageddon│    │           │
│   │  │   Master)   │   │              │   │   Master)   │    │           │
│   │  └──────┬──────┘   └──────┬───────┘   └──────┬──────┘    │           │
│   │         │                 │                   │           │           │
│   │         └────────────────┬───────────────────┘           │           │
│   │                          ▼                               │           │
│   │              ┌────────────────────┐                      │           │
│   │              │   Time Master      │                      │           │
│   │              │   Daemon           │                      │           │
│   │              └─────────┬──────────┘                      │           │
│   │                        │                                 │           │
│   └────────────────────────┼─────────────────────────────────┘           │
│                            │                                             │
│            ┌───────────────┼───────────────┐                             │
│            ▼               ▼               ▼                             │
│   ┌──────────────┐ ┌──────────────┐ ┌──────────────┐                    │
│   │  Spanserver  │ │  Spanserver  │ │  Spanserver  │                    │
│   │  (has local  │ │  (has local  │ │  (has local  │                    │
│   │   TT daemon) │ │   TT daemon) │ │   TT daemon) │                    │
│   └──────────────┘ └──────────────┘ └──────────────┘                    │
│                                                                          │
│  UNCERTAINTY SOURCES:                                                    │
│  ────────────────────                                                    │
│  1. Communication delay to time masters: ~1ms                            │
│  2. Local clock drift between syncs: ~200μs/sec                         │
│  3. Typical ε (uncertainty): 1-7ms                                       │
│                                                                          │
│  WAIT-OUT STRATEGY:                                                     │
│  ──────────────────                                                      │
│  To guarantee s1 < s2 for commits:                                       │
│  After getting commit timestamp, wait until TT.after(s1) is true        │
│  This ensures no one else can get a timestamp ≤ s1                      │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────────┘

Spanner’s External Consistency

┌─────────────────────────────────────────────────────────────────────────────┐
│                    SPANNER COMMIT PROTOCOL                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  EXTERNAL CONSISTENCY: If T1 commits before T2 starts,                      │
│  then T1's timestamp < T2's timestamp (globally!)                           │
│                                                                              │
│  COMMIT WAIT ALGORITHM:                                                     │
│  ──────────────────────                                                     │
│                                                                              │
│  1. Acquire locks on all participants                                       │
│  2. Get commit timestamp s = TT.now().latest                                │
│  3. Wait until TT.after(s) is true                                          │
│  4. Release locks and return to client                                      │
│                                                                              │
│  WHY THIS WORKS:                                                            │
│  ───────────────                                                            │
│                                                                              │
│  Transaction T1:                                                            │
│     TT.now() = [100, 107]                                                   │
│     s1 = 107                                                                │
│     Wait until TT.after(107)                                                │
│     Commit at real time ~114                                                │
│                                                                              │
│  Transaction T2 (starts after T1 commits):                                  │
│     User sees T1 committed at real time 114                                 │
│     T2 starts at real time 115                                              │
│     TT.now() = [108, 115]  (at least!)                                     │
│     s2 ≥ 115 > 107 = s1                                                    │
│                                                                              │
│  LATENCY IMPACT:                                                            │
│  ───────────────                                                            │
│  Commit wait adds ε (1-7ms) to every transaction                           │
│  This is why Google invests heavily in reducing ε                          │
│  GPS receivers on every datacenter roof!                                    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Fault-Tolerant Clock Synchronization

In large-scale systems, some nodes might have faulty clocks or even malicious intent (Byzantine failures).

Marzullo’s Algorithm

Marzullo’s algorithm is used to select a confidence interval from a set of noisy time sources.

┌─────────────────────────────────────────────────────────────────────────────┐
│                    MARZULLO'S ALGORITHM (INTERSECTION)                       │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  INPUT: A set of intervals [c - e, c + e] where c is clock and e is error.  │
│                                                                              │
│  GOAL: Find the smallest interval that contains at least M correct clocks.  │
│                                                                              │
│  STEPS:                                                                     │
│  1. For each interval [low, high], create two tuples:                       │
│     (low, -1) and (high, +1)                                                │
│  2. Sort all tuples by value.                                               │
│  3. Iterate through sorted tuples, maintaining a counter:                   │
│     - If type is -1 (low), counter++                                        │
│     - If type is +1 (high), counter--                                       │
│  4. The intersection is where counter is maximum.                           │
│                                                                              │
│  EXAMPLE:                                                                   │
│  A: [10, 12], B: [11, 13], C: [10.5, 11.5]                                  │
│  Sorted: (10, -1), (10.5, -1), (11, -1), (11.5, +1), (12, +1), (13, +1)      │
│  Counter:    1         2         3          2         1         0           │
│  Max counter (3) is between 11 and 11.5.                                    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

The Uncertainty Window & Intersection

In high-accuracy systems (like PTP or TrueTime), the clock is never a point, but an interval:

I = [t - \epsilon, t + \epsilon]

The Overlap Rule

For two events

e_1

and

e_2

to be definitively ordered (

e_1 < e_2

), their uncertainty intervals must not overlap:

e_{1}.high < e_{2}.low

If they overlap, the system cannot determine the true order based on physical time alone. This is the Causality Gap.

Multi-Source Intersection (The “Master” Interval)

When a node queries

N

time sources, it uses Marzullo’s (or the Improved Marzullo/Intersection algorithm) to find the “True” interval. If the sources disagree significantly, the interval grows (increasing uncertainty) or the node enters a “Panic” state. Staff Tip: When designing systems that rely on time for consistency (like Spanner), you must explicitly handle the “Overlap Case” by either:

Waiting: Wait until the uncertainty window of $e_1$ has passed before starting $e_2$ (Commit-Wait).
Versioning: Use a logical counter (HLC) to break ties during the overlap.

Byzantine Clock Synchronization

f

nodes are Byzantine (malicious), we need at least

3f+1

total nodes to synchronize clocks correctly.

Lynch-Welch Algorithm: Nodes exchange their clock values. Each node discards the $f$ highest and $f$ lowest values and takes the average of the remaining $n-2f$ values.
Clock Drift Bounds: In a Byzantine environment, the maximum skew between correct clocks is bounded by $\Delta \approx \epsilon + \rho R$ , where $\epsilon$ is message delay uncertainty, $\rho$ is drift rate, and $R$ is synchronization interval.

Practical Guidelines

Choosing the Right Clock

Use Physical Time When...
Use Logical Clocks When...
Use HLC When...
Use TrueTime When...

Generating user-facing timestamps
Debugging and logging
TTL (time-to-live) calculations
Scheduling future events
Audit trails and compliance

Caveat: Accept that ordering might be wrong across machines

Clock Best Practices

┌─────────────────────────────────────────────────────────────────────────────┐
│                    CLOCK BEST PRACTICES                                      │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  1. USE MONOTONIC CLOCKS FOR DURATIONS                                      │
│     ✗ end_time - start_time (wall clock, can go backward!)                 │
│     ✓ Use monotonic clock APIs                                              │
│                                                                              │
│  2. DON'T TRUST TIMESTAMPS ACROSS MACHINES                                  │
│     ✗ if remote_timestamp > local_timestamp                                │
│     ✓ Use vector clocks or HLC for ordering                                │
│                                                                              │
│  3. ALWAYS INCLUDE UNCERTAINTY                                              │
│     ✗ "Event happened at exactly 10:00:00.000"                             │
│     ✓ "Event happened between 10:00:00.000 and 10:00:00.010"               │
│                                                                              │
│  4. USE UTC EVERYWHERE INTERNALLY                                           │
│     ✗ Store local times, convert on display                                │
│     ✓ Store UTC, convert to local only for display                         │
│                                                                              │
│  5. HANDLE LEAP SECONDS                                                     │
│     ✗ Assume 86400 seconds per day                                         │
│     ✓ Use libraries that handle leap seconds, or leap smearing             │
│                                                                              │
│  6. SYNC YOUR SERVERS                                                       │
│     ✗ Default OS NTP settings                                              │
│     ✓ Configure chrony/ntpd properly, monitor clock drift                  │
│                                                                              │
│  7. LOG CLOCK SKEW                                                          │
│     Monitor and alert on excessive clock drift between nodes               │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Interview Questions

Q: How does Google Spanner achieve external consistency?

Answer:

TrueTime API: Returns time interval [earliest, latest] instead of single value
GPS + Atomic Clocks: Hardware in every datacenter for accurate time
Commit Wait: After getting timestamp, wait until uncertainty period passes
Guarantee: If T1 commits before T2 starts, T1’s timestamp < T2’s timestamp

The wait adds latency (1-7ms) but provides global ordering without coordination.

Q: Why can't we just use physical timestamps for ordering?

Answer:

Clock Skew: Different machines have different times (milliseconds to seconds)
Clock Drift: Clocks run at slightly different speeds
NTP Jumps: Clocks can jump forward or backward during sync
No Causality: Physical time doesn’t capture happened-before relationships

Example: Machine A (behind) writes at “10:00:00.100”, Machine B (ahead) writes at “10:00:00.050” - B’s write appears earlier despite happening after A’s!

Q: When would you use vector clocks vs HLC?

Answer:Vector Clocks:

When you need precise conflict detection
Systems like DynamoDB/Riak that return conflicting versions
When number of nodes is bounded and small

HLC:

When you need approximate physical time for debugging
Systems with many nodes (vector clock size = O(n))
Databases like CockroachDB, MongoDB
When snapshot isolation is needed

Key difference: Vector clocks track all participants, HLC bounds clock size.

Q: Design a distributed system that orders events across data centers

Answer approach:

Understand requirements: Strong ordering (expensive) vs causal ordering (cheaper)
For causal ordering:
- Use HLC at each data center
- Propagate timestamps with messages
- Compare HLC timestamps for ordering
For strong ordering:
- Central sequencer (single point of failure)
- OR distributed consensus (high latency)
- OR TrueTime-like approach (hardware investment)
Practical trade-off:
- Use causal ordering where possible
- Strong ordering only where required (e.g., financial transactions)

Key Takeaways

Perfect Time is Impossible

Accept uncertainty. Design systems that handle clock skew gracefully.

Causality ≠ Physical Time

Use logical or hybrid clocks when you need to reason about event ordering.

Measure and Monitor

Track clock drift between nodes. Alert when skew exceeds thresholds.

Choose the Right Tool

Physical for display, logical for causality, HLC for both. TrueTime if you’re Google.

Overview

Testing & Code Quality

Crash Courses

AI Engineering

Math for ML - Understanding Linear Algebra

Probability & Statistics for ML

Math for ML - Understanding Calculus

ML Mastery

Deep Learning Mastery

NestJS Mastery

Microservices Mastery

Low Level Design

OOP Concepts

SOLID Principles

Design Patterns

LLD Case Studies

System Design (HLD)

Senior Level (L5+/Staff)

HLD Case Studies

Engineering Fundamentals

DevOps & Operations

Azure Cloud Engineering

AWS Cloud

AWS Monitoring & Observability

AWS Security Services

AWS Serverless

AWS Operations

AWS Advanced

AWS Case Studies

GCP Cloud Engineering

DevOps Tools

Database Engineering

HIPAA Compliance Mastery

Operating Systems

Linux Internals

Distributed Systems

Networking Mastery

Build Your Own X

Go Lang Mastery

C Programming

Classic Research Papers

Distributed System Tools

​Time & Clock Synchronization

​The Fundamental Problem

​Module 33: Clock Synchronization Protocols

​Physical Clock Synchronization

​Network Time Protocol (NTP)

​NTP Synchronization Algorithm

​Precision Time Protocol (PTP - IEEE 1588)

​Clock Anomalies and Edge Cases

​Real-World Clock Problems

​Module 34: Logical and Vector Clocks

​Logical Clocks Deep Dive

​Lamport Timestamps

​Vector Clocks

​Vector Clock in DynamoDB/Riak

​Module 35: Hybrid Logical Clocks (HLC)

​Module 36: TrueTime and Atomic Clocks

​Google TrueTime

​Spanner’s External Consistency

​Fault-Tolerant Clock Synchronization

​Marzullo’s Algorithm

​The Uncertainty Window & Intersection

​The Overlap Rule

​Multi-Source Intersection (The “Master” Interval)

​Byzantine Clock Synchronization

​Practical Guidelines

​Choosing the Right Clock

​Clock Best Practices

​Interview Questions

​Key Takeaways

Perfect Time is Impossible

Causality ≠ Physical Time

Measure and Monitor

Choose the Right Tool

Time & Clock Synchronization

The Fundamental Problem

Module 33: Clock Synchronization Protocols

Physical Clock Synchronization

Network Time Protocol (NTP)

NTP Synchronization Algorithm

Precision Time Protocol (PTP - IEEE 1588)

Clock Anomalies and Edge Cases

Real-World Clock Problems

Module 34: Logical and Vector Clocks

Logical Clocks Deep Dive

Lamport Timestamps

Vector Clocks

Vector Clock in DynamoDB/Riak

Module 35: Hybrid Logical Clocks (HLC)

Module 36: TrueTime and Atomic Clocks

Google TrueTime

Spanner’s External Consistency

Fault-Tolerant Clock Synchronization

Marzullo’s Algorithm

The Uncertainty Window & Intersection

The Overlap Rule

Multi-Source Intersection (The “Master” Interval)

Byzantine Clock Synchronization

Practical Guidelines

Choosing the Right Clock

Clock Best Practices

Interview Questions

Key Takeaways