Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Consistency Models

Understanding consistency models is fundamental to designing distributed systems. This module covers the entire consistency spectrum, from the strongest (linearizability) to the weakest (eventual consistency). Here is the single most useful analogy for this entire chapter: think of consistency models as a spectrum of “how much does this system behave like a single computer?” Linearizability means “exactly like a single computer” — every operation appears atomic and instantaneous to every observer. Eventual consistency means “it will eventually act like a single computer, but right now different observers may see different states.” Everything in between is a carefully negotiated trade-off between how real the illusion of a single computer is and how fast and available you need the system to be.
Module Duration: 12-16 hours
Key Topics: Linearizability, Serializability, Causal Consistency, Eventual Consistency, Session Guarantees
Interview Focus: Trade-offs between consistency levels, real-world examples

The Consistency Spectrum

┌─────────────────────────────────────────────────────────────────────────────┐
│                    CONSISTENCY SPECTRUM                                      │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  STRONGEST ◄──────────────────────────────────────────────► WEAKEST        │
│                                                                              │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐    │
│  │Linearizable  │  │ Sequential   │  │   Causal     │  │  Eventual    │    │
│  │              │  │ Consistency  │  │ Consistency  │  │ Consistency  │    │
│  └──────────────┘  └──────────────┘  └──────────────┘  └──────────────┘    │
│        ▲                  ▲                 ▲                 ▲            │
│        │                  │                 │                 │            │
│  Real-time         Global order       Preserves         Eventually        │
│  ordering          (some order)       causality         converges         │
│                                                                              │
│  ────────────────────────────────────────────────────────────────────────   │
│  TRADE-OFF:                                                                 │
│  ────────────────────────────────────────────────────────────────────────   │
│                                                                              │
│  Consistency  ─────────────────────────────────────────► Higher             │
│  Availability ◄───────────────────────────────────────── Lower              │
│  Latency      ─────────────────────────────────────────► Higher             │
│  Throughput   ◄───────────────────────────────────────── Lower              │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Linearizability (Strict Consistency)

The Gold Standard: Linearizability is the strongest single-object consistency model. Systems behave as if there’s a single copy of data.

Definition

┌─────────────────────────────────────────────────────────────────────────────┐
│                    LINEARIZABILITY                                           │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  DEFINITION:                                                                │
│  ───────────                                                                │
│  Every operation appears to execute instantaneously at some point           │
│  between its invocation and response (linearization point).                 │
│                                                                              │
│  PROPERTIES:                                                                │
│  ───────────                                                                │
│  1. Operations are atomic (happen at a single point in time)                │
│  2. Real-time ordering is preserved                                         │
│  3. All clients see the same order of operations                            │
│                                                                              │
│  EXAMPLE - Linearizable:                                                    │
│  ────────────────────────                                                   │
│  Time ────────────────────────────────────────────────────►                 │
│                                                                              │
│  Client A: [──write(x=1)──]                                                 │
│                       ↓ linearization point                                 │
│  Client B:              [──read(x)──] → returns 1 ✓                        │
│                                                                              │
│  After A's write completes, B must see x=1                                 │
│                                                                              │
│  EXAMPLE - NOT Linearizable:                                                │
│  ──────────────────────────                                                 │
│  Time ────────────────────────────────────────────────────►                 │
│                                                                              │
│  Client A: [──write(x=1)──]                                                 │
│  Client B:              [──read(x)──] → returns 0 ✗                        │
│  Client C:                  [──read(x)──] → returns 1                       │
│                                                                              │
│  B reads after A completes but sees old value - VIOLATION!                 │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Real-World Linearizable Systems

Linearizable writes:
- All writes go through the leader
- Leader assigns sequential zxid
- Write is linearized when majority acknowledge

Reads:
- sync() + read for linearizable reads
- Regular reads may be stale (sequential consistency)

Cost of Linearizability

┌─────────────────────────────────────────────────────────────────────────────┐
│                    COST OF LINEARIZABILITY                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  LATENCY:                                                                   │
│  ────────                                                                   │
│  - Writes must be acknowledged by majority                                  │
│  - Reads may need to contact leader or do sync                             │
│  - Cross-region: 50-200ms per operation                                    │
│                                                                              │
│  AVAILABILITY:                                                              │
│  ─────────────                                                              │
│  - Network partition → one side cannot make progress                       │
│  - Leader failure → election delay (seconds)                               │
│  - CAP theorem: Cannot have C and A during partition                       │
│                                                                              │
│  THROUGHPUT:                                                                │
│  ───────────                                                                │
│  - All operations serialized through leader                                │
│  - Consensus overhead per write                                            │
│  - Read scalability limited without compromising consistency               │
│                                                                              │
│  WHEN TO USE:                                                               │
│  ─────────────                                                              │
│  ✓ Leader election                                                         │
│  ✓ Distributed locks                                                       │
│  ✓ Unique ID generation                                                    │
│  ✓ Configuration management                                                │
│  ✓ Financial transactions                                                  │
│                                                                              │
│  WHEN TO AVOID:                                                             │
│  ──────────────                                                             │
│  ✗ High-throughput read-heavy workloads                                    │
│  ✗ Geographically distributed with low-latency requirements                │
│  ✗ Systems where availability is paramount                                 │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Sequential Consistency

Definition and Examples

┌─────────────────────────────────────────────────────────────────────────────┐
│                    SEQUENTIAL CONSISTENCY                                    │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  DEFINITION:                                                                │
│  ───────────                                                                │
│  - Operations from each client appear in program order                     │
│  - All clients see the SAME global order                                   │
│  - BUT: Order may not respect real-time                                    │
│                                                                              │
│  DIFFERENCE FROM LINEARIZABLE:                                              │
│  ─────────────────────────────                                              │
│                                                                              │
│  Linearizable:  Real-time order matters                                    │
│  Sequential:    Only program order per client matters                      │
│                                                                              │
│  EXAMPLE - Sequentially Consistent but NOT Linearizable:                   │
│  ─────────────────────────────────────────────────────────                  │
│                                                                              │
│  Real-time:                                                                 │
│  Client A: write(x=1) ────────────────────────────────────                 │
│  Client B: ──────────────────── read(x) → 0                                │
│                                                                              │
│  Possible sequential order: B.read(x), A.write(x=1)                        │
│  This is valid because B's read started before A's write completed!        │
│                                                                              │
│  For LINEARIZABILITY: B must see x=1 if read starts after write ends      │
│                                                                              │
│  USE CASES:                                                                 │
│  ───────────                                                                │
│  - Multi-core CPU memory models                                            │
│  - Zookeeper reads (without sync)                                          │
│  - Some distributed databases                                              │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Serializability

Database Focus: Serializability is about transactions, not individual operations. It’s the isolation property in ACID.

Definition

┌─────────────────────────────────────────────────────────────────────────────┐
│                    SERIALIZABILITY                                           │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  DEFINITION:                                                                │
│  ───────────                                                                │
│  Transactions appear to execute in SOME serial order.                      │
│  The actual order may differ from real-time order.                         │
│                                                                              │
│  EXAMPLE:                                                                   │
│  ─────────                                                                  │
│  T1: R(A) W(A)           Concurrent execution:                             │
│  T2: R(A) W(A)                                                              │
│                          T1: ──R(A)────────W(A)──                          │
│                          T2: ────────R(A)────────W(A)                      │
│                                                                              │
│  Serializable if equivalent to: T1→T2 or T2→T1                             │
│                                                                              │
│  SERIALIZABLE vs LINEARIZABLE:                                              │
│  ─────────────────────────────                                              │
│  ┌────────────────┬──────────────────┬──────────────────┐                  │
│  │                │ Linearizable     │ Serializable     │                  │
│  ├────────────────┼──────────────────┼──────────────────┤                  │
│  │ Scope          │ Single object    │ Multi-object txn │                  │
│  │ Real-time      │ Yes              │ No               │                  │
│  │ Ordering       │ Real-time        │ Any serial order │                  │
│  │ Domain         │ Distributed sys  │ Databases        │                  │
│  └────────────────┴──────────────────┴──────────────────┘                  │
│                                                                              │
│  STRICT SERIALIZABILITY:                                                   │
│  ────────────────────────                                                   │
│  = Serializable + Linearizable                                             │
│  = Transactions respect real-time order                                    │
│  = What Spanner provides                                                   │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Isolation Levels

┌─────────────────────────────────────────────────────────────────────────────┐
│                    SQL ISOLATION LEVELS                                      │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  WEAKEST ◄─────────────────────────────────────────────► STRONGEST          │
│                                                                              │
│  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐           │
│  │ Read        │ │ Read        │ │ Repeatable  │ │ Serializable│           │
│  │ Uncommitted │ │ Committed   │ │ Read        │ │             │           │
│  └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘           │
│                                                                              │
│  ANOMALIES PREVENTED:                                                       │
│  ─────────────────────                                                      │
│  ┌─────────────────────┬───────┬───────┬───────┬───────────────┐           │
│  │ Anomaly             │ RU    │ RC    │ RR    │ Serializable  │           │
│  ├─────────────────────┼───────┼───────┼───────┼───────────────┤           │
│  │ Dirty Read          │ ✗     │ ✓     │ ✓     │ ✓             │           │
│  │ Non-repeatable Read │ ✗     │ ✗     │ ✓     │ ✓             │           │
│  │ Phantom Read        │ ✗     │ ✗     │ ✗     │ ✓             │           │
│  │ Write Skew          │ ✗     │ ✗     │ ✗     │ ✓             │           │
│  └─────────────────────┴───────┴───────┴───────┴───────────────┘           │
│                                                                              │
│  DIRTY READ:                                                                │
│  ────────────                                                               │
│  T1: W(x=10)         ← not committed                                       │
│  T2: R(x)=10         ← reads uncommitted data                              │
│  T1: ROLLBACK        ← T2 has phantom data!                                │
│                                                                              │
│  WRITE SKEW:                                                                │
│  ────────────                                                               │
│  Constraint: x + y >= 0                                                    │
│  Initial: x=5, y=5                                                         │
│  T1: if (x+y >= 5) then x = x - 5                                         │
│  T2: if (x+y >= 5) then y = y - 5                                         │
│  Both read x+y=10, both proceed, result: x=0, y=0, violates constraint!    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Causal Consistency

Definition and Intuition

┌─────────────────────────────────────────────────────────────────────────────┐
│                    CAUSAL CONSISTENCY                                        │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  DEFINITION:                                                                │
│  ───────────                                                                │
│  If operation A causally precedes operation B,                              │
│  then all nodes see A before B.                                            │
│                                                                              │
│  CAUSALITY RULES:                                                           │
│  ────────────────                                                           │
│  1. Program order: In same thread, A before B → A causes B                 │
│  2. Message passing: Send before receive → Send causes receive             │
│  3. Transitivity: A→B and B→C implies A→C                                  │
│                                                                              │
│  EXAMPLE - Causally Consistent:                                            │
│  ───────────────────────────────                                            │
│                                                                              │
│  Alice posts: "I got the job!"                                             │
│  Bob (seeing Alice's post): "Congratulations!"                             │
│                                                                              │
│  Carol must see Alice's post before Bob's reply (causal order)             │
│  Carol may see unrelated posts in any order (concurrent = no causal order) │
│                                                                              │
│  CAUSALLY RELATED vs CONCURRENT:                                           │
│  ────────────────────────────────                                           │
│                                                                              │
│  Client A: W(x=1) ────────────────────────────                             │
│                  ↘                                                          │
│  Client B:         R(x=1) ── W(y=2)                                        │
│                              ↓ causally related (B read A's write)         │
│                                                                              │
│  Client A: W(x=1) ────────────────────────────                             │
│  Client B: W(y=2) ────────────────────────────                             │
│            ↑ concurrent (no communication)                                  │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Eventual Consistency

Definition and Variations

┌─────────────────────────────────────────────────────────────────────────────┐
│                    EVENTUAL CONSISTENCY                                      │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  DEFINITION:                                                                │
│  ───────────                                                                │
│  If no new updates are made, eventually all replicas converge              │
│  to the same value.                                                        │
│                                                                              │
│  THE WEAKEST USEFUL GUARANTEE:                                             │
│  ─────────────────────────────                                              │
│  - No guarantee about WHEN convergence happens                             │
│  - No guarantee about order of updates                                     │
│  - During updates, different replicas may return different values          │
│                                                                              │
│  TIMELINE:                                                                  │
│  ─────────                                                                  │
│  Update ─────────────────────────────────────► Convergence                 │
│          ↑ inconsistency window ↑                                          │
│                                                                              │
│  Replica A: [x=1] ─────────────────── [x=5] ─── [x=5]                      │
│  Replica B: [x=1] ── [x=5] ──────────────────── [x=5]                      │
│  Replica C: [x=1] ──────────────────────── [x=5][x=5]                      │
│                                                                              │
│  During inconsistency window:                                              │
│  - Read from A: x=1                                                        │
│  - Read from B: x=5                                                        │
│  - Application must handle this!                                           │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Session Guarantees

┌─────────────────────────────────────────────────────────────────────────────┐
│                    SESSION GUARANTEES                                        │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  SESSION GUARANTEES provide consistency within a single client session.    │
│  Weaker than linearizability but practical for many applications.          │
│                                                                              │
│  1. READ YOUR WRITES (RYW)                                                 │
│     ─────────────────────                                                   │
│     After a write, subsequent reads by same client see the write.          │
│                                                                              │
│  2. MONOTONIC READS                                                        │
│     ────────────────────                                                    │
│     If client reads value v, subsequent reads won't return older values.  │
│                                                                              │
│  3. MONOTONIC WRITES                                                       │
│     ─────────────────────                                                   │
│     Writes by a client are applied in order on all replicas.              │
│                                                                              │
│  4. WRITES FOLLOW READS                                                    │
│     ────────────────────────                                                │
│     A write following a read is ordered after that read on all replicas.  │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

8.1 Implementing Session Guarantees

In high-scale systems (like DynamoDB or Cassandra), session guarantees are often implemented using Client-Side Metadata or Version Vectors.

Read Your Writes (RYW)

To ensure a client sees their own update immediately, even if the database is eventually consistent:
  1. Write: The client performs a write and receives a Version Token (or Timestamp/LSN) in the response.
  2. Storage: The client stores this token in their session (e.g., a cookie or local storage).
  3. Read: When reading, the client sends this token. The server ensures the read replica has at least reached that version before responding. If the replica is lagging, the server can:
    • Wait for the replica to catch up.
    • Route the request to a fresher replica.
    • Read from the Leader.

Monotonic Reads

To prevent the “Time Travel” bug (where a user sees a post, refreshes, and it’s gone because they hit a lagging replica):
  1. The client tracks the Max Version it has seen so far.
  2. Every read request includes this min_version filter.
  3. The load balancer or database proxy ensures that the request only hits replicas that are at or beyond this version.
class SessionClient:
    """
    Pseudo-code for a client providing Read-Your-Writes 
    and Monotonic Reads.
    
    Think of last_version_seen as a bookmark: "I have seen the story
    up to this page." Every read says "don't show me anything older
    than my bookmark," and every write advances the bookmark forward.
    This prevents the disorienting experience of time travel -- seeing
    your own update, refreshing, and having it disappear because
    the load balancer routed you to a lagging replica.
    """
    def __init__(self, db_cluster):
        self.db = db_cluster
        self.last_version_seen = 0  # The "high-water mark" for this session

    def write(self, key, value):
        response = self.db.put(key, value)
        # Advance the bookmark: we now know the system is at least at this version
        self.last_version_seen = max(self.last_version_seen, response.version)
        return response

    def read(self, key):
        # Tell the server: "Only serve this read from a replica that is at
        # least as fresh as my bookmark." This is the key to read-your-writes.
        response = self.db.get(key, min_version=self.last_version_seen)
        # Advance the bookmark on reads too, ensuring monotonic reads
        self.last_version_seen = max(self.last_version_seen, response.version)
        return response.data
Staff Tip: While session guarantees improve the user experience, they can create Hot Spots. If a single user is extremely active, they may effectively “pin” themselves to a single replica, preventing the load balancer from distributing traffic effectively.

9. Advanced Theoretical Frameworks

To truly master consistency, one must look beyond simple definitions and understand the mathematical foundations.

9.1 The CALM Theorem

CALM stands for Consistency As Logical Monotonicity. This theorem provides a formal boundary for the CAP theorem: it identifies exactly which programs can be consistent and available without coordination.

The Fundamental Insight

Distributed consistency is hard because nodes disagree on the order and absence of events. The CALM theorem states that:
Consistency can be achieved without coordination if and only if the program is logically monotonic.

Monotonic vs. Non-Monotonic Operations

TypeDescriptionExamplesCoordination?
MonotonicAdding information never invalidates previous conclusions. “More is better.”Set union, Reachability, Maximum, Logical OR/ANDNo (Coordination-free)
Non-MonotonicAdding information can change a previous “True” to “False.” “Absence matters.”Set difference, Negation (NOT), Aggregation (Count/Sum), Garbage CollectionYes (Requires locks/consensus)

Why Monotonicity Matters

  1. Order Independence: In a monotonic system, messages can arrive in any order, be delayed, or be duplicated, and the final result will be the same. This is why CRDTs (Module 15) work—they are mathematically monotonic.
  2. Deterministic Convergence: Multiple replicas receiving different subsets of updates will always “eventually converge” to the same state as they receive more information.
  3. Availability: Monotonic programs are AP (Available under Partitions) because they don’t need to ask other nodes “Do you know anything that would make my current conclusion false?”

Practical Application: Garbage Collection

Consider a distributed system where you want to delete a file.
  • Problem: Deletion is non-monotonic. If Node A deletes file1, but Node B hasn’t seen the delete yet and re-replicates it, file1 “resurrects.”
  • Solution: Use a monotonic approximation. Instead of deleting, add a “Tombstone” (a record that says ‘this is deleted’). Adding a tombstone is monotonic (you are adding info). The actual space reclamation (purging) is non-monotonic and requires coordination or a background GC process with a grace period.
Staff+ Tip: When designing a high-scale system, always ask: “Can I refactor this non-monotonic operation into a monotonic one?” If you can, you remove the need for expensive Paxos/Raft coordination. Practical example: Consider a distributed “like” counter. Counting is non-monotonic (you need to know the total, which requires knowing about absences — “has everyone reported in?”). But if you reframe it as a G-Counter CRDT where each node tracks its own increment count and the global count is the sum, you have a monotonic operation. Each node only adds; the sum only grows. No coordination needed, and the system converges naturally.

9.2 Linearizability vs. Sequential Consistency (The Proof of Non-Composability)

A critical (and often asked) property of linearizability is that it is composable.
  • If object XX is linearizable and object YY is linearizable, then the combined system (X,Y)(X, Y) is also linearizable.
  • Sequential consistency is NOT composable. You can have two sequentially consistent objects that, when used together, violate sequential consistency. This is why multi-core memory models (which are often sequentially consistent) are so difficult to reason about at scale.

9.3 Formal Verification with TLA+

How do we know a consistency model is actually implemented correctly?
  • TLA+ (Temporal Logic of Actions): A language for modeling concurrent systems.
  • You define your system’s state and allowed transitions (actions).
  • You define Safety Invariants (e.g., “no two nodes are leader in the same term”).
  • The model checker explores all possible interleavings to find violations.

10. Testing Consistency in the Wild: The Jepsen Framework

Created by Kyle Kingsbury, Jepsen is the industry standard for testing distributed systems.

10.1 How Jepsen Works

  1. Setup: Spins up a cluster of nodes (e.g., 5 nodes).
  2. Client: A set of clients perform operations (reads, writes, CAS) on the cluster.
  3. Nemesis: A special process that causes “havoc”:
    • Network partitions (iptables drops).
    • Clock skew (ntpdate jumps).
    • Process crashes (kill -9).
  4. Checker: After the test, Jepsen analyzes the history of operations to see if they violate the claimed consistency model (e.g., using Knossos for linearizability).

10.2 Famous Jepsen Findings

  • MongoDB: Found that “Strong Consistency” wasn’t actually strong in many edge cases (later fixed with WiredTiger and Raft).
  • Cassandra: Found that lightweight transactions (LWT) could lose data during partitions.
  • Redis (Redlock): Kingsbury’s critique of Redlock showed that without fencing tokens, distributed locks are not safe under clock skew.

11. Consistency in Practice: The Decision Matrix

ModelCoordinationAvailabilityLatencyTypical Use Case
LinearizableHigh (Quorum)Low (CP)HighLeader Election, Locks
SequentialModerateLow (CP)MediumMemory models, CPU caches
CausalLow (Metadata)High (AP)LowSocial feeds, comments
EventualNoneHigh (AP)Ultra-lowAnalytics, background jobs
SEC (CRDTs)NoneHigh (AP)LowCollaborative editing

12. Interview Playbook: “The Deep Dive”

“When discussing consistency, I distinguish between single-object models like linearizability and multi-object models like serializability. Linearizability provides the strongest recency guarantee but at the cost of availability during partitions—a trade-off described by the CAP theorem. For high-availability systems, I look towards Causal Consistency, which is the strongest model achievable without global coordination. I also apply the CALM theorem to identify non-monotonic operations that strictly require coordination. Finally, I verify these systems using frameworks like Jepsen to ensure that under network partitions or clock skew, the safety invariants of the chosen model still hold.”

13. Key Takeaways

Consistency is a Spectrum

From linearizable (strongest) to eventual (weakest). Choose based on your requirements.

Stronger = Slower

Strong consistency requires coordination, which adds latency and reduces availability.

CAP is About Partitions

During partitions, choose consistency (reject writes) or availability (accept divergence).

Session Guarantees Help

Read-your-writes and monotonic reads provide practical consistency within a session.

14. Next Steps

Consensus Protocols

Learn Paxos, Raft, and how consensus enables strong consistency

Replication Strategies

Understand how data is replicated and conflicts resolved

Interview Deep-Dive

Strong Answer:
  • Linearizability is a single-object, real-time guarantee. It says: every operation on a single register (or key) appears to take effect instantaneously at some point between its invocation and its response, and all observers agree on the order. It is about recency — if a write completes before a read starts, the read must see that write.
  • Serializability is a multi-object, transaction-level guarantee. It says: the result of executing a set of transactions concurrently is equivalent to executing them in some serial order. Crucially, that serial order does not have to match real-time order. Serializability is the “I” in ACID (Isolation).
  • The confusion arises because both involve “ordering,” but they operate at different levels. A system can be serializable but not linearizable: transactions execute as-if serial, but individual reads within a transaction might see stale data because the serial order does not respect wall-clock time. A system can be linearizable but not serializable: each individual key is strongly consistent, but multi-key transactions might see inconsistent snapshots.
  • Strict serializability (or external consistency) combines both: transactions are serializable AND the serial order respects real-time. This is the strongest guarantee, provided by Google Spanner.
Follow-up: Why is linearizability composable but sequential consistency is not? Why does this matter in practice?Linearizability is composable because if object X is individually linearizable and object Y is individually linearizable, the combined system (X, Y) is also linearizable. This is because linearizability’s real-time ordering constraint is global — any operation that completes before another starts must be ordered before it, regardless of which object it operates on. Sequential consistency is not composable because it only requires a consistent global order per process, without respecting real-time. Two sequentially consistent objects can each maintain valid per-process orderings that, when combined, create impossible global orderings. In practice, this matters for multi-core CPU programming: each core may provide sequential consistency for its own memory operations, but the combined behavior across cores can violate sequential consistency. It is also why building linearizable systems from sequentially consistent components requires additional coordination (memory barriers, fences). For distributed systems designers, the takeaway is: if you are composing multiple independent consensus groups, linearizability of each group gives you linearizability of the whole, which is a powerful compositional guarantee.
Strong Answer:
  • Write skew is an anomaly where two transactions each read the same data, make independent decisions based on what they read, and then write to different objects — resulting in a state that violates a cross-object invariant. The classic example: a hospital requires at least one doctor on call. Both Dr. Alice and Dr. Bob read “2 doctors on call” and each decides it is safe to go off call. They each write their own record (different objects). Result: 0 doctors on call, violating the constraint.
  • Repeatable read prevents dirty reads, non-repeatable reads, and phantom reads within the same object. But it does not prevent write skew because the two transactions read the same data but write to different rows. There is no write-write conflict that repeatable read would catch — the writes are to different keys.
  • To prevent write skew in a distributed database, you need serializable isolation. There are three common approaches: (1) Actual serial execution — run one transaction at a time, which is what Redis and VoltDB do. (2) Two-phase locking (2PL) — acquire shared locks on all reads and exclusive locks on all writes, preventing concurrent access. This catches write skew because both transactions would try to acquire shared locks on the “doctors on call” query, and the conflict would be detected. (3) Serializable Snapshot Isolation (SSI), used by CockroachDB and PostgreSQL 9.1+ — optimistically execute transactions but detect dangerous read-write conflicts at commit time and abort one of the conflicting transactions.
Follow-up: SSI detects write skew at commit time and aborts one transaction. What are the production implications of high abort rates?High abort rates turn SSI from an optimization into a liability. Each abort wastes the work done by the aborted transaction and forces a retry, which consumes additional resources. If the workload has frequent cross-object invariant checks (like the doctor example, or inventory reservations), the abort rate can become a significant fraction of throughput. The mitigations are: first, design your data model to reduce the need for cross-object invariants — if you can denormalize the constraint into a single record, you eliminate the write skew possibility entirely. Second, use advisory locks or explicit “SELECT FOR UPDATE” to eagerly acquire write intent, which prevents the optimistic conflict from occurring. Third, monitor the abort rate as a first-class metric; if it exceeds 5-10%, investigate whether the workload pattern is inherently contention-heavy and consider switching to 2PL for those specific transactions.
Strong Answer:
  • For the user’s own profile updates (name, bio, avatar): read-your-writes consistency. After a user changes their bio, they must immediately see the change on their own screen. But other users seeing a 5-second-stale bio is perfectly acceptable. I would implement this with session-sticky routing or a version token that the client passes on subsequent reads.
  • For the news feed / timeline: eventual consistency is fine. A post appearing 2-3 seconds late in a friend’s feed is invisible to the user. Causal consistency is a nice-to-have: if Alice posts “I got a promotion” and Bob replies “Congratulations!”, viewers should see Alice’s post before Bob’s reply. This can be achieved with causal metadata (tracking that Bob’s reply depends on Alice’s post).
  • For likes/reactions: eventual consistency with CRDTs. A G-Counter CRDT allows each replica to independently count likes and merge later. The like count might be slightly inaccurate for a few seconds, but it will converge. No coordination needed.
  • For direct messages: causal consistency at minimum, with read-your-writes. Messages within a conversation must appear in causal order (replies after the messages they reply to). The sender must see their own message immediately.
  • For account settings (payment info, privacy settings): linearizability. If a user changes their account to “private,” that change must take effect immediately and globally. A stale read that shows the old privacy setting could expose private content.
Follow-up: How would you implement causal consistency for the news feed without paying the cost of vector clocks across millions of users?I would not use per-user vector clocks for the news feed — the vector would be enormous. Instead, I would use a lightweight causal tracking mechanism. Each post carries a “depends-on” list of post IDs that causally precede it (e.g., Bob’s “Congratulations” depends on Alice’s promotion post). When rendering a feed, the client-side or server-side renderer checks that all dependencies are satisfied before displaying a post. If a dependency is missing (the causal predecessor has not yet arrived at this replica), the post is buffered until the predecessor arrives or a timeout expires. For implementation, I would use Kafka with per-user partitioning: all posts relevant to a user’s feed are ordered within the partition, and the “depends-on” metadata is a simple list of message offsets. This gives causal ordering within a feed without global vector clocks. The trade-off: cross-feed causality (Alice sees a post on Feed A and shares it to Feed B) requires additional coordination, which I would handle with explicit “share” events that carry the causal dependency.
Strong Answer:
  • CALM stands for Consistency As Logical Monotonicity. It provides a formal answer to a deep question: which computations can be made both consistent and available without any coordination? The answer: exactly the monotonic ones.
  • A computation is monotonic if adding new information never invalidates previous conclusions. Set union is monotonic — adding an element to a set never removes existing elements. Set difference is non-monotonic — learning about a new element could change a subtraction result. Aggregations like COUNT are non-monotonic because the count changes with every new element and you need to know about absences (has everyone reported in?).
  • The practical implication is a design principle: if you can refactor your operation to be monotonic, you can implement it without coordination (no Paxos, no Raft, no distributed locks). This is exactly why CRDTs work — they are mathematically monotonic data structures. A G-Counter only grows, an OR-Set only adds (tombstones are adds, not deletes), and these structures converge without coordination.
  • The design heuristic I use: when I encounter a non-monotonic operation in a high-availability system, I ask “Can I make this monotonic?” Deletion becomes “add a tombstone.” Decrement becomes “add a negative event to a PN-Counter.” COUNT becomes “each node tracks its own count, merge by summing.” If I cannot make it monotonic, I know I need coordination for that specific operation, and I can minimize the coordination scope.
Follow-up: Give me a concrete example where you would refactor a non-monotonic operation into a monotonic one in a production system.Consider a distributed “unique username” check. Naively, checking uniqueness is non-monotonic: you need to know that no other node has registered this username, which requires knowledge of absence. The coordination-heavy approach is a distributed lock or a consensus protocol for every registration. The monotonic refactoring: pre-partition the username space across nodes (e.g., usernames starting with A-F go to Node 1, G-L to Node 2, etc.). Now each node can independently and monotonically add usernames to its partition. Uniqueness within a partition is a local check requiring no coordination. The username “alice” always maps to the same node, so two concurrent registrations of “alice” will both hit the same node and be serialized locally. You have turned a global coordination problem into a local monotonic operation by exploiting the partitioning structure. The trade-off: you lose flexibility in partition assignment, and a hot partition (many popular usernames starting with the same letter) becomes a bottleneck. But you eliminate the need for global consensus on every registration.