Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Data Management Patterns

Managing data across microservices is one of the hardest challenges. This chapter covers patterns for data ownership, distributed transactions, and consistency.
Learning Objectives:
  • Implement database per service pattern
  • Handle distributed transactions with Saga pattern
  • Understand event sourcing and its trade-offs
  • Implement CQRS for complex queries
  • Design for eventual consistency

Database Per Service

Each microservice owns its data and exposes it only through APIs.

Why This Pattern Exists

In a monolith, every module can reach into any table with a simple JOIN. That convenience is exactly what kills scalability and team autonomy at scale. When ten teams share one database, a schema change made by one team breaks queries in services owned by three others, and nobody can deploy independently because the shared schema becomes a coordination bottleneck. The database per service pattern draws a hard boundary: each service owns its persistence layer, period. If you want another service’s data, you call its API or consume its events. This trades query convenience (no more cross-service JOINs) for deployment independence, polyglot persistence (MongoDB for users, PostgreSQL for orders, Redis for inventory cache), and blast-radius containment when a schema migration goes wrong. What happens if you ignore this? You get the “distributed monolith” anti-pattern: services that look independent on the surface but share a database underneath. A single bad migration locks tables across five services. A slow query in one service starves connections for all the others. You have all the complexity of microservices with none of the benefits. The key tradeoff to watch: you lose ACID transactions across service boundaries. An order cannot atomically deduct inventory and charge the customer if those tables live in different databases. This is why the rest of this chapter exists — Saga, Outbox, Event Sourcing, and CQRS are all responses to “we broke the database apart, now how do we keep the business logic consistent?”
┌─────────────────────────────────────────────────────────────────────────────┐
│                      DATABASE PER SERVICE                                    │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ❌ ANTI-PATTERN: Shared Database                                           │
│  ───────────────────────────────────────                                    │
│  ┌───────────┐  ┌───────────┐  ┌───────────┐                               │
│  │  Service  │  │  Service  │  │  Service  │                               │
│  │     A     │  │     B     │  │     C     │                               │
│  └─────┬─────┘  └─────┬─────┘  └─────┬─────┘                               │
│        │              │              │                                       │
│        └──────────────┼──────────────┘                                      │
│                       ▼                                                      │
│              ┌─────────────────┐                                            │
│              │ SHARED DATABASE │  ← Coupling, schema conflicts              │
│              └─────────────────┘                                            │
│                                                                              │
│  ✅ CORRECT: Database Per Service                                           │
│  ───────────────────────────────────                                        │
│  ┌───────────┐  ┌───────────┐  ┌───────────┐                               │
│  │  Service  │  │  Service  │  │  Service  │                               │
│  │     A     │  │     B     │  │     C     │                               │
│  └─────┬─────┘  └─────┬─────┘  └─────┬─────┘                               │
│        │              │              │                                       │
│        ▼              ▼              ▼                                       │
│  ┌───────────┐  ┌───────────┐  ┌───────────┐                               │
│  │ MongoDB   │  │ PostgreSQL│  │   Redis   │  ← Best fit for each         │
│  └───────────┘  └───────────┘  └───────────┘                               │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Implementation Example

Below, three services each pick a database that fits their access pattern. The User Service uses MongoDB because user profiles have flexible, nested attributes (preferences, addresses, metadata) that evolve often — a schema-rigid relational store would require frequent migrations. The Order Service uses PostgreSQL because orders have strong relational structure (order-to-items), need ACID guarantees on financial data, and benefit from rich query capability. The Inventory Service combines PostgreSQL (source of truth) with Redis (hot read cache), because stock lookups happen thousands of times per second but mutations are comparatively rare. If you used one database for all three, you would either over-engineer the simple cases (MongoDB for simple orders) or under-engineer the complex ones (SQL for rapidly-evolving user preferences). Polyglot persistence is one of the genuine wins of microservices — but only if each team is prepared to operate the database they picked.
// user-service/src/database.js
const mongoose = require('mongoose');

// User service uses MongoDB
const connectUserDB = async () => {
  await mongoose.connect(process.env.USER_DB_URI, {
    useNewUrlParser: true,
    useUnifiedTopology: true
  });
  console.log('User DB connected');
};

const UserSchema = new mongoose.Schema({
  email: { type: String, unique: true, required: true },
  name: { type: String, required: true },
  passwordHash: String,
  status: { type: String, enum: ['active', 'inactive', 'suspended'] },
  preferences: {
    language: String,
    timezone: String,
    notifications: Boolean
  },
  createdAt: { type: Date, default: Date.now }
});

const User = mongoose.model('User', UserSchema);

// order-service/src/database.js
const { Pool } = require('pg');
const { PrismaClient } = require('@prisma/client');

// Order service uses PostgreSQL with Prisma
const prisma = new PrismaClient();

// schema.prisma
/*
model Order {
  id          String      @id @default(uuid())
  customerId  String      // Reference to user service
  status      OrderStatus
  items       OrderItem[]
  totalAmount Decimal
  currency    String      @default("USD")
  createdAt   DateTime    @default(now())
  updatedAt   DateTime    @updatedAt
}

model OrderItem {
  id        String  @id @default(uuid())
  orderId   String
  order     Order   @relation(fields: [orderId], references: [id])
  productId String  // Reference to product service
  quantity  Int
  unitPrice Decimal
}

enum OrderStatus {
  PENDING
  CONFIRMED
  PAID
  SHIPPED
  DELIVERED
  CANCELLED
}
*/

// inventory-service/src/database.js
const Redis = require('ioredis');
const { Pool } = require('pg');

// Inventory uses Redis for fast reads, PostgreSQL for persistence
class InventoryDatabase {
  constructor() {
    this.redis = new Redis(process.env.REDIS_URL);
    this.pg = new Pool({ connectionString: process.env.INVENTORY_DB_URL });
  }

  async getStock(productId) {
    // Try cache first
    let stock = await this.redis.get(`stock:${productId}`);
    
    if (stock === null) {
      // Fallback to database
      const result = await this.pg.query(
        'SELECT quantity FROM inventory WHERE product_id = $1',
        [productId]
      );
      stock = result.rows[0]?.quantity || 0;
      
      // Cache for 1 minute
      await this.redis.setex(`stock:${productId}`, 60, stock);
    }
    
    return parseInt(stock);
  }

  async updateStock(productId, delta) {
    const client = await this.pg.connect();
    
    try {
      await client.query('BEGIN');
      
      await client.query(
        `UPDATE inventory 
         SET quantity = quantity + $1, updated_at = NOW()
         WHERE product_id = $2`,
        [delta, productId]
      );
      
      await client.query('COMMIT');
      
      // Invalidate cache
      await this.redis.del(`stock:${productId}`);
    } catch (error) {
      await client.query('ROLLBACK');
      throw error;
    } finally {
      client.release();
    }
  }
}

Cross-Service Data Access

The rule is simple but frequently violated: never reach into another service’s database. The moment Service A runs a query against Service B’s tables, you have secretly coupled their schemas. Now any refactor in Service B — renaming a column, splitting a table, migrating to a different database engine — silently breaks Service A. The correct mental model: treat every other service’s data as if it were on a different company’s servers. You would never query a third-party vendor’s database directly; you would call their API. Apply the same discipline internally. APIs are contracts; database schemas are implementation details. The cost is real. An API call is slower than a JOIN (milliseconds vs microseconds), adds a network dependency, and requires you to think about failure modes like “what if the user service is down?” But the benefit is also real: each team can evolve its storage independently, and a bad migration in one service cannot corrupt another. If you find yourself wanting cross-service JOINs frequently, that’s a design smell. Either your service boundaries are wrong (and the two services should merge) or you need a dedicated read model via CQRS (covered later in this chapter).
// ❌ WRONG: Direct database access
class OrderService {
  async createOrder(orderData) {
    // DON'T DO THIS - accessing user database directly
    const user = await userDatabase.users.findById(orderData.userId);
    // ...
  }
}

// ✅ CORRECT: API-based access
class OrderService {
  constructor(userServiceClient, inventoryServiceClient) {
    this.userClient = userServiceClient;
    this.inventoryClient = inventoryServiceClient;
  }

  async createOrder(orderData) {
    // Validate user via API
    const user = await this.userClient.getUser(orderData.userId);
    if (!user) {
      throw new Error('User not found');
    }

    // Check inventory via API
    for (const item of orderData.items) {
      const available = await this.inventoryClient.checkStock(
        item.productId,
        item.quantity
      );
      if (!available) {
        throw new Error(`Insufficient stock for ${item.productId}`);
      }
    }

    // Create order in our database
    const order = await this.orderRepository.create({
      customerId: user.id,
      items: orderData.items,
      totalAmount: this.calculateTotal(orderData.items),
      status: 'PENDING'
    });

    return order;
  }
}

Saga Pattern

Manage distributed transactions across multiple services.

Why Sagas Exist

Once you split a database per service, the classic ACID transaction dies. You cannot wrap “reserve inventory, charge payment, create order” in a single BEGIN...COMMIT because those tables live in different databases owned by different teams. The academic answer is two-phase commit (2PC), but 2PC is blocking (locks held across network round-trips), fragile (coordinator failure wedges the system), and unsupported by most modern NoSQL stores. Nobody uses 2PC in production microservices. The Saga pattern replaces atomicity with a sequence of local transactions, each of which has a corresponding compensating transaction to undo it. Think of it as “eventual consistency with explicit rollback.” If step 3 of a 5-step saga fails, you run the compensations for steps 2 and 1 in reverse order. The system converges to a consistent state — just not instantly. If you did this naively (fire and forget, no compensation), you get stuck orders: payment charged, inventory reserved, but the order record was never created because the service crashed between steps. Money disappears, stock is locked forever, customer support is on fire. The tradeoff: sagas accept temporary inconsistency in exchange for availability and decoupling. A saga in mid-flight is a real business state — “order pending inventory” is a thing the UI must handle. This is where many teams go wrong: they build sagas but treat the intermediate states as internal implementation details rather than first-class business states visible to users and operators.

Choreography vs Orchestration: The Fundamental Choice

There are two styles: choreography (services react to events with no central brain) and orchestration (a single orchestrator calls each step in order). The choice shapes everything downstream: debuggability, coupling, observability. Choreography feels elegant at small scale — just services reacting to events, very “decoupled.” But as the saga grows to 5+ steps, nobody can answer the question “what’s the current flow?” without tracing events across five services. You get implicit coupling (Service B must know which event Service A emits, which event to emit next, and who else might listen). Orchestration centralizes the flow in one place: you can read the orchestrator’s code and see the full saga. The downside is the orchestrator becomes a critical service and a potential coupling point — it needs to know about every step. For most production systems, I prefer orchestration once the saga has more than three steps, because the observability win outweighs the “central coordinator” concern.

Choreography-Based Saga

Services communicate through events without a central coordinator.
┌─────────────────────────────────────────────────────────────────────────────┐
│                     CHOREOGRAPHY SAGA: ORDER FLOW                            │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  HAPPY PATH:                                                                 │
│  ───────────                                                                 │
│                                                                              │
│  Order       ──(1) OrderCreated──▶  Inventory   ──(2) StockReserved──▶      │
│  Service                            Service                                  │
│                                                                              │
│  Payment     ◀──(3) StockReserved──  Inventory  ──(4) PaymentSuccess──▶     │
│  Service                             Service                                 │
│                                                                              │
│  Order       ◀──(5) PaymentSuccess── Payment    ──(6) OrderConfirmed──▶     │
│  Service                             Service                                 │
│                                                                              │
│                                                                              │
│  COMPENSATION (Payment Failed):                                              │
│  ──────────────────────────────                                             │
│                                                                              │
│  Payment     ──(1) PaymentFailed──▶  Inventory  ──(2) StockReleased──▶      │
│  Service                             Service                                 │
│                                                                              │
│  Order       ◀──(3) StockReleased──  Inventory  ──(4) OrderCancelled──▶     │
│  Service                             Service                                 │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
In the choreography code below, notice how every service is both a publisher and subscriber of events, and the saga logic is spread across multiple files. The sagaState field on the Order record is a breadcrumb — it tells any operator examining the database “what step are we on?” Without this state tracking, a failure mid-saga becomes nearly impossible to diagnose: you’d have to reconstruct the flow from distributed event logs.
// order-service/src/sagas/orderSaga.js
class OrderChoreographySaga {
  constructor(eventBus, orderRepository) {
    this.eventBus = eventBus;
    this.orderRepository = orderRepository;
    this.setupHandlers();
  }

  setupHandlers() {
    // Listen for saga-related events
    this.eventBus.subscribe('inventory.stock_reserved', this.onStockReserved.bind(this));
    this.eventBus.subscribe('inventory.stock_reservation_failed', this.onStockReservationFailed.bind(this));
    this.eventBus.subscribe('payment.success', this.onPaymentSuccess.bind(this));
    this.eventBus.subscribe('payment.failed', this.onPaymentFailed.bind(this));
    this.eventBus.subscribe('inventory.stock_released', this.onStockReleased.bind(this));
  }

  // Step 1: Create order and publish event
  async createOrder(orderData) {
    const order = await this.orderRepository.create({
      ...orderData,
      status: 'PENDING',
      sagaState: 'STARTED'
    });

    await this.eventBus.publish('order.created', {
      orderId: order.id,
      customerId: order.customerId,
      items: order.items,
      totalAmount: order.totalAmount
    });

    return order;
  }

  // Step 2: Stock reserved → Proceed to payment
  async onStockReserved(event) {
    const { orderId, reservationId } = event.data;
    
    await this.orderRepository.update(orderId, {
      sagaState: 'STOCK_RESERVED',
      inventoryReservationId: reservationId
    });

    // Trigger payment
    const order = await this.orderRepository.findById(orderId);
    await this.eventBus.publish('order.ready_for_payment', {
      orderId,
      customerId: order.customerId,
      amount: order.totalAmount
    });
  }

  // Stock reservation failed → Cancel order
  async onStockReservationFailed(event) {
    const { orderId, reason } = event.data;
    
    await this.orderRepository.update(orderId, {
      status: 'CANCELLED',
      sagaState: 'FAILED',
      failureReason: reason
    });

    await this.eventBus.publish('order.cancelled', {
      orderId,
      reason: 'Insufficient stock'
    });
  }

  // Step 3: Payment success → Confirm order
  async onPaymentSuccess(event) {
    const { orderId, paymentId, amount } = event.data;
    
    await this.orderRepository.update(orderId, {
      status: 'CONFIRMED',
      sagaState: 'COMPLETED',
      paymentId
    });

    await this.eventBus.publish('order.confirmed', {
      orderId,
      paymentId
    });
  }

  // Payment failed → Trigger compensation
  async onPaymentFailed(event) {
    const { orderId, reason } = event.data;
    
    await this.orderRepository.update(orderId, {
      sagaState: 'COMPENSATING',
      paymentFailureReason: reason
    });

    // Trigger inventory release (compensation)
    const order = await this.orderRepository.findById(orderId);
    await this.eventBus.publish('order.payment_failed', {
      orderId,
      reservationId: order.inventoryReservationId
    });
  }

  // Compensation complete → Mark order as cancelled
  async onStockReleased(event) {
    const { orderId } = event.data;
    
    const order = await this.orderRepository.findById(orderId);
    if (order.sagaState === 'COMPENSATING') {
      await this.orderRepository.update(orderId, {
        status: 'CANCELLED',
        sagaState: 'COMPENSATED'
      });

      await this.eventBus.publish('order.cancelled', {
        orderId,
        reason: 'Payment failed'
      });
    }
  }
}

// inventory-service/src/handlers/sagaHandler.js
class InventorySagaHandler {
  constructor(eventBus, inventoryService) {
    this.eventBus = eventBus;
    this.inventoryService = inventoryService;
    this.setupHandlers();
  }

  setupHandlers() {
    this.eventBus.subscribe('order.created', this.onOrderCreated.bind(this));
    this.eventBus.subscribe('order.payment_failed', this.onPaymentFailed.bind(this));
  }

  async onOrderCreated(event) {
    const { orderId, items } = event.data;

    try {
      const reservationId = await this.inventoryService.reserveStock(orderId, items);
      
      await this.eventBus.publish('inventory.stock_reserved', {
        orderId,
        reservationId
      });
    } catch (error) {
      await this.eventBus.publish('inventory.stock_reservation_failed', {
        orderId,
        reason: error.message
      });
    }
  }

  async onPaymentFailed(event) {
    const { orderId, reservationId } = event.data;

    await this.inventoryService.releaseReservation(reservationId);
    
    await this.eventBus.publish('inventory.stock_released', {
      orderId,
      reservationId
    });
  }
}

Orchestration-Based Saga

A central orchestrator controls the saga flow. In the orchestrator pattern, the saga’s full sequence lives in one file. You can read startOrderSaga top-to-bottom and see every step, every compensation, in order. Compare that to the choreography version where the flow was spread across five event handlers in three services. The critical design choice here is storing compensations in a stack as each step succeeds. When something fails at step 4, you pop compensations off the stack and execute them in reverse order. This mirrors how a database rollback works — undo the most recent operation first. If you compensated in forward order instead, you would leave the system in bizarre intermediate states. Pay attention to the executeStep helper: it persists the current step and the compensation function together. If the orchestrator itself crashes mid-saga, a restart can read the state store, see “we were at step PROCESS_PAYMENT with compensations [cancelOrder, releaseInventory],” and either retry or compensate. Without this durable state, a crash in the orchestrator leaves sagas permanently stuck.
// saga-orchestrator/src/sagas/orderSaga.js
class OrderSagaOrchestrator {
  constructor(eventBus, stateStore, serviceClients) {
    this.eventBus = eventBus;
    this.stateStore = stateStore;
    this.services = serviceClients;
  }

  async startOrderSaga(orderData) {
    const sagaId = generateId();
    
    // Initialize saga state
    await this.stateStore.create(sagaId, {
      type: 'ORDER_SAGA',
      status: 'STARTED',
      step: 'CREATE_ORDER',
      data: orderData,
      compensations: []
    });

    try {
      // Step 1: Create Order
      const order = await this.executeStep(sagaId, 'CREATE_ORDER', async () => {
        return this.services.order.createOrder(orderData);
      }, async (order) => {
        await this.services.order.cancelOrder(order.id);
      });

      // Step 2: Reserve Inventory
      await this.executeStep(sagaId, 'RESERVE_INVENTORY', async () => {
        return this.services.inventory.reserve(order.id, order.items);
      }, async (reservationId) => {
        await this.services.inventory.release(reservationId);
      });

      // Step 3: Process Payment
      await this.executeStep(sagaId, 'PROCESS_PAYMENT', async () => {
        return this.services.payment.charge(order.id, order.totalAmount);
      }, async (paymentId) => {
        await this.services.payment.refund(paymentId);
      });

      // Step 4: Confirm Order
      await this.executeStep(sagaId, 'CONFIRM_ORDER', async () => {
        return this.services.order.confirm(order.id);
      }, null); // No compensation needed

      await this.stateStore.update(sagaId, {
        status: 'COMPLETED',
        completedAt: new Date()
      });

      return order;
    } catch (error) {
      await this.compensate(sagaId, error);
      throw error;
    }
  }

  async executeStep(sagaId, stepName, action, compensation) {
    await this.stateStore.update(sagaId, { step: stepName });

    const result = await action();

    if (compensation) {
      // Store compensation for potential rollback
      const saga = await this.stateStore.get(sagaId);
      saga.compensations.unshift({ step: stepName, compensation, result });
      await this.stateStore.update(sagaId, { compensations: saga.compensations });
    }

    return result;
  }

  async compensate(sagaId, error) {
    await this.stateStore.update(sagaId, {
      status: 'COMPENSATING',
      error: error.message
    });

    const saga = await this.stateStore.get(sagaId);

    for (const { step, compensation, result } of saga.compensations) {
      try {
        console.log(`Compensating step: ${step}`);
        await compensation(result);
      } catch (compError) {
        console.error(`Compensation failed for ${step}:`, compError);
        // Store for manual intervention
        await this.stateStore.update(sagaId, {
          status: 'COMPENSATION_FAILED',
          failedCompensation: step
        });
        return;
      }
    }

    await this.stateStore.update(sagaId, {
      status: 'COMPENSATED',
      compensatedAt: new Date()
    });
  }
}

Saga Caveats and Interview Deep-Dive

Saga pattern traps practitioners fall into:
  1. Compensating transactions are not idempotent. The most common bug: your saga retries a compensation after a transient network error, and the second call double-refunds the customer or releases stock twice. Compensations get executed more than once in production — plan for it from day one.
  2. Compensation order matters and is often wrong. If steps were A, B, C, compensations must run C-reverse, then B-reverse, then A-reverse. Running them in any other order can leave data in an invalid state (e.g., releasing inventory before cancelling the payment reservation leaves a “paid order with no items”).
  3. No timeout on saga execution. A saga step that never responds leaves the saga stuck in PENDING forever. Without a saga-level deadline plus automatic compensation on timeout, you accumulate zombie sagas that require manual cleanup.
  4. No visibility into in-flight sagas. When the payment step of 10,000 sagas is stuck waiting on a slow downstream, nobody knows until customer support starts getting calls. Build a saga dashboard that shows state distribution, average age per state, and stuck sagas. This is not optional in production.
Solutions and patterns:
  • Idempotency keys on every step and every compensation. Pass a stable saga_id + step_id on each call; the receiving service stores a short-lived record (Redis with a 7-day TTL works) and rejects duplicates with “already processed.” This turns at-least-once delivery into effectively exactly-once business effect.
  • Explicit saga state machine, persisted after every transition. Model the saga as rows in an orders_saga table with columns saga_state, last_event, retries, next_deadline. Every state transition is a DB UPDATE that must succeed before the next event is emitted. Crash in the middle? On restart, the recovery worker reads rows where next_deadline < NOW() and resumes.
  • Fail-forward vs fail-back. Not every failure needs to compensate. Some steps (sending a confirmation email, updating analytics) should be retried indefinitely rather than rolled back — the business cost of “we cancelled the order because the email didn’t send” is far worse than “we sent a duplicate email.” Classify each step as rollback-on-failure or retry-forever during design.
  • Correlation IDs propagated end-to-end. Every log line, event, and HTTP header carries the same saga_id. When a saga goes wrong at 3 AM, a single Kibana search finds all 17 related log lines in 5 seconds instead of 45 minutes.
Strong Answer Framework:
  1. Stop and diagnose before acting. First question: did the partial state happen because a service call failed, or because a service succeeded but the response was lost? These look identical but have different fixes. Check idempotency: if the downstream stored an idempotency_key, was it stored? If yes, the call succeeded — you just lost the response.
  2. Read the persisted saga state. Your saga row tells you exactly which step was last committed. If state is INVENTORY_RESERVED but PAYMENT_CHARGED never got written, you know payment is the uncertain step.
  3. Query downstreams by business key, not saga memory. Ask the Payment service “do you have a charge for order X?” using the order ID as the idempotency key. If yes, advance the saga. If no, retry the payment step.
  4. Trigger compensation only after confirming failure. Never compensate on timeout alone — query first. Compensating a step that actually succeeded doubles the damage.
  5. Emit the new saga state and continue. Either roll forward (retry the failed step) or roll backward (compensate prior steps). Log the decision and the evidence that led to it.
  6. Surface the incident. Any saga that requires recovery should page a human after N retries. Silent self-healing is good; silent self-failing is a disaster.
Real-World Example: In 2022, a well-known fintech had a saga that charged the card, failed to create the ledger entry due to a DB timeout, and triggered compensation. The compensation refunded the card. But the ledger entry had actually been written — the timeout was on the response. Net result: customer was debited, refunded, but also credited with the product. Post-incident fix: every saga step now queries the downstream by idempotency_key before deciding to compensate.Senior Follow-up Questions:
  1. “What if the saga state store itself is down when you try to recover?” The saga state must be persisted in the same transaction as the business operation — this is the Outbox pattern. If the state store is part of the primary DB (same Postgres as orders), it is either up together or down together, which simplifies reasoning. If the state store is external, you need an escape hatch: dump saga rows to a file log every N seconds, and recover from the file if DB is gone.
  2. “How do you test compensation code? It only runs on failure.” Build a chaos-testing harness that randomly aborts saga steps in a staging environment. Also: every compensation must be callable from a CLI tool that operators can invoke manually in an incident. Write integration tests that drive the saga to each state and then kill it, verifying compensation runs correctly. Teams that skip this discover compensation bugs during real outages — the worst possible time.
  3. “What’s your policy for compensations that fail?” Retry with exponential backoff for N attempts. After that, move the saga to a MANUAL_INTERVENTION state and alert. Compensation failures are rare but real (what if the refund API is down for 6 hours?) — the system needs a path where a human acknowledges and resolves. Pretending compensations always succeed is how teams end up with billions in stuck funds.
Common Wrong Answers:
  1. “I’d wrap the whole saga in a try/catch and roll back on exception.” This reveals the candidate does not understand distributed transactions. You cannot roll back a remote service with a try/catch — the database on the other side has already committed. This is exactly the problem that sagas exist to solve.
  2. “Use 2PC/XA transactions instead.” Shows lack of production experience. 2PC requires distributed locks held across network round-trips, kills throughput, and is not supported by most modern data stores (DynamoDB, Cassandra, most SaaS APIs). Nobody runs 2PC in production microservices for a reason.
Further Reading:
  • Chris Richardson, Microservices Patterns — chapters on Saga and Outbox patterns.
  • Caitie McCaffrey, “Distributed Sagas” talk (QCon) — the foundational modern treatment.
  • AWS Step Functions documentation — a production-grade saga orchestrator you can learn from.
Strong Answer Framework:
  1. Idempotency key per compensation call. The key is typically {saga_id}:{step_id}:compensate so replays are recognized.
  2. Store the effect, not just the intent. Before executing, write a record saying “compensation X is being applied.” If you crash and retry, the retry checks the record and either completes or skips based on what was already done.
  3. Use database constraints to prevent double-effect. A unique index on (saga_id, step_id, direction='compensate') in the audit table makes a second insert fail cleanly rather than running the compensation logic twice.
  4. Design compensations as state-setting, not delta-applying. Instead of “subtract 1 from inventory” (dangerous if replayed), write “set inventory reservation status = RELEASED” (safe, same result on replay).
  5. Acknowledge the edge case where compensation cannot be made idempotent. Some external APIs (email sends, SMS, legacy ERPs) are inherently non-idempotent. For these, accept that double-compensation may happen and either (a) add a human-approval gate or (b) log the double-effect explicitly so support can follow up.
Real-World Example: Stripe’s API enforces idempotency keys on every mutating call for exactly this reason. A retry with the same key returns the original response rather than charging twice. When you build compensations, mirror this contract: every compensation call accepts and honors an idempotency key.Senior Follow-up Questions:
  1. “How do you garbage-collect idempotency records?” TTL of 7-14 days is typical. Long enough to catch all realistic retries, short enough that the table does not grow unbounded. Use a separate partitioned table with time-based partitions and drop old partitions rather than deleting rows — much faster.
  2. “What if the downstream service doesn’t support idempotency keys?” Wrap the call in an idempotency proxy: a small stateful layer that you own, which stores the key and the response, and only calls the downstream on first attempt. This is the same pattern API gateways use for exactly-once semantics on non-idempotent backends.
  3. “Idempotency sounds expensive — every call now has a DB lookup. How much does it cost?” With proper indexing, the lookup is under 1ms on Postgres or Redis. For a service doing 10K RPS, that’s less than 1% of CPU. Real cost is developer discipline: getting idempotency right requires every endpoint author to remember to check the key. Use middleware or decorators to make it automatic.
Common Wrong Answers:
  1. “Our message broker guarantees exactly-once delivery, so we don’t need idempotency.” Dangerous misconception. Even Kafka’s “exactly-once semantics” only applies within Kafka — the moment you call an external service or DB, you are back to at-least-once. Idempotency is required regardless of broker.
  2. “We use SERIALIZABLE isolation to prevent double-apply.” Confuses local transaction isolation with distributed effects. SERIALIZABLE protects one database, not the remote service you’re compensating.
Further Reading:
  • Stripe Engineering Blog, “Designing robust and predictable APIs with idempotency.”
  • Martin Kleppmann, Designing Data-Intensive Applications, Chapter 8 (The Trouble with Distributed Systems).
  • AWS Well-Architected: Reliability Pillar, idempotency patterns.

Event Sourcing

Store state as a sequence of events instead of current state.

Why Event Sourcing Exists

Traditional storage keeps only the current state. An order row says “status: SHIPPED, total: $99.” But how did it get there? Was it ever CANCELLED and then re-created? Did the total change after a discount? The history is lost the moment you run UPDATE. Event sourcing flips this around: the events are the database. You never UPDATE a row — you append an event like ItemAdded, DiscountApplied, OrderShipped. The current state is a derived view computed by replaying events from the beginning. This is how your bank account works: there is no “balance” field in the ledger, just a list of deposits and withdrawals. The wins are real and hard-won: perfect audit trail (regulatory gold for finance and healthcare), time-travel debugging (replay to any point in history), and natural fit for domains with rich state transitions (insurance claims, legal workflows, shipping logistics). The costs are equally real. Event schema evolution is hard — events are immutable, so renaming a field means versioning the schema and writing upcasters that translate old events to the new shape. Querying is hard — “show me all orders over $100” requires replaying every event or maintaining a separate read model (see CQRS). Storage grows unbounded unless you snapshot periodically. And teams new to the pattern consistently underestimate the mental shift: you can no longer reason about “the current state of X” without thinking about which events produced it. Do not adopt event sourcing because it sounds cool. Adopt it when the business requires full history (finance, audit-heavy domains) or when you already have many state transitions that are painful to model in CRUD. For a simple CRUD app, it is dramatic overkill.
┌─────────────────────────────────────────────────────────────────────────────┐
│                          EVENT SOURCING                                      │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  TRADITIONAL APPROACH:                                                       │
│  ─────────────────────                                                       │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  Order Table                                                         │    │
│  │  id: 123, status: SHIPPED, total: $99.99, shipped_at: 2024-01-15    │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│  ▲ We only see current state, history is lost                               │
│                                                                              │
│  EVENT SOURCING APPROACH:                                                    │
│  ────────────────────────                                                    │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  Event Store for Order 123                                           │    │
│  │  ────────────────────────────────────────────────────────────────   │    │
│  │  [1] OrderCreated { items: [...], total: $99.99 }      2024-01-10   │    │
│  │  [2] PaymentReceived { paymentId: "pay_abc" }          2024-01-10   │    │
│  │  [3] OrderConfirmed { }                                 2024-01-10   │    │
│  │  [4] ItemsShipped { trackingNo: "1Z999" }              2024-01-15   │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│  ▲ Full history, can rebuild any past state                                 │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Implementation

The Event Store below enforces optimistic concurrency via the version number. Two concurrent users acting on the same aggregate will both read version=5, both try to append their changes expecting version=5, but only one of them wins — the other gets a ConcurrencyError and must re-read the events, re-apply their change, and retry. This is how you prevent the classic lost-update problem without using pessimistic locks that would kill throughput. The Order aggregate deserves careful attention. It has two ways to get into a state: create (factory for new orders) and fromEvents (replay past events to rebuild state). Every business method — addItem, submit, ship — does two things: validates the business rule, then applies an event. The event is both the record-of-truth and the mechanism that mutates in-memory state. This dual purpose is the whole point: your business logic becomes a function from (current state, command) to events, with no hidden side effects. If you did this differently — say, mutated state directly and then “also published an event” — you would risk divergence between state and events. Bug in the publisher? State changes, event missed. Replay from events? The rebuilt state does not match what you saw earlier. By making events the single source of truth, you eliminate that class of bug entirely.
// event-store/src/eventStore.js
class EventStore {
  constructor(database) {
    this.db = database;
  }

  async appendEvents(aggregateId, events, expectedVersion) {
    const client = await this.db.connect();
    
    try {
      await client.query('BEGIN');
      
      // Optimistic concurrency check
      const result = await client.query(
        'SELECT MAX(version) as current_version FROM events WHERE aggregate_id = $1',
        [aggregateId]
      );
      
      const currentVersion = result.rows[0]?.current_version || 0;
      
      if (expectedVersion !== currentVersion) {
        throw new ConcurrencyError(
          `Expected version ${expectedVersion}, but current is ${currentVersion}`
        );
      }
      
      // Append events
      let version = currentVersion;
      for (const event of events) {
        version++;
        await client.query(
          `INSERT INTO events (aggregate_id, aggregate_type, event_type, event_data, version, timestamp)
           VALUES ($1, $2, $3, $4, $5, $6)`,
          [
            aggregateId,
            event.aggregateType,
            event.eventType,
            JSON.stringify(event.data),
            version,
            new Date()
          ]
        );
      }
      
      await client.query('COMMIT');
      return version;
    } catch (error) {
      await client.query('ROLLBACK');
      throw error;
    } finally {
      client.release();
    }
  }

  async getEvents(aggregateId, fromVersion = 0) {
    const result = await this.db.query(
      `SELECT * FROM events 
       WHERE aggregate_id = $1 AND version > $2 
       ORDER BY version`,
      [aggregateId, fromVersion]
    );
    
    return result.rows.map(row => ({
      aggregateId: row.aggregate_id,
      aggregateType: row.aggregate_type,
      eventType: row.event_type,
      data: row.event_data,
      version: row.version,
      timestamp: row.timestamp
    }));
  }

  async getEventsByType(eventType, fromTimestamp, limit = 1000) {
    const result = await this.db.query(
      `SELECT * FROM events 
       WHERE event_type = $1 AND timestamp > $2 
       ORDER BY timestamp LIMIT $3`,
      [eventType, fromTimestamp, limit]
    );
    return result.rows;
  }
}

// order-service/src/aggregates/Order.js
class Order {
  constructor() {
    this.id = null;
    this.customerId = null;
    this.items = [];
    this.status = null;
    this.totalAmount = 0;
    this.version = 0;
    this.pendingEvents = [];
  }

  // Factory method to create new order
  static create(orderId, customerId, items) {
    const order = new Order();
    order.apply(new OrderCreatedEvent(orderId, customerId, items));
    return order;
  }

  // Reconstitute from events
  static fromEvents(events) {
    const order = new Order();
    for (const event of events) {
      order.applyEvent(event);
      order.version = event.version;
    }
    return order;
  }

  // Business methods that generate events
  addItem(productId, quantity, price) {
    if (this.status !== 'DRAFT') {
      throw new Error('Cannot add items to non-draft order');
    }
    this.apply(new ItemAddedEvent(this.id, productId, quantity, price));
  }

  submit() {
    if (this.items.length === 0) {
      throw new Error('Cannot submit empty order');
    }
    this.apply(new OrderSubmittedEvent(this.id));
  }

  confirmPayment(paymentId) {
    if (this.status !== 'PENDING_PAYMENT') {
      throw new Error('Order not pending payment');
    }
    this.apply(new PaymentConfirmedEvent(this.id, paymentId));
  }

  ship(trackingNumber) {
    if (this.status !== 'CONFIRMED') {
      throw new Error('Order not confirmed');
    }
    this.apply(new OrderShippedEvent(this.id, trackingNumber));
  }

  // Apply new event (adds to pending)
  apply(event) {
    this.applyEvent(event);
    this.pendingEvents.push(event);
  }

  // Apply event to state
  applyEvent(event) {
    switch (event.eventType) {
      case 'OrderCreated':
        this.id = event.data.orderId;
        this.customerId = event.data.customerId;
        this.items = event.data.items;
        this.totalAmount = this.calculateTotal();
        this.status = 'DRAFT';
        break;
        
      case 'ItemAdded':
        this.items.push({
          productId: event.data.productId,
          quantity: event.data.quantity,
          price: event.data.price
        });
        this.totalAmount = this.calculateTotal();
        break;
        
      case 'OrderSubmitted':
        this.status = 'PENDING_PAYMENT';
        break;
        
      case 'PaymentConfirmed':
        this.status = 'CONFIRMED';
        this.paymentId = event.data.paymentId;
        break;
        
      case 'OrderShipped':
        this.status = 'SHIPPED';
        this.trackingNumber = event.data.trackingNumber;
        break;
    }
  }

  calculateTotal() {
    return this.items.reduce((sum, item) => sum + item.price * item.quantity, 0);
  }

  getPendingEvents() {
    return this.pendingEvents;
  }

  clearPendingEvents() {
    this.pendingEvents = [];
  }
}

// order-service/src/repositories/OrderRepository.js
class OrderRepository {
  constructor(eventStore) {
    this.eventStore = eventStore;
  }

  async getById(orderId) {
    const events = await this.eventStore.getEvents(orderId);
    
    if (events.length === 0) {
      return null;
    }
    
    return Order.fromEvents(events);
  }

  async save(order) {
    const pendingEvents = order.getPendingEvents();
    
    if (pendingEvents.length === 0) {
      return;
    }
    
    const newVersion = await this.eventStore.appendEvents(
      order.id,
      pendingEvents,
      order.version
    );
    
    order.version = newVersion;
    order.clearPendingEvents();
    
    // Publish events for other services
    for (const event of pendingEvents) {
      await this.eventBus.publish(event.eventType, event);
    }
  }
}

Snapshotting for Performance

If your event stream grows to thousands of events per aggregate, loading it becomes painfully slow: every read replays the entire history. Snapshots are checkpoints — periodic “here is the aggregate state at version N” records. When loading, you grab the latest snapshot, then replay only the events after it. The tradeoff: snapshots introduce a new failure mode. If your aggregate logic changes (bug fix, new business rule), an old snapshot reflects the old behavior. You must either invalidate old snapshots on code changes or be certain snapshots capture only data, never derived calculations that might evolve. A common rule: snapshot every 100 events, keep only the latest, and be prepared to rebuild snapshots from scratch when aggregate logic changes.
// Snapshot to avoid replaying all events
class SnapshotStore {
  constructor(database) {
    this.db = database;
    this.snapshotFrequency = 100; // Snapshot every 100 events
  }

  async saveSnapshot(aggregateId, snapshot, version) {
    await this.db.query(
      `INSERT INTO snapshots (aggregate_id, snapshot_data, version, created_at)
       VALUES ($1, $2, $3, $4)
       ON CONFLICT (aggregate_id) DO UPDATE SET snapshot_data = $2, version = $3`,
      [aggregateId, JSON.stringify(snapshot), version, new Date()]
    );
  }

  async getSnapshot(aggregateId) {
    const result = await this.db.query(
      'SELECT * FROM snapshots WHERE aggregate_id = $1',
      [aggregateId]
    );
    return result.rows[0];
  }
}

class OrderRepositoryWithSnapshots {
  async getById(orderId) {
    // Try to load from snapshot
    const snapshot = await this.snapshotStore.getSnapshot(orderId);
    
    let order;
    let fromVersion = 0;
    
    if (snapshot) {
      order = Order.fromSnapshot(snapshot.snapshot_data);
      order.version = snapshot.version;
      fromVersion = snapshot.version;
    } else {
      order = new Order();
    }
    
    // Load events since snapshot
    const events = await this.eventStore.getEvents(orderId, fromVersion);
    
    for (const event of events) {
      order.applyEvent(event);
      order.version = event.version;
    }
    
    return order;
  }

  async save(order) {
    await super.save(order);
    
    // Create snapshot if needed
    if (order.version % this.snapshotStore.snapshotFrequency === 0) {
      await this.snapshotStore.saveSnapshot(
        order.id,
        order.toSnapshot(),
        order.version
      );
    }
  }
}

Event Sourcing Caveats and Interview Deep-Dive

Event sourcing traps practitioners fall into:
  1. Event store unbounded growth. Teams adopt event sourcing without a retention or archival strategy. Two years later, the event store is 4 TB, rebuilding an aggregate takes 45 seconds, and every query hits storage cost pain. You need snapshots (covered above) plus cold-tier archival (move events older than N days to S3 Glacier or equivalent).
  2. Breaking schema changes to events. Events are immutable — the ones on disk stay forever in the shape they were written. A junior engineer “fixes” the OrderCreated event schema by renaming total to total_amount and the next replay of old data crashes because the new aggregate expects a field that does not exist in 2022’s events. You need upcasters: version-aware transforms that translate old event shapes into the current model at replay time.
  3. Using event sourcing for the wrong domain. CRUD apps do not benefit from event sourcing. If your “events” are all EntityUpdated with no real business semantics, you have built a glorified audit log with 10x the complexity. Event sourcing earns its keep only when you have rich state transitions the business actually cares about (legal workflows, finance, insurance claims, shipping).
  4. Projections and aggregates drifting out of sync. The read model (projection) is eventually consistent with the event stream. If a projection bug silently corrupts a read row, nobody notices until a user complains months later. You need projection checksums and periodic “rebuild and compare” jobs that verify the read model still matches a fresh replay.
Solutions and patterns:
  • Event versioning from day one. Every event carries a version field. Never mutate an existing event type — create OrderCreatedV2 with the new schema and write upcasters that read V1 events and produce V2 in memory. Greg Young’s “Versioning in an Event Sourced System” is the definitive reference.
  • Snapshot strategy per aggregate type. Snapshot frequency depends on event rate. A shopping cart with an average 8 events per aggregate does not need snapshots. An insurance policy with 5,000 events over its lifetime must snapshot every 100-500 events.
  • Replayable projections with checkpointing. Projections store the last-processed event offset so they can resume after a crash. Rebuilding a projection from scratch must be a routine operation, not a panic button — practice it monthly.
  • Subscribe-and-catch-up on new read models. When adding a new projection, the consumer starts at offset 0 and reads the full history. This is the magic of event sourcing: new views can be built from historical data without needing “recent snapshot” plumbing.
Strong Answer Framework:
  1. Measure first, cut second. Which aggregates dominate the storage? Which events are rarely replayed? Put numbers on it before touching anything.
  2. Introduce snapshotting if not already present. Snapshots cap replay cost. If you have 1M events per aggregate but a snapshot every 1K events, you replay at most 1K events to rebuild state. That is 1000x faster.
  3. Tier storage by age. Events older than 90 days move to cheap object storage (S3 Glacier, GCS Coldline). Online queries go against the hot tier. Regulatory or audit queries go against the cold tier (slower, fine).
  4. Partition by aggregate ID. Events for aggregate A never need to touch aggregate B’s partition. Sharding by aggregate key reduces the per-query working set dramatically.
  5. Consider projections as the primary read path. Direct event-store queries should be rare — most reads should hit denormalized projections. If you are hitting the event store for user-facing reads, that is the problem, not the size.
  6. Archive dead aggregates. Orders completed 5 years ago do not need online access. Migrate them out with a verified archive and restore process.
Real-World Example: In 2020, a large European bank using EventStoreDB hit 40 TB of events across their account-processing domain. Their fix: (a) snapshots every 500 events, (b) monthly rollover to S3 for events older than 12 months, and (c) a separate query path that reads from projections for anything user-facing. Replay times dropped from minutes to sub-second.Senior Follow-up Questions:
  1. “How do you handle snapshot corruption?” Every snapshot stores the event version it represents. On load, you rebuild from the snapshot, then replay events after that version. If the rebuilt state from the snapshot does not match a fresh replay from event 0 (checked periodically), the snapshot is corrupt — delete it and regenerate. Snapshots are cache, not truth. The event stream is truth.
  2. “What if you need to query events by business attribute (e.g., find all orders with total over $1K)?” Do not query the event store for this — it is not indexed by business fields and scanning is O(N). Build a projection indexed by the attribute you need. The event store answers “give me events for this aggregate”; projections answer everything else.
  3. “How do you delete personal data for GDPR compliance if events are immutable?” Two options. Option A: crypto-shredding — each user’s events are encrypted with a user-specific key stored separately. To “delete” the user, destroy the key; events become unreadable noise. Option B: a pseudonymization projection that replaces personal fields with hashes during replay. Both keep the event store immutable while making the data effectively unavailable.
Common Wrong Answers:
  1. “Delete old events.” Event sourcing’s core invariant is immutability. Deleting events destroys the audit trail, breaks replay, and means projections can never be rebuilt. This answer disqualifies the candidate for any serious event-sourcing role.
  2. “Just add more storage.” Misses the point. Storage is cheap, query latency is not. The problem is working-set size, not disk capacity.
Further Reading:
  • Greg Young, “Versioning in an Event Sourced System” (free PDF) — the canonical reference.
  • EventStoreDB documentation, projections chapter.
  • “Event Sourcing and CQRS with Axon Framework” — deep dive on snapshotting patterns.
Strong Answer Framework:
  1. Never mutate existing event types. Old events on disk are frozen. If you need a new shape, introduce EventTypeV2.
  2. Write upcasters. A function that takes a V1 event and returns a V2 event in memory. The disk stays V1 forever; the aggregate always sees V2. This is the single most important pattern in event schema evolution.
  3. Treat schema changes as additive when possible. New optional fields with defaults are backward-compatible. Renames, removals, and type changes require upcasters.
  4. Version the event envelope, not just the payload. The envelope carries event_type, event_version, timestamp, correlation_id. Upcasters key off event_version to decide how to translate.
  5. Test upcasters against real historical data. A synthetic test is not enough. Replay a recent production event stream through the new upcaster chain before deploying.
Real-World Example: Adobe’s Document Cloud event-sourced system processes billions of events across years of PDF lifecycle data. Their schema has evolved roughly 40 times since launch. Their rule: every event type has an explicit migration chain (V1 -> V2 -> V3), stored as code, runs at every load. No event ever gets “rewritten on disk.”Senior Follow-up Questions:
  1. “What if the upcaster chain is slow? Every replay runs all upcasters.” True, but the cost is usually small (tens of nanoseconds per event). If measurements show otherwise, collapse the chain: at snapshot time, store the snapshot in the current format, so only events since the snapshot need upcasting. This amortizes the upcaster cost.
  2. “How do you handle event deletion requirements (GDPR, regulation)?” Covered above — crypto-shredding or pseudonymization projections. The key insight: event sourcing does not conflict with GDPR, it just requires intentional design. Most teams bolt this on late and regret it.
  3. “What if you need to merge two event streams (e.g., after an account merge)?” Introduce a StreamMerged event in the target stream that references the source. Do not copy events — copying loses identity and breaks idempotency. The target projection reads both streams and reconciles, or you build a merged projection from both sources.
Common Wrong Answers:
  1. “I’d run a migration script to rewrite old events.” This breaks the immutability contract and makes audit impossible. A regulator or auditor asking “what did the system record on this date?” would get a different answer depending on when you ask, which is exactly the problem event sourcing is supposed to solve.
  2. “Use schema registry like Avro/Protobuf.” Schema registries help with wire format but do not solve semantic evolution. Renaming customer_name to buyer_name requires reinterpreting the field, not just reformatting it. Upcasters are what do that.
Further Reading:
  • Greg Young, “Versioning in an Event Sourced System.”
  • Martin Kleppmann, Designing Data-Intensive Applications, Chapter 4 (Encoding and Evolution).
  • “Practical Event Sourcing” blog series from the EventStore team.

CQRS Pattern

Command Query Responsibility Segregation - separate read and write models.

Why CQRS Exists

The classic relational model assumes reads and writes share a schema. But in practice, they have opposite requirements: writes need normalization (no data duplication, referential integrity), while reads want denormalized, pre-joined views optimized for specific queries. Trying to serve both from one schema means either slow reads (expensive JOINs on every query) or polluted writes (denormalized write paths that invite data drift). CQRS splits them. Commands go to the write model (typically normalized, fully consistent, source of truth). Queries hit a read model (denormalized, often in a different database like Elasticsearch or MongoDB, updated by subscribing to write-side events). Reads and writes scale independently. You can have five read replicas feeding a search index while the write side stays on a single strongly-consistent Postgres instance. What breaks if you skip this at scale? Think Amazon’s product page. Joining products, inventory, pricing, reviews, and recommendations on every page load would grind the database into dust. The search team needs aggregated facets. The reporting team needs OLAP-style rollups. Without separation, every team is fighting over the same tables, adding indexes that help their queries and hurt everyone else’s writes. The tradeoff is significant: eventual consistency between write and read. A user updates their profile, hits the query API, and sees stale data for a few hundred milliseconds until the read model catches up. This feels wrong to users who expect read-your-writes consistency. You handle it by (a) accepting the staleness for queries where it is fine (search, listings), and (b) reading from the write side for queries that must see fresh data (your own profile, the order you just placed). Do not adopt CQRS for simple CRUD apps — the overhead of maintaining two models and a projection pipeline is wasted when your read and write schemas are essentially identical.
┌─────────────────────────────────────────────────────────────────────────────┐
│                             CQRS PATTERN                                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│                            ┌─────────────┐                                  │
│                            │    CLIENT   │                                  │
│                            └──────┬──────┘                                  │
│                                   │                                          │
│                    ┌──────────────┴──────────────┐                          │
│                    │                             │                          │
│            Commands (Write)               Queries (Read)                    │
│                    │                             │                          │
│                    ▼                             ▼                          │
│            ┌─────────────┐               ┌─────────────┐                    │
│            │   COMMAND   │               │    QUERY    │                    │
│            │   HANDLER   │               │   HANDLER   │                    │
│            └──────┬──────┘               └──────┬──────┘                    │
│                   │                             │                           │
│                   ▼                             ▼                           │
│            ┌─────────────┐               ┌─────────────┐                    │
│            │    WRITE    │──── Events ──▶│    READ     │                    │
│            │    MODEL    │               │    MODEL    │                    │
│            │ (Normalized)│               │(Denormalized)│                   │
│            └─────────────┘               └─────────────┘                    │
│                   │                             │                           │
│                   ▼                             ▼                           │
│            ┌─────────────┐               ┌─────────────┐                    │
│            │ PostgreSQL  │               │ Elasticsearch│                   │
│            │  (Source)   │               │   (Cache)    │                   │
│            └─────────────┘               └─────────────┘                    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Implementation

The implementation below has three distinct pieces. Command handlers load aggregates, invoke domain logic, and persist results. Read model updaters subscribe to events and maintain the denormalized view (Elasticsearch here, but could be Redis, MongoDB, or a denormalized Postgres table). Query handlers hit only the read model — they never touch the write side. Notice what the read model does differently from the write model. When OrderCreated arrives, it doesn’t just copy the event data — it also calls getCustomerName() to enrich the record with data from another service, pre-computes itemCount and totalAmount, and indexes by multiple fields for fast search. This is the whole value of CQRS: the read model is a view purpose-built for how users query the data, not constrained by normalization rules. The alternative — trying to answer “show me all shipped orders in the last 30 days with their customer names and item counts” from the normalized write side — requires JOINs across multiple tables (or service calls) on every query. Fine at 100 QPS, fatal at 100,000.
// Write side - Commands
class OrderCommandHandler {
  constructor(orderRepository, eventBus) {
    this.orderRepository = orderRepository;
    this.eventBus = eventBus;
  }

  async handleCreateOrder(command) {
    const order = Order.create(
      command.orderId,
      command.customerId,
      command.items
    );
    
    await this.orderRepository.save(order);
    
    // Events are published to update read models
    return order.id;
  }

  async handleSubmitOrder(command) {
    const order = await this.orderRepository.getById(command.orderId);
    
    if (!order) {
      throw new Error('Order not found');
    }
    
    order.submit();
    await this.orderRepository.save(order);
  }
}

// Read side - Query models
class OrderReadModel {
  constructor(elasticsearch) {
    this.es = elasticsearch;
  }

  // Update handlers - called when events occur
  async onOrderCreated(event) {
    await this.es.index({
      index: 'orders',
      id: event.data.orderId,
      body: {
        orderId: event.data.orderId,
        customerId: event.data.customerId,
        customerName: await this.getCustomerName(event.data.customerId),
        items: event.data.items,
        itemCount: event.data.items.length,
        totalAmount: this.calculateTotal(event.data.items),
        status: 'DRAFT',
        createdAt: event.timestamp,
        updatedAt: event.timestamp
      }
    });
  }

  async onOrderSubmitted(event) {
    await this.es.update({
      index: 'orders',
      id: event.data.orderId,
      body: {
        doc: {
          status: 'PENDING_PAYMENT',
          submittedAt: event.timestamp,
          updatedAt: event.timestamp
        }
      }
    });
  }

  async onOrderShipped(event) {
    await this.es.update({
      index: 'orders',
      id: event.data.orderId,
      body: {
        doc: {
          status: 'SHIPPED',
          trackingNumber: event.data.trackingNumber,
          shippedAt: event.timestamp,
          updatedAt: event.timestamp
        }
      }
    });
  }
}

// Query handlers
class OrderQueryHandler {
  constructor(elasticsearch) {
    this.es = elasticsearch;
  }

  async getOrderById(orderId) {
    const result = await this.es.get({
      index: 'orders',
      id: orderId
    });
    return result._source;
  }

  async searchOrders(criteria) {
    const query = {
      bool: {
        must: []
      }
    };

    if (criteria.customerId) {
      query.bool.must.push({ term: { customerId: criteria.customerId } });
    }
    
    if (criteria.status) {
      query.bool.must.push({ term: { status: criteria.status } });
    }
    
    if (criteria.dateRange) {
      query.bool.must.push({
        range: {
          createdAt: {
            gte: criteria.dateRange.from,
            lte: criteria.dateRange.to
          }
        }
      });
    }

    const result = await this.es.search({
      index: 'orders',
      body: {
        query,
        sort: [{ createdAt: 'desc' }],
        from: criteria.offset || 0,
        size: criteria.limit || 20
      }
    });

    return {
      orders: result.hits.hits.map(h => h._source),
      total: result.hits.total.value
    };
  }

  async getOrderStatistics(customerId) {
    const result = await this.es.search({
      index: 'orders',
      body: {
        query: {
          term: { customerId }
        },
        aggs: {
          total_spent: { sum: { field: 'totalAmount' } },
          order_count: { value_count: { field: 'orderId' } },
          avg_order_value: { avg: { field: 'totalAmount' } },
          by_status: {
            terms: { field: 'status' }
          }
        },
        size: 0
      }
    });

    return {
      totalSpent: result.aggregations.total_spent.value,
      orderCount: result.aggregations.order_count.value,
      avgOrderValue: result.aggregations.avg_order_value.value,
      byStatus: result.aggregations.by_status.buckets
    };
  }
}

CQRS Caveats and Interview Deep-Dive

CQRS traps practitioners fall into:
  1. Read model staleness surprising users. A user edits their profile, navigates to their profile page, and sees the old name. They think the save failed and click Save again. You now have duplicate events, duplicate emails, and a support ticket. Read-model lag of even 200 ms can destroy user trust in the app.
  2. Projection lag invisible until production. Developers test in an environment where the projection updates in under 5 ms because the load is tiny. In production with real traffic, the projection queue backs up to 30 seconds during bursts. Nobody sees this until customers start complaining.
  3. Projection crashes that require manual rebuild. An unhandled exception in the projection handler poisons the queue. Every retry fails on the same event. The projection falls further and further behind. Without a poison-pill queue and dead-letter handling, a single bad event can wedge the entire read side.
  4. Two-way sync between write and read models. Teams sometimes “helpfully” update the read model directly during a write transaction to reduce lag. This defeats the entire point of CQRS — now the read model is no longer derived purely from events, and a replay from events would produce a different result. Always rebuild the read model from the event stream, never from direct writes.
Solutions and patterns:
  • Read-your-writes consistency where it matters. For queries like “my own profile” or “the order I just placed,” read from the write side directly or include the expected event offset in the query. The client holds the offset from the last write and the query waits (up to a short bound) for the projection to catch up.
  • Projection lag monitoring with SLO alarms. Every projection exposes a “lag in seconds” metric. SLO is “p99 lag under 5 seconds.” Page on-call when breached. This single metric catches 80% of projection problems before users see them.
  • Idempotent projection handlers. Event replay must produce the same result whether run once or 100 times. Use INSERT ... ON CONFLICT DO UPDATE semantics or version-based optimistic writes. This means a handler crash followed by retry cannot corrupt the read model.
  • Separate hot and cold projections. Not every read path needs the same projection. Search uses Elasticsearch, user profile lookup uses Redis, analytics uses BigQuery. Each projection can run at its own freshness budget.
  • Communicate eventual consistency in the UI. If a save is in flight, show a “syncing…” indicator. This manages user expectation and prevents the “save twice” problem above.
Strong Answer Framework:
  1. Quantify the lag. Measure projection lag p50 and p99. If p99 is 2 seconds, users will often see stale data. If it is 50 ms, mostly they will not.
  2. Route the specific query to the write side. “My own profile” reads directly from the write model, bypassing the projection. Other users’ profiles still read from the projection (where staleness is acceptable). This is the “read-your-writes” pattern.
  3. Or: return the updated row from the write endpoint. When the profile-update API returns, include the new profile state in the response. The client caches it and displays it until the next fetch. No projection round-trip needed.
  4. Or: token-based consistency. The write returns a version token. Subsequent reads pass the token; the query layer waits (up to a short bound, say 500 ms) for the projection to reach that version before responding. If the projection is still behind after the bound, read from the write side as fallback.
  5. Communicate in the UI. If none of the above fully eliminate lag, show a “saving…” indicator until the write is confirmed. This is user-experience work, not just backend work.
Real-World Example: Facebook encountered this in 2013 with their news feed projections. Their fix: every post you create is added to your own feed synchronously from the write-side, while feeds of other users are updated asynchronously. When you refresh, you see your own post immediately. Friends see it within a second or two. This is the read-your-writes pattern at massive scale.Senior Follow-up Questions:
  1. “What if the write side cannot serve the read efficiently? It is normalized and the read needs denormalized data.” Cache the denormalized view in-process on the write side, invalidated on every write. Or materialize a “hot” projection that is synchronous within the same transaction as the write. The tradeoff: synchronous projection adds write latency but eliminates read lag. Pick based on which matters more for this query.
  2. “How do you handle projection rebuilds without downtime?” Build the new projection alongside the old, with both consuming events. Dual-write from events, single-read from the old projection. Once the new projection catches up and passes integrity checks, flip the read path to the new projection. Delete the old projection after a safety window. This is the same pattern as online schema migrations.
  3. “What is the correct way to version a read model schema?” Include the schema version in the projection table name or an index name (users_profile_v3). Projections are cheap to rebuild, so create a new projection for a new schema, dual-write for a migration window, then switch readers. This avoids in-place ALTERs on multi-terabyte projections.
Common Wrong Answers:
  1. “Just update the read model synchronously inside the write transaction.” Defeats the scalability benefits of CQRS and couples the write path to the read storage’s availability. If the read store is down, writes fail. You might as well not split them.
  2. “Poll the API from the client until the data appears.” Wastes bandwidth, puts load on the projection, and creates a race with user interactions. Token-based consistency or read-your-writes are better.
Further Reading:
  • Martin Kleppmann, Designing Data-Intensive Applications, Chapter 5 (Replication).
  • Pat Helland, “Immutability Changes Everything” (paper).
  • Netflix Tech Blog posts on read-model projections and freshness SLOs.
Strong Answer Framework:
  1. Stop the bleeding first. Pause the projection so it is not retrying the poison event 10 times per second and flooding logs.
  2. Route the offending event to a dead-letter queue. The projection skips it and advances to the next event. The DLQ preserves the event for investigation without blocking the stream.
  3. Alert the on-call. A DLQ’d event means either a projection bug or an unexpected event shape. Both require human attention.
  4. Investigate and fix. Either patch the projection handler to handle the event, or produce a correction event that supersedes the bad one.
  5. Replay from the DLQ. Once the fix is deployed, re-queue the DLQ’d events and let the projection process them.
  6. Verify projection integrity. After catch-up, run a reconciliation job: rebuild a sample of projection rows from the event stream and compare to the live projection. Any discrepancy is a latent bug.
Real-World Example: Uber’s dispatch system uses Kafka projections heavily. In a 2018 incident, a malformed driver-location event (negative speed value) crashed their ETA projection. Their recovery: DLQ the event, patch the handler to validate and clamp values, replay the DLQ, and add input validation on all event-producing services. Post-mortem: added pre-deployment schema compatibility tests to catch this class of bug before it reaches production.Senior Follow-up Questions:
  1. “How do you design a handler so it does not crash on unexpected data?” Treat events like untrusted input. Validate with Pydantic / JSON Schema at ingress. If validation fails, emit a metric and skip — do not throw. Log the full event payload for forensics. The projection should be resilient by default; crashing is a last resort.
  2. “What if the projection falls behind because of a traffic spike, not a bug?” Separate problem, separate fix. Scale out the projection workers (partition by aggregate key so ordering is preserved within a partition). Monitor consumer lag per partition. If a single partition is hot, consider repartitioning with a better key.
  3. “How do you test projection recovery?” Game-day exercises. Once a quarter, intentionally break a projection in staging (corrupt an event, kill the worker, block the DB write) and time the recovery. Teams that do not practice this are surprised by how long it takes in production — when a projection 10 minutes behind snowballs into customer-facing incidents.
Common Wrong Answers:
  1. “Delete the bad event from Kafka.” Kafka events are immutable once committed. Even if you could delete, you’d break replay for any future projection or any existing consumer. DLQ is the right answer.
  2. “Rebuild the entire projection from scratch.” Works but takes hours to days for a multi-billion-event stream. DLQ + patch + replay is faster and less disruptive. Full rebuild is the last resort, not the first.
Further Reading:
  • Confluent Blog, “Handling Bad Data in Kafka with Dead Letter Queues.”
  • Jay Kreps, “The Log: What every software engineer should know about real-time data’s unifying abstraction.”
  • Uber Engineering Blog, post-mortems on streaming pipeline incidents.

Interview Questions

Answer:Options:
  1. Saga Pattern: Series of local transactions with compensating actions
  2. Two-Phase Commit (2PC): Coordinator-based, but slow and blocking
  3. Outbox Pattern: Store events in same transaction as data changes
Saga is preferred because:
  • Non-blocking
  • Each service has autonomy
  • Scales better
  • Handles failures gracefully
Trade-off: Eventual consistency, need to handle compensation
Answer:Use when:
  • Need full audit trail
  • Business requires “what happened” history
  • Complex domain with many state transitions
  • Need to rebuild state at any point in time
  • Event-driven architecture already in place
Avoid when:
  • Simple CRUD operations
  • Low complexity domain
  • Team unfamiliar with pattern
  • Query performance is critical (use with CQRS)
Challenges: Event versioning, storage growth, eventual consistency
Answer:Benefits:
  • Independent Scaling: Scale reads (often 90%+) separately from writes
  • Optimized Models: Read model denormalized for query performance
  • Different Databases: Use best database for each (PostgreSQL for writes, Elasticsearch for reads)
  • Caching: Read model can be cached aggressively
Trade-offs:
  • Eventual consistency between read/write
  • More complex architecture
  • Need to handle stale reads
Answer:Problem: Publishing events after DB commit can fail, causing inconsistency.Solution:
  1. In same transaction: save data AND write event to outbox table
  2. Separate process reads outbox and publishes to message broker
  3. Mark event as published after successful publish
BEGIN TRANSACTION;
  INSERT INTO orders (...);
  INSERT INTO outbox (event_type, payload) VALUES (...);
COMMIT;
Benefits: Guaranteed at-least-once delivery, atomic with business logic

Summary

Key Takeaways

  • Each service owns its data
  • Use Saga for distributed transactions
  • Event Sourcing for audit/history needs
  • CQRS separates read/write concerns
  • Accept eventual consistency

Next Steps

In the next chapter, we’ll cover Resilience Patterns - circuit breakers, retries, and bulkheads.

Interview Deep-Dive

Strong Answer:This is the classic saga compensation problem. The failure scenario exposes a gap in the saga’s compensation logic — specifically, the payment was committed before inventory was confirmed, and there is no automatic rollback.The correct design uses an orchestrated saga with explicit compensation steps. When inventory reservation fails, the orchestrator should immediately trigger a payment refund as the compensation action. The order status should be set to “Failed - Refund Processing” and the customer should receive a notification explaining what happened.But the deeper design fix is to change the saga sequence. In most e-commerce systems, I would reserve inventory before charging payment, not after. The reason: inventory reservation is easily reversible (just release the hold), while payment charges involve real money, potential fees, and customer anxiety. So the correct flow is: Create Order (pending) -> Reserve Inventory -> Charge Payment -> Confirm Order. If inventory fails, you just cancel the order — no money was touched. If payment fails after inventory is reserved, you release the inventory hold.For the outbox pattern implementation: the Order Service writes both the order record and the saga event to the same database in one transaction. A separate poller (or CDC via Debezium) reads the outbox and publishes events to Kafka. This guarantees that if the order is created, the saga event is also published — no split-brain between database state and message queue state.Follow-up: “How do you handle the case where the refund itself fails?”You need a reconciliation process. The refund attempt goes into a retry queue with exponential backoff. If it fails 3 times, it gets flagged for manual review by the finance team. I also run a daily reconciliation job that compares successful payments against confirmed orders — any payment without a matching confirmed order triggers an automatic refund attempt. This belt-and-suspenders approach catches edge cases that the real-time saga misses.
Strong Answer:The problem with “write to DB then publish event” is that these are two separate operations that can partially fail. If the database write succeeds but the message publish fails (Kafka is temporarily down), you have committed state without notifying other services. If you publish first and the database write fails, other services act on an event for data that does not exist. There is no way to make a database transaction and a message queue publish atomic — they are different systems.The Outbox Pattern solves this by storing the event in the same database as the business data, in the same transaction. You have an “outbox” table where you insert a row with the event payload alongside your business write. Since both writes are in the same ACID transaction, they either both succeed or both fail. A separate process (relay/poller or CDC tool like Debezium) reads the outbox table and publishes events to the message broker. Once published, the outbox row is marked as sent or deleted.The implementation has two approaches. Polling publisher: a cron job that runs every 100ms, queries for unsent outbox rows, publishes them, and marks them sent. Simple but adds latency equal to the poll interval. CDC-based: Debezium reads the PostgreSQL WAL (write-ahead log) and streams changes from the outbox table directly to Kafka in near real-time. More complex to set up but sub-second latency and no polling overhead.The key guarantee: events are published at least once. You must design consumers to be idempotent because the relay might crash after publishing but before marking the outbox row as sent, resulting in a re-publish on recovery.Follow-up: “At what scale does the outbox table become a bottleneck, and what do you do about it?”The outbox table becomes a bottleneck when you are writing thousands of events per second, because the poller queries compete with business writes. At that scale, I switch from polling to CDC (Debezium reading the WAL), which has zero impact on the business table. I also partition the outbox table by date and automatically drop old partitions. For extreme throughput, I use a dedicated outbox database separate from the business database, connected via the same transaction using a “transactional outbox” library that coordinates distributed commits — though at that point, you might question whether the outbox pattern is still the right tool versus a native event sourcing approach where events are the primary store.
Strong Answer:CQRS (Command Query Responsibility Segregation) separates the write model from the read model. I use it when the read and write patterns are fundamentally different: the write side needs strong consistency and validation, while the read side needs denormalized, pre-computed views optimized for specific queries.The classic example: an e-commerce product page. The write model is normalized — product details, inventory counts, pricing rules, and reviews are in separate tables (or services). The read model is a denormalized document in Elasticsearch or MongoDB that contains everything a product page needs in one query: product info, current price, stock status, average rating, and top reviews. Without CQRS, you either do expensive JOINs on every read or maintain this denormalization ad-hoc.The operational challenges teams underestimate:
  • Eventual consistency lag. When a seller updates a product price, there is a delay (seconds to minutes) before the read model reflects the change. Users might see stale prices. You need to decide per use case whether this is acceptable. For product pages, a few seconds of staleness is fine. For inventory counts near zero, you might need a synchronous check before allowing “Add to Cart.”
  • Projection rebuild complexity. When you change the read model schema, you need to rebuild the projection from the event history. For a table with 100 million rows, this can take hours. You need a strategy for running the old and new projections in parallel during migration.
  • Monitoring for drift. How do you know the read model is correct? I run a daily consistency checker that samples records from the write side and verifies they match the read side. Any drift triggers an alert and a targeted re-projection.
I would not use CQRS for simple CRUD applications where the read and write models are essentially identical. The overhead of maintaining two models, a projection mechanism, and consistency monitoring is not justified when a single database with good indexes serves both needs.Follow-up: “How does CQRS interact with the Saga pattern when you need to query data mid-saga?”This is a real trap. During a saga, the write side might be updated but the read side projection has not caught up yet. If the next saga step queries the read model, it sees stale data and makes wrong decisions. The solution: saga steps should query the write model (command side) directly, not the read model. The read model is for external queries (API, UI), not for internal workflow decisions. This means the saga orchestrator needs direct access to the authoritative data source, which is one more reason I prefer orchestrated sagas over choreographed ones — the orchestrator can maintain its own state without depending on eventually-consistent read models.