Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Distributed Messaging & Event Systems

Messaging systems are the nervous system of distributed architectures, enabling asynchronous communication, decoupling, and event-driven processing. If databases are the “memory” of your system, messaging is the “nervous system” — it carries signals between components without requiring the sender to wait for the receiver to process them, just as your brain can send a nerve impulse to your hand without pausing all other activity until the hand responds.
Module Duration: 14-18 hours
Key Topics: Kafka, RabbitMQ, Event Sourcing, CQRS, Exactly-Once Semantics, Stream Processing
Interview Focus: Kafka internals, delivery guarantees, event-driven architecture

Why Messaging Systems?

┌─────────────────────────────────────────────────────────────────────────────┐
│                    SYNCHRONOUS vs ASYNCHRONOUS                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  SYNCHRONOUS (RPC):                                                         │
│  ──────────────────                                                         │
│  Client ─────► Service A ─────► Service B ─────► Service C                 │
│     │◄─────────────────────────────────────────────────────│               │
│     │                                                                       │
│  Problems:                                                                  │
│  • Caller blocked until all complete                                       │
│  • One service down → entire chain fails                                   │
│  • Tight coupling between services                                         │
│  • Hard to add/remove services                                             │
│                                                                              │
│  ASYNCHRONOUS (Messaging):                                                  │
│  ─────────────────────────                                                  │
│  Client ─────► Message Queue                                               │
│     │◄───────    │                                                         │
│  (returns)       ├──────► Service A                                        │
│                  ├──────► Service B                                        │
│                  └──────► Service C                                        │
│                                                                              │
│  Benefits:                                                                  │
│  • Caller returns immediately                                              │
│  • Services process independently                                          │
│  • Loose coupling, easy to scale                                           │
│  • Built-in retry, replay capability                                       │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Message Queue Patterns

Point-to-Point (Work Queue)

┌─────────────────────────────────────────────────────────────────────────────┐
│                    POINT-TO-POINT (WORK QUEUE)                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Each message consumed by exactly one consumer.                             │
│                                                                              │
│                     ┌─────────────────┐                                     │
│                     │     QUEUE       │                                     │
│                     │                 │                                     │
│  Producer ─────────►│ [M1][M2][M3][M4]│                                     │
│                     │                 │                                     │
│                     └────────┬────────┘                                     │
│                              │                                              │
│              ┌───────────────┼───────────────┐                              │
│              ▼               ▼               ▼                              │
│         Consumer A      Consumer B      Consumer C                          │
│            M1              M2              M3                               │
│                                                                              │
│  USE CASES:                                                                 │
│  ──────────                                                                 │
│  • Task distribution (processing jobs)                                     │
│  • Load balancing work across workers                                      │
│  • Order processing, email sending                                         │
│                                                                              │
│  ACKNOWLEDGMENT:                                                            │
│  ───────────────                                                            │
│  Consumer receives message → processes → ACKs                              │
│  If consumer dies before ACK → message redelivered                         │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Publish-Subscribe (Fan-out)

┌─────────────────────────────────────────────────────────────────────────────┐
│                    PUBLISH-SUBSCRIBE (FAN-OUT)                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Each message delivered to ALL subscribers.                                 │
│                                                                              │
│                     ┌─────────────────┐                                     │
│                     │     TOPIC       │                                     │
│                     │                 │                                     │
│  Publisher ────────►│    Message      │                                     │
│                     │                 │                                     │
│                     └────────┬────────┘                                     │
│                              │                                              │
│              ┌───────────────┼───────────────┐                              │
│              │               │               │                              │
│              ▼               ▼               ▼                              │
│       ┌───────────┐   ┌───────────┐   ┌───────────┐                        │
│       │Subscriber │   │Subscriber │   │Subscriber │                        │
│       │  (Email)  │   │ (Analytics│   │  (Audit)  │                        │
│       └───────────┘   └───────────┘   └───────────┘                        │
│                                                                              │
│  All three receive the same message                                        │
│                                                                              │
│  USE CASES:                                                                 │
│  ──────────                                                                 │
│  • Event broadcasting                                                      │
│  • Real-time updates                                                       │
│  • Logging, monitoring, analytics                                          │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Consumer Groups (Kafka Style)

┌─────────────────────────────────────────────────────────────────────────────┐
│                    CONSUMER GROUPS                                           │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Combines work queue (within group) and pub-sub (across groups).           │
│                                                                              │
│                     ┌─────────────────────────────┐                         │
│                     │     TOPIC (4 partitions)    │                         │
│                     │  ┌────┬────┬────┬────┐     │                         │
│                     │  │ P0 │ P1 │ P2 │ P3 │     │                         │
│                     │  └────┴────┴────┴────┘     │                         │
│                     └─────────────┬───────────────┘                         │
│                                   │                                          │
│          ┌────────────────────────┼────────────────────────┐                │
│          ▼                        ▼                        ▼                │
│  ┌───────────────────┐   ┌───────────────────┐   ┌───────────────────┐     │
│  │  Consumer Group A │   │  Consumer Group B │   │  Consumer Group C │     │
│  │  ┌───┐ ┌───┐     │   │  ┌───┐            │   │  ┌───┐            │     │
│  │  │C1 │ │C2 │     │   │  │C1 │            │   │  │C1 │            │     │
│  │  │P0 │ │P1 │     │   │  │P0 │            │   │  │P0 │            │     │
│  │  │P2 │ │P3 │     │   │  │P1 │            │   │  │P1 │            │     │
│  │  └───┘ └───┘     │   │  │P2 │            │   │  │P2 │            │     │
│  │                   │   │  │P3 │            │   │  │P3 │            │     │
│  └───────────────────┘   │  └───┘            │   │  └───┘            │     │
│                           └───────────────────┘   └───────────────────┘     │
│                                                                              │
│  RULES:                                                                     │
│  ──────                                                                     │
│  • Each partition → one consumer in a group                                │
│  • Multiple groups → each gets all messages                                │
│  • Consumer count > partition count → some idle                            │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Apache Kafka Deep Dive

Must-Know: Kafka is the de facto standard for event streaming. Expect detailed questions about partitions, replication, and exactly-once semantics.

Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                    KAFKA ARCHITECTURE                                        │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│                         ┌─────────────────┐                                 │
│                         │    PRODUCERS    │                                 │
│                         └────────┬────────┘                                 │
│                                  │                                          │
│                                  ▼                                          │
│  ┌────────────────────────────────────────────────────────────────────────┐│
│  │                      KAFKA CLUSTER                                     ││
│  │                                                                        ││
│  │  ┌─────────────────────────────────────────────────────────────────┐  ││
│  │  │                      TOPIC: orders                              │  ││
│  │  │                                                                 │  ││
│  │  │  Partition 0    Partition 1    Partition 2    Partition 3       │  ││
│  │  │  ┌─────────┐   ┌─────────┐   ┌─────────┐   ┌─────────┐        │  ││
│  │  │  │ Broker 1│   │ Broker 2│   │ Broker 3│   │ Broker 1│        │  ││
│  │  │  │ (Leader)│   │ (Leader)│   │ (Leader)│   │ (Leader)│        │  ││
│  │  │  └────┬────┘   └────┬────┘   └────┬────┘   └────┬────┘        │  ││
│  │  │       │             │             │             │              │  ││
│  │  │  ┌────▼────┐   ┌────▼────┐   ┌────▼────┐   ┌────▼────┐        │  ││
│  │  │  │ Broker 2│   │ Broker 3│   │ Broker 1│   │ Broker 2│        │  ││
│  │  │  │(Replica)│   │(Replica)│   │(Replica)│   │(Replica)│        │  ││
│  │  │  └────┬────┘   └────┬────┘   └────┬────┘   └────┬────┘        │  ││
│  │  │       │             │             │             │              │  ││
│  │  │  ┌────▼────┐   ┌────▼────┐   ┌────▼────┐   ┌────▼────┐        │  ││
│  │  │  │ Broker 3│   │ Broker 1│   │ Broker 2│   │ Broker 3│        │  ││
│  │  │  │(Replica)│   │(Replica)│   │(Replica)│   │(Replica)│        │  ││
│  │  │  └─────────┘   └─────────┘   └─────────┘   └─────────┘        │  ││
│  │  │                                                                 │  ││
│  │  │  Replication Factor = 3                                        │  ││
│  │  └─────────────────────────────────────────────────────────────────┘  ││
│  │                                                                        ││
│  │  ┌──────────────────────────────────────────────────────┐             ││
│  │  │              ZOOKEEPER / KRAFT                       │             ││
│  │  │  • Cluster membership                                │             ││
│  │  │  • Leader election                                   │             ││
│  │  │  • Topic/partition metadata                          │             ││
│  │  └──────────────────────────────────────────────────────┘             ││
│  └────────────────────────────────────────────────────────────────────────┘│
│                                  │                                          │
│                                  ▼                                          │
│                         ┌─────────────────┐                                 │
│                         │    CONSUMERS    │                                 │
│                         └─────────────────┘                                 │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Partition Internals

┌─────────────────────────────────────────────────────────────────────────────┐
│                    PARTITION STRUCTURE                                       │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  PARTITION = Ordered, immutable sequence of records                        │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                     PARTITION 0                                      │   │
│  │                                                                      │   │
│  │  Offset: 0    1    2    3    4    5    6    7    8    9   ...      │   │
│  │         ┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐        │   │
│  │         │ M0 │ M1 │ M2 │ M3 │ M4 │ M5 │ M6 │ M7 │ M8 │ M9 │        │   │
│  │         └────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘        │   │
│  │                                                                      │   │
│  │         ▲                              ▲              ▲              │   │
│  │         │                              │              │              │   │
│  │   Log Start                     Consumer          High-Water       │   │
│  │   (retention)                   Offset             Mark             │   │
│  │                                                                      │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
│  SEGMENTS:                                                                  │
│  ─────────                                                                  │
│  Partition split into segment files for efficient cleanup                  │
│                                                                              │
│  /kafka-logs/topic-partition-0/                                            │
│    ├── 00000000000000000000.log    (segment 1: offsets 0-999)             │
│    ├── 00000000000000000000.index  (offset → position)                    │
│    ├── 00000000000000001000.log    (segment 2: offsets 1000-1999)         │
│    ├── 00000000000000001000.index                                         │
│    └── ...                                                                 │
│                                                                              │
│  RETENTION:                                                                 │
│  ──────────                                                                 │
│  • Time-based: Delete segments older than X days                          │
│  • Size-based: Delete when partition exceeds X GB                         │
│  • Compaction: Keep only latest value per key                             │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Producer Delivery Guarantees

┌─────────────────────────────────────────────────────────────────────────────┐
│                    PRODUCER ACKNOWLEDGMENTS                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ACKS = 0 (Fire and forget)                                                │
│  ─────────────────────────                                                  │
│  Producer ───► Broker                                                       │
│  (returns immediately, doesn't wait for ACK)                               │
│                                                                              │
│  • Fastest, lowest latency                                                 │
│  • No delivery guarantee                                                   │
│  • Use for: Metrics, logs where loss is acceptable                        │
│                                                                              │
│  ACKS = 1 (Leader acknowledgment)                                          │
│  ────────────────────────────────                                           │
│  Producer ───► Leader ───► ACK                                             │
│  (returns after leader writes)                                             │
│                                                                              │
│  • Good balance of speed and safety                                        │
│  • Leader crash before replication = data loss                             │
│  • Default setting for most use cases                                      │
│                                                                              │
│  ACKS = -1/ALL (All ISR acknowledgment)                                    │
│  ──────────────────────────────────────                                     │
│  Producer ───► Leader ───► Replicas (all ISR) ───► ACK                    │
│  (returns after all in-sync replicas have it)                              │
│                                                                              │
│  • Strongest durability guarantee                                          │
│  • Higher latency                                                          │
│  • Combined with min.insync.replicas for safety                           │
│                                                                              │
│  IDEMPOTENT PRODUCER:                                                       │
│  ────────────────────                                                       │
│  enable.idempotence = true                                                 │
│  • Producer assigns sequence number to each message                        │
│  • Broker rejects duplicates                                               │
│  • Exactly-once within a partition                                         │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Consumer Offset Management

Think of Kafka offsets like a bookmark in a book. If you lose the bookmark, you have to re-read from wherever you last remember — potentially re-reading pages (reprocessing messages). The critical question is: do you move the bookmark before or after you read the page?
from kafka import KafkaConsumer, TopicPartition

# Manual offset management -- the "I control the bookmark" approach.
# This is essential for at-least-once delivery in production systems.
consumer = KafkaConsumer(
    'orders',
    bootstrap_servers=['localhost:9092'],
    group_id='order-processor',
    enable_auto_commit=False,  # CRITICAL: Never use auto-commit for
                                # anything where duplicates matter.
                                # Auto-commit moves the bookmark on a timer,
                                # not when you've actually finished processing.
    auto_offset_reset='earliest'  # On first join or offset expiry, start
                                   # from the beginning -- not 'latest',
                                   # which would skip unprocessed messages.
)

for message in consumer:
    try:
        # Process message first -- this is the at-least-once pattern.
        process_order(message.value)
        
        # Commit offset AFTER successful processing.
        # Note: offset + 1 because we're committing the NEXT offset to read,
        # not the one we just processed. This off-by-one is a common source
        # of bugs in Kafka consumer implementations.
        consumer.commit({
            TopicPartition(message.topic, message.partition): 
                OffsetAndMetadata(message.offset + 1, None)
        })
        
    except Exception as e:
        # Don't commit -- message will be redelivered on next poll.
        # This is the safety net: if processing fails, we retry.
        log.error(f"Failed to process: {e}")
        # Optionally: seek back explicitly for immediate retry
        # rather than waiting for the next consumer restart.
        consumer.seek(
            TopicPartition(message.topic, message.partition),
            message.offset
        )


# The three delivery patterns and their failure modes:
#
# AT-LEAST-ONCE (process then commit) -- shown above
#   Crash after process, before commit --> message redelivered
#   Your handler MUST be idempotent to handle this safely
#
# AT-MOST-ONCE (commit then process)
#   Crash after commit, before process --> message lost forever
#   Only acceptable for non-critical data (metrics, logs)
#
# EXACTLY-ONCE (transactional consume-transform-produce)
#   Kafka transactions wrap offset commit + output writes atomically
#   Highest overhead, but no duplicates and no data loss
Production pitfall: If your consumer group has more instances than partitions, the extra consumers sit idle — they cannot share a partition. Plan your partition count at topic creation time, because increasing it later changes the partitioning of keyed messages and can break ordering guarantees for in-flight data.

Kafka Transactions (Exactly-Once)

┌─────────────────────────────────────────────────────────────────────────────┐
│                    KAFKA TRANSACTIONS                                        │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  TRANSACTIONAL PRODUCER:                                                    │
│  ────────────────────────                                                   │
│  Atomically write to multiple partitions                                   │
│                                                                              │
│  ```python                                                                  │
│  producer = KafkaProducer(                                                 │
│      bootstrap_servers=['localhost:9092'],                                 │
│      transactional_id='my-producer'                                        │
│  )                                                                          │
│                                                                              │
│  producer.init_transactions()                                              │
│                                                                              │
│  producer.begin_transaction()                                              │
│  try:                                                                       │
│      producer.send('topic-a', value=b'message1')                           │
│      producer.send('topic-b', value=b'message2')                           │
│      producer.commit_transaction()                                         │
│  except:                                                                    │
│      producer.abort_transaction()                                          │
│  ```                                                                        │
│                                                                              │
│  READ_COMMITTED CONSUMERS:                                                  │
│  ──────────────────────────                                                 │
│  Only see committed transactional messages                                 │
│                                                                              │
│  isolation.level = "read_committed"                                        │
│                                                                              │
│  CONSUME-TRANSFORM-PRODUCE:                                                │
│  ───────────────────────────                                                │
│  Exactly-once stream processing                                            │
│                                                                              │
│  Input ───► Process ───► Output                                            │
│                │                                                            │
│                └──► Offset Commit                                          │
│                                                                              │
│  All three operations in ONE transaction!                                  │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Event Sourcing

Pattern Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                    EVENT SOURCING                                            │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  TRADITIONAL (State-based):                                                │
│  ───────────────────────────                                                │
│  Store CURRENT state only                                                  │
│                                                                              │
│  Account { id: 123, balance: 500 }                                         │
│                                                                              │
│  Problem: No history, hard to debug, can't replay                          │
│                                                                              │
│  EVENT SOURCING:                                                            │
│  ───────────────                                                            │
│  Store sequence of EVENTS that led to current state                        │
│                                                                              │
│  Event 1: AccountCreated { id: 123, owner: "Alice" }                       │
│  Event 2: MoneyDeposited { id: 123, amount: 1000 }                         │
│  Event 3: MoneyWithdrawn { id: 123, amount: 300 }                          │
│  Event 4: MoneyWithdrawn { id: 123, amount: 200 }                          │
│                                                                              │
│  Current state = replay(events) → balance: 500                             │
│                                                                              │
│  BENEFITS:                                                                  │
│  ─────────                                                                  │
│  ✓ Complete audit trail                                                    │
│  ✓ Can rebuild state at any point                                          │
│  ✓ Debug by replaying events                                               │
│  ✓ Temporal queries (what was state on Jan 1?)                             │
│  ✓ Easy to add new projections                                             │
│                                                                              │
│  CHALLENGES:                                                                │
│  ───────────                                                                │
│  ✗ Schema evolution for events                                             │
│  ✗ Replay can be slow                                                      │
│  ✗ Eventual consistency                                                    │
│  ✗ More complex than CRUD                                                  │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Event Store Implementation

from dataclasses import dataclass
from datetime import datetime
from typing import List
import json

@dataclass
class Event:
    aggregate_id: str
    event_type: str
    data: dict
    timestamp: datetime
    version: int

class EventStore:
    """
    Simple event store backed by a database.
    Production: Use EventStoreDB, Kafka, or DynamoDB Streams.
    
    The event store is analogous to an accounting ledger: you never
    erase or modify past entries -- you only append new ones. To know
    the current balance, you replay the ledger from the beginning
    (or from the last checkpoint/snapshot).
    """
    
    def __init__(self, db):
        self.db = db
    
    def append(self, aggregate_id: str, events: List[Event], expected_version: int):
        """
        Append events with optimistic concurrency control.
        
        The expected_version check is our guard against lost updates:
        if two requests try to append simultaneously, only the first
        succeeds -- the second sees a version mismatch and must retry
        with fresh state. This is the event-sourcing equivalent of a
        compare-and-swap (CAS) operation.
        """
        # Check current version
        current_version = self._get_version(aggregate_id)
        if current_version != expected_version:
            raise ConcurrencyError(
                f"Expected version {expected_version}, got {current_version}"
            )
        
        # Append events
        for i, event in enumerate(events):
            event.version = expected_version + i + 1
            self.db.execute(
                """
                INSERT INTO events (aggregate_id, event_type, data, timestamp, version)
                VALUES (?, ?, ?, ?, ?)
                """,
                (aggregate_id, event.event_type, json.dumps(event.data),
                 event.timestamp, event.version)
            )
        
        # Publish to subscribers
        self._publish(events)
    
    def get_events(self, aggregate_id: str, from_version: int = 0) -> List[Event]:
        """Get all events for an aggregate from a given version."""
        rows = self.db.execute(
            """
            SELECT * FROM events 
            WHERE aggregate_id = ? AND version > ?
            ORDER BY version
            """,
            (aggregate_id, from_version)
        )
        return [self._row_to_event(row) for row in rows]


class BankAccount:
    """
    Event-sourced aggregate.
    """
    
    def __init__(self, account_id: str):
        self.id = account_id
        self.balance = 0
        self.is_closed = False
        self.version = 0
        self._pending_events = []
    
    # Command handlers
    def deposit(self, amount: float):
        if self.is_closed:
            raise ValueError("Account is closed")
        if amount <= 0:
            raise ValueError("Amount must be positive")
        
        self._apply(MoneyDeposited(self.id, amount))
    
    def withdraw(self, amount: float):
        if self.is_closed:
            raise ValueError("Account is closed")
        if amount > self.balance:
            raise ValueError("Insufficient funds")
        
        self._apply(MoneyWithdrawn(self.id, amount))
    
    # Event handlers
    def _apply(self, event):
        self._pending_events.append(event)
        self._handle(event)
    
    def _handle(self, event):
        if isinstance(event, AccountCreated):
            self.balance = 0
        elif isinstance(event, MoneyDeposited):
            self.balance += event.amount
        elif isinstance(event, MoneyWithdrawn):
            self.balance -= event.amount
        elif isinstance(event, AccountClosed):
            self.is_closed = True
        
        self.version += 1
    
    # Rehydration
    @classmethod
    def from_events(cls, account_id: str, events: List[Event]) -> 'BankAccount':
        account = cls(account_id)
        for event in events:
            account._handle(event)
        return account


# Usage
def transfer_money(from_id: str, to_id: str, amount: float, event_store: EventStore):
    """
    Saga pattern for cross-aggregate operation.
    """
    from_events = event_store.get_events(from_id)
    from_account = BankAccount.from_events(from_id, from_events)
    
    to_events = event_store.get_events(to_id)
    to_account = BankAccount.from_events(to_id, to_events)
    
    try:
        from_account.withdraw(amount)
        to_account.deposit(amount)
        
        # Save both in transaction (or use saga with compensation)
        event_store.append(from_id, from_account._pending_events, from_account.version)
        event_store.append(to_id, to_account._pending_events, to_account.version)
        
    except Exception as e:
        # Compensation logic if needed
        raise

CQRS (Command Query Responsibility Segregation)

┌─────────────────────────────────────────────────────────────────────────────┐
│                    CQRS PATTERN                                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  TRADITIONAL:                                                               │
│  ────────────                                                               │
│  Same model for reads and writes                                           │
│                                                                              │
│  API ───► Service ───► Database                                            │
│   │                       ▲                                                 │
│   └───────────────────────┘                                                 │
│                                                                              │
│  CQRS:                                                                      │
│  ─────                                                                      │
│  Separate models for commands (writes) and queries (reads)                 │
│                                                                              │
│       WRITE PATH                       READ PATH                            │
│  ┌──────────────────┐             ┌──────────────────┐                     │
│  │    Commands      │             │     Queries      │                     │
│  │ (Create, Update) │             │  (List, Search)  │                     │
│  └────────┬─────────┘             └────────┬─────────┘                     │
│           │                                 │                               │
│           ▼                                 ▼                               │
│  ┌──────────────────┐             ┌──────────────────┐                     │
│  │  Command Handler │             │  Query Handler   │                     │
│  │  (Domain Logic)  │             │  (Simple reads)  │                     │
│  └────────┬─────────┘             └────────┬─────────┘                     │
│           │                                 │                               │
│           ▼                                 ▼                               │
│  ┌──────────────────┐             ┌──────────────────┐                     │
│  │   Write Model    │────────────►│    Read Model    │                     │
│  │   (Normalized)   │   Events/   │  (Denormalized)  │                     │
│  │                  │   Projections│                  │                     │
│  └──────────────────┘             └──────────────────┘                     │
│                                                                              │
│  BENEFITS:                                                                  │
│  ─────────                                                                  │
│  • Optimize read/write models independently                                │
│  • Scale reads and writes separately                                       │
│  • Different storage for each (SQL writes, Elasticsearch reads)            │
│  • Complex queries without affecting write performance                     │
│                                                                              │
│  PROJECTIONS (Event → Read Model):                                         │
│  ──────────────────────────────────                                         │
│  Events from write model → Update denormalized read model                  │
│  Example: OrderPlaced → Update UserOrderHistory, ProductSalesCount         │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Projection Example

class OrderProjection:
    """
    Builds read model from order events.
    """
    
    def __init__(self, read_db, event_store):
        self.read_db = read_db
        self.event_store = event_store
    
    def handle(self, event):
        """Route event to appropriate handler."""
        handler = getattr(self, f'on_{event.event_type}', None)
        if handler:
            handler(event)
    
    def on_OrderPlaced(self, event):
        # Insert into orders read model
        self.read_db.execute(
            """
            INSERT INTO order_summaries (id, user_id, total, status, created_at)
            VALUES (?, ?, ?, 'placed', ?)
            """,
            (event.data['order_id'], event.data['user_id'], 
             event.data['total'], event.timestamp)
        )
        
        # Update user statistics
        self.read_db.execute(
            """
            UPDATE user_stats 
            SET order_count = order_count + 1,
                total_spent = total_spent + ?
            WHERE user_id = ?
            """,
            (event.data['total'], event.data['user_id'])
        )
    
    def on_OrderShipped(self, event):
        self.read_db.execute(
            """
            UPDATE order_summaries 
            SET status = 'shipped', shipped_at = ?
            WHERE id = ?
            """,
            (event.timestamp, event.data['order_id'])
        )
    
    def rebuild(self):
        """Rebuild entire read model from events."""
        self.read_db.execute("TRUNCATE order_summaries")
        self.read_db.execute("TRUNCATE user_stats")
        
        for event in self.event_store.get_all_events():
            self.handle(event)

Message Queue Comparison

┌─────────────────────────────────────────────────────────────────────────────┐
│                    MESSAGE QUEUE COMPARISON                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌────────────┬─────────────┬─────────────┬─────────────┬─────────────┐    │
│  │ Feature    │ Kafka       │ RabbitMQ    │ AWS SQS     │ Pulsar      │    │
│  ├────────────┼─────────────┼─────────────┼─────────────┼─────────────┤    │
│  │ Model      │ Log         │ Queue       │ Queue       │ Log         │    │
│  │ Ordering   │ Per-partition│ Per-queue  │ Best-effort │ Per-partition│    │
│  │ Retention  │ Configurable│ Until ACK   │ 14 days max │ Configurable│    │
│  │ Replay     │ Yes         │ No          │ No          │ Yes         │    │
│  │ Throughput │ Very high   │ Moderate    │ High        │ Very high   │    │
│  │ Latency    │ Low         │ Very low    │ Moderate    │ Low         │    │
│  │ Exactly-once│ Yes (EOS)  │ No          │ No          │ Yes         │    │
│  │ Multi-tenancy│ Topics    │ vHosts      │ Queues      │ Tenants     │    │
│  └────────────┴─────────────┴─────────────┴─────────────┴─────────────┘    │
│                                                                              │
│  WHEN TO USE:                                                               │
│  ─────────────                                                              │
│                                                                              │
│  KAFKA:                                                                     │
│  • Event streaming, log aggregation                                        │
│  • High throughput (millions/sec)                                          │
│  • Need replay capability                                                  │
│  • Event sourcing, CQRS                                                    │
│                                                                              │
│  RABBITMQ:                                                                  │
│  • Complex routing (exchanges, bindings)                                   │
│  • Low latency messaging                                                   │
│  • RPC patterns                                                            │
│  • Traditional message queue patterns                                      │
│                                                                              │
│  SQS:                                                                       │
│  • AWS-native workloads                                                    │
│  • Fully managed, no operations                                            │
│  • Simple queue patterns                                                   │
│  • Lambda triggers                                                         │
│                                                                              │
│  PULSAR:                                                                    │
│  • Multi-tenancy requirements                                              │
│  • Geo-replication                                                         │
│  • Unified queuing and streaming                                           │
│  • Functions (serverless)                                                  │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Dead Letter Queues

┌─────────────────────────────────────────────────────────────────────────────┐
│                    DEAD LETTER QUEUES (DLQ)                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  PROBLEM: What happens to messages that can't be processed?                │
│                                                                              │
│  Without DLQ:                                                               │
│  • Poison message blocks queue                                             │
│  • Infinite retry loop                                                     │
│  • Messages lost if discarded                                              │
│                                                                              │
│  With DLQ:                                                                  │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │                                                                      │  │
│  │  Main Queue ───► Consumer ──┬──► Success                            │  │
│  │                             │                                        │  │
│  │                      Retry  │  Failure (after N retries)            │  │
│  │                      ↓      │                                        │  │
│  │                   ┌──┴──────▼────────────┐                          │  │
│  │                   │   Dead Letter Queue  │                          │  │
│  │                   │                      │                          │  │
│  │                   │  Failed messages for │                          │  │
│  │                   │  inspection/replay   │                          │  │
│  │                   └──────────────────────┘                          │  │
│  │                                                                      │  │
│  └──────────────────────────────────────────────────────────────────────┘  │
│                                                                              │
│  IMPLEMENTATION:                                                            │
│  ───────────────                                                            │
│                                                                              │
│  ```python                                                                  │
│  MAX_RETRIES = 3                                                           │
│                                                                              │
│  def process_message(message):                                             │
│      retry_count = message.headers.get('retry_count', 0)                   │
│                                                                              │
│      try:                                                                   │
│          handle(message)                                                   │
│          ack(message)                                                      │
│      except TransientError:                                                │
│          if retry_count < MAX_RETRIES:                                     │
│              requeue_with_delay(message, retry_count + 1)                  │
│          else:                                                             │
│              send_to_dlq(message)                                          │
│      except PermanentError:                                                │
│          send_to_dlq(message)  # Don't retry                              │
│  ```                                                                        │
│                                                                              │
│  DLQ HANDLING:                                                              │
│  ─────────────                                                              │
│  • Manual inspection and fix                                               │
│  • Automated replay after fix deployed                                     │
│  • Alerts for DLQ depth                                                    │
│  • Retention policy for old messages                                       │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Advanced Design Scenarios

Scenario 1: End-to-End Exactly-Once Pipeline

You need to build a consume → transform → produce pipeline that processes financial events exactly once, even across restarts and network failures. Requirements:
  • Input events from Kafka transactions.raw
  • Processed events written to Kafka transactions.cleared and a database
  • No duplicate effects (no double-charging, no missing debits)
Design:
  • Use idempotent + transactional producer for the output topic
  • Use read-committed consumers and manual offset commit
  • Wrap DB writes and Kafka offset/produce in a single transaction (per-partition)
Key pattern:
  • Maintain an idempotency key (e.g., event ID) table in the DB
  • On each event:
    • If key seen before → skip (already processed)
    • Else → apply business logic, insert key, produce output message, commit transaction and offset
This gives a practical, implementation-ready story you can walk through at the whiteboard.

Scenario 2: Handling Backpressure and Slow Consumers

You run a high-throughput Kafka topic with occasionally slow consumers. Symptoms:
  • Consumer lag grows during spikes
  • Retention window is threatened
  • Downstream services start timing out
Strategies:
  • Horizontal scale: Increase consumer instances within the group (up to partitions count)
  • Backpressure-aware processing: Use bounded thread pools and queues in consumers
  • Prioritization: Route high-priority messages to a separate topic/consumer group
  • Graceful degradation: Drop non-critical messages or aggregate at coarser granularity under load
Be prepared to discuss lag monitoring, alerting, and how you would safely catch up (e.g., temporary relaxed SLAs, extra consumer capacity, or time-bounded replay).

Scenario 3: Region Outage and Replay

Your system runs active-active in two regions, both consuming from and producing to regional topics. Requirements:
  • If one region fails for 2 hours, you must replay missed events when it comes back
  • No double-processing and no lost events
Design outline:
  • Use per-region topics (e.g., events.us, events.eu) plus a global replication pipeline
  • Keep checkpoints per consumer group and region (last processed offset / timestamp)
  • On recovery:
    • Start consumers from last committed checkpoint
    • Apply the same idempotent processing pattern as Scenario 1
This scenario connects messaging to disaster recovery and business continuity planning.

Interview Practice

Question: Design a system that sends notifications (push, email, SMS) for an e-commerce platform.Design:
Architecture:

Event Sources ───► Kafka ───► Notification Service ───► Channels
(orders, user)              (router, templates)      (Push/Email/SMS)

Components:

1. Event Ingestion
   - Kafka topic: notifications.events
   - Events: OrderPlaced, OrderShipped, PriceDropped, etc.
   
2. Notification Router
   - Reads events from Kafka
   - Applies user preferences
   - Routes to appropriate channel
   
3. Channel Workers
   - Separate consumer groups per channel
   - Push: Firebase/APNS
   - Email: SES/SendGrid
   - SMS: Twilio
   
4. Template Service
   - Render message from event + template
   - i18n support
   
5. Rate Limiting
   - Don't spam users
   - Per-user, per-channel limits
   
6. Dead Letter Queue
   - Failed deliveries for retry
   - Alerting on high DLQ depth
Question: When would you choose Kafka over RabbitMQ?Choose Kafka when:
  • High throughput needed (100K+ msg/sec)
  • Need to replay messages
  • Event sourcing / audit log
  • Stream processing (Kafka Streams, ksqlDB)
  • Multiple consumers reading same messages
  • Long-term message retention
Choose RabbitMQ when:
  • Complex routing needed (fanout, topic, headers)
  • Low latency is critical (< 1ms)
  • Need RPC patterns (request/reply)
  • Traditional work queues
  • Smaller scale, simpler operations
  • Need message priority
Key Insight: Kafka is a log, RabbitMQ is a queue. Different mental models! Think of Kafka as a newspaper archive — everyone can read any issue at any time, and old issues stick around based on retention policy. RabbitMQ is more like a postal service — once the letter is delivered and acknowledged, it is gone.
Question: How do you achieve exactly-once message processing?Answer:True exactly-once is impossible (Two Generals Problem). We achieve “effectively exactly-once”:Option 1: Idempotent Consumer
def process(message):
    if already_processed(message.id):
        return  # Deduplicate
    
    with transaction:
        do_work(message)
        mark_processed(message.id)
Option 2: Kafka Transactions (EOS)
- Producer: enable.idempotence = true
- Consumer: isolation.level = read_committed
- Consume-transform-produce in single transaction
Option 3: Outbox Pattern
1. Write to DB + outbox table (same transaction)
2. Separate process reads outbox, publishes to Kafka
3. Delete from outbox after publish confirmed
Option 4: Change Data Capture (CDC)
- Debezium reads DB transaction log
- Publishes changes to Kafka
- Inherently ordered and exactly-once

Key Takeaways

Kafka is a Log, Not a Queue

Messages persist, can be replayed, and multiple consumers read independently.

Partitions Enable Parallelism

More partitions = more consumers = higher throughput. Choose partition key wisely.

Event Sourcing Provides Audit Trail

Store events, derive state. Enables debugging, replay, and temporal queries.

Exactly-Once = Idempotency

At-least-once delivery + idempotent processing = effectively exactly-once.

Next Steps

Data Systems

Learn about Kafka’s storage, compaction, and stream processing

Transactions

Understand distributed transactions and the Saga pattern