> ## Documentation Index
> Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
> Use this file to discover all available pages before exploring further.

# 24. Migration Patterns

> Master the art of migrating from monolith to microservices using Strangler Fig, Branch by Abstraction, and other proven patterns

# Migration Patterns

Migrating from a monolith to microservices is a journey, not a destination. The number one cause of failed migrations is not technical -- it is impatience. Teams attempt "big bang" rewrites, run two systems in parallel for too long, or extract the wrong service first. The patterns in this chapter (Strangler Fig, Branch by Abstraction, CDC-based data sync) all share one philosophy: make the migration reversible at every step. If you cannot roll back any individual change, you are taking on risk that no pattern can save you from.

<Info>
  **Learning Objectives:**

  * Understand why incremental migration beats big-bang rewrites
  * Master the Strangler Fig pattern
  * Implement Branch by Abstraction
  * Learn database migration strategies
  * Handle dual-writes and data synchronization
</Info>

***

## Why Migrations Fail

Before reaching for any pattern, understand why most migrations go sideways. The failure modes below are not rare edge cases -- they are the default outcome unless you actively steer away from them. Every experienced migration leader has scars from at least one of these. The common thread: each anti-pattern is born from an entirely reasonable-looking decision that compounds into disaster over months. Big-bang rewrites feel clean. Splitting by technical layer feels organized. Migrating data first feels like "getting the hard part over with." All three are traps.

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                    MIGRATION ANTI-PATTERNS                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ❌ BIG BANG REWRITE                                                        │
│  ─────────────────────                                                      │
│                                                                              │
│  "Let's rewrite everything from scratch"                                    │
│                                                                              │
│  Problems:                                                                  │
│  • Takes years (Netscape took 3 years, lost market share)                  │
│  • No value delivered until complete                                        │
│  • Original system still needs maintenance                                  │
│  • Requirements change during rewrite                                       │
│  • Team burnout                                                             │
│                                                                              │
│  ═══════════════════════════════════════════════════════════════════════   │
│                                                                              │
│  ❌ WRONG SERVICE BOUNDARIES                                                │
│  ──────────────────────────────                                             │
│                                                                              │
│  "Let's split by technical layer"                                           │
│                                                                              │
│  ┌─────────────┐     ┌─────────────┐     ┌─────────────┐                   │
│  │ UI Service  │ ──▶ │ API Service │ ──▶ │ DB Service  │                   │
│  └─────────────┘     └─────────────┘     └─────────────┘                   │
│                                                                              │
│  Problems:                                                                  │
│  • One change requires updating all three                                   │
│  • Tight coupling disguised as services                                    │
│  • Distributed monolith                                                     │
│                                                                              │
│  ═══════════════════════════════════════════════════════════════════════   │
│                                                                              │
│  ❌ MIGRATING DATA TOO EARLY                                                │
│  ────────────────────────────                                               │
│                                                                              │
│  "Let's first migrate all the data, then the code"                         │
│                                                                              │
│  Problems:                                                                  │
│  • Data coupling remains                                                    │
│  • Complex synchronization                                                  │
│  • Risk of data inconsistency                                               │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
```

<Warning>
  **The Distributed Monolith Trap.** Splitting by technical layer (UI / API / DB) or by "shared entities" produces a system that looks like microservices on paper but behaves like a monolith with network latency added. You know you are in this trap when: (a) deploying one service requires coordinating releases with other services, (b) a single feature touches 4+ services, (c) one team owns code in many services. The cure is to split by **business capability** -- who owns the decisions, not who stores the data.
</Warning>

***

## The Strangler Fig Pattern

<Info>
  Named after strangler fig plants that grow around host trees, eventually replacing them entirely. Similarly, we gradually replace monolith functionality with microservices.
</Info>

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                    STRANGLER FIG PATTERN                                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  PHASE 1: INTERCEPT                                                         │
│  ──────────────────────                                                     │
│                                                                              │
│          ┌───────────────────┐                                              │
│  Client ─▶│   Facade/Proxy   │──────────────────────▶ Monolith              │
│          └───────────────────┘                                              │
│                                                                              │
│  All traffic goes through facade (no code changes yet)                      │
│                                                                              │
│  ═══════════════════════════════════════════════════════════════════════   │
│                                                                              │
│  PHASE 2: EXTRACT                                                           │
│  ──────────────────                                                         │
│                                                                              │
│          ┌───────────────────┐         ┌──────────────────┐                │
│  Client ─▶│   Facade/Proxy   │────────▶│  New User Svc    │                │
│          │                   │         └──────────────────┘                │
│          │                   │                                              │
│          └───────────────────┘──────────────────────▶ Monolith              │
│                │                                        (minus users)       │
│                │                                                            │
│                └──▶ /users/* → New Service                                 │
│                     /* → Monolith                                          │
│                                                                              │
│  ═══════════════════════════════════════════════════════════════════════   │
│                                                                              │
│  PHASE 3: REPEAT                                                            │
│  ─────────────────                                                          │
│                                                                              │
│          ┌───────────────────┐         ┌──────────────────┐                │
│  Client ─▶│   Facade/Proxy   │────────▶│  User Service    │                │
│          │                   │         ├──────────────────┤                │
│          │                   │────────▶│  Order Service   │                │
│          │                   │         ├──────────────────┤                │
│          │                   │────────▶│  Payment Service │                │
│          └───────────────────┘         └──────────────────┘                │
│                │                                                            │
│                └──▶ Monolith (shrinking)                                   │
│                                                                              │
│  ═══════════════════════════════════════════════════════════════════════   │
│                                                                              │
│  PHASE 4: RETIRE                                                            │
│  ──────────────────                                                         │
│                                                                              │
│          ┌───────────────────┐         ┌──────────────────┐                │
│  Client ─▶│   API Gateway    │────────▶│  User Service    │                │
│          │                   │         ├──────────────────┤                │
│          │                   │────────▶│  Order Service   │                │
│          │                   │         ├──────────────────┤                │
│          │                   │         ├──────────────────┤                │
│          │                   │────────▶│  Payment Service │                │
│          │                   │         ├──────────────────┤                │
│          │                   │────────▶│  Catalog Service │                │
│          └───────────────────┘         └──────────────────┘                │
│                                                                              │
│  Monolith is dead! 🎉                                                       │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
```

<Warning>
  **Caveats & Common Pitfalls: Big Bang Cutover.**

  * **"Let us just rewrite it over the weekend."** Big bang cutovers fail because you cannot test every behavior of a legacy system until real traffic hits it. Edge cases you did not know existed surface as outages. Netscape lost years of market share doing this; do not repeat it.
  * **Freeze during rewrite.** Feature development halts during a big bang, which is politically unsustainable. Engineers leave, stakeholders lose patience, and the project gets cancelled at 70% done.
  * **Rollback is binary.** With big bang, rollback means reverting 6 months of DB migrations and code changes at once. Mid-cutover failures usually become multi-day outages because "just roll it back" is not actually an option.
  * **Data migration order dependency.** Tables with foreign keys must migrate in dependency order, and legacy code may still write to tables you thought were frozen. Big bang assumes a clean cutover moment that almost never exists.
</Warning>

<Tip>
  **Solutions & Patterns: Incremental Cutover.**

  * **Strangler Fig is mandatory for anything above 100K lines.** Extract one endpoint at a time, validate with real traffic before ramping, and keep the monolith as fallback until the new service has at least 3 months of production stability.
  * **Every extraction is reversible for at least one release cycle.** Keep the monolith's old code path behind a feature flag. If you remove it immediately, you have no rollback when the new service misbehaves weeks later.
  * **Run shadow mode before any traffic cutover.** Send identical requests to both systems, compare results, log divergences. You will find bugs in the new implementation that would otherwise surface as production incidents.
  * **Always extract a leaf module first.** Pick a module that nothing else in the monolith depends on (notifications, search, reporting). Module-with-many-dependents is your last extraction, not your first.
</Tip>

### Implementation

The strangler facade is the safest primitive in the migration toolbox because it is fundamentally additive. In Phase 1 you introduce a reverse proxy with every route still pointing at the monolith -- behavior is byte-for-byte identical, but you now own the routing layer. Phase 2 begins the real migration: a single endpoint is routed to a new service behind a feature flag, starting at 1% of traffic. If error rates or latency degrade, you flip the flag back to 0% and traffic returns to the monolith in seconds -- no deploy, no rollback ceremony. You then ramp to 5%, 25%, 50%, 100% over days or weeks, watching dashboards and error budgets at each step. **Key decision points to proceed:** new-service error rate is within 1x of monolith, p99 latency is within 20%, business metrics (conversion, checkout success) are flat or better. **Decision points to roll back:** any hard error spike, data divergence detected in shadow comparisons, or an on-call page during ramp. Do not skip percentages and do not rush -- the whole point of this pattern is that the ramp is slow enough to catch problems cheaply.

<Tabs>
  <Tab title="Node.js">
    ```javascript theme={null}
    // strangler-facade.js - The routing layer that enables migration

    const express = require('express');
    const { createProxyMiddleware } = require('http-proxy-middleware');

    const app = express();

    // Configuration: which routes go where
    const routingConfig = {
      // New microservices
      microservices: {
        '/api/users': 'http://user-service:3001',
        '/api/orders': 'http://order-service:3002',
        '/api/payments': 'http://payment-service:3003'
      },
      // Everything else goes to monolith
      monolith: 'http://monolith:8080'
    };

    // Feature flags for gradual rollout
    const featureFlags = {
      'users-service': { enabled: true, percentage: 100 },
      'orders-service': { enabled: true, percentage: 50 },  // 50% traffic
      'payments-service': { enabled: false, percentage: 0 }  // Still in monolith
    };

    // Determine if request should go to microservice
    function shouldRouteToMicroservice(serviceName, req) {
      const flag = featureFlags[serviceName];
      if (!flag || !flag.enabled) return false;
      
      // Percentage rollout based on user ID or random
      if (flag.percentage < 100) {
        const userId = req.headers['x-user-id'] || req.ip;
        const hash = hashCode(userId);
        return (hash % 100) < flag.percentage;
      }
      
      return true;
    }

    function hashCode(str) {
      let hash = 0;
      for (let i = 0; i < str.length; i++) {
        hash = ((hash << 5) - hash) + str.charCodeAt(i);
        hash |= 0;
      }
      return Math.abs(hash);
    }

    // Dynamic routing middleware
    app.use((req, res, next) => {
      const path = req.path;
      
      // Find matching microservice route
      for (const [route, target] of Object.entries(routingConfig.microservices)) {
        if (path.startsWith(route)) {
          const serviceName = route.split('/').pop() + '-service';
          
          if (shouldRouteToMicroservice(serviceName, req)) {
            // Route to microservice
            req.routingDecision = { target, type: 'microservice' };
          } else {
            // Route to monolith
            req.routingDecision = { target: routingConfig.monolith, type: 'monolith' };
          }
          break;
        }
      }
      
      // Default to monolith
      if (!req.routingDecision) {
        req.routingDecision = { target: routingConfig.monolith, type: 'monolith' };
      }
      
      // Log for analysis
      console.log(`[Strangler] ${req.method} ${path} -> ${req.routingDecision.type}`);
      
      next();
    });

    // Create proxy for each target
    app.use('/api/users', (req, res, next) => {
      if (req.routingDecision.type === 'microservice') {
        createProxyMiddleware({
          target: routingConfig.microservices['/api/users'],
          changeOrigin: true,
          pathRewrite: { '^/api/users': '' },
          onError: (err, req, res) => {
            // Fallback to monolith on error
            console.error('[Strangler] Microservice error, falling back:', err.message);
            createProxyMiddleware({
              target: routingConfig.monolith,
              changeOrigin: true
            })(req, res, next);
          }
        })(req, res, next);
      } else {
        createProxyMiddleware({
          target: routingConfig.monolith,
          changeOrigin: true
        })(req, res, next);
      }
    });

    // Catch-all for monolith
    app.use(createProxyMiddleware({
      target: routingConfig.monolith,
      changeOrigin: true
    }));

    app.listen(3000);
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    # strangler_facade.py - FastAPI reverse proxy for monolith-to-microservices migration

    import hashlib
    import logging
    import os
    from typing import Optional

    import httpx
    from fastapi import FastAPI, Request, Response
    from pydantic import BaseModel

    logging.basicConfig(level=logging.INFO)
    logger = logging.getLogger("strangler")

    app = FastAPI(title="Strangler Fig Facade")


    class ServiceTarget(BaseModel):
        """Routing target for a specific service."""

        prefix: str
        url: str
        service_name: str


    class FeatureFlag(BaseModel):
        """Feature flag controlling gradual rollout."""

        enabled: bool
        percentage: int  # 0-100


    # Configuration: which routes go where
    MICROSERVICES: list[ServiceTarget] = [
        ServiceTarget(prefix="/api/users", url="http://user-service:3001", service_name="users-service"),
        ServiceTarget(prefix="/api/orders", url="http://order-service:3002", service_name="orders-service"),
        ServiceTarget(prefix="/api/payments", url="http://payment-service:3003", service_name="payments-service"),
    ]
    MONOLITH_URL = os.getenv("MONOLITH_URL", "http://monolith:8080")


    def load_feature_flags() -> dict[str, FeatureFlag]:
        """Read feature flag state from environment variables.

        Example env vars:
            FF_USERS_SERVICE_ENABLED=true
            FF_USERS_SERVICE_PERCENTAGE=100
        """
        flags: dict[str, FeatureFlag] = {}
        for svc in MICROSERVICES:
            key = svc.service_name.upper().replace("-", "_")
            flags[svc.service_name] = FeatureFlag(
                enabled=os.getenv(f"FF_{key}_ENABLED", "false").lower() == "true",
                percentage=int(os.getenv(f"FF_{key}_PERCENTAGE", "0")),
            )
        return flags


    FEATURE_FLAGS = load_feature_flags()


    def stable_hash(value: str) -> int:
        """Deterministic 0-99 bucket for a given user id.

        Using a stable hash (not Python's built-in hash which is randomized)
        ensures a user stays in the same cohort across facade restarts.
        """
        digest = hashlib.md5(value.encode()).hexdigest()
        return int(digest, 16) % 100


    def should_route_to_microservice(service_name: str, request: Request) -> bool:
        """Decide whether this request should hit the new microservice."""
        flag = FEATURE_FLAGS.get(service_name)
        if flag is None or not flag.enabled:
            return False

        if flag.percentage >= 100:
            return True
        if flag.percentage <= 0:
            return False

        identity = request.headers.get("x-user-id") or (request.client.host if request.client else "anon")
        return stable_hash(identity) < flag.percentage


    def match_microservice(path: str) -> Optional[ServiceTarget]:
        """Return the microservice target that owns this path, if any."""
        for svc in MICROSERVICES:
            if path.startswith(svc.prefix):
                return svc
        return None


    # Shared async HTTP client; reuse connections across requests
    client = httpx.AsyncClient(timeout=httpx.Timeout(10.0, connect=2.0))


    async def forward(target_url: str, path: str, request: Request) -> Response:
        """Forward the incoming request to target_url preserving method/headers/body."""
        body = await request.body()
        # Strip hop-by-hop headers; httpx will set Host for us
        forward_headers = {k: v for k, v in request.headers.items() if k.lower() != "host"}
        upstream = await client.request(
            method=request.method,
            url=f"{target_url}{path}",
            content=body,
            headers=forward_headers,
            params=request.query_params,
        )
        return Response(
            content=upstream.content,
            status_code=upstream.status_code,
            headers={k: v for k, v in upstream.headers.items() if k.lower() not in {"content-encoding", "transfer-encoding"}},
        )


    @app.api_route("/{path:path}", methods=["GET", "POST", "PUT", "PATCH", "DELETE"])
    async def strangler_proxy(path: str, request: Request) -> Response:
        full_path = f"/{path}"
        svc = match_microservice(full_path)

        # Paths that aren't microservice candidates go straight to the monolith
        if svc is None:
            logger.info("%s %s -> monolith (no match)", request.method, full_path)
            return await forward(MONOLITH_URL, full_path, request)

        if should_route_to_microservice(svc.service_name, request):
            # Strip the prefix so the microservice receives a clean path
            stripped = full_path[len(svc.prefix):] or "/"
            try:
                logger.info("%s %s -> microservice %s", request.method, full_path, svc.service_name)
                return await forward(svc.url, stripped, request)
            except httpx.HTTPError as exc:
                # Fallback to monolith if the new service is unhealthy.
                # This is THE key safety property of strangler fig.
                logger.error("Microservice %s failed (%s); falling back to monolith", svc.service_name, exc)
                return await forward(MONOLITH_URL, full_path, request)

        logger.info("%s %s -> monolith (flag off)", request.method, full_path)
        return await forward(MONOLITH_URL, full_path, request)


    @app.on_event("shutdown")
    async def shutdown() -> None:
        await client.aclose()
    ```
  </Tab>
</Tabs>

***

## Branch by Abstraction

<Info>
  Use when you need to replace a component **inside** the monolith before extracting it. Create an abstraction layer, swap implementations, then extract. This is complementary to Strangler Fig -- Strangler routes traffic externally at the API level, while Branch by Abstraction swaps implementations internally at the code level. Use Strangler for extracting entire endpoints; use Branch by Abstraction for replacing internal dependencies (like swapping an in-app email sender for a notification microservice client).
</Info>

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                    BRANCH BY ABSTRACTION                                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  STEP 1: IDENTIFY COMPONENT                                                 │
│  ─────────────────────────────                                              │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │  MONOLITH                                                            │   │
│  │                                                                       │   │
│  │   Code A ──┬──▶ Notification Module ◀──┬── Code C                   │   │
│  │            │                            │                            │   │
│  │   Code B ──┘    (direct dependency)     └── Code D                   │   │
│  │                                                                       │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
│  STEP 2: CREATE ABSTRACTION                                                 │
│  ────────────────────────────                                               │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │  MONOLITH                                                            │   │
│  │                                                                       │   │
│  │   Code A ──┬──▶ NotificationInterface ◀──┬── Code C                 │   │
│  │            │            │                 │                          │   │
│  │   Code B ──┘            ▼                 └── Code D                 │   │
│  │                  OldNotificationImpl                                 │   │
│  │                 (existing code)                                      │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
│  STEP 3: IMPLEMENT NEW VERSION                                              │
│  ────────────────────────────────                                           │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │  MONOLITH                                                            │   │
│  │                              ┌──────────────────┐                   │   │
│  │   Code ──▶ NotificationInterface                │                   │   │
│  │                    │         │ toggle           │                   │   │
│  │            ┌───────┴───────┐ ▼                  ▼                   │   │
│  │            │               │                                         │   │
│  │    OldNotificationImpl   NewNotificationImpl                        │   │
│  │                          (calls microservice)                       │   │
│  │                                                                       │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                        │                                                    │
│                        └────────────────▶ Notification Microservice        │
│                                                                              │
│  STEP 4: MIGRATE AND REMOVE                                                 │
│  ──────────────────────────────                                             │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │  MONOLITH                                                            │   │
│  │                                                                       │   │
│  │   Code ──▶ NotificationClient ─────────▶ Notification Microservice  │   │
│  │                                                                       │   │
│  │   (Old implementation deleted)                                       │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
```

<Warning>
  **Caveats & Common Pitfalls: Strangler Fig Stalled Mid-Migration.**

  * **The half-migrated state becomes the new permanent state.** Teams extract 3 services, declare victory on the quarterly review, and then never touch the monolith again. Two years later you have a monolith plus 3 microservices -- strictly worse than either pure architecture.
  * **Strangler facade becomes a new monolith.** Every new feature routes through the facade, which accumulates business logic, becomes the central point of coordination, and recreates the exact coupling you tried to eliminate.
  * **Rollback impossible after shared DB removed.** Phase 4 of data migration (dropping tables from monolith) is the point of no return. Many teams do this prematurely, before the new service has proven itself, and then cannot roll back when issues emerge.
  * **The 80/20 trap.** Extracting 20% of the monolith captures 80% of the value, which tempts teams to stop. The remaining 20% is often the most coupled, most load-bearing code, which means you still have all the monolith's fragility on the 20% that matters most.
</Warning>

<Tip>
  **Solutions & Patterns: Preventing Migration Stall.**

  * **Tie extractions to OKRs, not projects.** A rolling quarterly OKR ("reduce monolith LOC by 15% this quarter") prevents the "we extracted 3 services, we are done" reflex.
  * **Set a migration end-date and honor it.** Commit publicly to "monolith retired by Q4 2027." Slipping is allowed; abandoning is not. The deadline forces continuous progress.
  * **Keep the facade stupid.** Routing rules, no business logic, no data transformation. If you need transformation, put it in the calling service or the called service, never the facade.
  * **Define the "done" state precisely.** "Monolith retired" means: zero production traffic, zero cron jobs, database shut down, repository archived. Anything less is still half-migrated.
</Tip>

### Implementation Example

Branch by Abstraction is safer than Strangler Fig for internal components because the swap happens in-process -- no network, no distributed transactions, no database-per-service problem during the transition. The migration has five measured phases, and you must not skip any of them. **Phase 1 (Abstract):** extract an interface around the existing implementation without changing callers' behavior -- this is a pure refactor, all tests pass. **Phase 2 (Dual implementation):** add the new implementation behind the same interface but keep the old one wired up -- you now have two versions of the same contract. **Phase 3 (Shadow):** run both implementations for every call, use the legacy result as the source of truth, and log any divergence. Do this for at least a week in production before trusting the new implementation. **Phase 4 (Cutover):** feature-flag a percentage of traffic to use the new implementation as the primary. Ramp 1% -> 10% -> 50% -> 100%. **Phase 5 (Cleanup):** delete the old implementation and the feature flag -- and yes, actually delete it, do not leave it "just in case" because dead code rots and becomes a liability. **Decision to roll back:** any shadow-compare divergence you cannot explain, or a jump in error rate during the cutover ramp. **Decision to proceed:** zero unexplained divergences and stable latency for one full week at each ramp step. This pattern avoids the distributed monolith trap precisely because the new implementation lives in the same process -- you only introduce the network hop once you are certain the logic is right.

<Tabs>
  <Tab title="Node.js">
    ```javascript theme={null}
    // branch-by-abstraction.js

    // STEP 1: Original code (tightly coupled)
    // Before: Direct usage of notification module
    class OrderService_Before {
      async placeOrder(order) {
        // ... order logic ...
        
        // Direct dependency - hard to change
        const emailer = require('./email-sender');
        emailer.send(order.userId, 'Order placed', orderDetails);
        
        const sms = require('./sms-sender');
        sms.send(order.phone, 'Your order is placed');
      }
    }

    // =========================================================================

    // STEP 2: Create abstraction interface
    class NotificationService {
      async sendOrderConfirmation(order) {
        throw new Error('Not implemented');
      }
      
      async sendShippingUpdate(order, trackingInfo) {
        throw new Error('Not implemented');
      }
      
      async sendDeliveryNotification(order) {
        throw new Error('Not implemented');
      }
    }

    // STEP 3: Implement with old code (same behavior)
    class LegacyNotificationService extends NotificationService {
      constructor() {
        super();
        this.emailer = require('./email-sender');
        this.sms = require('./sms-sender');
      }

      async sendOrderConfirmation(order) {
        await this.emailer.send(
          order.userId,
          'Order Confirmation',
          this.formatOrderEmail(order)
        );
        
        if (order.phone) {
          await this.sms.send(order.phone, `Order ${order.id} confirmed!`);
        }
      }

      async sendShippingUpdate(order, trackingInfo) {
        await this.emailer.send(
          order.userId,
          'Your order has shipped',
          this.formatShippingEmail(order, trackingInfo)
        );
      }
    }

    // STEP 4: Implement with new microservice
    class MicroserviceNotificationService extends NotificationService {
      constructor(serviceUrl) {
        super();
        this.baseUrl = serviceUrl;
      }

      async sendOrderConfirmation(order) {
        await fetch(`${this.baseUrl}/notifications/order-confirmation`, {
          method: 'POST',
          headers: { 'Content-Type': 'application/json' },
          body: JSON.stringify({
            userId: order.userId,
            orderId: order.id,
            items: order.items,
            total: order.total
          })
        });
      }

      async sendShippingUpdate(order, trackingInfo) {
        await fetch(`${this.baseUrl}/notifications/shipping`, {
          method: 'POST',
          headers: { 'Content-Type': 'application/json' },
          body: JSON.stringify({
            userId: order.userId,
            orderId: order.id,
            trackingNumber: trackingInfo.number,
            carrier: trackingInfo.carrier
          })
        });
      }
    }

    // STEP 5: Feature flag to switch implementations
    class NotificationServiceFactory {
      static create() {
        const useNewService = process.env.USE_NOTIFICATION_MICROSERVICE === 'true';
        
        if (useNewService) {
          return new MicroserviceNotificationService(
            process.env.NOTIFICATION_SERVICE_URL
          );
        }
        
        return new LegacyNotificationService();
      }
    }

    // STEP 6: Updated order service using abstraction
    // Notice how OrderService no longer knows or cares whether notifications go through
    // the legacy email sender or a microservice HTTP call. That decision lives in the factory.
    // This is the power of the pattern: the consuming code is completely unaware of the migration.
    class OrderService {
      constructor() {
        this.notificationService = NotificationServiceFactory.create();
      }

      async placeOrder(order) {
        // ... order logic ...
        
        // Now uses abstraction - can switch implementations
        await this.notificationService.sendOrderConfirmation(order);
      }
    }

    // Gradual rollout with comparison
    class ComparingNotificationService extends NotificationService {
      constructor() {
        super();
        this.legacy = new LegacyNotificationService();
        this.modern = new MicroserviceNotificationService(
          process.env.NOTIFICATION_SERVICE_URL
        );
      }

      async sendOrderConfirmation(order) {
        // Run both, compare results, use legacy as source of truth
        const [legacyResult, modernResult] = await Promise.allSettled([
          this.legacy.sendOrderConfirmation(order),
          this.modern.sendOrderConfirmation(order)
        ]);

        // Log differences for analysis
        if (modernResult.status === 'rejected') {
          console.error('Modern notification failed:', modernResult.reason);
        }

        // Return legacy result (source of truth during migration)
        if (legacyResult.status === 'rejected') {
          throw legacyResult.reason;
        }
        
        return legacyResult.value;
      }
    }
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    # branch_by_abstraction.py

    from __future__ import annotations

    import asyncio
    import logging
    import os
    from abc import ABC, abstractmethod
    from typing import Any

    import httpx
    from pydantic import BaseModel, Field

    logger = logging.getLogger(__name__)


    # ---- Domain models (pydantic v2) -----------------------------------------

    class OrderItem(BaseModel):
        sku: str
        quantity: int
        unit_price: float


    class Order(BaseModel):
        id: str
        user_id: str
        phone: str | None = None
        items: list[OrderItem] = Field(default_factory=list)
        total: float


    class TrackingInfo(BaseModel):
        number: str
        carrier: str


    # ---- STEP 1: Original tightly coupled code (for reference) ---------------

    class OrderServiceBefore:
        """The 'before' version: callers reach directly for the sender modules.

        Hard to test, hard to replace, and impossible to run two implementations
        side-by-side. This is what branch-by-abstraction fixes.
        """

        async def place_order(self, order: Order) -> None:
            # Direct dependency - hard to change
            from legacy import email_sender, sms_sender  # noqa: F401 (illustrative)
            await email_sender.send(order.user_id, "Order placed", "...")
            if order.phone:
                await sms_sender.send(order.phone, "Your order is placed")


    # ---- STEP 2: The abstraction (the 'branch' that both impls share) --------

    class NotificationService(ABC):
        """Abstract contract every implementation must satisfy."""

        @abstractmethod
        async def send_order_confirmation(self, order: Order) -> None: ...

        @abstractmethod
        async def send_shipping_update(self, order: Order, tracking: TrackingInfo) -> None: ...

        @abstractmethod
        async def send_delivery_notification(self, order: Order) -> None: ...


    # ---- STEP 3: Legacy implementation wraps the existing code ---------------

    class LegacyNotificationService(NotificationService):
        """Implements the interface using the monolith's in-process senders."""

        def __init__(self, email_sender: Any, sms_sender: Any) -> None:
            self._email = email_sender
            self._sms = sms_sender

        async def send_order_confirmation(self, order: Order) -> None:
            await self._email.send(order.user_id, "Order Confirmation", self._format_order(order))
            if order.phone:
                await self._sms.send(order.phone, f"Order {order.id} confirmed!")

        async def send_shipping_update(self, order: Order, tracking: TrackingInfo) -> None:
            body = f"Tracking {tracking.number} via {tracking.carrier}"
            await self._email.send(order.user_id, "Your order has shipped", body)

        async def send_delivery_notification(self, order: Order) -> None:
            await self._email.send(order.user_id, "Delivered", f"Order {order.id} delivered.")

        def _format_order(self, order: Order) -> str:
            lines = [f"{item.quantity}x {item.sku} @ {item.unit_price}" for item in order.items]
            return "\n".join(lines) + f"\nTotal: {order.total}"


    # ---- STEP 4: New implementation calls the microservice -------------------

    class MicroserviceNotificationService(NotificationService):
        """Implements the same interface by calling the new notification microservice."""

        def __init__(self, base_url: str, client: httpx.AsyncClient | None = None) -> None:
            self._base_url = base_url.rstrip("/")
            self._client = client or httpx.AsyncClient(timeout=5.0)

        async def send_order_confirmation(self, order: Order) -> None:
            await self._post("/notifications/order-confirmation", {
                "user_id": order.user_id,
                "order_id": order.id,
                "items": [item.model_dump() for item in order.items],
                "total": order.total,
            })

        async def send_shipping_update(self, order: Order, tracking: TrackingInfo) -> None:
            await self._post("/notifications/shipping", {
                "user_id": order.user_id,
                "order_id": order.id,
                "tracking_number": tracking.number,
                "carrier": tracking.carrier,
            })

        async def send_delivery_notification(self, order: Order) -> None:
            await self._post("/notifications/delivery", {
                "user_id": order.user_id,
                "order_id": order.id,
            })

        async def _post(self, path: str, body: dict[str, Any]) -> None:
            response = await self._client.post(f"{self._base_url}{path}", json=body)
            response.raise_for_status()


    # ---- STEP 5: Factory driven by a feature flag (env var) ------------------

    def notification_service_factory() -> NotificationService:
        """Return the active notification implementation based on env feature flag.

        Setting USE_NOTIFICATION_MICROSERVICE=true flips the whole system over.
        Unset/false keeps the legacy path, which is the safe default during rollout.
        """
        use_new = os.getenv("USE_NOTIFICATION_MICROSERVICE", "false").lower() == "true"
        if use_new:
            url = os.environ["NOTIFICATION_SERVICE_URL"]
            return MicroserviceNotificationService(base_url=url)

        # Import deferred so the new path does not pay import cost for legacy code.
        from legacy import email_sender, sms_sender  # type: ignore[import-not-found]
        return LegacyNotificationService(email_sender=email_sender, sms_sender=sms_sender)


    # ---- STEP 6: Consumer is blissfully unaware of which impl is active ------

    class OrderService:
        """The order service talks only to the abstraction.

        Whether notifications go through an in-process sender or over HTTP to a
        microservice is an operational decision -- not a code change here.
        """

        def __init__(self, notifications: NotificationService | None = None) -> None:
            self._notifications = notifications or notification_service_factory()

        async def place_order(self, order: Order) -> None:
            # ... order persistence/business logic ...
            await self._notifications.send_order_confirmation(order)


    # ---- Shadow-compare implementation: the safest rollout step --------------

    class ComparingNotificationService(NotificationService):
        """Run both impls, return the legacy result, log any divergence.

        Use this for at least a week in production before trusting the new impl.
        The legacy result remains the source of truth -- if the new impl fails,
        the customer never notices.
        """

        def __init__(self, legacy: NotificationService, modern: NotificationService) -> None:
            self._legacy = legacy
            self._modern = modern

        async def send_order_confirmation(self, order: Order) -> None:
            legacy_task = asyncio.create_task(self._legacy.send_order_confirmation(order))
            modern_task = asyncio.create_task(self._modern.send_order_confirmation(order))

            legacy_result, modern_result = await asyncio.gather(
                legacy_task, modern_task, return_exceptions=True,
            )

            if isinstance(modern_result, BaseException):
                # New path failed but legacy succeeded -- record for analysis
                logger.error("Modern notification failed: %s", modern_result)

            if isinstance(legacy_result, BaseException):
                # Legacy is source of truth; propagate its failure
                raise legacy_result

        async def send_shipping_update(self, order: Order, tracking: TrackingInfo) -> None:
            await asyncio.gather(
                self._legacy.send_shipping_update(order, tracking),
                self._modern.send_shipping_update(order, tracking),
                return_exceptions=True,
            )

        async def send_delivery_notification(self, order: Order) -> None:
            await asyncio.gather(
                self._legacy.send_delivery_notification(order),
                self._modern.send_delivery_notification(order),
                return_exceptions=True,
            )
    ```
  </Tab>
</Tabs>

***

## Database Migration Strategies

Data is where migrations go to die. The code-level patterns above (Strangler, Branch by Abstraction) are solved problems -- they are well-understood, low-risk, and reversible. Database migrations are the opposite: they are high-risk, often irreversible, and every team discovers new failure modes the hard way. The four patterns below form a ladder from "quick but coupled" to "fully decoupled but expensive." You will climb the ladder over the course of the migration, not jump straight to the top. **Key decision points on the ladder:** move up a rung when the current pattern's coupling is blocking a release or degrading reliability; do not move up a rung just because the target pattern is more "architecturally pure." A team stuck on Pattern 1 (shared DB) for nine months is usually fine; a team that jumped to Pattern 4 (DB per service) in month one is often drowning in eventual-consistency bugs. **Do not migrate data ownership before code ownership** -- that is the distributed monolith trap in its purest form: two services writing to one database, each assuming ownership, corrupting each other's invariants.

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                    DATABASE MIGRATION PATTERNS                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  PATTERN 1: SHARED DATABASE (TEMPORARY)                                     │
│  ──────────────────────────────────────                                     │
│                                                                              │
│    ┌───────────┐      ┌───────────┐                                        │
│    │ Monolith  │      │ New Svc   │                                        │
│    └─────┬─────┘      └─────┬─────┘                                        │
│          │                  │                                               │
│          └────────┬─────────┘                                               │
│                   ▼                                                         │
│          ┌───────────────┐                                                 │
│          │ Shared DB     │  ← Both read/write                              │
│          └───────────────┘                                                 │
│                                                                              │
│    Quick start, but creates coupling. Use as stepping stone only.          │
│                                                                              │
│  ═══════════════════════════════════════════════════════════════════════   │
│                                                                              │
│  PATTERN 2: DATABASE VIEW (READ-ONLY)                                       │
│  ────────────────────────────────────                                       │
│                                                                              │
│    ┌───────────┐      ┌───────────┐                                        │
│    │ Monolith  │      │ New Svc   │                                        │
│    └─────┬─────┘      └─────┬─────┘                                        │
│          │ R/W              │ Read-only                                     │
│          ▼                  ▼                                               │
│    ┌─────────────┐   ┌─────────────┐                                       │
│    │ Main DB     │──▶│    View     │                                       │
│    └─────────────┘   └─────────────┘                                       │
│                                                                              │
│    New service reads from view, writes go to monolith via API.             │
│                                                                              │
│  ═══════════════════════════════════════════════════════════════════════   │
│                                                                              │
│  PATTERN 3: DATA SYNCHRONIZATION                                            │
│  ─────────────────────────────────                                          │
│                                                                              │
│    ┌───────────┐  events   ┌───────────┐                                   │
│    │ Monolith  │──────────▶│ New Svc   │                                   │
│    └─────┬─────┘           └─────┬─────┘                                   │
│          │                       │                                          │
│          ▼                       ▼                                          │
│    ┌─────────────┐         ┌─────────────┐                                 │
│    │ Main DB     │  sync   │ Service DB  │                                 │
│    └─────────────┘ ─────▶  └─────────────┘                                 │
│                                                                              │
│    Change Data Capture (CDC) keeps databases in sync.                      │
│                                                                              │
│  ═══════════════════════════════════════════════════════════════════════   │
│                                                                              │
│  PATTERN 4: DATABASE PER SERVICE (TARGET STATE)                             │
│  ───────────────────────────────────────────────                            │
│                                                                              │
│    ┌───────────┐           ┌───────────┐                                   │
│    │ Service A │  events   │ Service B │                                   │
│    └─────┬─────┘◀─────────▶└─────┬─────┘                                   │
│          │                       │                                          │
│          ▼                       ▼                                          │
│    ┌─────────────┐         ┌─────────────┐                                 │
│    │ Database A  │         │ Database B  │                                 │
│    └─────────────┘         └─────────────┘                                 │
│                                                                              │
│    Complete data ownership. Communication via APIs/events.                  │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
```

<Warning>
  **Caveats & Common Pitfalls: Data Migration Order.**

  * **Migrating data before code ownership creates a distributed monolith.** Two services writing to one database with unclear ownership silently corrupt each other's invariants. The bugs do not appear in tests; they appear in production as data drift.
  * **Dropping shared DB makes rollback impossible.** Once the monolith's tables are dropped, you cannot revert to "just use the monolith." Every subsequent incident forces you forward, not back. Do this only after months of new-service stability.
  * **Foreign key dependencies violated during extraction.** If Orders has a foreign key to Customers, and you extract Customers first, Orders breaks. If you extract Orders first, Customers breaks. Plan the extraction order with the actual FK graph.
  * **CDC lag masquerading as consistency.** The new service's database is 2 seconds behind the source. A read immediately after a write returns stale data. Customers see "your order was not created" even though it was, they click again, and now you have duplicate orders.
</Warning>

<Tip>
  **Solutions & Patterns: Safe Data Migration.**

  * **Migrate code ownership first; data follows.** The service owns the writes before the data moves. Both services write to the same DB during transition, but only one owns schema changes.
  * **Extract leaves of the FK graph first.** Tables with no incoming FKs (reporting logs, audit tables, product images) are safe to extract early. Core transactional tables (customers, orders) come last.
  * **Keep the source DB intact for months after cutover.** Even if the new service is primary, retaining the old DB for emergency rollback during the first 90 days is cheap insurance.
  * **Measure CDC lag in user-visible terms.** "P99 CDC lag under 5 seconds" is an SLO. Track it, alert on it, and build read-your-writes semantics (read from source DB for post-write reads) until CDC lag is provably under user tolerance.
</Tip>

### Change Data Capture Implementation

CDC is how mature migrations keep two databases in sync without asking application code to participate. The insight: databases already have a transaction log (WAL in PostgreSQL, binlog in MySQL) that records every change durably before it is visible to queries. CDC tools (Debezium being the industry standard) tail that log, translate each change into an event, and publish it to a message broker. Downstream consumers -- including your new microservice's database -- subscribe to the relevant streams and apply changes. **Why this is safe:** the source database's transactions remain the single point of truth; CDC is a side-effect-free reader. If the new service falls behind, the monolith keeps working. If the new service's database gets corrupted, you replay from the WAL. **Why this can still go wrong:** schema changes in the source break the CDC pipeline if your consumers are not forward-compatible; out-of-order events during rebalancing cause ghost rows; and the "last-write-wins" semantics can silently overwrite newer data when clocks drift. **Key decision point to proceed to the next phase:** CDC lag stays under 5 seconds at p99 for two weeks under production load, and a reconciliation job reports zero divergence between source and replica. **Key decision to roll back:** any unexplained lag spike or any divergence in the reconciliation job -- do not ramp traffic until the root cause is understood.

<Tabs>
  <Tab title="Node.js">
    ```javascript theme={null}
    // cdc-sync.js - Database synchronization using CDC

    const { Kafka } = require('kafkajs');
    const { Pool } = require('pg');

    // Debezium-style CDC connector (simplified)
    class ChangeDataCaptureSync {
      constructor(config) {
        this.sourceDb = new Pool(config.source);
        this.targetDb = new Pool(config.target);
        this.kafka = new Kafka(config.kafka);
        this.producer = this.kafka.producer();
        this.consumer = this.kafka.consumer({ groupId: 'cdc-sync' });
        this.lastPosition = null;
      }

      async startCapture() {
        await this.producer.connect();
        
        // Poll for changes (in production, use Debezium or similar)
        setInterval(async () => {
          await this.captureChanges();
        }, 1000);
      }

      async captureChanges() {
        // Using PostgreSQL logical replication slot
        const changes = await this.sourceDb.query(`
          SELECT * FROM pg_logical_slot_get_changes(
            'cdc_slot',
            NULL,
            NULL,
            'include-xids', '1',
            'include-timestamp', '1'
          )
        `);

        for (const change of changes.rows) {
          const parsed = this.parseChange(change.data);
          
          if (parsed) {
            await this.producer.send({
              topic: `cdc.${parsed.table}`,
              messages: [{
                key: parsed.key,
                value: JSON.stringify({
                  operation: parsed.operation,
                  before: parsed.before,
                  after: parsed.after,
                  timestamp: parsed.timestamp
                })
              }]
            });
          }
        }
      }

      async startSync() {
        await this.consumer.connect();
        await this.consumer.subscribe({ topic: /cdc\..*/ });

        await this.consumer.run({
          eachMessage: async ({ topic, message }) => {
            const tableName = topic.replace('cdc.', '');
            const change = JSON.parse(message.value.toString());
            
            await this.applyChange(tableName, change);
          }
        });
      }

      async applyChange(tableName, change) {
        const { operation, before, after } = change;

        try {
          switch (operation) {
            case 'INSERT':
              await this.targetDb.query(
                `INSERT INTO ${tableName} SELECT * FROM json_populate_record(NULL::${tableName}, $1)`,
                [after]
              );
              break;

            case 'UPDATE':
              const setClauses = Object.keys(after)
                .map((key, i) => `${key} = $${i + 1}`)
                .join(', ');
              const values = Object.values(after);
              
              await this.targetDb.query(
                `UPDATE ${tableName} SET ${setClauses} WHERE id = $${values.length + 1}`,
                [...values, after.id]
              );
              break;

            case 'DELETE':
              await this.targetDb.query(
                `DELETE FROM ${tableName} WHERE id = $1`,
                [before.id]
              );
              break;
          }
        } catch (error) {
          console.error(`Failed to apply change to ${tableName}:`, error);
          // Send to dead letter queue for retry
          await this.sendToDeadLetter(tableName, change, error);
        }
      }
    }

    // Data Migration with verification
    class DataMigrator {
      constructor(sourceDb, targetDb) {
        this.sourceDb = sourceDb;
        this.targetDb = targetDb;
      }

      async migrateTable(tableName, options = {}) {
        const batchSize = options.batchSize || 1000;
        let offset = 0;
        let totalMigrated = 0;

        console.log(`Starting migration of ${tableName}...`);

        while (true) {
          // Fetch batch from source
          const batch = await this.sourceDb.query(
            `SELECT * FROM ${tableName} ORDER BY id LIMIT $1 OFFSET $2`,
            [batchSize, offset]
          );

          if (batch.rows.length === 0) break;

          // Transform if needed
          const transformed = options.transform 
            ? batch.rows.map(options.transform)
            : batch.rows;

          // Insert into target
          for (const row of transformed) {
            await this.targetDb.query(
              `INSERT INTO ${tableName} (${Object.keys(row).join(', ')}) 
               VALUES (${Object.keys(row).map((_, i) => `$${i + 1}`).join(', ')})
               ON CONFLICT (id) DO UPDATE SET 
               ${Object.keys(row).map((k, i) => `${k} = $${i + 1}`).join(', ')}`,
              Object.values(row)
            );
          }

          totalMigrated += batch.rows.length;
          offset += batchSize;
          
          console.log(`Migrated ${totalMigrated} rows from ${tableName}`);
        }

        // Verify migration
        await this.verify(tableName);
        
        return totalMigrated;
      }

      async verify(tableName) {
        const sourceCount = await this.sourceDb.query(
          `SELECT COUNT(*) FROM ${tableName}`
        );
        const targetCount = await this.targetDb.query(
          `SELECT COUNT(*) FROM ${tableName}`
        );

        const sourceTotal = parseInt(sourceCount.rows[0].count);
        const targetTotal = parseInt(targetCount.rows[0].count);

        if (sourceTotal !== targetTotal) {
          throw new Error(
            `Verification failed for ${tableName}: ` +
            `source=${sourceTotal}, target=${targetTotal}`
          );
        }

        console.log(`✓ Verified ${tableName}: ${targetTotal} rows`);
        
        // Sample verification
        const sampleRows = await this.sourceDb.query(
          `SELECT * FROM ${tableName} ORDER BY RANDOM() LIMIT 10`
        );

        for (const sourceRow of sampleRows.rows) {
          const targetRow = await this.targetDb.query(
            `SELECT * FROM ${tableName} WHERE id = $1`,
            [sourceRow.id]
          );

          if (targetRow.rows.length === 0) {
            throw new Error(`Missing row ${sourceRow.id} in target ${tableName}`);
          }

          // Deep comparison
          const diff = this.compareRows(sourceRow, targetRow.rows[0]);
          if (diff.length > 0) {
            console.warn(`Row ${sourceRow.id} differs:`, diff);
          }
        }

        console.log(`✓ Sample verification passed for ${tableName}`);
      }

      compareRows(source, target) {
        const diffs = [];
        for (const key of Object.keys(source)) {
          if (JSON.stringify(source[key]) !== JSON.stringify(target[key])) {
            diffs.push({ field: key, source: source[key], target: target[key] });
          }
        }
        return diffs;
      }
    }

    module.exports = { ChangeDataCaptureSync, DataMigrator };
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    # cdc_sync.py - CDC pipeline using SQLAlchemy + aiokafka

    from __future__ import annotations

    import asyncio
    import json
    import logging
    from dataclasses import dataclass
    from typing import Any, Callable

    from aiokafka import AIOKafkaConsumer, AIOKafkaProducer
    from pydantic import BaseModel
    from sqlalchemy import MetaData, Table, text
    from sqlalchemy.ext.asyncio import AsyncEngine, create_async_engine

    logger = logging.getLogger(__name__)


    # ---- Change event model (pydantic v2) ------------------------------------

    class ChangeEvent(BaseModel):
        """One row-level change captured from the source WAL."""

        operation: str  # INSERT | UPDATE | DELETE
        table: str
        before: dict[str, Any] | None = None
        after: dict[str, Any] | None = None
        timestamp: str


    @dataclass
    class CDCConfig:
        source_dsn: str
        target_dsn: str
        kafka_brokers: str
        replication_slot: str = "cdc_slot"
        poll_interval_seconds: float = 1.0


    # ---- Capture side: reads WAL, publishes to Kafka -------------------------

    class ChangeDataCaptureSync:
        """Polls PostgreSQL logical replication slot and publishes to Kafka.

        In production you would use Debezium rather than polling yourself, but the
        shape of the pipeline is the same and shown here for clarity.
        """

        def __init__(self, config: CDCConfig) -> None:
            self._config = config
            self._source: AsyncEngine = create_async_engine(config.source_dsn)
            self._target: AsyncEngine = create_async_engine(config.target_dsn)
            self._producer: AIOKafkaProducer | None = None
            self._running = False

        async def start_capture(self) -> None:
            self._producer = AIOKafkaProducer(bootstrap_servers=self._config.kafka_brokers)
            await self._producer.start()
            self._running = True
            try:
                while self._running:
                    await self._capture_changes()
                    await asyncio.sleep(self._config.poll_interval_seconds)
            finally:
                await self._producer.stop()

        async def stop(self) -> None:
            self._running = False

        async def _capture_changes(self) -> None:
            assert self._producer is not None
            async with self._source.connect() as conn:
                result = await conn.execute(
                    text(
                        """
                        SELECT data
                        FROM pg_logical_slot_get_changes(
                            :slot, NULL, NULL,
                            'include-xids', '1',
                            'include-timestamp', '1'
                        )
                        """
                    ),
                    {"slot": self._config.replication_slot},
                )
                rows = result.fetchall()

            for row in rows:
                event = self._parse_change(row.data)
                if event is None:
                    continue
                payload = event.model_dump_json().encode()
                # Keying by aggregate id preserves per-row order across partitions
                key = self._event_key(event).encode()
                await self._producer.send_and_wait(
                    topic=f"cdc.{event.table}",
                    key=key,
                    value=payload,
                )

        def _parse_change(self, raw: str) -> ChangeEvent | None:
            """Parse one WAL entry. Returns None for control records we skip."""
            try:
                parsed = json.loads(raw)
            except json.JSONDecodeError:
                return None
            # Simplified: a real implementation would decode pgoutput / wal2json
            return ChangeEvent(**parsed)

        def _event_key(self, event: ChangeEvent) -> str:
            # Prefer "after" for inserts/updates, "before" for deletes
            record = event.after or event.before or {}
            return str(record.get("id", ""))


    # ---- Apply side: consumes Kafka, applies to target DB --------------------

    class CDCApplier:
        """Consumes CDC events and applies them to the target database.

        Idempotency is non-negotiable: replays must produce the same final state.
        Kafka may deliver a message twice if a consumer crashes mid-commit.
        """

        def __init__(self, config: CDCConfig) -> None:
            self._config = config
            self._target: AsyncEngine = create_async_engine(config.target_dsn)
            self._consumer: AIOKafkaConsumer | None = None
            self._metadata = MetaData()
            self._table_cache: dict[str, Table] = {}

        async def start_sync(self) -> None:
            self._consumer = AIOKafkaConsumer(
                bootstrap_servers=self._config.kafka_brokers,
                group_id="cdc-sync",
                enable_auto_commit=False,  # We commit only after apply succeeds
                auto_offset_reset="earliest",
            )
            await self._consumer.start()
            try:
                self._consumer.subscribe(pattern="cdc\\..*")
                async for message in self._consumer:
                    table_name = message.topic.removeprefix("cdc.")
                    event = ChangeEvent.model_validate_json(message.value)
                    try:
                        await self._apply_change(table_name, event)
                        await self._consumer.commit()
                    except Exception as exc:
                        logger.exception("Failed to apply change to %s: %s", table_name, exc)
                        await self._send_to_dlq(table_name, event, exc)
            finally:
                await self._consumer.stop()

        async def _apply_change(self, table_name: str, event: ChangeEvent) -> None:
            async with self._target.begin() as conn:
                if event.operation == "INSERT" and event.after:
                    await self._upsert(conn, table_name, event.after)
                elif event.operation == "UPDATE" and event.after:
                    await self._upsert(conn, table_name, event.after)
                elif event.operation == "DELETE" and event.before:
                    await conn.execute(
                        text(f"DELETE FROM {table_name} WHERE id = :id"),
                        {"id": event.before["id"]},
                    )

        async def _upsert(self, conn: Any, table_name: str, row: dict[str, Any]) -> None:
            """Idempotent INSERT ... ON CONFLICT UPDATE."""
            columns = ", ".join(row.keys())
            placeholders = ", ".join(f":{k}" for k in row.keys())
            updates = ", ".join(f"{k} = EXCLUDED.{k}" for k in row.keys() if k != "id")
            sql = (
                f"INSERT INTO {table_name} ({columns}) VALUES ({placeholders}) "
                f"ON CONFLICT (id) DO UPDATE SET {updates}"
            )
            await conn.execute(text(sql), row)

        async def _send_to_dlq(self, table_name: str, event: ChangeEvent, exc: Exception) -> None:
            # Stub: in production you would write to a dead-letter topic/table for retry.
            logger.error("DLQ %s: %s -> %s", table_name, event, exc)


    # ---- Bulk backfill with verification -------------------------------------

    RowTransform = Callable[[dict[str, Any]], dict[str, Any]]


    class DataMigrator:
        """Copies existing rows from source to target in batches, then verifies.

        Use this once, before starting CDC, to seed the target database with the
        historical rows. CDC then keeps it up to date.
        """

        def __init__(self, source: AsyncEngine, target: AsyncEngine) -> None:
            self._source = source
            self._target = target

        async def migrate_table(
            self,
            table_name: str,
            batch_size: int = 1000,
            transform: RowTransform | None = None,
        ) -> int:
            offset = 0
            total_migrated = 0
            logger.info("Starting migration of %s", table_name)

            while True:
                async with self._source.connect() as src:
                    result = await src.execute(
                        text(f"SELECT * FROM {table_name} ORDER BY id LIMIT :lim OFFSET :off"),
                        {"lim": batch_size, "off": offset},
                    )
                    batch = [dict(row._mapping) for row in result]

                if not batch:
                    break

                rows = [transform(r) for r in batch] if transform else batch

                async with self._target.begin() as dst:
                    for row in rows:
                        columns = ", ".join(row.keys())
                        placeholders = ", ".join(f":{k}" for k in row.keys())
                        updates = ", ".join(f"{k} = EXCLUDED.{k}" for k in row.keys() if k != "id")
                        sql = (
                            f"INSERT INTO {table_name} ({columns}) VALUES ({placeholders}) "
                            f"ON CONFLICT (id) DO UPDATE SET {updates}"
                        )
                        await dst.execute(text(sql), row)

                total_migrated += len(batch)
                offset += batch_size
                logger.info("Migrated %d rows from %s", total_migrated, table_name)

            await self.verify(table_name)
            return total_migrated

        async def verify(self, table_name: str) -> None:
            async with self._source.connect() as src, self._target.connect() as dst:
                src_count = (await src.execute(text(f"SELECT COUNT(*) FROM {table_name}"))).scalar_one()
                dst_count = (await dst.execute(text(f"SELECT COUNT(*) FROM {table_name}"))).scalar_one()

            if src_count != dst_count:
                raise RuntimeError(
                    f"Verification failed for {table_name}: source={src_count}, target={dst_count}"
                )
            logger.info("Verified %s: %d rows", table_name, dst_count)

            # Sample verification: pick 10 random rows and compare field-by-field
            async with self._source.connect() as src:
                sample = (
                    await src.execute(
                        text(f"SELECT * FROM {table_name} ORDER BY RANDOM() LIMIT 10")
                    )
                ).fetchall()

            for src_row in sample:
                src_dict = dict(src_row._mapping)
                async with self._target.connect() as dst:
                    dst_row = (
                        await dst.execute(
                            text(f"SELECT * FROM {table_name} WHERE id = :id"),
                            {"id": src_dict["id"]},
                        )
                    ).fetchone()
                if dst_row is None:
                    raise RuntimeError(f"Missing row {src_dict['id']} in target {table_name}")
                diffs = self._compare_rows(src_dict, dict(dst_row._mapping))
                if diffs:
                    logger.warning("Row %s differs: %s", src_dict["id"], diffs)

            logger.info("Sample verification passed for %s", table_name)

        def _compare_rows(self, source: dict[str, Any], target: dict[str, Any]) -> list[dict[str, Any]]:
            diffs: list[dict[str, Any]] = []
            for key, src_val in source.items():
                if json.dumps(src_val, default=str) != json.dumps(target.get(key), default=str):
                    diffs.append({"field": key, "source": src_val, "target": target.get(key)})
            return diffs
    ```
  </Tab>
</Tabs>

***

## Dual-Write Patterns

Dual-writes are where well-intentioned migrations silently corrupt data. The naive version -- "write to old DB, then write to new DB" -- has no atomicity: the process can crash between the two writes, the second DB can be temporarily unreachable, or a retry can double-apply the second write. The result is drift that nobody notices for weeks because both databases independently look healthy. **The safe alternative is to write to exactly one store transactionally, and let something else propagate the change.** The Transactional Outbox pattern does this by writing the row and an event record in the same DB transaction, then relying on a separate process to read the outbox and publish events. CDC does it by letting the database's own WAL be the source of events. Both avoid the split-brain problem because there is only ever one authoritative write. **Key decision point:** if your data is non-financial and tolerates minutes of inconsistency, Outbox is usually enough and is simpler to operate. If you need sub-second sync, or cannot modify application code to write to the outbox table, use CDC. **Never use naive dual-writes in production**, even "just for now" -- the inconsistencies compound faster than you can detect them, and the debugging burden falls on the on-call engineer at 3 AM.

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                    DUAL-WRITE DURING MIGRATION                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ❌ NAIVE DUAL-WRITE (DON'T DO THIS)                                       │
│  ──────────────────────────────────                                         │
│                                                                              │
│    ┌──────────────────────────────────────────────────────────────────┐    │
│    │  async function saveOrder(order) {                                │    │
│    │    await monolithDb.save(order);  // What if this succeeds...    │    │
│    │    await newServiceDb.save(order); // ...but this fails?         │    │
│    │  }                                                                │    │
│    └──────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│    Problem: No atomicity = data inconsistency                              │
│                                                                              │
│  ═══════════════════════════════════════════════════════════════════════   │
│                                                                              │
│  ✅ TRANSACTIONAL OUTBOX PATTERN                                           │
│  ──────────────────────────────                                             │
│                                                                              │
│    ┌──────────────────────────────────────────────────────────────────┐    │
│    │  BEGIN TRANSACTION                                                │    │
│    │    INSERT INTO orders (...);                                      │    │
│    │    INSERT INTO outbox (event_type, payload);  ← Same transaction │    │
│    │  COMMIT                                                           │    │
│    └──────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│    Background job reads outbox → publishes events → marks processed        │
│                                                                              │
│  ═══════════════════════════════════════════════════════════════════════   │
│                                                                              │
│  ✅ CHANGE DATA CAPTURE (CDC)                                               │
│  ─────────────────────────────                                              │
│                                                                              │
│    ┌───────────────────────────────────────────────────────────────────┐   │
│    │  Monolith DB ──▶ CDC (Debezium) ──▶ Kafka ──▶ New Service DB     │   │
│    └───────────────────────────────────────────────────────────────────┘   │
│                                                                              │
│    Changes captured from DB transaction log. Guaranteed delivery.          │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
```

### Safe Dual-Write Implementation

The Outbox implementation below illustrates why shadow mode is the safest starting place: you write to the legacy system, then also try writing to the new one, and compare results asynchronously. The customer's request succeeds if the legacy write succeeds -- the new service's failure is invisible to them. This lets you accumulate a week of real production data showing the new implementation matches the legacy's behavior before you trust it with real traffic. **Phases of a dual-write migration:** (1) Shadow mode -- legacy is primary, new is silent; (2) Primary-legacy dual-write -- both receive real writes, legacy is source of truth; (3) Primary-new dual-write -- both receive writes, new is source of truth, legacy is backup; (4) New-only -- legacy is decommissioned. **Key decision to advance phases:** zero divergence in reconciliation for 7+ days under production load. **Key decision to roll back a phase:** any divergence you cannot explain within 24 hours, any customer-visible inconsistency, or loss of on-call confidence during business hours. The comparison logic must be reviewed -- many teams declare victory because their "diff" function did not flag anything, when actually it was silently ignoring the one field that mattered.

<Tabs>
  <Tab title="Node.js">
    ```javascript theme={null}
    // safe-dual-write.js - Using Outbox Pattern

    const { Pool } = require('pg');

    class OutboxDualWrite {
      constructor(db) {
        this.db = db;
      }

      // Safe dual-write using outbox
      async saveOrder(order) {
        const client = await this.db.connect();
        
        try {
          await client.query('BEGIN');
          
          // 1. Save to main table
          const result = await client.query(
            `INSERT INTO orders (customer_id, items, total, status)
             VALUES ($1, $2, $3, $4)
             RETURNING *`,
            [order.customerId, JSON.stringify(order.items), order.total, 'pending']
          );
          
          const savedOrder = result.rows[0];
          
          // 2. Write to outbox (same transaction!)
          await client.query(
            `INSERT INTO outbox (aggregate_type, aggregate_id, event_type, payload, created_at)
             VALUES ($1, $2, $3, $4, NOW())`,
            [
              'Order',
              savedOrder.id,
              'OrderCreated',
              JSON.stringify({
                orderId: savedOrder.id,
                customerId: savedOrder.customer_id,
                items: order.items,
                total: order.total
              })
            ]
          );
          
          await client.query('COMMIT');
          
          return savedOrder;
        } catch (error) {
          await client.query('ROLLBACK');
          throw error;
        } finally {
          client.release();
        }
      }
    }

    // Outbox processor - publishes events to message broker
    class OutboxProcessor {
      constructor(db, eventPublisher) {
        this.db = db;
        this.eventPublisher = eventPublisher;
        this.running = false;
      }

      async start() {
        this.running = true;
        
        while (this.running) {
          await this.processOutbox();
          await this.sleep(100); // Poll interval
        }
      }

      async processOutbox() {
        const client = await this.db.connect();
        
        try {
          // Lock and fetch unprocessed events
          await client.query('BEGIN');
          
          const events = await client.query(
            `SELECT * FROM outbox 
             WHERE processed_at IS NULL 
             ORDER BY created_at 
             LIMIT 100
             FOR UPDATE SKIP LOCKED`
          );

          for (const event of events.rows) {
            try {
              // Publish to message broker
              await this.eventPublisher.publish({
                topic: `${event.aggregate_type.toLowerCase()}.events`,
                key: event.aggregate_id.toString(),
                value: {
                  eventId: event.id,
                  eventType: event.event_type,
                  aggregateType: event.aggregate_type,
                  aggregateId: event.aggregate_id,
                  payload: event.payload,
                  timestamp: event.created_at
                }
              });

              // Mark as processed
              await client.query(
                `UPDATE outbox SET processed_at = NOW() WHERE id = $1`,
                [event.id]
              );
            } catch (error) {
              console.error(`Failed to process outbox event ${event.id}:`, error);
              // Will be retried on next poll
            }
          }
          
          await client.query('COMMIT');
        } catch (error) {
          await client.query('ROLLBACK');
          console.error('Outbox processing error:', error);
        } finally {
          client.release();
        }
      }

      stop() {
        this.running = false;
      }

      sleep(ms) {
        return new Promise(resolve => setTimeout(resolve, ms));
      }
    }

    // Migration-specific dual-write with comparison
    class MigrationDualWrite {
      constructor(legacyService, newService) {
        this.legacy = legacyService;
        this.new = newService;
        this.mode = 'shadow'; // shadow | primary-legacy | primary-new | new-only
      }

      async write(data) {
        switch (this.mode) {
          case 'shadow':
            // Write to legacy (primary), shadow write to new, compare
            return this.shadowWrite(data);
            
          case 'primary-legacy':
            // Write to both, legacy is source of truth
            return this.dualWriteLegacyPrimary(data);
            
          case 'primary-new':
            // Write to both, new is source of truth
            return this.dualWriteNewPrimary(data);
            
          case 'new-only':
            // Only write to new service
            return this.new.write(data);
        }
      }

      async shadowWrite(data) {
        // Primary write
        const legacyResult = await this.legacy.write(data);
        
        // Shadow write (fire and forget, with logging)
        this.new.write(data)
          .then(newResult => {
            // Compare results
            const diff = this.compare(legacyResult, newResult);
            if (diff.length > 0) {
              console.warn('Shadow write difference detected:', diff);
              this.metrics.recordDiff('write', diff);
            }
          })
          .catch(error => {
            console.error('Shadow write failed:', error);
            this.metrics.recordError('shadow-write', error);
          });
        
        return legacyResult;
      }

      async read(id) {
        switch (this.mode) {
          case 'shadow':
          case 'primary-legacy':
            // Read from legacy, shadow read from new for comparison
            const legacyData = await this.legacy.read(id);
            
            this.new.read(id).then(newData => {
              const diff = this.compare(legacyData, newData);
              if (diff.length > 0) {
                console.warn(`Read difference for ${id}:`, diff);
              }
            }).catch(() => {});
            
            return legacyData;
            
          case 'primary-new':
          case 'new-only':
            return this.new.read(id);
        }
      }

      compare(legacy, current) {
        const diffs = [];
        const keys = new Set([...Object.keys(legacy), ...Object.keys(current)]);
        
        for (const key of keys) {
          if (JSON.stringify(legacy[key]) !== JSON.stringify(current[key])) {
            diffs.push({
              field: key,
              legacy: legacy[key],
              new: current[key]
            });
          }
        }
        
        return diffs;
      }
    }

    module.exports = { OutboxDualWrite, OutboxProcessor, MigrationDualWrite };
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    # safe_dual_write.py - Transactional Outbox + shadow-compare dual write

    from __future__ import annotations

    import asyncio
    import json
    import logging
    from enum import Enum
    from typing import Any, Protocol

    from pydantic import BaseModel
    from sqlalchemy import text
    from sqlalchemy.ext.asyncio import AsyncEngine

    logger = logging.getLogger(__name__)


    # ---- Domain types --------------------------------------------------------

    class OrderInput(BaseModel):
        customer_id: str
        items: list[dict[str, Any]]
        total: float


    class SavedOrder(BaseModel):
        id: int
        customer_id: str
        items: list[dict[str, Any]]
        total: float
        status: str


    # ---- Transactional Outbox ------------------------------------------------

    class OutboxDualWrite:
        """Writes the business row and the outbox event in a single DB transaction.

        Because both inserts share one transaction, either both land or neither
        does. That single commit is the only fact the caller must trust. A
        separate worker is responsible for getting the event out to the broker.
        """

        def __init__(self, engine: AsyncEngine) -> None:
            self._engine = engine

        async def save_order(self, order: OrderInput) -> SavedOrder:
            async with self._engine.begin() as conn:
                result = await conn.execute(
                    text(
                        """
                        INSERT INTO orders (customer_id, items, total, status)
                        VALUES (:customer_id, :items, :total, 'pending')
                        RETURNING id, customer_id, items, total, status
                        """
                    ),
                    {
                        "customer_id": order.customer_id,
                        "items": json.dumps(order.items),
                        "total": order.total,
                    },
                )
                row = result.one()
                saved = SavedOrder(
                    id=row.id,
                    customer_id=row.customer_id,
                    items=json.loads(row.items) if isinstance(row.items, str) else row.items,
                    total=row.total,
                    status=row.status,
                )

                # Same transaction -> atomic with the business write
                await conn.execute(
                    text(
                        """
                        INSERT INTO outbox (aggregate_type, aggregate_id, event_type, payload, created_at)
                        VALUES (:agg_type, :agg_id, :event_type, :payload, NOW())
                        """
                    ),
                    {
                        "agg_type": "Order",
                        "agg_id": saved.id,
                        "event_type": "OrderCreated",
                        "payload": json.dumps({
                            "order_id": saved.id,
                            "customer_id": saved.customer_id,
                            "items": saved.items,
                            "total": saved.total,
                        }),
                    },
                )

            return saved


    # ---- Outbox relay: reads outbox, publishes, marks processed --------------

    class EventPublisher(Protocol):
        async def publish(self, topic: str, key: str, value: dict[str, Any]) -> None: ...


    class OutboxProcessor:
        """Polls the outbox table and publishes pending events to the broker.

        FOR UPDATE SKIP LOCKED lets multiple relay workers run safely in parallel
        without fighting for the same rows.
        """

        def __init__(self, engine: AsyncEngine, publisher: EventPublisher, poll_ms: int = 100) -> None:
            self._engine = engine
            self._publisher = publisher
            self._poll_seconds = poll_ms / 1000
            self._running = False

        async def start(self) -> None:
            self._running = True
            while self._running:
                try:
                    await self._process_once()
                except Exception as exc:  # pragma: no cover - operational logging
                    logger.exception("Outbox processing error: %s", exc)
                await asyncio.sleep(self._poll_seconds)

        def stop(self) -> None:
            self._running = False

        async def _process_once(self) -> None:
            async with self._engine.begin() as conn:
                result = await conn.execute(
                    text(
                        """
                        SELECT id, aggregate_type, aggregate_id, event_type, payload, created_at
                        FROM outbox
                        WHERE processed_at IS NULL
                        ORDER BY created_at
                        LIMIT 100
                        FOR UPDATE SKIP LOCKED
                        """
                    )
                )
                events = result.fetchall()

                for event in events:
                    try:
                        await self._publisher.publish(
                            topic=f"{event.aggregate_type.lower()}.events",
                            key=str(event.aggregate_id),
                            value={
                                "event_id": event.id,
                                "event_type": event.event_type,
                                "aggregate_type": event.aggregate_type,
                                "aggregate_id": event.aggregate_id,
                                "payload": event.payload,
                                "timestamp": event.created_at.isoformat(),
                            },
                        )
                        await conn.execute(
                            text("UPDATE outbox SET processed_at = NOW() WHERE id = :id"),
                            {"id": event.id},
                        )
                    except Exception as exc:
                        # Do not mark processed; the next poll will retry.
                        logger.error("Failed to publish outbox event %s: %s", event.id, exc)


    # ---- Migration-specific shadow / dual-write orchestrator -----------------

    class DualWriteMode(str, Enum):
        SHADOW = "shadow"
        PRIMARY_LEGACY = "primary-legacy"
        PRIMARY_NEW = "primary-new"
        NEW_ONLY = "new-only"


    class WriteBackend(Protocol):
        async def write(self, data: dict[str, Any]) -> dict[str, Any]: ...
        async def read(self, entity_id: str) -> dict[str, Any]: ...


    class MigrationDualWrite:
        """Routes writes/reads across legacy and new backends according to phase.

        SHADOW:          legacy serves the request, new runs silently for compare
        PRIMARY_LEGACY:  write to both; legacy is source of truth
        PRIMARY_NEW:     write to both; new is source of truth
        NEW_ONLY:        legacy retired; new serves everything
        """

        def __init__(
            self,
            legacy: WriteBackend,
            new: WriteBackend,
            mode: DualWriteMode = DualWriteMode.SHADOW,
        ) -> None:
            self._legacy = legacy
            self._new = new
            self._mode = mode

        async def write(self, data: dict[str, Any]) -> dict[str, Any]:
            if self._mode == DualWriteMode.SHADOW:
                return await self._shadow_write(data)
            if self._mode == DualWriteMode.PRIMARY_LEGACY:
                return await self._dual_write_legacy_primary(data)
            if self._mode == DualWriteMode.PRIMARY_NEW:
                return await self._dual_write_new_primary(data)
            # NEW_ONLY
            return await self._new.write(data)

        async def read(self, entity_id: str) -> dict[str, Any]:
            if self._mode in (DualWriteMode.SHADOW, DualWriteMode.PRIMARY_LEGACY):
                legacy_data = await self._legacy.read(entity_id)
                asyncio.create_task(self._compare_read(entity_id, legacy_data))
                return legacy_data
            return await self._new.read(entity_id)

        async def _shadow_write(self, data: dict[str, Any]) -> dict[str, Any]:
            """Primary write to legacy. Fire-and-forget to new. Compare later.

            The customer only waits for the legacy write. The shadow write's
            latency and failure mode do not affect the user experience, which
            is exactly the property we want during early ramp-up.
            """
            legacy_result = await self._legacy.write(data)
            asyncio.create_task(self._shadow_compare_write(data, legacy_result))
            return legacy_result

        async def _shadow_compare_write(
            self, data: dict[str, Any], legacy_result: dict[str, Any]
        ) -> None:
            try:
                new_result = await self._new.write(data)
            except Exception as exc:
                logger.error("Shadow write failed: %s", exc)
                return
            diffs = self._compare(legacy_result, new_result)
            if diffs:
                logger.warning("Shadow write difference detected: %s", diffs)

        async def _dual_write_legacy_primary(self, data: dict[str, Any]) -> dict[str, Any]:
            """Both writes are awaited; legacy success determines request success.

            If the new write fails, we record the divergence for a reconciliation
            job to fix up -- we do not fail the customer's request.
            """
            legacy_result = await self._legacy.write(data)
            try:
                await self._new.write(data)
            except Exception as exc:
                logger.error("New-side write failed in primary-legacy mode: %s", exc)
            return legacy_result

        async def _dual_write_new_primary(self, data: dict[str, Any]) -> dict[str, Any]:
            new_result = await self._new.write(data)
            try:
                await self._legacy.write(data)
            except Exception as exc:
                logger.error("Legacy-side write failed in primary-new mode: %s", exc)
            return new_result

        async def _compare_read(self, entity_id: str, legacy_data: dict[str, Any]) -> None:
            try:
                new_data = await self._new.read(entity_id)
            except Exception:
                return
            diffs = self._compare(legacy_data, new_data)
            if diffs:
                logger.warning("Read difference for %s: %s", entity_id, diffs)

        @staticmethod
        def _compare(legacy: dict[str, Any], current: dict[str, Any]) -> list[dict[str, Any]]:
            diffs: list[dict[str, Any]] = []
            for key in set(legacy) | set(current):
                lv = legacy.get(key)
                cv = current.get(key)
                if json.dumps(lv, sort_keys=True, default=str) != json.dumps(cv, sort_keys=True, default=str):
                    diffs.append({"field": key, "legacy": lv, "new": cv})
            return diffs
    ```
  </Tab>
</Tabs>

***

## Interview Questions

<AccordionGroup>
  <Accordion title="Q1: What is the Strangler Fig pattern?">
    **Answer:**

    A migration pattern where new functionality is built around the existing system, gradually replacing it.

    **Steps:**

    1. Add a facade/proxy in front of monolith
    2. Extract one feature to microservice
    3. Route that feature's traffic to new service
    4. Repeat until monolith is empty
    5. Remove monolith

    **Key benefits:**

    * Zero downtime migration
    * Gradual, low-risk
    * Can stop/pause anytime
    * Immediate value from extracted services

    **Named after:** Strangler fig plants that grow around host trees.
  </Accordion>

  <Accordion title="Q2: How do you handle database during migration?">
    **Answer:**

    **Progressive approach:**

    1. **Shared database (temporary)**
       * Quick start
       * Both read/write to same DB

    2. **Database view**
       * New service reads from view
       * Writes via API to monolith

    3. **CDC synchronization**
       * Capture changes from source
       * Replay to new database
       * Eventually consistent

    4. **Database per service**
       * Full data ownership
       * Communication via APIs/events

    **Key principle:** Migrate ownership, not just data.
  </Accordion>

  <Accordion title="Q3: What's wrong with dual-writes?">
    **Answer:**

    **Problem:** No atomicity between two writes.

    ```javascript theme={null}
    await db1.save(data);  // Succeeds
    await db2.save(data);  // Fails! Data inconsistent
    ```

    **Solutions:**

    1. **Outbox pattern:**
       * Single DB transaction
       * Write data + event to outbox
       * Background processor publishes events

    2. **Change Data Capture (CDC):**
       * Capture changes from DB transaction log
       * Stream to message broker
       * New service consumes and applies

    3. **Saga pattern:**
       * Compensating transactions
       * Eventually consistent
  </Accordion>

  <Accordion title="Q4: What is Branch by Abstraction?">
    **Answer:**

    A technique to replace a component inside a system safely.

    **Steps:**

    1. Create abstraction interface
    2. Implement interface with existing code
    3. Change callers to use interface
    4. Create new implementation
    5. Switch implementations (feature flag)
    6. Remove old implementation

    **Use when:**

    * Replacing internal components
    * Need to run old/new in parallel
    * Want to compare implementations

    **Difference from Strangler:**

    * Strangler: External facade, route traffic
    * Branch: Internal abstraction, swap implementations
  </Accordion>
</AccordionGroup>

***

## Chapter Summary

<Info>
  **Key Takeaways:**

  * Never do big-bang rewrites - use incremental patterns
  * Strangler Fig: Route traffic gradually to new services
  * Branch by Abstraction: Swap internal implementations safely
  * Database migration is the hardest part - plan carefully
  * Use CDC or Outbox pattern, never naive dual-writes
  * Always have rollback capability
</Info>

**Next Chapter:** Event Sourcing Deep Dive - Advanced event storage and CQRS patterns.

***

## Interview Questions with Structured Answers

<AccordionGroup>
  <Accordion title="You have a 500K-line PHP monolith. The CTO wants microservices. Design the 18-month migration plan without stopping feature work.">
    **Strong Answer Framework:**

    1. **Establish baseline metrics first.** Deploy frequency, deploy failure rate, incident frequency, mean time to recovery, engineer productivity signals (PR merge time, merge conflict rate). These become your success criteria.
    2. **Create a steering committee.** CTO, 2-3 senior engineers, product lead. Meets monthly. Has authority to halt the migration if metrics regress.
    3. **Invest months 1-3 in platform foundations.** CI/CD that supports multiple deployable artifacts, observability (tracing across services), service template, and on-call tooling. Without these, the first extraction is a disaster.
    4. **Pick the pilot service carefully.** A leaf module (notifications, image resizing, reporting) with clear boundaries, moderate business value, and a team willing to own it. Extract it using Strangler Fig with shadow mode for 2 weeks before any traffic cutover.
    5. **Extract 1-2 more services in months 6-12.** Apply lessons from the pilot. Do not accelerate; each extraction should feel easier than the last, not harder.
    6. **Feature work continues in parallel.** Feature teams continue to ship in the monolith. Only the migration team works on extractions. New features that fit the extracted domain go into the new service.
    7. **Month 12-18: decision point.** Review metrics against baseline. If deploy frequency is up, incident rate is flat or down, and team productivity is up, continue the migration. If not, either fix the root cause or halt.

    **Real-World Example:** Etsy (roughly 2016-2019). Etsy had a large PHP monolith and migrated incrementally to a services architecture over several years. They explicitly documented that feature work continued throughout, and that the migration team was a specific set of engineers rather than everyone being expected to migrate their own code. They reported deploying hundreds of times per day by the end, versus once-per-day in monolith days.

    **Senior Follow-up Questions:**

    <Note>
      **Q: "How do you handle the monolith's shared database during the 18 months?"**

      A: Four phases per extracted domain. Phase 1: the new service reads and writes to the monolith's DB via the monolith's existing models (no new DB yet). This validates the code extraction without the data risk. Phase 2: new service writes to its own DB and shadow-writes to the monolith DB, with monolith DB as source of truth. Phase 3: CDC pipeline keeps the monolith DB updated from the new service's DB for any remaining monolith readers. Phase 4 (only after 3 months of stability): drop the tables from the monolith DB. Each phase is reversible.
    </Note>

    <Note>
      **Q: "The CTO wants to see progress at 6 months. What do you show?"**

      A: Concrete metrics, not counts of services extracted. Show: (1) the pilot service is handling X% of traffic for its domain with equal or better p99 latency; (2) the monolith has a documented bounded context map that the team agrees on; (3) platform foundations (tracing, deployment pipeline, service template) are operational; (4) the pilot team reports faster feature delivery in the new service. Avoid showing "we extracted 3 services" without productivity or reliability data; extraction count without outcomes is vanity.
    </Note>

    <Note>
      **Q: "Halfway through, the migration team is burned out and services are harder to extract than the pilot suggested. What do you do?"**

      A: Halt. Hold a retrospective. Likely the remaining code is more coupled than the leaf pilot, and naive extraction attempts are failing. Options: (a) invest 1-2 quarters in improving the monolith's internal modularity (Branch by Abstraction) to make future extractions cheaper; (b) reduce the migration scope -- maybe the monolith's core 200K lines stay as a modular monolith forever, and only 5-6 services get extracted; (c) reassess whether the migration is actually solving the original pain points and consider stopping. Finishing the migration is not the goal; solving the original pain is.
    </Note>

    **Common Wrong Answers:**

    * **"Rewrite everything from scratch in Go over 18 months."** Big bang anti-pattern. Feature work stops, scope inflates, and the project fails. Interviewers asking this question are testing whether you recognize the anti-pattern.
    * **"Extract 20 services in the first 6 months."** Too aggressive. You do not yet have the platform foundations to operate 20 services. Each extraction that outpaces operational readiness creates incidents.

    **Further Reading:**

    * "Monolith to Microservices" by Sam Newman -- the definitive pattern catalog for migrations
    * "Migrating from a Monolith to Microservices at Etsy" -- multiple engineering blog posts and conference talks
    * "Working Effectively with Legacy Code" by Michael Feathers -- still the best book on safely refactoring existing systems
  </Accordion>

  <Accordion title="Your strangler fig migration has stalled at 40%. The monolith still handles the hard business logic and no one wants to extract more. What do you do?">
    **Strong Answer Framework:**

    1. **Acknowledge the economic reality.** The first 40% was the easy 40%. The remaining 60% is harder, and the team has discovered that microservices have real costs. The stall is rational.
    2. **Evaluate whether to continue at all.** Would a modular monolith with 3-5 extracted services be better than pushing for 100% extraction? Often yes.
    3. **Identify the specific blockers.** Is it technical (deep coupling), organizational (no team owns the hardest module), or political (stakeholders do not see value)?
    4. **Propose three options.** Continue (with remediation for the specific blockers), pause (declare the hybrid state the target, document it), or retreat (re-integrate some services back into the monolith if the extraction did not deliver value).
    5. **Make the decision with data.** Measure current-state productivity and reliability. If the hybrid state is working, stopping there is a valid outcome.

    **Real-World Example:** Segment (2018). After migrating to 140+ microservices, Segment famously reversed course and consolidated back to a monolith. They wrote the "Goodbye Microservices" postmortem explaining why: their team of \~10 engineers could not operate the services, each service had low utilization making costs high, and inter-service complexity exceeded the complexity of a well-structured monolith. The pivot back took effort but was correct; they shipped more after the consolidation than during the microservices era.

    **Senior Follow-up Questions:**

    <Note>
      **Q: "Is it ever OK to stop a migration at 40%?"**

      A: Yes, if the stopping point is a stable architecture, not a half-finished project. A "modular monolith plus 5 extracted services" can be a valid long-term architecture if: the extracted services are the ones that genuinely benefit from separate deployment (different scale, different team ownership), the remaining monolith is well-modularized internally, and the boundaries between them are stable. The failure mode to avoid is "we stopped but the half-state is chaos" -- that is not a stopping point, it is abandonment.
    </Note>

    <Note>
      **Q: "How do you re-integrate a microservice back into a monolith if extraction was a mistake?"**

      A: Reverse strangler fig. Step 1: the monolith gains a copy of the service's code (or equivalent logic). Step 2: feature flag routes traffic to the monolith's copy, starting at 1%, ramping to 100%. Step 3: the microservice is decommissioned. Data migration is the hard part: if the microservice has its own DB, that data must be imported back into the monolith's DB. Segment documented this process; it takes weeks to months but is straightforward if done incrementally.
    </Note>

    <Note>
      **Q: "The hardest module to extract is the one that every other module depends on. How do you handle it?"**

      A: Usually you do not extract it. A module that everything depends on (often user/auth, or some core domain concept) is either (a) something that should remain a shared library consumed by all services, (b) should be wrapped in a thin API service with minimal logic, not extracted in full. The "everything depends on it" property means extracting it creates a huge blast radius for any failure in the new service. Leave it in the monolith or turn it into a library.
    </Note>

    **Common Wrong Answers:**

    * **"Push through and finish the migration; do not lose momentum."** Sunk cost fallacy. The right answer is to evaluate current state versus end-state honestly, not to push because you already started.
    * **"Fire the team and hire consultants."** Addresses neither the technical nor organizational root cause. The stall is usually caused by the migration plan being wrong, not the team being incompetent.

    **Further Reading:**

    * "Goodbye Microservices: From 100s of problem children to 1 superstar" -- Segment, 2018
    * "Monolith Decomposition Patterns" by Sam Newman -- covers when to stop as well as when to extract
    * "The Majestic Monolith" -- DHH's essay on why modular monoliths are underrated
  </Accordion>

  <Accordion title="You are 6 months into extracting a customer service. You just realized the monolith and the new service are writing to overlapping customer fields in the same database. How bad is this and how do you fix it?">
    **Strong Answer Framework:**

    1. **Diagnose the severity.** Are the fields disjoint per-row (service A owns customers 1-1000, service B owns 1001-2000) or interleaved (both services write to all customers)? Interleaved is the distributed monolith nightmare.
    2. **Measure actual drift.** Write a reconciliation job that compares writes over 48 hours. How often do conflicting writes occur? Is it 1 in 10,000 or 1 in 10?
    3. **Identify the "unified owner."** For each overlapping field, decide which service is the source of truth going forward. Every write of that field from the other service becomes an API call instead.
    4. **Introduce the ownership layer before removing duplicates.** Both services route writes through a shared DB procedure, Kafka topic, or API gateway that enforces "only one writer per field." You cannot stop duplicate writes instantaneously; you must funnel them through a chokepoint first.
    5. **Plan the cleanup.** Once all writes flow through the chokepoint and data is converging, remove the chokepoint and let the service own its fields directly.
    6. **Pause new features in this area.** New features that touch customer fields must wait until ownership is clear. Otherwise every new feature adds to the mess.

    **Real-World Example:** Salesforce (approximately 2017). Salesforce documented a multi-year effort to untangle overlapping writes between their monolith and extracted services. The fix involved creating "ownership adapters" that enforced which service could write which field, with the adapter layer logging violations so the team could find leftover code paths. The cleanup took 18 months after the original extraction.

    **Senior Follow-up Questions:**

    <Note>
      **Q: "What if you cannot pause feature work in this area during the cleanup?"**

      A: Then you accept a longer cleanup period with more divergence risk. Mitigate by: (a) running the reconciliation job continuously and alerting on drift above a threshold; (b) requiring all new feature code to go through the new ownership layer (enforced via code review and CI checks); (c) scheduling a "drift cleanup" sprint every quarter to catch up. The cleanup will take 2-3x longer but it will happen. Alternative: frame the continued feature work as actively making the migration worse, and use that to negotiate a feature freeze -- leadership usually agrees once they see the cost.
    </Note>

    <Note>
      **Q: "The monolith writes customer.email directly to the database. The new service writes customer.email via the monolith's API. How do you migrate the monolith writes to go through the new service?"**

      A: Branch by abstraction inside the monolith. Step 1: find every place the monolith writes customer.email directly. Step 2: introduce a `CustomerEmailWriter` interface with two implementations: `LegacyDirectWriter` (current behavior) and `ServiceApiWriter` (calls new service). Step 3: switch the monolith to use the interface, controlled by feature flag. Step 4: ramp the feature flag from 0% to 100% while monitoring for drift. Step 5: remove `LegacyDirectWriter` and the flag. Same approach for every field that needs ownership migrated.
    </Note>

    <Note>
      **Q: "Would event-driven architecture have prevented this?"**

      A: Partially. Events enforce that state changes are observable, which makes drift easier to detect. But events do not enforce single ownership -- two services can both emit `CustomerEmailUpdated` events, and consumers have to pick which to trust. The ownership discipline is still required. Event-driven architecture + clear aggregate ownership (only one service is allowed to emit a given event type) is the combination that prevents this problem. Events alone are not enough.
    </Note>

    **Common Wrong Answers:**

    * **"Just use distributed transactions (2PC) to keep them in sync."** 2PC is poorly supported in modern microservice stacks, blocks on the slowest participant, and does not solve the ownership question. The problem is ownership, not synchronization.
    * **"The database will resolve it with last-write-wins."** Last-write-wins silently drops the losing write. If both writes are valid from their respective services' perspectives, you are losing business data.

    **Further Reading:**

    * "Saga Pattern" chapter in "Microservices Patterns" by Chris Richardson
    * "Data Management in a Microservice Architecture" -- Chris Richardson's microservices.io site
    * "Event-Driven Microservices" by Adam Bellemare -- covers ownership patterns with events
  </Accordion>
</AccordionGroup>

***

## Interview Deep-Dive

<AccordionGroup>
  <Accordion title="'Your company has a 500,000-line monolith. You have been tasked with leading the migration to microservices. What is your first step, and what is your 12-month plan?'">
    **Strong Answer:**

    My first step is not writing any code. It is mapping the monolith's domain boundaries by analyzing three things: the codebase structure (which modules import from which), the database schema (which tables are joined together in queries), and the team structure (which team owns which features and where the merge conflicts happen).

    Month 1-2: Domain analysis. I run an Event Storming workshop with domain experts to identify bounded contexts. I analyze git logs to find modules that change together (high coupling) and modules that change independently (good extraction candidates). I produce a migration priority matrix: each candidate is scored on extraction difficulty (data coupling, API surface area) and business value (deployment independence, scaling needs).

    Month 3-4: Extract the first service. I pick the module with the lowest coupling and highest independent value -- often something like Notification or Search. I use the Strangler Fig pattern: deploy the new service alongside the monolith, route traffic through an API proxy, and gradually shift requests to the new service. The monolith retains a stub that forwards to the new service, so the extraction is transparent to consumers.

    Month 5-6: Stabilize and learn. The first extraction teaches you everything you do not know about your infrastructure: do you have CI/CD that handles multiple services? Can your monitoring track cross-service requests? Do your tests cover the integration points? I fix every gap before extracting the second service.

    Month 7-12: Extract 2-3 more services, applying the lessons from the first. By month 12, I expect to have 3-5 extracted services, a proven migration playbook, and a clear understanding of the remaining work. The monolith still exists and handles 70-80% of the functionality -- that is fine. The goal at 12 months is not completion; it is momentum and confidence.

    What I explicitly do NOT do: hire a separate team to rewrite the monolith in parallel. That is the big-bang rewrite anti-pattern. The migration team is the same team that owns the monolith.

    **Follow-up: "How do you keep the monolith maintainable during the 2-3 year migration period when some functionality is in the monolith and some is in services?"**

    This is the hardest part of any migration. The monolith becomes a hybrid system: some features are native, some are forwarded to external services. I keep it maintainable by: (1) introducing an anti-corruption layer inside the monolith that abstracts the external service calls behind the same interfaces the monolith already uses -- existing code does not need to know whether it is calling a local module or an external service; (2) writing feature flags that allow instant rollback from the external service to the monolith's original implementation if the service has issues; (3) maintaining clear documentation of which features live where, updated as part of every extraction PR. The worst outcome is an undocumented hybrid where nobody knows which codepath is active.
  </Accordion>

  <Accordion title="'Explain the Strangler Fig pattern in detail. How do you handle the data migration part, which is usually the hardest piece?'">
    **Strong Answer:**

    The Strangler Fig pattern extracts functionality incrementally by routing requests through a proxy that directs traffic to either the monolith or the new service. Over time, more routes go to the new service until the monolith feature is completely replaced.

    The traffic routing phase is well-understood: put an API gateway or reverse proxy in front of the monolith, configure it to route specific paths to the new service, and gradually migrate paths. The real challenge is data migration -- the new service needs its own database, but the monolith's database has the data.

    I use a four-phase data migration approach. Phase one: the new service reads from its own database but writes to both its database AND the monolith's database (via an API call or shared event). The monolith's database remains the source of truth. This is the "shadow write" phase where you validate that the new service's writes are correct by comparing them against the monolith's data.

    Phase two: the new service becomes the source of truth for writes. It writes to its own database, and a CDC pipeline (Debezium) synchronizes changes back to the monolith's database for any remaining monolith features that read this data. The monolith's database is now a read replica.

    Phase three: migrate all read traffic to the new service. Update any remaining monolith features that read this data to call the new service's API instead of querying the local database. Once all readers are migrated, the CDC synchronization can be disabled.

    Phase four: drop the tables from the monolith's database. This is the "clean up" phase that teams often forget, leaving orphaned tables that confuse future developers.

    The key principle: at every phase, you can roll back. If the new service fails during phase two, you re-enable writes to the monolith's database and restore it as the source of truth. Reversibility is non-negotiable for data migrations.

    **Follow-up: "What about referential integrity? The monolith has foreign keys between the tables you are extracting and tables that remain in the monolith."**

    You break the foreign keys. This is uncomfortable for teams used to relational integrity guarantees, but it is unavoidable in microservices. The extracted service stores an ID reference (customer\_id) without a foreign key constraint. Referential integrity is maintained through application-level checks (verify the customer exists via API before creating the order) and eventual consistency monitoring (a reconciliation job that checks for orphaned references). The foreign keys in the monolith's database are dropped as part of the extraction, and replaced with soft references that are validated by the application layer.
  </Accordion>

  <Accordion title="'What is Branch by Abstraction, and when would you use it instead of Strangler Fig?'">
    **Strong Answer:**

    Branch by Abstraction is a technique for replacing an internal implementation within the monolith without creating a separate service. You introduce an abstraction (interface) around the component you want to replace, implement the new version behind the same interface, and use a feature flag to switch between old and new implementations. Once the new implementation is validated, you remove the old one.

    Strangler Fig works at the API/routing level -- you redirect external traffic from the monolith to a new service. Branch by Abstraction works at the code level -- you swap an internal implementation within the same codebase.

    I use Branch by Abstraction when: the component being replaced is internal (not exposed as an API), the replacement is a technology change rather than a service extraction (swapping a homegrown email sender for SendGrid, replacing a custom caching layer with Redis, migrating from one ORM to another), or when the team is not ready for the operational complexity of a separate service but needs to modernize the internals.

    The process: Step one, identify the component and all its callers within the monolith. Step two, create an interface that abstracts the component's API (if one does not exist). Step three, refactor all callers to use the interface. Step four, create the new implementation behind the interface. Step five, use a feature flag to route some traffic to the new implementation. Step six, monitor and gradually increase the new implementation's traffic share. Step seven, remove the old implementation and the feature flag.

    The critical advantage over Strangler Fig: no data migration. Since both implementations live in the same application, they can share the same database during the transition. This makes Branch by Abstraction significantly lower risk for internal component replacements.

    **Follow-up: "Can you combine Branch by Abstraction and Strangler Fig in the same migration?"**

    Yes, and this is actually the most common pattern in practice. Phase one: use Branch by Abstraction to introduce clean interfaces around the component within the monolith. This decouples the component from the rest of the codebase without any infrastructure changes. Phase two: use Strangler Fig to extract the component (now behind a clean interface) into its own service. The interface becomes the service's API contract. The combination works because Branch by Abstraction handles the code-level decoupling (hardest to do safely) while Strangler Fig handles the infrastructure-level separation (straightforward once the code boundaries are clean).
  </Accordion>
</AccordionGroup>
