Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Migration Patterns

Migrating from a monolith to microservices is a journey, not a destination. The number one cause of failed migrations is not technical — it is impatience. Teams attempt “big bang” rewrites, run two systems in parallel for too long, or extract the wrong service first. The patterns in this chapter (Strangler Fig, Branch by Abstraction, CDC-based data sync) all share one philosophy: make the migration reversible at every step. If you cannot roll back any individual change, you are taking on risk that no pattern can save you from.
Learning Objectives:
  • Understand why incremental migration beats big-bang rewrites
  • Master the Strangler Fig pattern
  • Implement Branch by Abstraction
  • Learn database migration strategies
  • Handle dual-writes and data synchronization

Why Migrations Fail

Before reaching for any pattern, understand why most migrations go sideways. The failure modes below are not rare edge cases — they are the default outcome unless you actively steer away from them. Every experienced migration leader has scars from at least one of these. The common thread: each anti-pattern is born from an entirely reasonable-looking decision that compounds into disaster over months. Big-bang rewrites feel clean. Splitting by technical layer feels organized. Migrating data first feels like “getting the hard part over with.” All three are traps.
┌─────────────────────────────────────────────────────────────────────────────┐
│                    MIGRATION ANTI-PATTERNS                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ❌ BIG BANG REWRITE                                                        │
│  ─────────────────────                                                      │
│                                                                              │
│  "Let's rewrite everything from scratch"                                    │
│                                                                              │
│  Problems:                                                                  │
│  • Takes years (Netscape took 3 years, lost market share)                  │
│  • No value delivered until complete                                        │
│  • Original system still needs maintenance                                  │
│  • Requirements change during rewrite                                       │
│  • Team burnout                                                             │
│                                                                              │
│  ═══════════════════════════════════════════════════════════════════════   │
│                                                                              │
│  ❌ WRONG SERVICE BOUNDARIES                                                │
│  ──────────────────────────────                                             │
│                                                                              │
│  "Let's split by technical layer"                                           │
│                                                                              │
│  ┌─────────────┐     ┌─────────────┐     ┌─────────────┐                   │
│  │ UI Service  │ ──▶ │ API Service │ ──▶ │ DB Service  │                   │
│  └─────────────┘     └─────────────┘     └─────────────┘                   │
│                                                                              │
│  Problems:                                                                  │
│  • One change requires updating all three                                   │
│  • Tight coupling disguised as services                                    │
│  • Distributed monolith                                                     │
│                                                                              │
│  ═══════════════════════════════════════════════════════════════════════   │
│                                                                              │
│  ❌ MIGRATING DATA TOO EARLY                                                │
│  ────────────────────────────                                               │
│                                                                              │
│  "Let's first migrate all the data, then the code"                         │
│                                                                              │
│  Problems:                                                                  │
│  • Data coupling remains                                                    │
│  • Complex synchronization                                                  │
│  • Risk of data inconsistency                                               │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
The Distributed Monolith Trap. Splitting by technical layer (UI / API / DB) or by “shared entities” produces a system that looks like microservices on paper but behaves like a monolith with network latency added. You know you are in this trap when: (a) deploying one service requires coordinating releases with other services, (b) a single feature touches 4+ services, (c) one team owns code in many services. The cure is to split by business capability — who owns the decisions, not who stores the data.

The Strangler Fig Pattern

Named after strangler fig plants that grow around host trees, eventually replacing them entirely. Similarly, we gradually replace monolith functionality with microservices.
┌─────────────────────────────────────────────────────────────────────────────┐
│                    STRANGLER FIG PATTERN                                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  PHASE 1: INTERCEPT                                                         │
│  ──────────────────────                                                     │
│                                                                              │
│          ┌───────────────────┐                                              │
│  Client ─▶│   Facade/Proxy   │──────────────────────▶ Monolith              │
│          └───────────────────┘                                              │
│                                                                              │
│  All traffic goes through facade (no code changes yet)                      │
│                                                                              │
│  ═══════════════════════════════════════════════════════════════════════   │
│                                                                              │
│  PHASE 2: EXTRACT                                                           │
│  ──────────────────                                                         │
│                                                                              │
│          ┌───────────────────┐         ┌──────────────────┐                │
│  Client ─▶│   Facade/Proxy   │────────▶│  New User Svc    │                │
│          │                   │         └──────────────────┘                │
│          │                   │                                              │
│          └───────────────────┘──────────────────────▶ Monolith              │
│                │                                        (minus users)       │
│                │                                                            │
│                └──▶ /users/* → New Service                                 │
│                     /* → Monolith                                          │
│                                                                              │
│  ═══════════════════════════════════════════════════════════════════════   │
│                                                                              │
│  PHASE 3: REPEAT                                                            │
│  ─────────────────                                                          │
│                                                                              │
│          ┌───────────────────┐         ┌──────────────────┐                │
│  Client ─▶│   Facade/Proxy   │────────▶│  User Service    │                │
│          │                   │         ├──────────────────┤                │
│          │                   │────────▶│  Order Service   │                │
│          │                   │         ├──────────────────┤                │
│          │                   │────────▶│  Payment Service │                │
│          └───────────────────┘         └──────────────────┘                │
│                │                                                            │
│                └──▶ Monolith (shrinking)                                   │
│                                                                              │
│  ═══════════════════════════════════════════════════════════════════════   │
│                                                                              │
│  PHASE 4: RETIRE                                                            │
│  ──────────────────                                                         │
│                                                                              │
│          ┌───────────────────┐         ┌──────────────────┐                │
│  Client ─▶│   API Gateway    │────────▶│  User Service    │                │
│          │                   │         ├──────────────────┤                │
│          │                   │────────▶│  Order Service   │                │
│          │                   │         ├──────────────────┤                │
│          │                   │         ├──────────────────┤                │
│          │                   │────────▶│  Payment Service │                │
│          │                   │         ├──────────────────┤                │
│          │                   │────────▶│  Catalog Service │                │
│          └───────────────────┘         └──────────────────┘                │
│                                                                              │
│  Monolith is dead! 🎉                                                       │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
Caveats & Common Pitfalls: Big Bang Cutover.
  • “Let us just rewrite it over the weekend.” Big bang cutovers fail because you cannot test every behavior of a legacy system until real traffic hits it. Edge cases you did not know existed surface as outages. Netscape lost years of market share doing this; do not repeat it.
  • Freeze during rewrite. Feature development halts during a big bang, which is politically unsustainable. Engineers leave, stakeholders lose patience, and the project gets cancelled at 70% done.
  • Rollback is binary. With big bang, rollback means reverting 6 months of DB migrations and code changes at once. Mid-cutover failures usually become multi-day outages because “just roll it back” is not actually an option.
  • Data migration order dependency. Tables with foreign keys must migrate in dependency order, and legacy code may still write to tables you thought were frozen. Big bang assumes a clean cutover moment that almost never exists.
Solutions & Patterns: Incremental Cutover.
  • Strangler Fig is mandatory for anything above 100K lines. Extract one endpoint at a time, validate with real traffic before ramping, and keep the monolith as fallback until the new service has at least 3 months of production stability.
  • Every extraction is reversible for at least one release cycle. Keep the monolith’s old code path behind a feature flag. If you remove it immediately, you have no rollback when the new service misbehaves weeks later.
  • Run shadow mode before any traffic cutover. Send identical requests to both systems, compare results, log divergences. You will find bugs in the new implementation that would otherwise surface as production incidents.
  • Always extract a leaf module first. Pick a module that nothing else in the monolith depends on (notifications, search, reporting). Module-with-many-dependents is your last extraction, not your first.

Implementation

The strangler facade is the safest primitive in the migration toolbox because it is fundamentally additive. In Phase 1 you introduce a reverse proxy with every route still pointing at the monolith — behavior is byte-for-byte identical, but you now own the routing layer. Phase 2 begins the real migration: a single endpoint is routed to a new service behind a feature flag, starting at 1% of traffic. If error rates or latency degrade, you flip the flag back to 0% and traffic returns to the monolith in seconds — no deploy, no rollback ceremony. You then ramp to 5%, 25%, 50%, 100% over days or weeks, watching dashboards and error budgets at each step. Key decision points to proceed: new-service error rate is within 1x of monolith, p99 latency is within 20%, business metrics (conversion, checkout success) are flat or better. Decision points to roll back: any hard error spike, data divergence detected in shadow comparisons, or an on-call page during ramp. Do not skip percentages and do not rush — the whole point of this pattern is that the ramp is slow enough to catch problems cheaply.
// strangler-facade.js - The routing layer that enables migration

const express = require('express');
const { createProxyMiddleware } = require('http-proxy-middleware');

const app = express();

// Configuration: which routes go where
const routingConfig = {
  // New microservices
  microservices: {
    '/api/users': 'http://user-service:3001',
    '/api/orders': 'http://order-service:3002',
    '/api/payments': 'http://payment-service:3003'
  },
  // Everything else goes to monolith
  monolith: 'http://monolith:8080'
};

// Feature flags for gradual rollout
const featureFlags = {
  'users-service': { enabled: true, percentage: 100 },
  'orders-service': { enabled: true, percentage: 50 },  // 50% traffic
  'payments-service': { enabled: false, percentage: 0 }  // Still in monolith
};

// Determine if request should go to microservice
function shouldRouteToMicroservice(serviceName, req) {
  const flag = featureFlags[serviceName];
  if (!flag || !flag.enabled) return false;
  
  // Percentage rollout based on user ID or random
  if (flag.percentage < 100) {
    const userId = req.headers['x-user-id'] || req.ip;
    const hash = hashCode(userId);
    return (hash % 100) < flag.percentage;
  }
  
  return true;
}

function hashCode(str) {
  let hash = 0;
  for (let i = 0; i < str.length; i++) {
    hash = ((hash << 5) - hash) + str.charCodeAt(i);
    hash |= 0;
  }
  return Math.abs(hash);
}

// Dynamic routing middleware
app.use((req, res, next) => {
  const path = req.path;
  
  // Find matching microservice route
  for (const [route, target] of Object.entries(routingConfig.microservices)) {
    if (path.startsWith(route)) {
      const serviceName = route.split('/').pop() + '-service';
      
      if (shouldRouteToMicroservice(serviceName, req)) {
        // Route to microservice
        req.routingDecision = { target, type: 'microservice' };
      } else {
        // Route to monolith
        req.routingDecision = { target: routingConfig.monolith, type: 'monolith' };
      }
      break;
    }
  }
  
  // Default to monolith
  if (!req.routingDecision) {
    req.routingDecision = { target: routingConfig.monolith, type: 'monolith' };
  }
  
  // Log for analysis
  console.log(`[Strangler] ${req.method} ${path} -> ${req.routingDecision.type}`);
  
  next();
});

// Create proxy for each target
app.use('/api/users', (req, res, next) => {
  if (req.routingDecision.type === 'microservice') {
    createProxyMiddleware({
      target: routingConfig.microservices['/api/users'],
      changeOrigin: true,
      pathRewrite: { '^/api/users': '' },
      onError: (err, req, res) => {
        // Fallback to monolith on error
        console.error('[Strangler] Microservice error, falling back:', err.message);
        createProxyMiddleware({
          target: routingConfig.monolith,
          changeOrigin: true
        })(req, res, next);
      }
    })(req, res, next);
  } else {
    createProxyMiddleware({
      target: routingConfig.monolith,
      changeOrigin: true
    })(req, res, next);
  }
});

// Catch-all for monolith
app.use(createProxyMiddleware({
  target: routingConfig.monolith,
  changeOrigin: true
}));

app.listen(3000);

Branch by Abstraction

Use when you need to replace a component inside the monolith before extracting it. Create an abstraction layer, swap implementations, then extract. This is complementary to Strangler Fig — Strangler routes traffic externally at the API level, while Branch by Abstraction swaps implementations internally at the code level. Use Strangler for extracting entire endpoints; use Branch by Abstraction for replacing internal dependencies (like swapping an in-app email sender for a notification microservice client).
┌─────────────────────────────────────────────────────────────────────────────┐
│                    BRANCH BY ABSTRACTION                                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  STEP 1: IDENTIFY COMPONENT                                                 │
│  ─────────────────────────────                                              │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │  MONOLITH                                                            │   │
│  │                                                                       │   │
│  │   Code A ──┬──▶ Notification Module ◀──┬── Code C                   │   │
│  │            │                            │                            │   │
│  │   Code B ──┘    (direct dependency)     └── Code D                   │   │
│  │                                                                       │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
│  STEP 2: CREATE ABSTRACTION                                                 │
│  ────────────────────────────                                               │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │  MONOLITH                                                            │   │
│  │                                                                       │   │
│  │   Code A ──┬──▶ NotificationInterface ◀──┬── Code C                 │   │
│  │            │            │                 │                          │   │
│  │   Code B ──┘            ▼                 └── Code D                 │   │
│  │                  OldNotificationImpl                                 │   │
│  │                 (existing code)                                      │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
│  STEP 3: IMPLEMENT NEW VERSION                                              │
│  ────────────────────────────────                                           │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │  MONOLITH                                                            │   │
│  │                              ┌──────────────────┐                   │   │
│  │   Code ──▶ NotificationInterface                │                   │   │
│  │                    │         │ toggle           │                   │   │
│  │            ┌───────┴───────┐ ▼                  ▼                   │   │
│  │            │               │                                         │   │
│  │    OldNotificationImpl   NewNotificationImpl                        │   │
│  │                          (calls microservice)                       │   │
│  │                                                                       │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                        │                                                    │
│                        └────────────────▶ Notification Microservice        │
│                                                                              │
│  STEP 4: MIGRATE AND REMOVE                                                 │
│  ──────────────────────────────                                             │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │  MONOLITH                                                            │   │
│  │                                                                       │   │
│  │   Code ──▶ NotificationClient ─────────▶ Notification Microservice  │   │
│  │                                                                       │   │
│  │   (Old implementation deleted)                                       │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
Caveats & Common Pitfalls: Strangler Fig Stalled Mid-Migration.
  • The half-migrated state becomes the new permanent state. Teams extract 3 services, declare victory on the quarterly review, and then never touch the monolith again. Two years later you have a monolith plus 3 microservices — strictly worse than either pure architecture.
  • Strangler facade becomes a new monolith. Every new feature routes through the facade, which accumulates business logic, becomes the central point of coordination, and recreates the exact coupling you tried to eliminate.
  • Rollback impossible after shared DB removed. Phase 4 of data migration (dropping tables from monolith) is the point of no return. Many teams do this prematurely, before the new service has proven itself, and then cannot roll back when issues emerge.
  • The 80/20 trap. Extracting 20% of the monolith captures 80% of the value, which tempts teams to stop. The remaining 20% is often the most coupled, most load-bearing code, which means you still have all the monolith’s fragility on the 20% that matters most.
Solutions & Patterns: Preventing Migration Stall.
  • Tie extractions to OKRs, not projects. A rolling quarterly OKR (“reduce monolith LOC by 15% this quarter”) prevents the “we extracted 3 services, we are done” reflex.
  • Set a migration end-date and honor it. Commit publicly to “monolith retired by Q4 2027.” Slipping is allowed; abandoning is not. The deadline forces continuous progress.
  • Keep the facade stupid. Routing rules, no business logic, no data transformation. If you need transformation, put it in the calling service or the called service, never the facade.
  • Define the “done” state precisely. “Monolith retired” means: zero production traffic, zero cron jobs, database shut down, repository archived. Anything less is still half-migrated.

Implementation Example

Branch by Abstraction is safer than Strangler Fig for internal components because the swap happens in-process — no network, no distributed transactions, no database-per-service problem during the transition. The migration has five measured phases, and you must not skip any of them. Phase 1 (Abstract): extract an interface around the existing implementation without changing callers’ behavior — this is a pure refactor, all tests pass. Phase 2 (Dual implementation): add the new implementation behind the same interface but keep the old one wired up — you now have two versions of the same contract. Phase 3 (Shadow): run both implementations for every call, use the legacy result as the source of truth, and log any divergence. Do this for at least a week in production before trusting the new implementation. Phase 4 (Cutover): feature-flag a percentage of traffic to use the new implementation as the primary. Ramp 1% -> 10% -> 50% -> 100%. Phase 5 (Cleanup): delete the old implementation and the feature flag — and yes, actually delete it, do not leave it “just in case” because dead code rots and becomes a liability. Decision to roll back: any shadow-compare divergence you cannot explain, or a jump in error rate during the cutover ramp. Decision to proceed: zero unexplained divergences and stable latency for one full week at each ramp step. This pattern avoids the distributed monolith trap precisely because the new implementation lives in the same process — you only introduce the network hop once you are certain the logic is right.
// branch-by-abstraction.js

// STEP 1: Original code (tightly coupled)
// Before: Direct usage of notification module
class OrderService_Before {
  async placeOrder(order) {
    // ... order logic ...
    
    // Direct dependency - hard to change
    const emailer = require('./email-sender');
    emailer.send(order.userId, 'Order placed', orderDetails);
    
    const sms = require('./sms-sender');
    sms.send(order.phone, 'Your order is placed');
  }
}

// =========================================================================

// STEP 2: Create abstraction interface
class NotificationService {
  async sendOrderConfirmation(order) {
    throw new Error('Not implemented');
  }
  
  async sendShippingUpdate(order, trackingInfo) {
    throw new Error('Not implemented');
  }
  
  async sendDeliveryNotification(order) {
    throw new Error('Not implemented');
  }
}

// STEP 3: Implement with old code (same behavior)
class LegacyNotificationService extends NotificationService {
  constructor() {
    super();
    this.emailer = require('./email-sender');
    this.sms = require('./sms-sender');
  }

  async sendOrderConfirmation(order) {
    await this.emailer.send(
      order.userId,
      'Order Confirmation',
      this.formatOrderEmail(order)
    );
    
    if (order.phone) {
      await this.sms.send(order.phone, `Order ${order.id} confirmed!`);
    }
  }

  async sendShippingUpdate(order, trackingInfo) {
    await this.emailer.send(
      order.userId,
      'Your order has shipped',
      this.formatShippingEmail(order, trackingInfo)
    );
  }
}

// STEP 4: Implement with new microservice
class MicroserviceNotificationService extends NotificationService {
  constructor(serviceUrl) {
    super();
    this.baseUrl = serviceUrl;
  }

  async sendOrderConfirmation(order) {
    await fetch(`${this.baseUrl}/notifications/order-confirmation`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        userId: order.userId,
        orderId: order.id,
        items: order.items,
        total: order.total
      })
    });
  }

  async sendShippingUpdate(order, trackingInfo) {
    await fetch(`${this.baseUrl}/notifications/shipping`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        userId: order.userId,
        orderId: order.id,
        trackingNumber: trackingInfo.number,
        carrier: trackingInfo.carrier
      })
    });
  }
}

// STEP 5: Feature flag to switch implementations
class NotificationServiceFactory {
  static create() {
    const useNewService = process.env.USE_NOTIFICATION_MICROSERVICE === 'true';
    
    if (useNewService) {
      return new MicroserviceNotificationService(
        process.env.NOTIFICATION_SERVICE_URL
      );
    }
    
    return new LegacyNotificationService();
  }
}

// STEP 6: Updated order service using abstraction
// Notice how OrderService no longer knows or cares whether notifications go through
// the legacy email sender or a microservice HTTP call. That decision lives in the factory.
// This is the power of the pattern: the consuming code is completely unaware of the migration.
class OrderService {
  constructor() {
    this.notificationService = NotificationServiceFactory.create();
  }

  async placeOrder(order) {
    // ... order logic ...
    
    // Now uses abstraction - can switch implementations
    await this.notificationService.sendOrderConfirmation(order);
  }
}

// Gradual rollout with comparison
class ComparingNotificationService extends NotificationService {
  constructor() {
    super();
    this.legacy = new LegacyNotificationService();
    this.modern = new MicroserviceNotificationService(
      process.env.NOTIFICATION_SERVICE_URL
    );
  }

  async sendOrderConfirmation(order) {
    // Run both, compare results, use legacy as source of truth
    const [legacyResult, modernResult] = await Promise.allSettled([
      this.legacy.sendOrderConfirmation(order),
      this.modern.sendOrderConfirmation(order)
    ]);

    // Log differences for analysis
    if (modernResult.status === 'rejected') {
      console.error('Modern notification failed:', modernResult.reason);
    }

    // Return legacy result (source of truth during migration)
    if (legacyResult.status === 'rejected') {
      throw legacyResult.reason;
    }
    
    return legacyResult.value;
  }
}

Database Migration Strategies

Data is where migrations go to die. The code-level patterns above (Strangler, Branch by Abstraction) are solved problems — they are well-understood, low-risk, and reversible. Database migrations are the opposite: they are high-risk, often irreversible, and every team discovers new failure modes the hard way. The four patterns below form a ladder from “quick but coupled” to “fully decoupled but expensive.” You will climb the ladder over the course of the migration, not jump straight to the top. Key decision points on the ladder: move up a rung when the current pattern’s coupling is blocking a release or degrading reliability; do not move up a rung just because the target pattern is more “architecturally pure.” A team stuck on Pattern 1 (shared DB) for nine months is usually fine; a team that jumped to Pattern 4 (DB per service) in month one is often drowning in eventual-consistency bugs. Do not migrate data ownership before code ownership — that is the distributed monolith trap in its purest form: two services writing to one database, each assuming ownership, corrupting each other’s invariants.
┌─────────────────────────────────────────────────────────────────────────────┐
│                    DATABASE MIGRATION PATTERNS                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  PATTERN 1: SHARED DATABASE (TEMPORARY)                                     │
│  ──────────────────────────────────────                                     │
│                                                                              │
│    ┌───────────┐      ┌───────────┐                                        │
│    │ Monolith  │      │ New Svc   │                                        │
│    └─────┬─────┘      └─────┬─────┘                                        │
│          │                  │                                               │
│          └────────┬─────────┘                                               │
│                   ▼                                                         │
│          ┌───────────────┐                                                 │
│          │ Shared DB     │  ← Both read/write                              │
│          └───────────────┘                                                 │
│                                                                              │
│    Quick start, but creates coupling. Use as stepping stone only.          │
│                                                                              │
│  ═══════════════════════════════════════════════════════════════════════   │
│                                                                              │
│  PATTERN 2: DATABASE VIEW (READ-ONLY)                                       │
│  ────────────────────────────────────                                       │
│                                                                              │
│    ┌───────────┐      ┌───────────┐                                        │
│    │ Monolith  │      │ New Svc   │                                        │
│    └─────┬─────┘      └─────┬─────┘                                        │
│          │ R/W              │ Read-only                                     │
│          ▼                  ▼                                               │
│    ┌─────────────┐   ┌─────────────┐                                       │
│    │ Main DB     │──▶│    View     │                                       │
│    └─────────────┘   └─────────────┘                                       │
│                                                                              │
│    New service reads from view, writes go to monolith via API.             │
│                                                                              │
│  ═══════════════════════════════════════════════════════════════════════   │
│                                                                              │
│  PATTERN 3: DATA SYNCHRONIZATION                                            │
│  ─────────────────────────────────                                          │
│                                                                              │
│    ┌───────────┐  events   ┌───────────┐                                   │
│    │ Monolith  │──────────▶│ New Svc   │                                   │
│    └─────┬─────┘           └─────┬─────┘                                   │
│          │                       │                                          │
│          ▼                       ▼                                          │
│    ┌─────────────┐         ┌─────────────┐                                 │
│    │ Main DB     │  sync   │ Service DB  │                                 │
│    └─────────────┘ ─────▶  └─────────────┘                                 │
│                                                                              │
│    Change Data Capture (CDC) keeps databases in sync.                      │
│                                                                              │
│  ═══════════════════════════════════════════════════════════════════════   │
│                                                                              │
│  PATTERN 4: DATABASE PER SERVICE (TARGET STATE)                             │
│  ───────────────────────────────────────────────                            │
│                                                                              │
│    ┌───────────┐           ┌───────────┐                                   │
│    │ Service A │  events   │ Service B │                                   │
│    └─────┬─────┘◀─────────▶└─────┬─────┘                                   │
│          │                       │                                          │
│          ▼                       ▼                                          │
│    ┌─────────────┐         ┌─────────────┐                                 │
│    │ Database A  │         │ Database B  │                                 │
│    └─────────────┘         └─────────────┘                                 │
│                                                                              │
│    Complete data ownership. Communication via APIs/events.                  │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
Caveats & Common Pitfalls: Data Migration Order.
  • Migrating data before code ownership creates a distributed monolith. Two services writing to one database with unclear ownership silently corrupt each other’s invariants. The bugs do not appear in tests; they appear in production as data drift.
  • Dropping shared DB makes rollback impossible. Once the monolith’s tables are dropped, you cannot revert to “just use the monolith.” Every subsequent incident forces you forward, not back. Do this only after months of new-service stability.
  • Foreign key dependencies violated during extraction. If Orders has a foreign key to Customers, and you extract Customers first, Orders breaks. If you extract Orders first, Customers breaks. Plan the extraction order with the actual FK graph.
  • CDC lag masquerading as consistency. The new service’s database is 2 seconds behind the source. A read immediately after a write returns stale data. Customers see “your order was not created” even though it was, they click again, and now you have duplicate orders.
Solutions & Patterns: Safe Data Migration.
  • Migrate code ownership first; data follows. The service owns the writes before the data moves. Both services write to the same DB during transition, but only one owns schema changes.
  • Extract leaves of the FK graph first. Tables with no incoming FKs (reporting logs, audit tables, product images) are safe to extract early. Core transactional tables (customers, orders) come last.
  • Keep the source DB intact for months after cutover. Even if the new service is primary, retaining the old DB for emergency rollback during the first 90 days is cheap insurance.
  • Measure CDC lag in user-visible terms. “P99 CDC lag under 5 seconds” is an SLO. Track it, alert on it, and build read-your-writes semantics (read from source DB for post-write reads) until CDC lag is provably under user tolerance.

Change Data Capture Implementation

CDC is how mature migrations keep two databases in sync without asking application code to participate. The insight: databases already have a transaction log (WAL in PostgreSQL, binlog in MySQL) that records every change durably before it is visible to queries. CDC tools (Debezium being the industry standard) tail that log, translate each change into an event, and publish it to a message broker. Downstream consumers — including your new microservice’s database — subscribe to the relevant streams and apply changes. Why this is safe: the source database’s transactions remain the single point of truth; CDC is a side-effect-free reader. If the new service falls behind, the monolith keeps working. If the new service’s database gets corrupted, you replay from the WAL. Why this can still go wrong: schema changes in the source break the CDC pipeline if your consumers are not forward-compatible; out-of-order events during rebalancing cause ghost rows; and the “last-write-wins” semantics can silently overwrite newer data when clocks drift. Key decision point to proceed to the next phase: CDC lag stays under 5 seconds at p99 for two weeks under production load, and a reconciliation job reports zero divergence between source and replica. Key decision to roll back: any unexplained lag spike or any divergence in the reconciliation job — do not ramp traffic until the root cause is understood.
// cdc-sync.js - Database synchronization using CDC

const { Kafka } = require('kafkajs');
const { Pool } = require('pg');

// Debezium-style CDC connector (simplified)
class ChangeDataCaptureSync {
  constructor(config) {
    this.sourceDb = new Pool(config.source);
    this.targetDb = new Pool(config.target);
    this.kafka = new Kafka(config.kafka);
    this.producer = this.kafka.producer();
    this.consumer = this.kafka.consumer({ groupId: 'cdc-sync' });
    this.lastPosition = null;
  }

  async startCapture() {
    await this.producer.connect();
    
    // Poll for changes (in production, use Debezium or similar)
    setInterval(async () => {
      await this.captureChanges();
    }, 1000);
  }

  async captureChanges() {
    // Using PostgreSQL logical replication slot
    const changes = await this.sourceDb.query(`
      SELECT * FROM pg_logical_slot_get_changes(
        'cdc_slot',
        NULL,
        NULL,
        'include-xids', '1',
        'include-timestamp', '1'
      )
    `);

    for (const change of changes.rows) {
      const parsed = this.parseChange(change.data);
      
      if (parsed) {
        await this.producer.send({
          topic: `cdc.${parsed.table}`,
          messages: [{
            key: parsed.key,
            value: JSON.stringify({
              operation: parsed.operation,
              before: parsed.before,
              after: parsed.after,
              timestamp: parsed.timestamp
            })
          }]
        });
      }
    }
  }

  async startSync() {
    await this.consumer.connect();
    await this.consumer.subscribe({ topic: /cdc\..*/ });

    await this.consumer.run({
      eachMessage: async ({ topic, message }) => {
        const tableName = topic.replace('cdc.', '');
        const change = JSON.parse(message.value.toString());
        
        await this.applyChange(tableName, change);
      }
    });
  }

  async applyChange(tableName, change) {
    const { operation, before, after } = change;

    try {
      switch (operation) {
        case 'INSERT':
          await this.targetDb.query(
            `INSERT INTO ${tableName} SELECT * FROM json_populate_record(NULL::${tableName}, $1)`,
            [after]
          );
          break;

        case 'UPDATE':
          const setClauses = Object.keys(after)
            .map((key, i) => `${key} = $${i + 1}`)
            .join(', ');
          const values = Object.values(after);
          
          await this.targetDb.query(
            `UPDATE ${tableName} SET ${setClauses} WHERE id = $${values.length + 1}`,
            [...values, after.id]
          );
          break;

        case 'DELETE':
          await this.targetDb.query(
            `DELETE FROM ${tableName} WHERE id = $1`,
            [before.id]
          );
          break;
      }
    } catch (error) {
      console.error(`Failed to apply change to ${tableName}:`, error);
      // Send to dead letter queue for retry
      await this.sendToDeadLetter(tableName, change, error);
    }
  }
}

// Data Migration with verification
class DataMigrator {
  constructor(sourceDb, targetDb) {
    this.sourceDb = sourceDb;
    this.targetDb = targetDb;
  }

  async migrateTable(tableName, options = {}) {
    const batchSize = options.batchSize || 1000;
    let offset = 0;
    let totalMigrated = 0;

    console.log(`Starting migration of ${tableName}...`);

    while (true) {
      // Fetch batch from source
      const batch = await this.sourceDb.query(
        `SELECT * FROM ${tableName} ORDER BY id LIMIT $1 OFFSET $2`,
        [batchSize, offset]
      );

      if (batch.rows.length === 0) break;

      // Transform if needed
      const transformed = options.transform 
        ? batch.rows.map(options.transform)
        : batch.rows;

      // Insert into target
      for (const row of transformed) {
        await this.targetDb.query(
          `INSERT INTO ${tableName} (${Object.keys(row).join(', ')}) 
           VALUES (${Object.keys(row).map((_, i) => `$${i + 1}`).join(', ')})
           ON CONFLICT (id) DO UPDATE SET 
           ${Object.keys(row).map((k, i) => `${k} = $${i + 1}`).join(', ')}`,
          Object.values(row)
        );
      }

      totalMigrated += batch.rows.length;
      offset += batchSize;
      
      console.log(`Migrated ${totalMigrated} rows from ${tableName}`);
    }

    // Verify migration
    await this.verify(tableName);
    
    return totalMigrated;
  }

  async verify(tableName) {
    const sourceCount = await this.sourceDb.query(
      `SELECT COUNT(*) FROM ${tableName}`
    );
    const targetCount = await this.targetDb.query(
      `SELECT COUNT(*) FROM ${tableName}`
    );

    const sourceTotal = parseInt(sourceCount.rows[0].count);
    const targetTotal = parseInt(targetCount.rows[0].count);

    if (sourceTotal !== targetTotal) {
      throw new Error(
        `Verification failed for ${tableName}: ` +
        `source=${sourceTotal}, target=${targetTotal}`
      );
    }

    console.log(`✓ Verified ${tableName}: ${targetTotal} rows`);
    
    // Sample verification
    const sampleRows = await this.sourceDb.query(
      `SELECT * FROM ${tableName} ORDER BY RANDOM() LIMIT 10`
    );

    for (const sourceRow of sampleRows.rows) {
      const targetRow = await this.targetDb.query(
        `SELECT * FROM ${tableName} WHERE id = $1`,
        [sourceRow.id]
      );

      if (targetRow.rows.length === 0) {
        throw new Error(`Missing row ${sourceRow.id} in target ${tableName}`);
      }

      // Deep comparison
      const diff = this.compareRows(sourceRow, targetRow.rows[0]);
      if (diff.length > 0) {
        console.warn(`Row ${sourceRow.id} differs:`, diff);
      }
    }

    console.log(`✓ Sample verification passed for ${tableName}`);
  }

  compareRows(source, target) {
    const diffs = [];
    for (const key of Object.keys(source)) {
      if (JSON.stringify(source[key]) !== JSON.stringify(target[key])) {
        diffs.push({ field: key, source: source[key], target: target[key] });
      }
    }
    return diffs;
  }
}

module.exports = { ChangeDataCaptureSync, DataMigrator };

Dual-Write Patterns

Dual-writes are where well-intentioned migrations silently corrupt data. The naive version — “write to old DB, then write to new DB” — has no atomicity: the process can crash between the two writes, the second DB can be temporarily unreachable, or a retry can double-apply the second write. The result is drift that nobody notices for weeks because both databases independently look healthy. The safe alternative is to write to exactly one store transactionally, and let something else propagate the change. The Transactional Outbox pattern does this by writing the row and an event record in the same DB transaction, then relying on a separate process to read the outbox and publish events. CDC does it by letting the database’s own WAL be the source of events. Both avoid the split-brain problem because there is only ever one authoritative write. Key decision point: if your data is non-financial and tolerates minutes of inconsistency, Outbox is usually enough and is simpler to operate. If you need sub-second sync, or cannot modify application code to write to the outbox table, use CDC. Never use naive dual-writes in production, even “just for now” — the inconsistencies compound faster than you can detect them, and the debugging burden falls on the on-call engineer at 3 AM.
┌─────────────────────────────────────────────────────────────────────────────┐
│                    DUAL-WRITE DURING MIGRATION                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ❌ NAIVE DUAL-WRITE (DON'T DO THIS)                                       │
│  ──────────────────────────────────                                         │
│                                                                              │
│    ┌──────────────────────────────────────────────────────────────────┐    │
│    │  async function saveOrder(order) {                                │    │
│    │    await monolithDb.save(order);  // What if this succeeds...    │    │
│    │    await newServiceDb.save(order); // ...but this fails?         │    │
│    │  }                                                                │    │
│    └──────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│    Problem: No atomicity = data inconsistency                              │
│                                                                              │
│  ═══════════════════════════════════════════════════════════════════════   │
│                                                                              │
│  ✅ TRANSACTIONAL OUTBOX PATTERN                                           │
│  ──────────────────────────────                                             │
│                                                                              │
│    ┌──────────────────────────────────────────────────────────────────┐    │
│    │  BEGIN TRANSACTION                                                │    │
│    │    INSERT INTO orders (...);                                      │    │
│    │    INSERT INTO outbox (event_type, payload);  ← Same transaction │    │
│    │  COMMIT                                                           │    │
│    └──────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│    Background job reads outbox → publishes events → marks processed        │
│                                                                              │
│  ═══════════════════════════════════════════════════════════════════════   │
│                                                                              │
│  ✅ CHANGE DATA CAPTURE (CDC)                                               │
│  ─────────────────────────────                                              │
│                                                                              │
│    ┌───────────────────────────────────────────────────────────────────┐   │
│    │  Monolith DB ──▶ CDC (Debezium) ──▶ Kafka ──▶ New Service DB     │   │
│    └───────────────────────────────────────────────────────────────────┘   │
│                                                                              │
│    Changes captured from DB transaction log. Guaranteed delivery.          │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Safe Dual-Write Implementation

The Outbox implementation below illustrates why shadow mode is the safest starting place: you write to the legacy system, then also try writing to the new one, and compare results asynchronously. The customer’s request succeeds if the legacy write succeeds — the new service’s failure is invisible to them. This lets you accumulate a week of real production data showing the new implementation matches the legacy’s behavior before you trust it with real traffic. Phases of a dual-write migration: (1) Shadow mode — legacy is primary, new is silent; (2) Primary-legacy dual-write — both receive real writes, legacy is source of truth; (3) Primary-new dual-write — both receive writes, new is source of truth, legacy is backup; (4) New-only — legacy is decommissioned. Key decision to advance phases: zero divergence in reconciliation for 7+ days under production load. Key decision to roll back a phase: any divergence you cannot explain within 24 hours, any customer-visible inconsistency, or loss of on-call confidence during business hours. The comparison logic must be reviewed — many teams declare victory because their “diff” function did not flag anything, when actually it was silently ignoring the one field that mattered.
// safe-dual-write.js - Using Outbox Pattern

const { Pool } = require('pg');

class OutboxDualWrite {
  constructor(db) {
    this.db = db;
  }

  // Safe dual-write using outbox
  async saveOrder(order) {
    const client = await this.db.connect();
    
    try {
      await client.query('BEGIN');
      
      // 1. Save to main table
      const result = await client.query(
        `INSERT INTO orders (customer_id, items, total, status)
         VALUES ($1, $2, $3, $4)
         RETURNING *`,
        [order.customerId, JSON.stringify(order.items), order.total, 'pending']
      );
      
      const savedOrder = result.rows[0];
      
      // 2. Write to outbox (same transaction!)
      await client.query(
        `INSERT INTO outbox (aggregate_type, aggregate_id, event_type, payload, created_at)
         VALUES ($1, $2, $3, $4, NOW())`,
        [
          'Order',
          savedOrder.id,
          'OrderCreated',
          JSON.stringify({
            orderId: savedOrder.id,
            customerId: savedOrder.customer_id,
            items: order.items,
            total: order.total
          })
        ]
      );
      
      await client.query('COMMIT');
      
      return savedOrder;
    } catch (error) {
      await client.query('ROLLBACK');
      throw error;
    } finally {
      client.release();
    }
  }
}

// Outbox processor - publishes events to message broker
class OutboxProcessor {
  constructor(db, eventPublisher) {
    this.db = db;
    this.eventPublisher = eventPublisher;
    this.running = false;
  }

  async start() {
    this.running = true;
    
    while (this.running) {
      await this.processOutbox();
      await this.sleep(100); // Poll interval
    }
  }

  async processOutbox() {
    const client = await this.db.connect();
    
    try {
      // Lock and fetch unprocessed events
      await client.query('BEGIN');
      
      const events = await client.query(
        `SELECT * FROM outbox 
         WHERE processed_at IS NULL 
         ORDER BY created_at 
         LIMIT 100
         FOR UPDATE SKIP LOCKED`
      );

      for (const event of events.rows) {
        try {
          // Publish to message broker
          await this.eventPublisher.publish({
            topic: `${event.aggregate_type.toLowerCase()}.events`,
            key: event.aggregate_id.toString(),
            value: {
              eventId: event.id,
              eventType: event.event_type,
              aggregateType: event.aggregate_type,
              aggregateId: event.aggregate_id,
              payload: event.payload,
              timestamp: event.created_at
            }
          });

          // Mark as processed
          await client.query(
            `UPDATE outbox SET processed_at = NOW() WHERE id = $1`,
            [event.id]
          );
        } catch (error) {
          console.error(`Failed to process outbox event ${event.id}:`, error);
          // Will be retried on next poll
        }
      }
      
      await client.query('COMMIT');
    } catch (error) {
      await client.query('ROLLBACK');
      console.error('Outbox processing error:', error);
    } finally {
      client.release();
    }
  }

  stop() {
    this.running = false;
  }

  sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}

// Migration-specific dual-write with comparison
class MigrationDualWrite {
  constructor(legacyService, newService) {
    this.legacy = legacyService;
    this.new = newService;
    this.mode = 'shadow'; // shadow | primary-legacy | primary-new | new-only
  }

  async write(data) {
    switch (this.mode) {
      case 'shadow':
        // Write to legacy (primary), shadow write to new, compare
        return this.shadowWrite(data);
        
      case 'primary-legacy':
        // Write to both, legacy is source of truth
        return this.dualWriteLegacyPrimary(data);
        
      case 'primary-new':
        // Write to both, new is source of truth
        return this.dualWriteNewPrimary(data);
        
      case 'new-only':
        // Only write to new service
        return this.new.write(data);
    }
  }

  async shadowWrite(data) {
    // Primary write
    const legacyResult = await this.legacy.write(data);
    
    // Shadow write (fire and forget, with logging)
    this.new.write(data)
      .then(newResult => {
        // Compare results
        const diff = this.compare(legacyResult, newResult);
        if (diff.length > 0) {
          console.warn('Shadow write difference detected:', diff);
          this.metrics.recordDiff('write', diff);
        }
      })
      .catch(error => {
        console.error('Shadow write failed:', error);
        this.metrics.recordError('shadow-write', error);
      });
    
    return legacyResult;
  }

  async read(id) {
    switch (this.mode) {
      case 'shadow':
      case 'primary-legacy':
        // Read from legacy, shadow read from new for comparison
        const legacyData = await this.legacy.read(id);
        
        this.new.read(id).then(newData => {
          const diff = this.compare(legacyData, newData);
          if (diff.length > 0) {
            console.warn(`Read difference for ${id}:`, diff);
          }
        }).catch(() => {});
        
        return legacyData;
        
      case 'primary-new':
      case 'new-only':
        return this.new.read(id);
    }
  }

  compare(legacy, current) {
    const diffs = [];
    const keys = new Set([...Object.keys(legacy), ...Object.keys(current)]);
    
    for (const key of keys) {
      if (JSON.stringify(legacy[key]) !== JSON.stringify(current[key])) {
        diffs.push({
          field: key,
          legacy: legacy[key],
          new: current[key]
        });
      }
    }
    
    return diffs;
  }
}

module.exports = { OutboxDualWrite, OutboxProcessor, MigrationDualWrite };

Interview Questions

Answer:A migration pattern where new functionality is built around the existing system, gradually replacing it.Steps:
  1. Add a facade/proxy in front of monolith
  2. Extract one feature to microservice
  3. Route that feature’s traffic to new service
  4. Repeat until monolith is empty
  5. Remove monolith
Key benefits:
  • Zero downtime migration
  • Gradual, low-risk
  • Can stop/pause anytime
  • Immediate value from extracted services
Named after: Strangler fig plants that grow around host trees.
Answer:Progressive approach:
  1. Shared database (temporary)
    • Quick start
    • Both read/write to same DB
  2. Database view
    • New service reads from view
    • Writes via API to monolith
  3. CDC synchronization
    • Capture changes from source
    • Replay to new database
    • Eventually consistent
  4. Database per service
    • Full data ownership
    • Communication via APIs/events
Key principle: Migrate ownership, not just data.
Answer:Problem: No atomicity between two writes.
await db1.save(data);  // Succeeds
await db2.save(data);  // Fails! Data inconsistent
Solutions:
  1. Outbox pattern:
    • Single DB transaction
    • Write data + event to outbox
    • Background processor publishes events
  2. Change Data Capture (CDC):
    • Capture changes from DB transaction log
    • Stream to message broker
    • New service consumes and applies
  3. Saga pattern:
    • Compensating transactions
    • Eventually consistent
Answer:A technique to replace a component inside a system safely.Steps:
  1. Create abstraction interface
  2. Implement interface with existing code
  3. Change callers to use interface
  4. Create new implementation
  5. Switch implementations (feature flag)
  6. Remove old implementation
Use when:
  • Replacing internal components
  • Need to run old/new in parallel
  • Want to compare implementations
Difference from Strangler:
  • Strangler: External facade, route traffic
  • Branch: Internal abstraction, swap implementations

Chapter Summary

Key Takeaways:
  • Never do big-bang rewrites - use incremental patterns
  • Strangler Fig: Route traffic gradually to new services
  • Branch by Abstraction: Swap internal implementations safely
  • Database migration is the hardest part - plan carefully
  • Use CDC or Outbox pattern, never naive dual-writes
  • Always have rollback capability
Next Chapter: Event Sourcing Deep Dive - Advanced event storage and CQRS patterns.

Interview Questions with Structured Answers

Strong Answer Framework:
  1. Establish baseline metrics first. Deploy frequency, deploy failure rate, incident frequency, mean time to recovery, engineer productivity signals (PR merge time, merge conflict rate). These become your success criteria.
  2. Create a steering committee. CTO, 2-3 senior engineers, product lead. Meets monthly. Has authority to halt the migration if metrics regress.
  3. Invest months 1-3 in platform foundations. CI/CD that supports multiple deployable artifacts, observability (tracing across services), service template, and on-call tooling. Without these, the first extraction is a disaster.
  4. Pick the pilot service carefully. A leaf module (notifications, image resizing, reporting) with clear boundaries, moderate business value, and a team willing to own it. Extract it using Strangler Fig with shadow mode for 2 weeks before any traffic cutover.
  5. Extract 1-2 more services in months 6-12. Apply lessons from the pilot. Do not accelerate; each extraction should feel easier than the last, not harder.
  6. Feature work continues in parallel. Feature teams continue to ship in the monolith. Only the migration team works on extractions. New features that fit the extracted domain go into the new service.
  7. Month 12-18: decision point. Review metrics against baseline. If deploy frequency is up, incident rate is flat or down, and team productivity is up, continue the migration. If not, either fix the root cause or halt.
Real-World Example: Etsy (roughly 2016-2019). Etsy had a large PHP monolith and migrated incrementally to a services architecture over several years. They explicitly documented that feature work continued throughout, and that the migration team was a specific set of engineers rather than everyone being expected to migrate their own code. They reported deploying hundreds of times per day by the end, versus once-per-day in monolith days.Senior Follow-up Questions:
Q: “How do you handle the monolith’s shared database during the 18 months?”A: Four phases per extracted domain. Phase 1: the new service reads and writes to the monolith’s DB via the monolith’s existing models (no new DB yet). This validates the code extraction without the data risk. Phase 2: new service writes to its own DB and shadow-writes to the monolith DB, with monolith DB as source of truth. Phase 3: CDC pipeline keeps the monolith DB updated from the new service’s DB for any remaining monolith readers. Phase 4 (only after 3 months of stability): drop the tables from the monolith DB. Each phase is reversible.
Q: “The CTO wants to see progress at 6 months. What do you show?”A: Concrete metrics, not counts of services extracted. Show: (1) the pilot service is handling X% of traffic for its domain with equal or better p99 latency; (2) the monolith has a documented bounded context map that the team agrees on; (3) platform foundations (tracing, deployment pipeline, service template) are operational; (4) the pilot team reports faster feature delivery in the new service. Avoid showing “we extracted 3 services” without productivity or reliability data; extraction count without outcomes is vanity.
Q: “Halfway through, the migration team is burned out and services are harder to extract than the pilot suggested. What do you do?”A: Halt. Hold a retrospective. Likely the remaining code is more coupled than the leaf pilot, and naive extraction attempts are failing. Options: (a) invest 1-2 quarters in improving the monolith’s internal modularity (Branch by Abstraction) to make future extractions cheaper; (b) reduce the migration scope — maybe the monolith’s core 200K lines stay as a modular monolith forever, and only 5-6 services get extracted; (c) reassess whether the migration is actually solving the original pain points and consider stopping. Finishing the migration is not the goal; solving the original pain is.
Common Wrong Answers:
  • “Rewrite everything from scratch in Go over 18 months.” Big bang anti-pattern. Feature work stops, scope inflates, and the project fails. Interviewers asking this question are testing whether you recognize the anti-pattern.
  • “Extract 20 services in the first 6 months.” Too aggressive. You do not yet have the platform foundations to operate 20 services. Each extraction that outpaces operational readiness creates incidents.
Further Reading:
  • “Monolith to Microservices” by Sam Newman — the definitive pattern catalog for migrations
  • “Migrating from a Monolith to Microservices at Etsy” — multiple engineering blog posts and conference talks
  • “Working Effectively with Legacy Code” by Michael Feathers — still the best book on safely refactoring existing systems
Strong Answer Framework:
  1. Acknowledge the economic reality. The first 40% was the easy 40%. The remaining 60% is harder, and the team has discovered that microservices have real costs. The stall is rational.
  2. Evaluate whether to continue at all. Would a modular monolith with 3-5 extracted services be better than pushing for 100% extraction? Often yes.
  3. Identify the specific blockers. Is it technical (deep coupling), organizational (no team owns the hardest module), or political (stakeholders do not see value)?
  4. Propose three options. Continue (with remediation for the specific blockers), pause (declare the hybrid state the target, document it), or retreat (re-integrate some services back into the monolith if the extraction did not deliver value).
  5. Make the decision with data. Measure current-state productivity and reliability. If the hybrid state is working, stopping there is a valid outcome.
Real-World Example: Segment (2018). After migrating to 140+ microservices, Segment famously reversed course and consolidated back to a monolith. They wrote the “Goodbye Microservices” postmortem explaining why: their team of ~10 engineers could not operate the services, each service had low utilization making costs high, and inter-service complexity exceeded the complexity of a well-structured monolith. The pivot back took effort but was correct; they shipped more after the consolidation than during the microservices era.Senior Follow-up Questions:
Q: “Is it ever OK to stop a migration at 40%?”A: Yes, if the stopping point is a stable architecture, not a half-finished project. A “modular monolith plus 5 extracted services” can be a valid long-term architecture if: the extracted services are the ones that genuinely benefit from separate deployment (different scale, different team ownership), the remaining monolith is well-modularized internally, and the boundaries between them are stable. The failure mode to avoid is “we stopped but the half-state is chaos” — that is not a stopping point, it is abandonment.
Q: “How do you re-integrate a microservice back into a monolith if extraction was a mistake?”A: Reverse strangler fig. Step 1: the monolith gains a copy of the service’s code (or equivalent logic). Step 2: feature flag routes traffic to the monolith’s copy, starting at 1%, ramping to 100%. Step 3: the microservice is decommissioned. Data migration is the hard part: if the microservice has its own DB, that data must be imported back into the monolith’s DB. Segment documented this process; it takes weeks to months but is straightforward if done incrementally.
Q: “The hardest module to extract is the one that every other module depends on. How do you handle it?”A: Usually you do not extract it. A module that everything depends on (often user/auth, or some core domain concept) is either (a) something that should remain a shared library consumed by all services, (b) should be wrapped in a thin API service with minimal logic, not extracted in full. The “everything depends on it” property means extracting it creates a huge blast radius for any failure in the new service. Leave it in the monolith or turn it into a library.
Common Wrong Answers:
  • “Push through and finish the migration; do not lose momentum.” Sunk cost fallacy. The right answer is to evaluate current state versus end-state honestly, not to push because you already started.
  • “Fire the team and hire consultants.” Addresses neither the technical nor organizational root cause. The stall is usually caused by the migration plan being wrong, not the team being incompetent.
Further Reading:
  • “Goodbye Microservices: From 100s of problem children to 1 superstar” — Segment, 2018
  • “Monolith Decomposition Patterns” by Sam Newman — covers when to stop as well as when to extract
  • “The Majestic Monolith” — DHH’s essay on why modular monoliths are underrated
Strong Answer Framework:
  1. Diagnose the severity. Are the fields disjoint per-row (service A owns customers 1-1000, service B owns 1001-2000) or interleaved (both services write to all customers)? Interleaved is the distributed monolith nightmare.
  2. Measure actual drift. Write a reconciliation job that compares writes over 48 hours. How often do conflicting writes occur? Is it 1 in 10,000 or 1 in 10?
  3. Identify the “unified owner.” For each overlapping field, decide which service is the source of truth going forward. Every write of that field from the other service becomes an API call instead.
  4. Introduce the ownership layer before removing duplicates. Both services route writes through a shared DB procedure, Kafka topic, or API gateway that enforces “only one writer per field.” You cannot stop duplicate writes instantaneously; you must funnel them through a chokepoint first.
  5. Plan the cleanup. Once all writes flow through the chokepoint and data is converging, remove the chokepoint and let the service own its fields directly.
  6. Pause new features in this area. New features that touch customer fields must wait until ownership is clear. Otherwise every new feature adds to the mess.
Real-World Example: Salesforce (approximately 2017). Salesforce documented a multi-year effort to untangle overlapping writes between their monolith and extracted services. The fix involved creating “ownership adapters” that enforced which service could write which field, with the adapter layer logging violations so the team could find leftover code paths. The cleanup took 18 months after the original extraction.Senior Follow-up Questions:
Q: “What if you cannot pause feature work in this area during the cleanup?”A: Then you accept a longer cleanup period with more divergence risk. Mitigate by: (a) running the reconciliation job continuously and alerting on drift above a threshold; (b) requiring all new feature code to go through the new ownership layer (enforced via code review and CI checks); (c) scheduling a “drift cleanup” sprint every quarter to catch up. The cleanup will take 2-3x longer but it will happen. Alternative: frame the continued feature work as actively making the migration worse, and use that to negotiate a feature freeze — leadership usually agrees once they see the cost.
Q: “The monolith writes customer.email directly to the database. The new service writes customer.email via the monolith’s API. How do you migrate the monolith writes to go through the new service?”A: Branch by abstraction inside the monolith. Step 1: find every place the monolith writes customer.email directly. Step 2: introduce a CustomerEmailWriter interface with two implementations: LegacyDirectWriter (current behavior) and ServiceApiWriter (calls new service). Step 3: switch the monolith to use the interface, controlled by feature flag. Step 4: ramp the feature flag from 0% to 100% while monitoring for drift. Step 5: remove LegacyDirectWriter and the flag. Same approach for every field that needs ownership migrated.
Q: “Would event-driven architecture have prevented this?”A: Partially. Events enforce that state changes are observable, which makes drift easier to detect. But events do not enforce single ownership — two services can both emit CustomerEmailUpdated events, and consumers have to pick which to trust. The ownership discipline is still required. Event-driven architecture + clear aggregate ownership (only one service is allowed to emit a given event type) is the combination that prevents this problem. Events alone are not enough.
Common Wrong Answers:
  • “Just use distributed transactions (2PC) to keep them in sync.” 2PC is poorly supported in modern microservice stacks, blocks on the slowest participant, and does not solve the ownership question. The problem is ownership, not synchronization.
  • “The database will resolve it with last-write-wins.” Last-write-wins silently drops the losing write. If both writes are valid from their respective services’ perspectives, you are losing business data.
Further Reading:
  • “Saga Pattern” chapter in “Microservices Patterns” by Chris Richardson
  • “Data Management in a Microservice Architecture” — Chris Richardson’s microservices.io site
  • “Event-Driven Microservices” by Adam Bellemare — covers ownership patterns with events

Interview Deep-Dive

Strong Answer:My first step is not writing any code. It is mapping the monolith’s domain boundaries by analyzing three things: the codebase structure (which modules import from which), the database schema (which tables are joined together in queries), and the team structure (which team owns which features and where the merge conflicts happen).Month 1-2: Domain analysis. I run an Event Storming workshop with domain experts to identify bounded contexts. I analyze git logs to find modules that change together (high coupling) and modules that change independently (good extraction candidates). I produce a migration priority matrix: each candidate is scored on extraction difficulty (data coupling, API surface area) and business value (deployment independence, scaling needs).Month 3-4: Extract the first service. I pick the module with the lowest coupling and highest independent value — often something like Notification or Search. I use the Strangler Fig pattern: deploy the new service alongside the monolith, route traffic through an API proxy, and gradually shift requests to the new service. The monolith retains a stub that forwards to the new service, so the extraction is transparent to consumers.Month 5-6: Stabilize and learn. The first extraction teaches you everything you do not know about your infrastructure: do you have CI/CD that handles multiple services? Can your monitoring track cross-service requests? Do your tests cover the integration points? I fix every gap before extracting the second service.Month 7-12: Extract 2-3 more services, applying the lessons from the first. By month 12, I expect to have 3-5 extracted services, a proven migration playbook, and a clear understanding of the remaining work. The monolith still exists and handles 70-80% of the functionality — that is fine. The goal at 12 months is not completion; it is momentum and confidence.What I explicitly do NOT do: hire a separate team to rewrite the monolith in parallel. That is the big-bang rewrite anti-pattern. The migration team is the same team that owns the monolith.Follow-up: “How do you keep the monolith maintainable during the 2-3 year migration period when some functionality is in the monolith and some is in services?”This is the hardest part of any migration. The monolith becomes a hybrid system: some features are native, some are forwarded to external services. I keep it maintainable by: (1) introducing an anti-corruption layer inside the monolith that abstracts the external service calls behind the same interfaces the monolith already uses — existing code does not need to know whether it is calling a local module or an external service; (2) writing feature flags that allow instant rollback from the external service to the monolith’s original implementation if the service has issues; (3) maintaining clear documentation of which features live where, updated as part of every extraction PR. The worst outcome is an undocumented hybrid where nobody knows which codepath is active.
Strong Answer:The Strangler Fig pattern extracts functionality incrementally by routing requests through a proxy that directs traffic to either the monolith or the new service. Over time, more routes go to the new service until the monolith feature is completely replaced.The traffic routing phase is well-understood: put an API gateway or reverse proxy in front of the monolith, configure it to route specific paths to the new service, and gradually migrate paths. The real challenge is data migration — the new service needs its own database, but the monolith’s database has the data.I use a four-phase data migration approach. Phase one: the new service reads from its own database but writes to both its database AND the monolith’s database (via an API call or shared event). The monolith’s database remains the source of truth. This is the “shadow write” phase where you validate that the new service’s writes are correct by comparing them against the monolith’s data.Phase two: the new service becomes the source of truth for writes. It writes to its own database, and a CDC pipeline (Debezium) synchronizes changes back to the monolith’s database for any remaining monolith features that read this data. The monolith’s database is now a read replica.Phase three: migrate all read traffic to the new service. Update any remaining monolith features that read this data to call the new service’s API instead of querying the local database. Once all readers are migrated, the CDC synchronization can be disabled.Phase four: drop the tables from the monolith’s database. This is the “clean up” phase that teams often forget, leaving orphaned tables that confuse future developers.The key principle: at every phase, you can roll back. If the new service fails during phase two, you re-enable writes to the monolith’s database and restore it as the source of truth. Reversibility is non-negotiable for data migrations.Follow-up: “What about referential integrity? The monolith has foreign keys between the tables you are extracting and tables that remain in the monolith.”You break the foreign keys. This is uncomfortable for teams used to relational integrity guarantees, but it is unavoidable in microservices. The extracted service stores an ID reference (customer_id) without a foreign key constraint. Referential integrity is maintained through application-level checks (verify the customer exists via API before creating the order) and eventual consistency monitoring (a reconciliation job that checks for orphaned references). The foreign keys in the monolith’s database are dropped as part of the extraction, and replaced with soft references that are validated by the application layer.
Strong Answer:Branch by Abstraction is a technique for replacing an internal implementation within the monolith without creating a separate service. You introduce an abstraction (interface) around the component you want to replace, implement the new version behind the same interface, and use a feature flag to switch between old and new implementations. Once the new implementation is validated, you remove the old one.Strangler Fig works at the API/routing level — you redirect external traffic from the monolith to a new service. Branch by Abstraction works at the code level — you swap an internal implementation within the same codebase.I use Branch by Abstraction when: the component being replaced is internal (not exposed as an API), the replacement is a technology change rather than a service extraction (swapping a homegrown email sender for SendGrid, replacing a custom caching layer with Redis, migrating from one ORM to another), or when the team is not ready for the operational complexity of a separate service but needs to modernize the internals.The process: Step one, identify the component and all its callers within the monolith. Step two, create an interface that abstracts the component’s API (if one does not exist). Step three, refactor all callers to use the interface. Step four, create the new implementation behind the interface. Step five, use a feature flag to route some traffic to the new implementation. Step six, monitor and gradually increase the new implementation’s traffic share. Step seven, remove the old implementation and the feature flag.The critical advantage over Strangler Fig: no data migration. Since both implementations live in the same application, they can share the same database during the transition. This makes Branch by Abstraction significantly lower risk for internal component replacements.Follow-up: “Can you combine Branch by Abstraction and Strangler Fig in the same migration?”Yes, and this is actually the most common pattern in practice. Phase one: use Branch by Abstraction to introduce clean interfaces around the component within the monolith. This decouples the component from the rest of the codebase without any infrastructure changes. Phase two: use Strangler Fig to extract the component (now behind a clean interface) into its own service. The interface becomes the service’s API contract. The combination works because Branch by Abstraction handles the code-level decoupling (hardest to do safely) while Strangler Fig handles the infrastructure-level separation (straightforward once the code boundaries are clean).