Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Migration Patterns
Migrating from a monolith to microservices is a journey, not a destination. The number one cause of failed migrations is not technical — it is impatience. Teams attempt “big bang” rewrites, run two systems in parallel for too long, or extract the wrong service first. The patterns in this chapter (Strangler Fig, Branch by Abstraction, CDC-based data sync) all share one philosophy: make the migration reversible at every step. If you cannot roll back any individual change, you are taking on risk that no pattern can save you from.- Understand why incremental migration beats big-bang rewrites
- Master the Strangler Fig pattern
- Implement Branch by Abstraction
- Learn database migration strategies
- Handle dual-writes and data synchronization
Why Migrations Fail
Before reaching for any pattern, understand why most migrations go sideways. The failure modes below are not rare edge cases — they are the default outcome unless you actively steer away from them. Every experienced migration leader has scars from at least one of these. The common thread: each anti-pattern is born from an entirely reasonable-looking decision that compounds into disaster over months. Big-bang rewrites feel clean. Splitting by technical layer feels organized. Migrating data first feels like “getting the hard part over with.” All three are traps.The Strangler Fig Pattern
Implementation
The strangler facade is the safest primitive in the migration toolbox because it is fundamentally additive. In Phase 1 you introduce a reverse proxy with every route still pointing at the monolith — behavior is byte-for-byte identical, but you now own the routing layer. Phase 2 begins the real migration: a single endpoint is routed to a new service behind a feature flag, starting at 1% of traffic. If error rates or latency degrade, you flip the flag back to 0% and traffic returns to the monolith in seconds — no deploy, no rollback ceremony. You then ramp to 5%, 25%, 50%, 100% over days or weeks, watching dashboards and error budgets at each step. Key decision points to proceed: new-service error rate is within 1x of monolith, p99 latency is within 20%, business metrics (conversion, checkout success) are flat or better. Decision points to roll back: any hard error spike, data divergence detected in shadow comparisons, or an on-call page during ramp. Do not skip percentages and do not rush — the whole point of this pattern is that the ramp is slow enough to catch problems cheaply.- Node.js
- Python
Branch by Abstraction
Implementation Example
Branch by Abstraction is safer than Strangler Fig for internal components because the swap happens in-process — no network, no distributed transactions, no database-per-service problem during the transition. The migration has five measured phases, and you must not skip any of them. Phase 1 (Abstract): extract an interface around the existing implementation without changing callers’ behavior — this is a pure refactor, all tests pass. Phase 2 (Dual implementation): add the new implementation behind the same interface but keep the old one wired up — you now have two versions of the same contract. Phase 3 (Shadow): run both implementations for every call, use the legacy result as the source of truth, and log any divergence. Do this for at least a week in production before trusting the new implementation. Phase 4 (Cutover): feature-flag a percentage of traffic to use the new implementation as the primary. Ramp 1% -> 10% -> 50% -> 100%. Phase 5 (Cleanup): delete the old implementation and the feature flag — and yes, actually delete it, do not leave it “just in case” because dead code rots and becomes a liability. Decision to roll back: any shadow-compare divergence you cannot explain, or a jump in error rate during the cutover ramp. Decision to proceed: zero unexplained divergences and stable latency for one full week at each ramp step. This pattern avoids the distributed monolith trap precisely because the new implementation lives in the same process — you only introduce the network hop once you are certain the logic is right.- Node.js
- Python
Database Migration Strategies
Data is where migrations go to die. The code-level patterns above (Strangler, Branch by Abstraction) are solved problems — they are well-understood, low-risk, and reversible. Database migrations are the opposite: they are high-risk, often irreversible, and every team discovers new failure modes the hard way. The four patterns below form a ladder from “quick but coupled” to “fully decoupled but expensive.” You will climb the ladder over the course of the migration, not jump straight to the top. Key decision points on the ladder: move up a rung when the current pattern’s coupling is blocking a release or degrading reliability; do not move up a rung just because the target pattern is more “architecturally pure.” A team stuck on Pattern 1 (shared DB) for nine months is usually fine; a team that jumped to Pattern 4 (DB per service) in month one is often drowning in eventual-consistency bugs. Do not migrate data ownership before code ownership — that is the distributed monolith trap in its purest form: two services writing to one database, each assuming ownership, corrupting each other’s invariants.Change Data Capture Implementation
CDC is how mature migrations keep two databases in sync without asking application code to participate. The insight: databases already have a transaction log (WAL in PostgreSQL, binlog in MySQL) that records every change durably before it is visible to queries. CDC tools (Debezium being the industry standard) tail that log, translate each change into an event, and publish it to a message broker. Downstream consumers — including your new microservice’s database — subscribe to the relevant streams and apply changes. Why this is safe: the source database’s transactions remain the single point of truth; CDC is a side-effect-free reader. If the new service falls behind, the monolith keeps working. If the new service’s database gets corrupted, you replay from the WAL. Why this can still go wrong: schema changes in the source break the CDC pipeline if your consumers are not forward-compatible; out-of-order events during rebalancing cause ghost rows; and the “last-write-wins” semantics can silently overwrite newer data when clocks drift. Key decision point to proceed to the next phase: CDC lag stays under 5 seconds at p99 for two weeks under production load, and a reconciliation job reports zero divergence between source and replica. Key decision to roll back: any unexplained lag spike or any divergence in the reconciliation job — do not ramp traffic until the root cause is understood.- Node.js
- Python
Dual-Write Patterns
Dual-writes are where well-intentioned migrations silently corrupt data. The naive version — “write to old DB, then write to new DB” — has no atomicity: the process can crash between the two writes, the second DB can be temporarily unreachable, or a retry can double-apply the second write. The result is drift that nobody notices for weeks because both databases independently look healthy. The safe alternative is to write to exactly one store transactionally, and let something else propagate the change. The Transactional Outbox pattern does this by writing the row and an event record in the same DB transaction, then relying on a separate process to read the outbox and publish events. CDC does it by letting the database’s own WAL be the source of events. Both avoid the split-brain problem because there is only ever one authoritative write. Key decision point: if your data is non-financial and tolerates minutes of inconsistency, Outbox is usually enough and is simpler to operate. If you need sub-second sync, or cannot modify application code to write to the outbox table, use CDC. Never use naive dual-writes in production, even “just for now” — the inconsistencies compound faster than you can detect them, and the debugging burden falls on the on-call engineer at 3 AM.Safe Dual-Write Implementation
The Outbox implementation below illustrates why shadow mode is the safest starting place: you write to the legacy system, then also try writing to the new one, and compare results asynchronously. The customer’s request succeeds if the legacy write succeeds — the new service’s failure is invisible to them. This lets you accumulate a week of real production data showing the new implementation matches the legacy’s behavior before you trust it with real traffic. Phases of a dual-write migration: (1) Shadow mode — legacy is primary, new is silent; (2) Primary-legacy dual-write — both receive real writes, legacy is source of truth; (3) Primary-new dual-write — both receive writes, new is source of truth, legacy is backup; (4) New-only — legacy is decommissioned. Key decision to advance phases: zero divergence in reconciliation for 7+ days under production load. Key decision to roll back a phase: any divergence you cannot explain within 24 hours, any customer-visible inconsistency, or loss of on-call confidence during business hours. The comparison logic must be reviewed — many teams declare victory because their “diff” function did not flag anything, when actually it was silently ignoring the one field that mattered.- Node.js
- Python
Interview Questions
Q1: What is the Strangler Fig pattern?
Q1: What is the Strangler Fig pattern?
- Add a facade/proxy in front of monolith
- Extract one feature to microservice
- Route that feature’s traffic to new service
- Repeat until monolith is empty
- Remove monolith
- Zero downtime migration
- Gradual, low-risk
- Can stop/pause anytime
- Immediate value from extracted services
Q2: How do you handle database during migration?
Q2: How do you handle database during migration?
-
Shared database (temporary)
- Quick start
- Both read/write to same DB
-
Database view
- New service reads from view
- Writes via API to monolith
-
CDC synchronization
- Capture changes from source
- Replay to new database
- Eventually consistent
-
Database per service
- Full data ownership
- Communication via APIs/events
Q3: What's wrong with dual-writes?
Q3: What's wrong with dual-writes?
-
Outbox pattern:
- Single DB transaction
- Write data + event to outbox
- Background processor publishes events
-
Change Data Capture (CDC):
- Capture changes from DB transaction log
- Stream to message broker
- New service consumes and applies
-
Saga pattern:
- Compensating transactions
- Eventually consistent
Q4: What is Branch by Abstraction?
Q4: What is Branch by Abstraction?
- Create abstraction interface
- Implement interface with existing code
- Change callers to use interface
- Create new implementation
- Switch implementations (feature flag)
- Remove old implementation
- Replacing internal components
- Need to run old/new in parallel
- Want to compare implementations
- Strangler: External facade, route traffic
- Branch: Internal abstraction, swap implementations
Chapter Summary
- Never do big-bang rewrites - use incremental patterns
- Strangler Fig: Route traffic gradually to new services
- Branch by Abstraction: Swap internal implementations safely
- Database migration is the hardest part - plan carefully
- Use CDC or Outbox pattern, never naive dual-writes
- Always have rollback capability
Interview Questions with Structured Answers
You have a 500K-line PHP monolith. The CTO wants microservices. Design the 18-month migration plan without stopping feature work.
You have a 500K-line PHP monolith. The CTO wants microservices. Design the 18-month migration plan without stopping feature work.
- Establish baseline metrics first. Deploy frequency, deploy failure rate, incident frequency, mean time to recovery, engineer productivity signals (PR merge time, merge conflict rate). These become your success criteria.
- Create a steering committee. CTO, 2-3 senior engineers, product lead. Meets monthly. Has authority to halt the migration if metrics regress.
- Invest months 1-3 in platform foundations. CI/CD that supports multiple deployable artifacts, observability (tracing across services), service template, and on-call tooling. Without these, the first extraction is a disaster.
- Pick the pilot service carefully. A leaf module (notifications, image resizing, reporting) with clear boundaries, moderate business value, and a team willing to own it. Extract it using Strangler Fig with shadow mode for 2 weeks before any traffic cutover.
- Extract 1-2 more services in months 6-12. Apply lessons from the pilot. Do not accelerate; each extraction should feel easier than the last, not harder.
- Feature work continues in parallel. Feature teams continue to ship in the monolith. Only the migration team works on extractions. New features that fit the extracted domain go into the new service.
- Month 12-18: decision point. Review metrics against baseline. If deploy frequency is up, incident rate is flat or down, and team productivity is up, continue the migration. If not, either fix the root cause or halt.
- “Rewrite everything from scratch in Go over 18 months.” Big bang anti-pattern. Feature work stops, scope inflates, and the project fails. Interviewers asking this question are testing whether you recognize the anti-pattern.
- “Extract 20 services in the first 6 months.” Too aggressive. You do not yet have the platform foundations to operate 20 services. Each extraction that outpaces operational readiness creates incidents.
- “Monolith to Microservices” by Sam Newman — the definitive pattern catalog for migrations
- “Migrating from a Monolith to Microservices at Etsy” — multiple engineering blog posts and conference talks
- “Working Effectively with Legacy Code” by Michael Feathers — still the best book on safely refactoring existing systems
Your strangler fig migration has stalled at 40%. The monolith still handles the hard business logic and no one wants to extract more. What do you do?
Your strangler fig migration has stalled at 40%. The monolith still handles the hard business logic and no one wants to extract more. What do you do?
- Acknowledge the economic reality. The first 40% was the easy 40%. The remaining 60% is harder, and the team has discovered that microservices have real costs. The stall is rational.
- Evaluate whether to continue at all. Would a modular monolith with 3-5 extracted services be better than pushing for 100% extraction? Often yes.
- Identify the specific blockers. Is it technical (deep coupling), organizational (no team owns the hardest module), or political (stakeholders do not see value)?
- Propose three options. Continue (with remediation for the specific blockers), pause (declare the hybrid state the target, document it), or retreat (re-integrate some services back into the monolith if the extraction did not deliver value).
- Make the decision with data. Measure current-state productivity and reliability. If the hybrid state is working, stopping there is a valid outcome.
- “Push through and finish the migration; do not lose momentum.” Sunk cost fallacy. The right answer is to evaluate current state versus end-state honestly, not to push because you already started.
- “Fire the team and hire consultants.” Addresses neither the technical nor organizational root cause. The stall is usually caused by the migration plan being wrong, not the team being incompetent.
- “Goodbye Microservices: From 100s of problem children to 1 superstar” — Segment, 2018
- “Monolith Decomposition Patterns” by Sam Newman — covers when to stop as well as when to extract
- “The Majestic Monolith” — DHH’s essay on why modular monoliths are underrated
You are 6 months into extracting a customer service. You just realized the monolith and the new service are writing to overlapping customer fields in the same database. How bad is this and how do you fix it?
You are 6 months into extracting a customer service. You just realized the monolith and the new service are writing to overlapping customer fields in the same database. How bad is this and how do you fix it?
- Diagnose the severity. Are the fields disjoint per-row (service A owns customers 1-1000, service B owns 1001-2000) or interleaved (both services write to all customers)? Interleaved is the distributed monolith nightmare.
- Measure actual drift. Write a reconciliation job that compares writes over 48 hours. How often do conflicting writes occur? Is it 1 in 10,000 or 1 in 10?
- Identify the “unified owner.” For each overlapping field, decide which service is the source of truth going forward. Every write of that field from the other service becomes an API call instead.
- Introduce the ownership layer before removing duplicates. Both services route writes through a shared DB procedure, Kafka topic, or API gateway that enforces “only one writer per field.” You cannot stop duplicate writes instantaneously; you must funnel them through a chokepoint first.
- Plan the cleanup. Once all writes flow through the chokepoint and data is converging, remove the chokepoint and let the service own its fields directly.
- Pause new features in this area. New features that touch customer fields must wait until ownership is clear. Otherwise every new feature adds to the mess.
CustomerEmailWriter interface with two implementations: LegacyDirectWriter (current behavior) and ServiceApiWriter (calls new service). Step 3: switch the monolith to use the interface, controlled by feature flag. Step 4: ramp the feature flag from 0% to 100% while monitoring for drift. Step 5: remove LegacyDirectWriter and the flag. Same approach for every field that needs ownership migrated.CustomerEmailUpdated events, and consumers have to pick which to trust. The ownership discipline is still required. Event-driven architecture + clear aggregate ownership (only one service is allowed to emit a given event type) is the combination that prevents this problem. Events alone are not enough.- “Just use distributed transactions (2PC) to keep them in sync.” 2PC is poorly supported in modern microservice stacks, blocks on the slowest participant, and does not solve the ownership question. The problem is ownership, not synchronization.
- “The database will resolve it with last-write-wins.” Last-write-wins silently drops the losing write. If both writes are valid from their respective services’ perspectives, you are losing business data.
- “Saga Pattern” chapter in “Microservices Patterns” by Chris Richardson
- “Data Management in a Microservice Architecture” — Chris Richardson’s microservices.io site
- “Event-Driven Microservices” by Adam Bellemare — covers ownership patterns with events
Interview Deep-Dive
'Your company has a 500,000-line monolith. You have been tasked with leading the migration to microservices. What is your first step, and what is your 12-month plan?'
'Your company has a 500,000-line monolith. You have been tasked with leading the migration to microservices. What is your first step, and what is your 12-month plan?'
'Explain the Strangler Fig pattern in detail. How do you handle the data migration part, which is usually the hardest piece?'
'Explain the Strangler Fig pattern in detail. How do you handle the data migration part, which is usually the hardest piece?'
'What is Branch by Abstraction, and when would you use it instead of Strangler Fig?'
'What is Branch by Abstraction, and when would you use it instead of Strangler Fig?'