Legacy Monolith Migration: Strangler Fig and CDC
Every organization that has attempted a monolith migration has a version of this story. You spent months building the new system in parallel. The cutover window arrived. Late one evening. Pizza ordered. War room set up. A data sync issue surfaced that nobody could resolve fast enough. You rolled back early the next morning. That was years ago. You’re still on the monolith.
The family moved out. The house was demolished. The new one wasn’t ready. Everyone’s been in a hotel for two years.
DORA’s research shows that incremental migration strategies correlate with better delivery outcomes than big-bang approaches. The pattern is consistent enough that it barely qualifies as an opinion anymore.
- Big-bang cutovers kill monolith migrations. The war room, the pizza, the rollback. Strangler Fig eliminates the cutover risk entirely by routing traffic one room at a time.
- Extract at the routing boundary, not the code boundary. Intercept HTTP requests at the load balancer. Route specific paths to the new service. The monolith doesn’t know it’s being replaced.
- The database is the real bottleneck. Application code is stateless and replaceable. The shared database is what creates coupling that defeats the purpose of extraction.
- CDC keeps both databases synchronized during the transition. The new service writes to its own database while changes stream back to the monolith’s database until final cutover.
- Extract the highest-value, lowest-coupling domain first. Not the easiest. Not the hardest. The one that delivers measurable business impact with the fewest cross-domain foreign key relationships.
The monolith works. Ugly, slow to change, makes everybody nervous. But it processes real transactions for real customers every day. The new system is prettier, faster, and has never survived contact with production traffic. The old house is drafty. The new one has never seen winter. Betting the business on a weekend cutover from the thing that works to the thing that hasn’t been battle-tested is the gamble that kills migration projects.
The approach that reliably works treats migration as an incremental routing problem, not a cutover event. The Strangler Fig pattern combined with Change Data Capture eliminates the cutover risk entirely. Renovate one room at a time. The family never moves out.
Why Big-Bang Rewrites Fail
A big-bang rewrite means building the new system in isolation and attempting a heroic weekend cutover. Demolish the house. Rebuild from scratch. Move everyone back in by Monday. Three structural failure modes show up with remarkable consistency.
Feature freeze. The legacy system stops evolving while the new system catches up to feature parity. Business stakeholders experience a 6-12 month pause in product development. The family can’t use the kitchen for a year while it’s being rebuilt. The business doesn’t care about your architecture. The business cares that it can’t ship features.
Data consistency risk. At the moment of cutover, any data mismatch or schema problem causes production bugs. With databases holding tens of millions of rows and thousands of transactions per hour, mismatches aren’t a risk. They’re a certainty. Finding and fixing them under a live cutover deadline is the scenario most post-mortems identify as the primary failure cause. Moving day. The furniture doesn’t fit through the new doors. You have 12 hours to fix it.
Irreversible rollback. If the new system fails under real traffic, rolling back means accepting that all data written during the cutover window is at risk. You choose between a broken new system and potentially lost transactions. Neither option is acceptable. Both are what you get.
Don’t: Build the new system in complete isolation for 12 months, then attempt a weekend cutover. The new system has never handled production traffic patterns, edge cases, or the years of undocumented behavior the monolith handles by accident. A house that’s never seen weather. The cutover fails because the system hasn’t been tested against reality.
Do: Route production traffic to the new service gradually, starting with reads. The new system proves itself under real load before it handles a single write. Rollback at any point is a gateway config change. Use the new kitchen’s fridge while still cooking on the old stove. If the fridge breaks, the old one is right there.
The Incremental Migration, Phase by Phase
Instead of a hard cutover, route traffic gradually from the monolith to new microservices, one domain at a time. The Strangler Fig pattern combined with data engineering pipelines keeps data synchronized throughout. One room at a time. Plumbing connected the whole way.
| Phase | Duration | What Happens | Rollback |
|---|---|---|---|
| 1. CDC streaming | 1-2 weeks | Stream changes from monolith DB to event bus | Remove CDC, no impact |
| 2. Domain overlay | 2-4 weeks | New service reads from event stream, builds own state | Stop consumer, no impact |
| 3. Read routing | 1-2 weeks | Route read traffic to new service, writes stay on monolith | Revert gateway route |
| 4. Write cutover | 1-2 weeks | New service handles writes, bidirectional sync to monolith | Revert to monolith writes |
| 5. Decommission | 2-4 weeks | Monitor, then shut down monolith for this domain | Re-enable monolith (sync still running) |
- API gateway or load balancer able to do path-based routing with dynamic configuration
- Legacy database supports CDC (PostgreSQL WAL, MySQL binlog, or equivalent transaction log)
- Event streaming infrastructure (Kafka or equivalent) deployed and running
- Domain boundaries mapped, including cross-domain foreign key dependencies
- Monitoring confirms CDC lag stays under 2 seconds during normal and peak load
Phase 1: Capturing Continuous Changes
A CDC connector attaches to the legacy database’s replication log. Debezium reads PostgreSQL’s WAL or MySQL’s binlog, streaming every insert, update, and delete into Kafka. No changes to legacy code. No schema modifications. No meaningful performance impact. The monolith doesn’t know the migration is happening. A pipe connecting the old plumbing to the new. The old house doesn’t feel a thing.
This is the phase where teams discover how much implicit domain knowledge lives in the schema. Billing isn’t cleanly separated from users. Foreign keys cross domain boundaries. A “customer” record is referenced by billing, shipping, user management, and analytics tables. Load-bearing walls where you expected partition walls. Mapping these dependencies is the real work of Phase 1, and it determines everything that follows. Don’t rush it. The dependency map is the most valuable artifact of the entire migration.
Phase 2: Building the Domain Overlay
The new microservice starts with its database populated entirely by the CDC event stream. A perfectly synchronized replica of the billing domain, updating in near real-time. No users affected. No risk taken. The new kitchen exists. The fridge is stocked. Nobody’s cooking in it yet. The service exists to prove the data transformation is correct and the new schema handles every edge case the legacy schema contains.
You find bugs here without anyone noticing. And there will be bugs. The “address” field storing street addresses and P.O. boxes in the same column. The “status” field with 14 values where documentation lists 5. Discovering the old house has wiring nobody documented. Better to find them now than during a cutover with the war room watching.
Phase 3: Routing Reads, Then Writes
Once monitoring confirms the new service’s data is synchronized (typically within 1-2 seconds of the legacy database), the API gateway begins routing read requests to the new service. The legacy monolith still handles all writes, which sync to the new service via CDC. Using the new kitchen’s fridge. Still cooking on the old stove. This read-first approach from legacy modernization practice cuts risk sharply. If something looks wrong, reverting read traffic back to the legacy system is a single gateway config change. One command. Instant.
When the read path proves stable under production load for 2-4 weeks, the gateway routes write operations to the new service. This phase requires nerve. The new service publishes state changes back to the event stream, keeping the legacy database synchronized. Cooking on the new stove. Old one still connected. This bidirectional sync means you can revert writes back to the legacy path at any point without data loss. Only after 30+ days of stable write operation do you decommission the legacy component for that domain.
Then you start the next room.
Choosing Which Domain to Extract First
Extract the domain with the highest business value and lowest database coupling first. Not the easiest domain. Not the most technically interesting. The one where independent deployability delivers measurable business impact fast enough to fund the harder extractions later. Renovate the kitchen first, not the guest bedroom. The kitchen is where the family spends the most time.
Domain analysis consistently surfaces 2-4 unknown cross-domain dependencies that would have caused integrity bugs in any approach. Finding them during analysis is enlightening. Finding them during a cutover is catastrophic. (Discovering the load-bearing wall after the sledgehammer.)
| Approach | Effort | Risk | Business Value |
|---|---|---|---|
| Start with easiest domain | Low | Low but deceptive (hard domains remain) | Low (proves the pattern but doesn’t justify continued investment) |
| Start with highest-value domain | Medium | Medium (reduced by CDC rollback) | High (justifies the remaining migration budget) |
| Start with most-coupled domain | High | High | Unpredictable (may stall the entire initiative) |
Timeline is real but manageable. First domain: 6-8 weeks including CDC setup (reused for later domains). Each additional domain: 4-12 weeks depending on complexity. Full migration of a medium monolith (200-500K lines, 6-10 domains): 12-18 months. Roughly the same timeline as a big-bang rewrite, with one critical difference: production value from week 6 instead of month 14. The family uses the new kitchen while the bathroom renovation is still in progress.
Handling cross-domain foreign keys during extraction
The most common stall point is shared tables. A customers table referenced by billing, shipping, user management, and analytics through foreign keys can’t be cleanly extracted into any single domain. Load-bearing walls that touch every room. The practical solution: extract the customer record into a shared service with a read-only API, then replace direct foreign key references in each domain with API calls or event-driven denormalization. This adds latency on reads but eliminates the coupling that blocks all later extractions. Accept the trade-off early or accept the stall later. Reroute the plumbing before you tear out the walls.
What the Industry Gets Wrong About Monolith Migration
“Microservices are the goal.” Microservices are a means. The goal is independent deployability, team autonomy, and the ability to scale specific components independently. If two domains still share a database after extraction, you have microservices with monolith coupling. Separate rooms with shared plumbing. The architecture looks modern. The operational reality hasn’t changed.
“Rewrite in parallel, cut over on a weekend.” The weekend cutover fails because the new system hasn’t handled production traffic patterns, edge cases, and the years of undocumented behavior the monolith handles by accident. The new house has never seen winter. Incremental traffic shifting with CDC eliminates the gamble entirely.
“Start with the easiest domain to prove the pattern.” The easiest domain proves the pattern but doesn’t justify the investment. Renovating the coat closet to prove you can renovate. When the budget review comes after the first extraction, a domain that saved 10 minutes of deployment time is a harder sell than one that enabled independent scaling of the checkout flow during peak traffic.
DevOps and cloud migration teams handle the compliance requirements for running parallel systems during the transition.
Same monolith. Same team. But instead of a war room and a rollback at dawn, traffic shifts 5% at a time through the gateway. No cutover window. No all-or-nothing bet. The last legacy domain gets decommissioned on an ordinary afternoon and nobody even notices. The family never moved out. The renovation finished room by room. That’s how migration is supposed to feel.