← Back to Insights

Legacy Monolith Migration: Strangler Fig and CDC

Metasphere Engineering 13 min read

Every organization that has attempted a monolith migration has a version of this story. You spent months building the new system in parallel. The cutover window arrived. Late one evening. Pizza ordered. War room set up. A data sync issue surfaced that nobody could resolve fast enough. You rolled back early the next morning. That was years ago. You’re still on the monolith.

The family moved out. The house was demolished. The new one wasn’t ready. Everyone’s been in a hotel for two years.

DORA’s research shows that incremental migration strategies correlate with better delivery outcomes than big-bang approaches. The pattern is consistent enough that it barely qualifies as an opinion anymore.

Key takeaways
  • Big-bang cutovers kill monolith migrations. The war room, the pizza, the rollback. Strangler Fig eliminates the cutover risk entirely by routing traffic one room at a time.
  • Extract at the routing boundary, not the code boundary. Intercept HTTP requests at the load balancer. Route specific paths to the new service. The monolith doesn’t know it’s being replaced.
  • The database is the real bottleneck. Application code is stateless and replaceable. The shared database is what creates coupling that defeats the purpose of extraction.
  • CDC keeps both databases synchronized during the transition. The new service writes to its own database while changes stream back to the monolith’s database until final cutover.
  • Extract the highest-value, lowest-coupling domain first. Not the easiest. Not the hardest. The one that delivers measurable business impact with the fewest cross-domain foreign key relationships.

The monolith works. Ugly, slow to change, makes everybody nervous. But it processes real transactions for real customers every day. The new system is prettier, faster, and has never survived contact with production traffic. The old house is drafty. The new one has never seen winter. Betting the business on a weekend cutover from the thing that works to the thing that hasn’t been battle-tested is the gamble that kills migration projects.

The approach that reliably works treats migration as an incremental routing problem, not a cutover event. The Strangler Fig pattern combined with Change Data Capture eliminates the cutover risk entirely. Renovate one room at a time. The family never moves out.

Strangler Fig pattern showing incremental module extraction from a monolith with rollback safetyAnimated diagram showing a monolith with five modules. One module is extracted via CDC stream to a new microservice. Traffic shifts from monolith to new service through an API gateway. A rollback badge shows safety at every step. The pattern then hints at repeating for additional modules.Strangler Fig: Incremental Module ExtractionAPIGatewayAll trafficMonolithModule A: OrdersModule B: UsersModule C: BillingModule D: InventoryModule E: ReportsCDC StreamOrders ServiceNew microserviceOwn databaseRead traffic shiftsRollback: instantWrite traffic shiftsSyncModule A: RetiredMigration completeUsers ServiceNext extraction...CDC StreamSafe at every stepEach module migrates independently. Rollback is instant.

Why Big-Bang Rewrites Fail

A big-bang rewrite means building the new system in isolation and attempting a heroic weekend cutover. Demolish the house. Rebuild from scratch. Move everyone back in by Monday. Three structural failure modes show up with remarkable consistency.

Feature freeze. The legacy system stops evolving while the new system catches up to feature parity. Business stakeholders experience a 6-12 month pause in product development. The family can’t use the kitchen for a year while it’s being rebuilt. The business doesn’t care about your architecture. The business cares that it can’t ship features.

Data consistency risk. At the moment of cutover, any data mismatch or schema problem causes production bugs. With databases holding tens of millions of rows and thousands of transactions per hour, mismatches aren’t a risk. They’re a certainty. Finding and fixing them under a live cutover deadline is the scenario most post-mortems identify as the primary failure cause. Moving day. The furniture doesn’t fit through the new doors. You have 12 hours to fix it.

Irreversible rollback. If the new system fails under real traffic, rolling back means accepting that all data written during the cutover window is at risk. You choose between a broken new system and potentially lost transactions. Neither option is acceptable. Both are what you get.

Why big-bang rewrites fail: timeline comparisonBig-bang rewrite: 18 months, zero value delivered until launch, high risk of cancellation. Incremental strangler: value delivered every 2 weeks, risk contained to each extraction, can stop at any point and still have working software.Big-Bang Rewrite vs Incremental StranglerBig-Bang RewriteStart18 monthsZero value until launch dayRequirements drift the whole time70% get cancelled before completionAll-or-nothing gambleIncremental StranglerValue every 2 weeksCan stop anytime with working softwareRisk contained to each extractionContinuous delivery of valueThe rewrite that ships nothing for 18 months ships nothing ever.
Anti-pattern

Don’t: Build the new system in complete isolation for 12 months, then attempt a weekend cutover. The new system has never handled production traffic patterns, edge cases, or the years of undocumented behavior the monolith handles by accident. A house that’s never seen weather. The cutover fails because the system hasn’t been tested against reality.

Do: Route production traffic to the new service gradually, starting with reads. The new system proves itself under real load before it handles a single write. Rollback at any point is a gateway config change. Use the new kitchen’s fridge while still cooking on the old stove. If the fridge breaks, the old one is right there.

The Incremental Migration, Phase by Phase

Instead of a hard cutover, route traffic gradually from the monolith to new microservices, one domain at a time. The Strangler Fig pattern combined with data engineering pipelines keeps data synchronized throughout. One room at a time. Plumbing connected the whole way.

PhaseDurationWhat HappensRollback
1. CDC streaming1-2 weeksStream changes from monolith DB to event busRemove CDC, no impact
2. Domain overlay2-4 weeksNew service reads from event stream, builds own stateStop consumer, no impact
3. Read routing1-2 weeksRoute read traffic to new service, writes stay on monolithRevert gateway route
4. Write cutover1-2 weeksNew service handles writes, bidirectional sync to monolithRevert to monolith writes
5. Decommission2-4 weeksMonitor, then shut down monolith for this domainRe-enable monolith (sync still running)
Prerequisites
  1. API gateway or load balancer able to do path-based routing with dynamic configuration
  2. Legacy database supports CDC (PostgreSQL WAL, MySQL binlog, or equivalent transaction log)
  3. Event streaming infrastructure (Kafka or equivalent) deployed and running
  4. Domain boundaries mapped, including cross-domain foreign key dependencies
  5. Monitoring confirms CDC lag stays under 2 seconds during normal and peak load

Phase 1: Capturing Continuous Changes

A CDC connector attaches to the legacy database’s replication log. Debezium reads PostgreSQL’s WAL or MySQL’s binlog, streaming every insert, update, and delete into Kafka. No changes to legacy code. No schema modifications. No meaningful performance impact. The monolith doesn’t know the migration is happening. A pipe connecting the old plumbing to the new. The old house doesn’t feel a thing.

This is the phase where teams discover how much implicit domain knowledge lives in the schema. Billing isn’t cleanly separated from users. Foreign keys cross domain boundaries. A “customer” record is referenced by billing, shipping, user management, and analytics tables. Load-bearing walls where you expected partition walls. Mapping these dependencies is the real work of Phase 1, and it determines everything that follows. Don’t rush it. The dependency map is the most valuable artifact of the entire migration.

Phase 2: Building the Domain Overlay

The new microservice starts with its database populated entirely by the CDC event stream. A perfectly synchronized replica of the billing domain, updating in near real-time. No users affected. No risk taken. The new kitchen exists. The fridge is stocked. Nobody’s cooking in it yet. The service exists to prove the data transformation is correct and the new schema handles every edge case the legacy schema contains.

You find bugs here without anyone noticing. And there will be bugs. The “address” field storing street addresses and P.O. boxes in the same column. The “status” field with 14 values where documentation lists 5. Discovering the old house has wiring nobody documented. Better to find them now than during a cutover with the war room watching.

Phase 3: Routing Reads, Then Writes

Once monitoring confirms the new service’s data is synchronized (typically within 1-2 seconds of the legacy database), the API gateway begins routing read requests to the new service. The legacy monolith still handles all writes, which sync to the new service via CDC. Using the new kitchen’s fridge. Still cooking on the old stove. This read-first approach from legacy modernization practice cuts risk sharply. If something looks wrong, reverting read traffic back to the legacy system is a single gateway config change. One command. Instant.

When the read path proves stable under production load for 2-4 weeks, the gateway routes write operations to the new service. This phase requires nerve. The new service publishes state changes back to the event stream, keeping the legacy database synchronized. Cooking on the new stove. Old one still connected. This bidirectional sync means you can revert writes back to the legacy path at any point without data loss. Only after 30+ days of stable write operation do you decommission the legacy component for that domain.

Then you start the next room.

Strangler Fig: API Gateway Routes Traffic ProgressivelyStrangler Fig: Progressive Domain ExtractionAPI Gateway (Facade)Routes by domain, not by serviceMonolithBilling, orders, auth still hereTraffic decreasing over timeUser Service (extracted)100% user traffic routed hereIndependently deployableCatalog Service (in progress)Shadow traffic validationGradually shifting from monolithExtract one domain. Validate. Route traffic. Repeat. The monolith shrinks naturally.

Choosing Which Domain to Extract First

The Strangler Stall The point where incremental migration stops making progress because the remaining domains are deeply coupled through shared database tables, foreign keys, and transaction boundaries. The easy rooms are renovated. The remaining ones share load-bearing walls. The easy extractions are done. The hard ones require database decoupling that is architecturally complex and politically difficult. Most stalled migrations are stuck here because the first extraction was chosen for ease, not impact.

Extract the domain with the highest business value and lowest database coupling first. Not the easiest domain. Not the most technically interesting. The one where independent deployability delivers measurable business impact fast enough to fund the harder extractions later. Renovate the kitchen first, not the guest bedroom. The kitchen is where the family spends the most time.

Domain analysis consistently surfaces 2-4 unknown cross-domain dependencies that would have caused integrity bugs in any approach. Finding them during analysis is enlightening. Finding them during a cutover is catastrophic. (Discovering the load-bearing wall after the sledgehammer.)

ApproachEffortRiskBusiness Value
Start with easiest domainLowLow but deceptive (hard domains remain)Low (proves the pattern but doesn’t justify continued investment)
Start with highest-value domainMediumMedium (reduced by CDC rollback)High (justifies the remaining migration budget)
Start with most-coupled domainHighHighUnpredictable (may stall the entire initiative)

Timeline is real but manageable. First domain: 6-8 weeks including CDC setup (reused for later domains). Each additional domain: 4-12 weeks depending on complexity. Full migration of a medium monolith (200-500K lines, 6-10 domains): 12-18 months. Roughly the same timeline as a big-bang rewrite, with one critical difference: production value from week 6 instead of month 14. The family uses the new kitchen while the bathroom renovation is still in progress.

CDC Pipeline: Keep Legacy and New in SyncCDC Pipeline: Legacy and New Stay in SyncLegacy DBOracle / SQL ServerStill receiving writesDebezium CDCCaptures every changeTransaction log streamingSub-second latencyKafkaChange events streamOrdered, durableNew Service DBPostgreSQL / AuroraAlways current with legacyCDC means you never stop the legacy system. Both databases stay in sync until cutover.
Handling cross-domain foreign keys during extraction

The most common stall point is shared tables. A customers table referenced by billing, shipping, user management, and analytics through foreign keys can’t be cleanly extracted into any single domain. Load-bearing walls that touch every room. The practical solution: extract the customer record into a shared service with a read-only API, then replace direct foreign key references in each domain with API calls or event-driven denormalization. This adds latency on reads but eliminates the coupling that blocks all later extractions. Accept the trade-off early or accept the stall later. Reroute the plumbing before you tear out the walls.

What the Industry Gets Wrong About Monolith Migration

“Microservices are the goal.” Microservices are a means. The goal is independent deployability, team autonomy, and the ability to scale specific components independently. If two domains still share a database after extraction, you have microservices with monolith coupling. Separate rooms with shared plumbing. The architecture looks modern. The operational reality hasn’t changed.

“Rewrite in parallel, cut over on a weekend.” The weekend cutover fails because the new system hasn’t handled production traffic patterns, edge cases, and the years of undocumented behavior the monolith handles by accident. The new house has never seen winter. Incremental traffic shifting with CDC eliminates the gamble entirely.

“Start with the easiest domain to prove the pattern.” The easiest domain proves the pattern but doesn’t justify the investment. Renovating the coat closet to prove you can renovate. When the budget review comes after the first extraction, a domain that saved 10 minutes of deployment time is a harder sell than one that enabled independent scaling of the checkout flow during peak traffic.

Our take Extract the domain with the highest business value and lowest database coupling first. Early wins fund the harder extractions later. A migration that delivers measurable business impact in week 6 gets more budget and organizational patience than one that promises value in month 14. Renovate the kitchen first. It’s where the family lives.

DevOps and cloud migration teams handle the compliance requirements for running parallel systems during the transition.

Same monolith. Same team. But instead of a war room and a rollback at dawn, traffic shifts 5% at a time through the gateway. No cutover window. No all-or-nothing bet. The last legacy domain gets decommissioned on an ordinary afternoon and nobody even notices. The family never moved out. The renovation finished room by room. That’s how migration is supposed to feel.

Migrate Without the Big-Bang Panic

Stop hoping the weekend cutover goes perfectly. Zero-downtime migration with event streaming and incremental rollouts keeps the legacy system running throughout. Rollback capability at every stage means a failed extraction is a learning experience, not a disaster.

Plan Your Safe Migration

Frequently Asked Questions

Why are database migrations the hardest part of monolith modernization?

+

Databases contain state that can’t be freely rolled back like stateless application code. A production database collecting thousands of transactions per hour means even a 2-hour cutover window risks thousands of unrecoverable writes if something goes wrong. CDC-based migrations keep both systems synchronized all the time, making rollback instant at any stage.

How does Change Data Capture differ from a database backup?

+

A backup is a static snapshot, typically 8-24 hours stale by the time it’s used. CDC continuously reads the database’s transaction log, streaming every change as it happens with sub-second latency. CDC-based synchronization keeps the target database within 1-2 seconds of the source, allowing safe parallel operation without taking the source offline.

What is the Strangler Fig pattern for legacy modernization?

+

The Strangler Fig builds new services around the edges of a monolith, routing traffic to them one domain at a time via an API gateway. Each domain migration takes 4-12 weeks and is independently reversible. The monolith shrinks over 12-18 months rather than being replaced in a single weekend. Each room gets renovated while the family stays in the house.

Can the legacy system keep running while the new system is being built?

+

Yes. The CDC pipeline reads from the legacy database’s transaction log without impacting performance, allowing normal business operations throughout. No feature freeze and no big-bang cutover. Business stakeholders never experience a development pause while the migration progresses domain by domain.

What happens if the new service fails during the migration?

+

Because the API gateway acts as a dynamic router and data stays synchronized between both systems via CDC, the gateway can immediately reroute traffic back to the legacy system. Instant rollback that big-bang rewrites can’t offer. The legacy system always has a current, consistent copy of the data because the CDC pipeline never stops running.