Guide
Strangler fig pattern explained
The billing monolith has run Harbor Payments for twelve years. Leadership wants event-driven invoicing, better audit trails, and a team that can ship weekly instead of quarterly. A greenfield rewrite sounds clean until someone asks who owns the eighteen-month feature freeze while customers still need refunds, tax updates, and PCI audits. The strangler fig pattern offers a third path: put a routing facade in front of the legacy system, build new capabilities beside it, and migrate traffic slice by slice until the old code can be retired. Named after a tropical vine that grows around a host tree and eventually replaces it, the pattern keeps production revenue flowing while you modernize. This guide explains how facade routing works, how to choose migration slices, pairing with expand-contract schema changes and bounded contexts, anti-patterns that kill strangler projects, a Harbor Payments billing modernization worked example, a modernization strategy decision table, common pitfalls, and a production checklist. Pair it with our microservices and API gateway guides when you wire the routing layer.
What the strangler fig pattern is
Martin Fowler coined the term to describe incremental replacement of a legacy application. Instead of swapping the entire system at once, you:
- Place a facade (reverse proxy, API gateway, or routing layer) between clients and backends.
- Route most requests to the legacy system unchanged.
- Implement one vertical slice in a new system and point matching requests to it.
- Repeat until legacy traffic shrinks to zero, then decommission the old stack.
The facade is the control plane. Product and engineering decide which URLs, API operations, or user cohorts hit new code. Everything else falls through to legacy. Users experience a single product; operators see two (or more) systems behind one hostname.
Strangler vs related approaches
- Big-bang rewrite — build a parallel system, cut over on a date. High risk, long payback, feature parity pressure. Strangler avoids the freeze.
- Branch by abstraction — introduce an interface inside the monolith and swap implementations. Strangler is the distributed cousin: abstraction lives at the network edge.
- Lift-and-shift — move the same app to new infra. Strangler changes behavior and architecture, not just hosting.
- Microservices extraction — often the destination of a strangler; the pattern is the journey, not the end state.
How facade routing works in practice
The facade can be an API gateway, nginx/Traefik reverse proxy, or a thin BFF service. Routing rules typically key off:
- URL path or HTTP method —
/v2/invoices/*to new service; everything else to legacy. - Header or JWT claim — internal beta tenants first; expand by account tier.
- Feature flag — gateway consults a flag service before choosing upstream (see our feature flags guide).
- Percentage canary — 5% of
POST /chargesto new stack, 95% legacy; ramp on success metrics.
Critical requirement: bidirectional data consistency during overlap. If a customer updates a payment method in the new UI, legacy batch jobs must not overwrite it. Common tactics include dual writes (short term), event-sourced sync, or read-from-new / write-to-both during transition.
Choosing migration slices
Slice by business capability, not by technical layer. Good first slices are:
- Net-new features with no legacy dependency (mobile wallet top-up).
- Read-heavy paths with tolerable replication lag (invoice PDF download).
- Isolated domains with clear bounded context boundaries (promo codes vs core ledger).
Poor first slices: cross-cutting auth, shared reference data every screen touches, or the settlement path that must be byte-identical with accounting. Save those for when routing, observability, and rollback are muscle memory.
Data migration and the expand-contract rhythm
Strangler migrations fail in the database more often than in the gateway. Follow expand-contract from our database migration strategies guide:
- Expand — add new columns/tables/services without breaking legacy readers.
- Dual-write or backfill — populate new store from legacy events or nightly jobs; reconcile discrepancies.
- Cut read traffic — facade sends reads to new DB once parity checks pass.
- Cut write traffic — new system becomes source of truth; legacy receives sync or goes read-only.
- Contract — drop legacy columns and code when traffic and data are fully migrated.
Never delete legacy tables while any route still references them. Maintain a migration dashboard: percent of requests per slice, error-rate delta, reconciliation lag, and rollback lever (flip routing rule in seconds).
Observability across two systems
Distributed traces must span facade, legacy, and new services. Propagate W3C
traceparent headers at the gateway; tag spans with
upstream=legacy|new so on-call can compare latency during ramps.
Pair with structured logs and SLO dashboards from our
OpenTelemetry
tracing guide.
Worked example: Harbor Payments billing modernization
Harbor Payments runs a Java monolith (billing-core) on PostgreSQL.
Product wants subscription proration, usage-based metering, and a self-serve
customer portal. The strangler program starts with an Envoy gateway in front of
api.harborpay.com, default route to the monolith.
Phase 1 — portal reads. A new Node service
(billing-portal) serves GET /v1/invoices and
GET /v1/payment-methods. Gateway routes those paths to the new
service, which reads from a read replica fed by Debezium CDC events from the
monolith database. Legacy still owns writes. Portal launches to 10% of SMB
accounts via a header X-Billing-Stack: v2.
Phase 2 — payment method updates. Expand schema: add
payment_methods_v2 table. New service handles
PUT /v1/payment-methods/{id}; dual-writes to v2 table and legacy
via an outbox pattern. Reconciliation job alerts on mismatch > 0.01%.
Gateway ramps PUT traffic from 5% to 100% over two weeks with
canary
gates on 5xx rate.
Phase 3 — usage metering. Net-new capability: no legacy
equivalent. billing-metering service ingests usage events from
Kafka, aggregates daily, and exposes POST /v1/usage. Monolith
unchanged. Finance signs off because metering never touched the old ledger.
Phase 4 — proration engine. Highest risk slice. New
billing-proration service owns calculation; monolith still posts
final GL entries via a compatibility API for three months. When month-end close
matches to the cent for four consecutive cycles, proration routes go 100% new;
legacy proration module is feature-flagged off, then deleted in the next
quarter.
Eighteen months in, 92% of API traffic hits new services. The monolith runs batch settlement only. Harbor schedules its decommission after the last expand-contract migration removes shared invoice tables.
Modernization strategy decision table
| Approach | Best for | Time to first value | Risk profile | Key requirement |
|---|---|---|---|---|
| Strangler fig | Revenue-critical legacy, continuous delivery needed | Weeks (first slice) | Low per slice; managed cumulatively | Facade routing, expand-contract discipline |
| Big-bang rewrite | Small systems, provable feature freeze | Months to years | High cutover weekend risk | Executive freeze, parity testing |
| Lift-and-shift | Datacenter exit, minimal code change | Months | Medium (ops parity) | Infra automation, network parity |
| In-place refactor | Modular monolith with good tests | Days per module | Low if tests exist | Comprehensive CI, module boundaries |
Common pitfalls
- Facade as afterthought — bolting routing on late means clients already bypass it with direct legacy URLs.
- Slice too large — “migrate all checkout” is a rewrite wearing a strangler hat.
- Ignoring data sync — dual systems with divergent customer records erode trust faster than slow delivery.
- No rollback path — every ramp needs a one-click route revert without redeploy.
- Permanent dual stack — without decommission deadlines, you pay for two systems forever.
- Shared database trap — new services writing to legacy tables recreate monolith coupling; prefer owned schemas per slice.
- Weak parity testing — shadow traffic comparing legacy vs new responses catches rounding and timezone bugs before cutover.
Production checklist
- Document target architecture and explicit decommission date for legacy.
- Deploy facade so 100% of client traffic passes through it.
- Define slice boundaries with product using bounded-context language.
- Implement expand-contract migrations; never big-bang ALTER on hot tables.
- Run shadow or dark traffic comparing legacy vs new responses before ramp.
- Instrument per-slice metrics: traffic %, error rate, p95 latency, sync lag.
- Pair gateway routing with feature flags for instant rollback.
- Maintain a reconciliation job and alert threshold for dual-write periods.
- Require each slice to reduce legacy LOC or traffic before starting the next.
- Schedule legacy code deletion in the same quarter traffic hits zero.
Key takeaways
- Strangler fig modernizes incrementally — route by route, not all at once.
- The facade is the product edge — invest in routing, auth, and observability there first.
- Slice by business capability — small wins compound; avoid cross-cutting first moves.
- Data migration is the hard part — expand-contract and reconciliation beat heroic cutover weekends.
- Decommission deliberately — a strangler that never kills legacy is just expensive duplication.
Related reading
- Microservices architecture explained — service boundaries and communication patterns
- Database migration strategies explained — zero-downtime expand-contract schema changes
- API gateway explained — edge routing, auth, and rate limits
- Blue-green and canary deployments explained — safe traffic ramps per slice