Guide

Strangler fig pattern explained

The billing monolith has run Harbor Payments for twelve years. Leadership wants event-driven invoicing, better audit trails, and a team that can ship weekly instead of quarterly. A greenfield rewrite sounds clean until someone asks who owns the eighteen-month feature freeze while customers still need refunds, tax updates, and PCI audits. The strangler fig pattern offers a third path: put a routing facade in front of the legacy system, build new capabilities beside it, and migrate traffic slice by slice until the old code can be retired. Named after a tropical vine that grows around a host tree and eventually replaces it, the pattern keeps production revenue flowing while you modernize. This guide explains how facade routing works, how to choose migration slices, pairing with expand-contract schema changes and bounded contexts, anti-patterns that kill strangler projects, a Harbor Payments billing modernization worked example, a modernization strategy decision table, common pitfalls, and a production checklist. Pair it with our microservices and API gateway guides when you wire the routing layer.

What the strangler fig pattern is

Martin Fowler coined the term to describe incremental replacement of a legacy application. Instead of swapping the entire system at once, you:

  1. Place a facade (reverse proxy, API gateway, or routing layer) between clients and backends.
  2. Route most requests to the legacy system unchanged.
  3. Implement one vertical slice in a new system and point matching requests to it.
  4. Repeat until legacy traffic shrinks to zero, then decommission the old stack.

The facade is the control plane. Product and engineering decide which URLs, API operations, or user cohorts hit new code. Everything else falls through to legacy. Users experience a single product; operators see two (or more) systems behind one hostname.

Strangler vs related approaches

  • Big-bang rewrite — build a parallel system, cut over on a date. High risk, long payback, feature parity pressure. Strangler avoids the freeze.
  • Branch by abstraction — introduce an interface inside the monolith and swap implementations. Strangler is the distributed cousin: abstraction lives at the network edge.
  • Lift-and-shift — move the same app to new infra. Strangler changes behavior and architecture, not just hosting.
  • Microservices extraction — often the destination of a strangler; the pattern is the journey, not the end state.

How facade routing works in practice

The facade can be an API gateway, nginx/Traefik reverse proxy, or a thin BFF service. Routing rules typically key off:

  • URL path or HTTP method/v2/invoices/* to new service; everything else to legacy.
  • Header or JWT claim — internal beta tenants first; expand by account tier.
  • Feature flag — gateway consults a flag service before choosing upstream (see our feature flags guide).
  • Percentage canary — 5% of POST /charges to new stack, 95% legacy; ramp on success metrics.

Critical requirement: bidirectional data consistency during overlap. If a customer updates a payment method in the new UI, legacy batch jobs must not overwrite it. Common tactics include dual writes (short term), event-sourced sync, or read-from-new / write-to-both during transition.

Choosing migration slices

Slice by business capability, not by technical layer. Good first slices are:

  • Net-new features with no legacy dependency (mobile wallet top-up).
  • Read-heavy paths with tolerable replication lag (invoice PDF download).
  • Isolated domains with clear bounded context boundaries (promo codes vs core ledger).

Poor first slices: cross-cutting auth, shared reference data every screen touches, or the settlement path that must be byte-identical with accounting. Save those for when routing, observability, and rollback are muscle memory.

Data migration and the expand-contract rhythm

Strangler migrations fail in the database more often than in the gateway. Follow expand-contract from our database migration strategies guide:

  1. Expand — add new columns/tables/services without breaking legacy readers.
  2. Dual-write or backfill — populate new store from legacy events or nightly jobs; reconcile discrepancies.
  3. Cut read traffic — facade sends reads to new DB once parity checks pass.
  4. Cut write traffic — new system becomes source of truth; legacy receives sync or goes read-only.
  5. Contract — drop legacy columns and code when traffic and data are fully migrated.

Never delete legacy tables while any route still references them. Maintain a migration dashboard: percent of requests per slice, error-rate delta, reconciliation lag, and rollback lever (flip routing rule in seconds).

Observability across two systems

Distributed traces must span facade, legacy, and new services. Propagate W3C traceparent headers at the gateway; tag spans with upstream=legacy|new so on-call can compare latency during ramps. Pair with structured logs and SLO dashboards from our OpenTelemetry tracing guide.

Worked example: Harbor Payments billing modernization

Harbor Payments runs a Java monolith (billing-core) on PostgreSQL. Product wants subscription proration, usage-based metering, and a self-serve customer portal. The strangler program starts with an Envoy gateway in front of api.harborpay.com, default route to the monolith.

Phase 1 — portal reads. A new Node service (billing-portal) serves GET /v1/invoices and GET /v1/payment-methods. Gateway routes those paths to the new service, which reads from a read replica fed by Debezium CDC events from the monolith database. Legacy still owns writes. Portal launches to 10% of SMB accounts via a header X-Billing-Stack: v2.

Phase 2 — payment method updates. Expand schema: add payment_methods_v2 table. New service handles PUT /v1/payment-methods/{id}; dual-writes to v2 table and legacy via an outbox pattern. Reconciliation job alerts on mismatch > 0.01%. Gateway ramps PUT traffic from 5% to 100% over two weeks with canary gates on 5xx rate.

Phase 3 — usage metering. Net-new capability: no legacy equivalent. billing-metering service ingests usage events from Kafka, aggregates daily, and exposes POST /v1/usage. Monolith unchanged. Finance signs off because metering never touched the old ledger.

Phase 4 — proration engine. Highest risk slice. New billing-proration service owns calculation; monolith still posts final GL entries via a compatibility API for three months. When month-end close matches to the cent for four consecutive cycles, proration routes go 100% new; legacy proration module is feature-flagged off, then deleted in the next quarter.

Eighteen months in, 92% of API traffic hits new services. The monolith runs batch settlement only. Harbor schedules its decommission after the last expand-contract migration removes shared invoice tables.

Modernization strategy decision table

Approach Best for Time to first value Risk profile Key requirement
Strangler fig Revenue-critical legacy, continuous delivery needed Weeks (first slice) Low per slice; managed cumulatively Facade routing, expand-contract discipline
Big-bang rewrite Small systems, provable feature freeze Months to years High cutover weekend risk Executive freeze, parity testing
Lift-and-shift Datacenter exit, minimal code change Months Medium (ops parity) Infra automation, network parity
In-place refactor Modular monolith with good tests Days per module Low if tests exist Comprehensive CI, module boundaries

Common pitfalls

  • Facade as afterthought — bolting routing on late means clients already bypass it with direct legacy URLs.
  • Slice too large — “migrate all checkout” is a rewrite wearing a strangler hat.
  • Ignoring data sync — dual systems with divergent customer records erode trust faster than slow delivery.
  • No rollback path — every ramp needs a one-click route revert without redeploy.
  • Permanent dual stack — without decommission deadlines, you pay for two systems forever.
  • Shared database trap — new services writing to legacy tables recreate monolith coupling; prefer owned schemas per slice.
  • Weak parity testing — shadow traffic comparing legacy vs new responses catches rounding and timezone bugs before cutover.

Production checklist

  • Document target architecture and explicit decommission date for legacy.
  • Deploy facade so 100% of client traffic passes through it.
  • Define slice boundaries with product using bounded-context language.
  • Implement expand-contract migrations; never big-bang ALTER on hot tables.
  • Run shadow or dark traffic comparing legacy vs new responses before ramp.
  • Instrument per-slice metrics: traffic %, error rate, p95 latency, sync lag.
  • Pair gateway routing with feature flags for instant rollback.
  • Maintain a reconciliation job and alert threshold for dual-write periods.
  • Require each slice to reduce legacy LOC or traffic before starting the next.
  • Schedule legacy code deletion in the same quarter traffic hits zero.

Key takeaways

  • Strangler fig modernizes incrementally — route by route, not all at once.
  • The facade is the product edge — invest in routing, auth, and observability there first.
  • Slice by business capability — small wins compound; avoid cross-cutting first moves.
  • Data migration is the hard part — expand-contract and reconciliation beat heroic cutover weekends.
  • Decommission deliberately — a strangler that never kills legacy is just expensive duplication.

Related reading