Guide

Blue-green and canary deployments explained

Shipping code to production used to mean a maintenance window: stop the servers, copy files, pray, restart. Modern teams expect the opposite — deploy dozens of times a day with no visible outage. Two strategies make that possible: blue-green (run two full environments and flip traffic) and canary (send a small slice of real users to the new version first). Both reduce blast radius when something breaks, but they solve different problems. This guide explains how each works, when to pick one over rolling updates, how databases and stateful services complicate the picture, and how to pair deployments with feature flags, load balancers, and observability for safe rollouts.

Why deployment strategy matters

A bad deploy is not just downtime — it is corrupted data, angry users, and a 3 a.m. rollback under pressure. The goal of progressive delivery is to detect failure while you can still reverse it with minimal customer impact. Rolling updates (replace instances one at a time) are the default in Kubernetes and many PaaS platforms, but they mix old and new code simultaneously. That is fine for backward-compatible changes; it is dangerous when schema migrations, API contract changes, or cache invalidation bugs can corrupt state mid-rollout.

Blue-green and canary deployments add explicit traffic control: you decide exactly which version serves which requests, for how long, and under what success criteria. The trade-off is infrastructure cost (two environments for blue-green) or operational complexity (metric gates and automated rollback for canaries). Teams that master both ship faster because deploy anxiety drops — a failed release becomes a routing change, not a fire drill.

Blue-green deployment: two environments, one switch

In a blue-green setup you maintain two identical production environments. Conventionally blue is live and green is idle, though the labels are arbitrary. You deploy the new release to green while blue continues serving all traffic. Smoke tests and synthetic checks run against green. When green passes health gates, you flip the router — load balancer, DNS, or API gateway — so 100% of traffic hits green. Blue becomes the standby; if green misbehaves, flip back instantly.

How the cutover works

The switch point is usually a reverse proxy or cloud load balancer upstream pool swap. Nginx upstream blocks, AWS ALB target groups, or an API gateway route table can point api.example.com from blue instances to green instances in seconds. DNS-based blue-green is slower (TTL propagation) but works for global traffic. The critical property: cutover is atomic at the routing layer — users either see the old build or the new build, not a random mix during the flip.

Strengths and costs

Blue-green gives the fastest rollback (re-flip the router) and a clean pre-production environment that mirrors prod sizing. The downside is double resource cost while both stacks run. For large fleets that means temporarily provisioning another full cluster. It also does not gradually test under real load — green gets synthetic traffic until cutover, then everyone arrives at once. Pair blue-green with a short canary phase (route 5% to green first) if you need both instant rollback and gradual validation.

# Conceptual nginx upstream swap
upstream api_live {
  server green-1.internal:8080;
  server green-2.internal:8080;
}
# Previously pointed at blue-* ; reload nginx to cut over

Canary deployment: test on real traffic, then expand

A canary release routes a small percentage of production traffic to the new version while the majority stays on stable. The name comes from coal miners who carried caged canaries into tunnels — if the bird stopped singing, toxic gas was present. In software, if error rates spike, latency p99 doubles, or checkout conversion drops on the canary slice, you abort and drain the new version before promoting it.

Progressive traffic shifting

Typical progression: 1% → 5% → 25% → 50% → 100%, with automated gates between steps. Each gate compares canary metrics against the baseline (control) cohort over a minimum observation window — often 15–60 minutes depending on traffic volume. Low-traffic services need longer windows or synthetic load; high-traffic APIs can detect regressions in minutes. Service meshes (Istio, Linkerd) and managed platforms (Argo Rollouts, Flagger, AWS App Mesh) automate weight changes; simpler setups use load balancer weighted target groups or Envoy route rules.

What to measure

Error rate and latency are table stakes. Add business metrics when possible: payment success rate, signup completion, search zero-result rate. A canary can look healthy on HTTP 500 counts while silently breaking checkout because a new tax-calculation bug only triggers for EU users. Segment canary analysis by region, device, and customer tier. Log and trace sampling should tag requests with deployment_version so you can diff failures without guessing which build caused them.

# Weighted routing (conceptual)
route:
  - match: headers["x-canary"] == "true"  # internal testers
    destination: v2
    weight: 100
  - destination: v2
    weight: 5    # 5% production canary
  - destination: v1
    weight: 95

Comparing rollout strategies

Strategy	Traffic pattern	Rollback speed	Infra cost	Best for
Rolling update	Gradual instance replacement; mixed versions during rollout	Slow (must roll back remaining instances)	Low	Stateless, backward-compatible patches
Blue-green	0% then 100% cutover	Instant (router flip)	High (two full stacks)	Major releases, instant rollback requirement
Canary	1–100% progressive shift	Fast (drain canary weight to 0)	Medium (partial duplicate capacity)	High-risk changes, metric-sensitive apps
Feature flags	Code deployed everywhere; behavior toggled per user	Instant (disable flag)	Low (flag service overhead)	Long-lived experiments, A/B tests, kill switches

These strategies compose. A common pattern: deploy v2 behind a feature flag (dark launch), enable the flag for internal staff, run a 5% canary on real traffic, then blue-green cutover the remaining weight. Flags decouple deployment (code on servers) from release (users see the change).

Stateful services and database migrations

Stateless APIs are easy to blue-green. Databases are not. If v2 expects a new status column that v1 does not write, a simultaneous mix breaks reads. The fix is expand-contract migrations: deploy schema changes that are backward compatible first (add nullable column), ship v1 code that writes both old and new fields, deploy v2 that reads the new field, then remove the old column in a later release. Never deploy breaking schema and code in the same atomic flip unless you accept downtime.

Session stores, WebSocket connections, and uploaded files add friction. Blue-green cutover drops in-flight WebSocket sessions unless you drain connections gracefully — stop sending new sessions to blue, wait for existing ones to finish, then switch. Shared caches (Redis) must use versioned keys or tolerate cold-cache stampedes after cutover. For message queues, ensure consumers on both versions can handle the same message schema during overlap, or pause consumption during cutover with a brief backlog window.

Health checks, draining, and graceful shutdown

Load balancers only route to healthy instances — but default TCP checks lie. An app can accept connections while returning 500 on every request. Use HTTP health endpoints that verify dependencies: database ping, cache reachability, critical feature smoke. Kubernetes liveness vs readiness probes matter: readiness failing removes an instance from the pool without killing the pod (useful during startup); liveness failing restarts it.

Before removing an instance from rotation, send a SIGTERM and let in-flight requests complete. Set terminationGracePeriodSeconds high enough for your slowest endpoint. Without draining, blue-green cutover causes spurious 502 errors as active checkout flows hit dead connections. Pair with circuit breakers on downstream calls so a dying instance does not cascade failures while shutting down.

Automated rollback and human gates

Manual rollbacks fail at 2 a.m. Define objective abort criteria before the deploy starts: error rate > 2× baseline for 5 minutes, p99 latency > 500 ms above control, payment failure rate > 0.5%. Automated systems (Flagger, Argo Rollouts, Spinnaker) watch Prometheus or Datadog metrics and revert traffic weights without paging anyone — but only if metrics are trustworthy and canary traffic is large enough for statistical signal.

Keep a human approval gate before the final 100% promotion on high-risk changes (pricing engine, auth refactor). The CI/CD pipeline should produce immutable artifacts (container digest, not :latest) so rollback deploys the exact previous build. Document the rollback runbook: who can flip traffic, which dashboard to watch, and maximum acceptable data loss window if you must replay queue messages.

Production checklist

Choose strategy per risk: rolling for patches, canary for behavior changes, blue-green for major cutovers.
Run expand-contract database migrations across multiple releases before flipping traffic.
Health checks verify real dependencies — not just return 200.
Implement connection draining and graceful shutdown on every service.
Tag logs and traces with version / deployment_id for canary comparison.
Define automated rollback thresholds before deploy; test rollback in staging quarterly.
Use immutable artifacts — pin container digests in manifests.
Pair canaries with business metrics, not only HTTP error counts.
Keep the previous environment warm for instant blue-green revert.
Document who approves final promotion and the on-call escalation path.
Load-test green environment before cutover; do not discover OOM on first real spike.
Audit feature flags — disable stale flags that shadow-deployed code paths.

Key takeaways

Blue-green runs two full environments and flips traffic atomically — fastest rollback, highest infra cost.
Canary validates on a real traffic slice with metric gates before full promotion — best blast-radius control.
Stateful systems need backward-compatible migrations; never mix incompatible schema and code.
Feature flags separate deploy from release; combine with canaries for safest high-risk launches.
Automated rollback criteria and immutable artifacts turn a scary deploy into a reversible routing change.