Guide

Feature flags explained: gradual rollouts, kill switches, and safe deployments

Shipping code and shipping features are not the same thing. A deploy can land new code in production while the feature stays invisible — or visible to 5% of users, or only to your team, or instantly turned off when metrics spike. That decoupling is what feature flags (also called feature toggles) provide: runtime switches that control which code paths execute without redeploying binaries. Used well, flags reduce deployment risk, speed up trunk-based development, and give operators a kill switch when things go wrong. Used poorly, they become permanent if (flag) branches nobody dares delete. This guide explains the patterns, trade-offs, and how flags fit alongside CI/CD pipelines, Kubernetes rollouts, and observability.

What a feature flag actually is

At its simplest, a feature flag is a named boolean (or richer value) evaluated at runtime. Your application asks a flag service or local config: "Is new-checkout-flow enabled for this user?" The answer determines which branch runs — old checkout vs new checkout, legacy API vs v2 handler, ads on vs ads off.

The evaluation can happen in many places: server middleware before routing, client-side in a React app, at the edge in a CDN worker, or inside a database migration gate. The flag definition lives separately from the code that checks it. That separation is the whole point: you can flip behavior in seconds through a dashboard instead of waiting for a full deploy pipeline.

Managed platforms (LaunchDarkly, Unleash, Flagsmith, ConfigCat) provide targeting rules, audit logs, and SDKs. Smaller teams often start with environment variables or a JSON file in Redis — workable for a handful of flags, painful at scale. The architecture choice matters less than discipline around flag lifecycle (more on that below).

Types of feature flags

Not every toggle serves the same purpose. Mixing types without labeling them is how flag debt accumulates.

Release toggles

Short-lived flags that hide incomplete work on the main branch. Trunk-based development depends on these: engineers merge code behind a default-off flag, then enable it gradually once tests pass. The flag should be removed within days or weeks after full rollout — the code path becomes unconditional.

Ops toggles (kill switches)

Long-lived flags that disable expensive or risky subsystems under load. Examples: turn off recommendation ranking during a traffic spike, disable third-party ad scripts when an SDK misbehaves, or pause webhook delivery when a partner API is down. These are operational tools, not product experiments. Document them in runbooks and test the "off" path regularly — kill switches that never get exercised tend to be broken when you need them.

Experiment toggles (A/B tests)

Split traffic between variants to measure conversion, retention, or latency. Requires consistent user bucketing (hash user ID + experiment name) and statistical rigor. Pair with metrics pipelines so you can compare cohorts. Experiment flags have a defined end date: ship the winner, delete the loser code.

Permission toggles

Entitlement flags tied to billing tiers or beta programs — "pro users see advanced analytics." These may live for years but should be modeled as authorization, not buried inside feature-flag conditionals scattered through the codebase. Consider routing them through your auth layer eventually.

Rollout strategies that work

The most common production pattern is percentage rollout: enable the flag for 1% of users, watch error rates and latency, bump to 10%, then 50%, then 100%. Good flag systems use deterministic hashing so the same user always gets the same bucket — you do not want someone seeing the new checkout on Monday and the old one on Tuesday.

Targeting dimensions

Beyond random percentages, you can target:

Internal users — dogfood before external traffic (email domain, IP allowlist, or explicit user ID list).
Geography — launch in one region first to limit blast radius.
Account attributes — enterprise vs free tier, account age, feature adoption history.
Custom segments — "users who completed onboarding" or "wallets with > 1 SOL balance" on a crypto product.

Each targeting rule adds complexity. Start with internal-only, then percentage, then attribute rules. Every rule is another thing that can misconfigure at 2 a.m.

Canary deploys vs feature flags

Kubernetes canary deployments route a fraction of requests to a new pod version. Feature flags route a fraction of users to new code paths inside the same binary. They complement each other: deploy the code everywhere (canary or blue-green), then use flags to control who executes the new path. Flags let you roll back behavior instantly without rolling back the container image — valuable when the new code is fine but a downstream dependency is not.

Implementation patterns in code

The anti-pattern is sprinkling raw if (getFlag("foo")) across fifty files. Better approaches:

Centralized evaluation

One module resolves all flags for a request context at the start. Pass a FlagContext object down the call stack. SDKs typically cache flag values locally with a TTL (30–60 seconds) so every HTTP handler does not hit the flag service. Stale cache means a flip takes up to one TTL to propagate — acceptable for gradual rollouts, unacceptable for emergency kill switches unless you add a force-refresh path.

Branch by abstraction

Define an interface (CheckoutService) with two implementations (LegacyCheckout, NewCheckout). The factory picks implementation based on the flag. When the rollout completes, delete the legacy class and the flag — one clean diff instead of hunting conditionals.

Default-off for new flags

New flags should default to false in code when the flag service is unreachable. "Fail closed" for risky features; "fail open" only for kill switches where the safe state is "feature disabled." Document which is which per flag.

// Pseudocode — factory pattern
function createCheckout(ctx) {
  if (flags.isEnabled("new-checkout", ctx.userId)) {
    return new NewCheckout();
  }
  return new LegacyCheckout();
}

Flag lifecycle and avoiding flag debt

The biggest long-term risk is flag debt: hundreds of stale toggles nobody understands. Each one doubles test matrix size (flag on/off × environments) and makes refactors terrifying.

Mitigations that actually stick:

Owner and expiry date on every flag at creation time. Automated reminders when expiry passes.
Flag dashboards that show last evaluation time — a flag evaluated zero times in 30 days is a deletion candidate.
CI checks that fail if release toggles older than N days remain in the flag registry.
Removal tickets filed at the same time as the rollout ticket. "Enable new checkout" and "Remove new-checkout flag" should be linked tasks.

Ops toggles and permission flags are exempt from short expiry but need owners and runbook entries. If a kill switch has been "off" for a year, ask whether the risky subsystem should be deleted instead.

Observability and safety

Every flag evaluation should emit structured metadata when it affects user-visible behavior: log or trace attribute feature_flag.new-checkout=true. When error rates spike after a rollout bump, you need to slice metrics by flag state — otherwise you are guessing.

Pair rollouts with RED metrics (rate, errors, duration) per service and per flag cohort. Alert on error-rate divergence between "flag on" and "flag off" populations before you increase the percentage.

Audit logs matter for compliance: who flipped a flag, when, and from what value to what. Managed flag services provide this out of the box; DIY Redis solutions need explicit logging.

For high-risk changes (payment flows, auth), consider synchronous flag evaluation on the critical path with no cache, or a two-step enable: internal → 0.1% → hold 24 hours → continue. Rate limiting and circuit breakers (see our API rate limiting guide) protect downstream systems when a new code path suddenly gets real traffic.

When you do not need feature flags

Flags add operational surface area. Skip them when:

The change is small, well-tested, and easy to revert via git revert + redeploy in under five minutes.
You are a solo developer on a static site — branch in git, merge when ready.
The "flag" would live for one deploy cycle anyway — just use a short-lived branch and merge.
Configuration belongs in environment config (database URL, API keys) rather than per-user behavior.

Flags earn their keep on teams shipping daily to production, running multi-tenant SaaS, or operating systems where redeploy latency or blast radius is costly. They are force multipliers alongside solid CI/CD automation — not a substitute for tests, code review, or staged environments.

Practical checklist

Name flags clearly — checkout-v2-rollout, not temp_flag_3.
Default off for new release toggles; document fail-open vs fail-closed per flag.
Deterministic bucketing — same user, same experience across sessions.
Start internal — team dogfood before percentage rollout.
Instrument cohorts — metrics and traces tagged with flag state.
Set expiry dates — delete release toggles after full rollout.
Test the off path — especially for kill switches and ops toggles.
Audit who can flip production flags — RBAC on the flag dashboard.

Feature flags are a deployment safety net, not a product strategy. The goal is smaller, more frequent releases with less fear — then clean up the toggles so the codebase stays readable. Treat every flag as temporary until proven otherwise.