Guide
Exponential backoff and retry patterns explained
A payment API blips for eight seconds during a database failover. Your checkout
service retries immediately — and so do four hundred other instances, each
firing three attempts in the same second. The recovering database never gets
a breath; the blip becomes a twenty-minute outage. That is a
retry storm: well-intentioned fault tolerance that amplifies
failure instead of absorbing it. Retries are essential for transient errors —
network timeouts, brief 503 responses, leader election gaps — but
only when spaced intelligently. Exponential backoff increases
delay between attempts; jitter randomizes those delays so
clients do not synchronize. This guide covers backoff formulas, jitter
strategies, retry budgets, which errors deserve another try,
idempotency
requirements for safe replays, pairing with
circuit breakers,
and a production checklist for resilient clients and servers.
Why retries exist — and when they hurt
Distributed systems fail in short bursts. TCP retransmits packets; load
balancers drain unhealthy nodes; cloud APIs return
503 Service Unavailable while autoscaling catches up. A single
request that fails once often succeeds on the second attempt — if the
underlying problem was transient.
Retries hurt when:
- The error is permanent — validation failures, auth denials, and
404responses will not heal with delay. - The operation is not idempotent — charging a card twice because the first response timed out is worse than failing once.
- Many clients retry in lockstep — synchronized retries recreate the overload that caused the failure.
- Retries lack a ceiling — unbounded attempts keep pressure on a recovering dependency indefinitely.
Good retry design answers four questions before the first failure: Is this error retryable? Is the handler safe to run again? How long should we wait? When do we give up and surface failure to a human or a dead letter queue?
Fixed delay vs exponential backoff
The simplest retry loop waits a constant interval — say one second — between attempts. Fixed delay is easy to reason about but dangerous at scale: every client that failed at time T retries at T + 1s, creating periodic traffic spikes that hammer the recovering service on a metronome.
Exponential backoff multiplies the wait after each failure. A common pattern:
delay = min(cap, base * 2^attempt)
With base = 100ms, cap = 30s, and up to five
attempts, delays grow roughly 100ms → 200ms → 400ms → 800ms → 1.6s before
hitting the cap. Early retries catch fast recoveries; later attempts back
off aggressively when the outage persists.
Choose base large enough to avoid hammering a service in its
first milliseconds of recovery, and cap small enough that total
user-visible latency stays within your SLO. Document max attempts per
operation class — a read can afford more tries than a financial write.
Jitter: breaking synchronization
Exponential backoff alone still leaves clients aligned if they failed at the same moment — a deploy, a regional blip, or a shared dependency outage synchronizes thousands of clocks. Jitter adds randomness so retry times spread across a window.
Full jitter (recommended default)
AWS popularized picking a uniform random delay between zero and the calculated backoff:
sleep = random(0, min(cap, base * 2^attempt))
Full jitter minimizes collision probability and is the safest default for client SDKs talking to shared infrastructure.
Equal jitter
Half the calculated delay plus random noise in the upper half:
delay/2 + random(0, delay/2). Keeps average wait closer to
the exponential curve while still desynchronizing clients.
Decorrelated jitter
Each wait depends on the previous sleep, not just the attempt count:
sleep = min(cap, random(base, previous_sleep * 3)). Useful
when attempt numbers are unreliable (message visibility timeouts that
reset) or when you want faster spread without strict powers of two.
Whichever variant you choose, log the attempt number, chosen delay, and error class — without logging secrets or full payloads.
Retry budgets and giving up
A retry budget caps how much retry traffic your system generates — globally or per dependency. Google's SRE practice limits retries to a fraction of total request volume so a failing backend cannot be drowned by its own clients' goodwill.
Practical limits to set:
- Max attempts — typically 3–5 for synchronous HTTP; more for async workers with durable queues.
- Total deadline — wall-clock timeout across all attempts (e.g. 10s for an interactive API call).
- Per-dependency concurrency — cap simultaneous in-flight retries so one slow service does not exhaust your thread pool.
- Retry-After respect — when a server returns
Retry-After, honor it instead of your own schedule (within reason).
After the budget is exhausted, fail visibly: return an error to the user, enqueue for later processing, or route to a DLQ. Silent infinite retry loops are how stuck messages and zombie jobs accumulate.
Retryable vs terminal errors
Not every non-200 status deserves another attempt. A useful
rule of thumb for HTTP clients:
- Retry —
408 Request Timeout,429 Too Many Requests(with backoff honoring rate limits),500,502,503,504, and connection resets where the request may not have reached the server. - Do not retry —
400bad input,401/403auth,404not found,409conflict (unless your app defines idempotent upsert semantics), most422validation errors.
For idempotent GET and HEAD, retries are
generally safe. For POST, assume unsafe unless you send an
idempotency key the server deduplicates. For PUT and
DELETE with stable resource IDs, retries are often safe;
for PATCH, depends on whether the patch is absolute or
relative.
Message consumers should classify exceptions the same way: network blips and throttling → retry with backoff; schema violations and business-rule rejections → terminal, send to DLQ after N receives.
Idempotency: the non-negotiable prerequisite
A timeout is ambiguous: the server may have succeeded and the response was lost, or the server never ran the handler. Retrying without idempotency guarantees duplicates — double charges, duplicate shipments, two ledger entries for one trade.
Production patterns:
- Idempotency keys — client sends
Idempotency-Key: uuid; server stores outcome keyed by that ID for 24–72 hours. - Natural idempotency —
PUT /users/42with full representation replaces the same state regardless of repeat count. - Deduplication tables — store processed event IDs for async consumers; skip duplicates on redelivery.
- Compare-and-swap — only apply if version or timestamp matches; stale retries no-op safely.
If you cannot make an operation idempotent, do not retry it blindly — use outbox polling, human reconciliation, or a saga with compensating transactions instead.
Where retries live: client, proxy, or broker
Retries can happen at multiple layers; duplicating them multiplies load.
Client SDK retries
Application code or the HTTP client library retries failed calls. Gives fine-grained control per API but risks every service inventing different policies.
Service mesh / API gateway
Envoy, Linkerd, or an
API gateway
may retry idempotent routes automatically. Centralized policy is powerful
but dangerous for non-idempotent POST unless explicitly
excluded.
Message broker redelivery
SQS visibility timeout, RabbitMQ nack-with-requeue, and Kafka consumer offset rewind all implement retries asynchronously. Pair broker redelivery with exponential backoff via delayed queues or tiered retry topics — not immediate hot loops.
Pick one primary retry layer per hop. If the client retries three times and the gateway retries three times, you have nine attempts hitting a fragile backend.
Pairing backoff with circuit breakers and rate limits
Retries and circuit breakers solve opposite phases of the same outage. Backoff helps during brief transients; the circuit opens when failure rate proves the dependency is down, failing fast instead of wasting slots on doomed attempts.
Typical combination:
- First failure → exponential backoff retry (small number of tries).
- Sustained failures → circuit trips open; calls return immediately or use a cached fallback.
- After a cool-down → half-open probe with a single attempt; success closes the circuit.
Edge
rate limiting
protects shared APIs from abusive retry volume. Return
429 with Retry-After so well-behaved clients
back off instead of tight-looping.
HTTP semantics: Retry-After and idempotent methods
RFC 9110 defines Retry-After as either a delay in seconds or
an HTTP-date when the client should try again. Servers under load should
set it on 503 and 429 responses so clients do not
guess.
Clients should parse both forms, clamp unreasonable values, and add jitter
even when honoring Retry-After — thousands of clients receiving
Retry-After: 5 will still collide at second five without
noise.
For long-running operations, prefer 202 Accepted with a status
poll URL or webhook callback over holding a connection open through
multiple internal retries.
Common failure modes
- Retrying non-idempotent POST on timeout — classic double-charge bug; always use idempotency keys for payments and orders.
- No jitter on shared outage — recovery window gets hammered by synchronized wave two.
- Retrying 429 as fast as possible — ignores rate-limit intent; honor
Retry-Afterand reduce concurrency. - Nested retries across layers — multiplicative attempt count; document and disable duplicate layers.
- Retrying through an open circuit — wastes resources; check breaker state before scheduling backoff.
- Poison message infinite requeue — terminal errors must land in DLQ after max receives, not loop forever.
Production checklist
- Classify every outbound call and consumer handler as retryable or terminal; document the decision.
- Require idempotency keys (or natural idempotency) before enabling retries on mutating operations.
- Implement exponential backoff with full jitter as the default client policy.
- Set max attempts, per-call deadline, and optional per-dependency retry budget.
- Honor
Retry-Afteron429and503; add jitter on top. - Pair retries with circuit breakers — stop retrying when the breaker is open.
- Ensure only one layer retries per hop (client OR gateway OR broker, not all three).
- Emit metrics: retry count, backoff delay histogram, exhausted-budget rate, duplicate-detection hits.
- Route exhausted async retries to a DLQ with alert and redrive runbook.
- Load-test recovery: simulate dependency outage and verify clients desynchronize and recover without retry storm.
Key takeaways
- Retries absorb transients; exponential backoff prevents them from becoming storms.
- Jitter is not optional at scale — without randomness, backoff still synchronizes clients.
- Idempotency comes before retry policy — ambiguous timeouts on writes need deduplication, not blind repeats.
- Classify errors — retry timeouts and 5xx; fail fast on 4xx validation and auth.
- Circuit breakers and rate limits complement backoff — know when to stop retrying entirely.
Related reading
- Idempotency explained — safe replays, idempotency keys, and deduplication for retried writes
- Circuit breaker pattern explained — when to stop calling a failing dependency
- Dead letter queues explained — quarantine messages after retry budgets exhaust
- API rate limiting explained —
429, token buckets, and client backoff discipline