Guide
Backpressure explained
A flash sale sends ten thousand checkout events per second into your order
service. The service can process two thousand. Without a plan, those eight
thousand surplus events land in an in-memory list that grows until the JVM
throws OutOfMemoryError and every in-flight order is lost.
Backpressure is the mechanism that tells the fast side to
slow down — or routes excess work elsewhere — before the slow side drowns.
It is not optional polish for high-traffic systems; it is how pipelines
survive mismatched producer and consumer speeds. This guide covers what
backpressure means in practice, how it differs from
rate limiting,
bounded vs unbounded buffering, pressure signals in streams and
message queues,
load shedding trade-offs, and a checklist for shipping flow control that
fails safely instead of silently.
The producer–consumer speed mismatch
Every data pipeline has stages: HTTP handlers enqueue jobs, workers transform rows, databases commit writes, downstream services emit webhooks. Stages rarely run at identical throughput. A spike at the front — a viral post, a batch import, a retry storm after a brief outage — can arrive faster than the bottleneck stage can drain it.
Systems without backpressure respond in one of three bad ways:
- Unbounded buffering — accept everything into RAM or disk until something crashes.
- Blocking without timeout — threads pile up waiting for queue space, exhausting connection pools.
- Silent drops — overflow policies discard work with no metric, no retry, and no user-visible error.
Backpressure is the deliberate fourth option: propagate pressure
upstream so the source reduces its send rate, blocks with a deadline,
or redirects overflow to a durable slow path. The goal is controlled
degradation — latency rises or some requests get
503 Service Unavailable — rather than uncontrolled collapse
where recovery takes hours and data integrity is unknown.
Backpressure vs rate limiting
Teams often conflate the two because both reduce incoming work. They solve different problems at different layers.
Rate limiting caps requests per client, IP, or API key over a time window — typically at the edge, before business logic runs. It protects against abuse and fair-shares capacity across tenants. A client that stays under its quota can still overwhelm an internal worker if the downstream stage is slower than the gateway allows.
Backpressure is end-to-end flow control inside the pipeline. It reacts to current downstream capacity: queue depth, thread-pool saturation, database connection wait time, or consumer lag. When the order processor falls behind, backpressure slows the HTTP handler even for clients well within their rate limit.
Production systems use both: rate limits at the perimeter, backpressure between internal stages. Pairing backpressure with a circuit breaker on failing dependencies prevents retry amplification from turning a partial slowdown into a full outage.
TCP flow control: the original backpressure
The concept is not new. TCP uses a receive window advertised in every ACK packet: "I can accept N more bytes." When the receiver's buffer fills, the window shrinks to zero and the sender stops transmitting. No central coordinator — each hop negotiates capacity locally.
Application-level backpressure mirrors this contract:
- The slow consumer exposes how much buffer space remains.
- The fast producer checks that signal before sending the next unit of work.
- When space is zero, the producer waits, drops, or routes elsewhere — explicitly, not by accident.
HTTP/2 and gRPC extend the idea with stream-level flow control and credit-based windows. A gRPC client cannot flood a server with unbounded concurrent messages if the server has not granted send credit. Streaming APIs that ignore these signals and buffer indefinitely in middleware recreate the OOM failure mode at a higher layer.
Bounded queues and blocking policies
The simplest backpressure implementation is a bounded queue
between producer and consumer threads. Java's
ArrayBlockingQueue, Go channels with fixed capacity, and
Rust's sync_channel(n) all block or fail the
send when full.
Choose the overflow policy deliberately:
- Block with timeout — producer waits up to
Tms for space, then returns an error to the caller. Good for user-facing paths where hanging forever is worse than a fast failure. - Reject immediately —
offer()returns false; surface429or503withRetry-After. Preserves latency percentiles under overload. - Drop oldest (sample) — keep only the newest events. Useful for metrics and telemetry where freshness beats completeness.
- Spill to durable queue — when the in-memory buffer fills, write to Kafka or SQS. This is buffering with a ceiling, not infinite absorption; monitor lag on the spill path.
Unbounded queues — LinkedBlockingQueue without a capacity,
unbounded Go channels, lists that grow on every request — feel safe
during development and become production incidents under load. If you
cannot articulate the maximum in-flight work count and its memory cost,
you do not have backpressure.
Reactive streams and async pipelines
In event-driven and reactive architectures, backpressure is a
first-class protocol. The Reactive Streams specification (implemented in
Project Reactor, RxJava, and Akka Streams) defines
request(n): a subscriber tells the publisher how many items
it is ready to process. The publisher must not emit more than requested.
Node.js readable streams use a similar highWaterMark:
when internal buffers exceed the mark, write() returns false
and the producer should pause until a drain event fires.
Ignoring the return value and continuing to write() buffers
unboundedly in userland.
In
event-driven architectures,
each stage should propagate pressure rather than absorb it silently.
A common anti-pattern: an API handler publishes to Kafka and returns
200 OK while consumer lag climbs for hours. The handler
succeeded; the system did not. Treat rising consumer lag as a
backpressure alarm, not a metric you check monthly.
Backpressure in message queues and databases
Message brokers are often mistaken for infinite buffers. They are not. Kafka partitions have retention limits; SQS queues have visibility timeouts and DLQ thresholds; RabbitMQ memory alarms pause publishers cluster-wide when disk or RAM thresholds breach.
Effective queue backpressure patterns:
- Consumer-scaled prefetch — fetch only as many messages as workers can process concurrently. High prefetch with slow handlers balloons in-flight work and extends recovery time after a crash.
- Lag-based autoscaling — add consumer instances when partition lag exceeds a threshold; scale in when lag drains. The queue is the pressure gauge.
- Publisher throttling — when broker depth or age exceeds limits, slow the producer or reject new publishes. Some teams expose a "system busy" flag that upstream batch jobs respect.
- Idempotent consumers — backpressure often causes redelivery; consumers must handle duplicates safely via idempotency keys.
Databases exert implicit backpressure through connection pool exhaustion and lock contention. When all pool connections are checked out, new requests block — but if the HTTP layer has no timeout, those blocks cascade into thread starvation. Size pools, set acquire timeouts, and surface pool wait metrics before users see 30-second hangs.
Load shedding vs queuing
Queuing trades latency for throughput: work waits until capacity returns. That is appropriate when delays are acceptable and work must not be lost — order processing, payment settlement, audit logs.
Load shedding rejects or drops work to protect the surviving fraction. Use it when:
- Stale work is worthless — live sports scores, ad impressions, real-time quotes.
- Queue depth would exceed recovery time — shedding now beats a six-hour lag backlog.
- User experience degrades uniformly — better to serve 80% of users fast than 100% slowly.
Shedding must be observable: increment a requests_rejected_total
counter, log at warn level with reason codes, and alert when reject
rate exceeds a baseline. Silent shedding is indistinguishable from a
bug. Prefer shedding at a well-defined boundary — API gateway, sidecar,
or admission controller — rather than random thread pool rejections deep
in the stack where error mapping is inconsistent.
Failure modes without backpressure
- Retry storms: Timeouts trigger client retries that multiply load on an already saturated service — backpressure plus jittered backoff breaks the loop.
- GC death spirals: Huge heaps from buffered objects increase pause times, which increases timeouts, which increases buffering.
- Cascading thread exhaustion: Blocked workers hold connections; upstream pools drain; unrelated endpoints fail.
- False capacity signals: Health checks pass because the process is alive while internal queues hold hours of unprocessed work.
- Priority inversion: Bulk batch jobs fill the queue ahead of interactive traffic unless separate bounded lanes exist.
Liveness probes that only hit /healthz without checking
queue depth or dependency latency will keep sending traffic to instances
that are technically up but functionally overloaded. Readiness probes
should fail when the instance cannot accept new work within SLO bounds.
Production checklist
- Inventory every producer–consumer boundary; document max in-flight count and memory per item.
- Replace unbounded in-memory buffers with bounded queues or explicit capacity limits.
- Define overflow policy per path: block-with-timeout, reject, shed, or spill-to-durable-queue.
- Expose metrics: queue depth, pool wait time, consumer lag, reject rate, and oldest-message age.
- Set alerts on lag and depth trends, not just absolute crashes.
- Pair internal backpressure with edge rate limits and circuit breakers on flaky dependencies.
- Ensure consumers are idempotent — pressure causes redelivery and reordering.
- Load-test beyond expected peak; verify reject paths return correct HTTP status and
Retry-Afterheaders. - Separate interactive and batch lanes so bulk work cannot starve user-facing traffic.
- Fail readiness when backlog exceeds recovery SLO; keep liveness cheap but do not confuse the two.
Key takeaways
- Backpressure propagates slowness upstream so fast producers cannot outrun slow consumers indefinitely.
- Rate limiting protects the perimeter; backpressure protects internal pipeline stages based on live capacity.
- Bounded queues are the simplest tool — unbounded buffers defer failure and amplify it.
- Consumer lag is a backpressure signal in async systems; treat rising lag as overload, not background noise.
- Load shedding is valid when queued work loses value — but must be measured, not silent.
Related reading
- Message queues explained — broker semantics, consumer groups, and lag as a capacity gauge
- Circuit breaker pattern explained — stop calling failing dependencies before retries amplify overload
- API rate limiting explained — perimeter quotas vs end-to-end flow control
- Event-driven architecture explained — async pipelines where pressure must propagate across stages