Explainer · 7 June 2026

How backpressure and flow control work

Every pipeline has a fast end and a slow end. A camera encodes 4K frames faster than a cellular uplink can carry them. An API gateway accepts requests faster than a database can commit rows. A log shipper ingests events faster than a search indexer can parse them. Without a coordination mechanism, the gap accumulates in memory — buffers swell, latency climbs, and eventually the process is killed by the OOM killer. Backpressure (also called flow control) is the signal traveling upstream that says stop sending until I catch up. This explainer covers where that signal lives at each layer — kernel sockets, application queues, message brokers, and explicit reactive-stream protocols — and how to design systems that propagate pressure instead of hiding it behind unbounded buffers.

The core problem: speed mismatch

A producer generates work or data; a consumer processes it. Their throughputs are rarely equal over short intervals. Producers burst; consumers stall on I/O, locks, or garbage collection. Something must absorb the temporary gap. The design choice is whether that absorber is:

Bounded — a fixed-size queue or buffer. When full, the producer blocks, drops, or receives an error. Latency stays predictable; throughput is capped to what the consumer can sustain.
Unbounded — an ever-growing list in RAM. Throughput looks fine in benchmarks until memory exhausts. Tail latency explodes because old items wait behind a backlog nobody drains.

Backpressure is what makes bounded designs work: the consumer (or an intermediary) communicates capacity to the producer so production rate matches sustainable consumption rate. The signal can be implicit (a full socket buffer stops write() from returning) or explicit (a credit counter in a reactive-stream protocol).

TCP and kernel-level flow control

At the transport layer, TCP implements two related but distinct mechanisms. Flow control uses the receive window (rwnd) in every segment header. The receiver advertises how much buffer space remains in its socket receive queue. When rwnd shrinks toward zero, the sender must stop transmitting new data (beyond what is already in flight). This is per-connection backpressure: your application read loop is slow, the kernel queue fills, the window closes, and the remote peer's send() blocks or buffers in userspace.

Congestion control is a separate, network-wide mechanism — it limits in-flight data based on perceived packet loss and delay, not just your local buffer. Our TCP congestion control explainer covers Reno, CUBIC, and BBR. For application designers, the lesson is the same: pressure propagates hop by hop. A slow server slows clients; a slow client slows servers that keep writing into kernel buffers.

HTTP/2 and HTTP/3 add stream-level and connection-level flow-control windows on top of TCP/QUIC. A single slow HTTP/2 stream cannot exhaust the entire connection buffer if windows are tuned correctly — but misconfigured proxies that buffer entire responses defeat end-to-end backpressure and reintroduce memory risk.

Application queues and thread pools

Inside your process, the pattern repeats. A web server hands requests to a worker pool through a queue. If the queue is unbounded, accept loops keep enqueueing while workers fall behind — you have accepted more work than you can complete, and clients time out waiting for slots that never arrive.

A bounded queue with a defined rejection policy is backpressure at the API boundary:

Block the acceptor — stop reading from the listening socket until queue depth drops. Pressure returns to TCP rwnd and eventually to clients.
Return HTTP 503 / 429 — fail fast so clients retry with jitter instead of hanging. Pair with rate limiting at the edge.
Shed load — drop lowest-priority traffic (health checks survive; bulk exports do not).

Thread pools without queue limits are a common footgun: "we'll just spawn more threads" converts backpressure into context-switch overhead and lock contention. Our event loops explainer describes how single-threaded reactors handle thousands of connections by never blocking the loop — but they still need explicit backpressure on outbound writes or internal job queues.

Streams in Node.js, Java, and reactive libraries

Language stream APIs encode backpressure in the API surface. In Node.js, a Writable returns false from write() when its internal buffer exceeds highWaterMark. The producer must pause and listen for the 'drain' event before resuming. Piping with readable.pipe(writable) wires this automatically; manual loops that ignore the return value are a classic source of heap OOM under load.

Java's Reactive Streams specification (implemented by Project Reactor, RxJava, and Akka Streams) makes demand explicit: Subscription.request(n) tells the publisher how many items the subscriber is ready to handle. The publisher must not emit more than requested — backpressure is a first-class contract, not an accident of buffer size.

Go channels with fixed capacity provide the same semantics: a send blocks when the buffer is full, synchronizing goroutine production to consumption rate. Unbuffered channels are synchronous handoff — zero hidden queue.

Message queues and log-based systems

Brokers decouple producers and consumers in time, which feels like eliminating backpressure — but the pressure moves to the broker's disk and to consumer lag. Kafka, RabbitMQ, and SQS all have limits: partition retention, memory alarms, or per-queue depth thresholds.

Healthy queue usage treats lag as a visible backpressure gauge. When lag grows monotonically, you are producing faster than you can consume — adding broker capacity without scaling consumers just delays the crisis. Mitigations:

Scale consumers horizontally (more partitions, more workers).
Throttle producers at the source when lag exceeds a threshold.
Use dead-letter queues for poison messages that block a partition.
Design consumers to be idempotent so safe retries do not amplify load.

Our message queues guide covers broker selection; the backpressure lesson is that queues are buffers with a budget, not infinite sponges.

Backpressure vs circuit breaking

Backpressure slows the flow while the downstream is merely busy. A circuit breaker opens when downstream is failing — errors, timeouts, or health-check failures — and rejects calls entirely for a cooldown period. They solve different problems but often appear together at API gateways: rate limits cap steady-state load, backpressure sheds burst overload, circuit breakers protect against cascading failure when a dependency is down.

See our circuit breakers explainer for state machines and half-open probes. Do not confuse "return 503 when queue is full" (backpressure) with "return 503 because the payment service circuit is open" (failure isolation).

Designing for end-to-end pressure

Backpressure only works if it propagates across every buffering layer. A common anti-pattern is an async API that accepts unbounded submissions into an in-memory queue while a background worker drains slowly — clients see HTTP 202 Accepted instantly, then wonder why results arrive minutes late. Better patterns:

Expose queue depth in metrics and health endpoints; alert before OOM.
Cap in-flight work per tenant and globally.
Prefer synchronous rejection over async buffering when SLA matters.
Load-test with sustained overload, not just peak RPS spikes — unbounded queues hide problems in short benchmarks.
Disable intermediary buffering where possible (nginx proxy_request_buffering off for streaming uploads).

In microservice meshes, pressure must cross process boundaries. gRPC supports flow control over HTTP/2 windows; REST clients need explicit 429 handling and retry-after headers. Without cross-service propagation, each hop adds another unbounded buffer and the system becomes a distributed memory leak.

Common pitfalls

Unbounded in-memory queues — "we'll optimize the consumer later" until production traffic proves you won't.
Ignoring write() return values in Node or equivalent APIs — the fastest path to heap exhaustion.
Buffering proxies — CDNs and reverse proxies that read entire request bodies before forwarding defeat client-side backpressure.
Treating broker lag as normal — lag is debt; retention policies eventually delete data you never processed.
Retry storms — clients that retry 503s without exponential backoff amplify overload; combine server backpressure with client jitter.
Confusing concurrency with throughput — more parallel workers help only if the bottleneck is CPU, not a single shared resource.

Practical checklist

Inventory every queue and buffer in the request path; note whether each is bounded.
Define what happens when each buffer fills: block, drop, or reject with a status code.
Instrument queue depth, consumer lag, and socket send-buffer saturation.
Verify backpressure in load tests lasting longer than your p99 latency SLO.
Propagate retry-after or 429 when rejecting; document client backoff expectations.
Pair overload handling with circuit breakers for failing dependencies, not just slow ones.