Explainer · 7 June 2026

How TCP congestion control and reliable transport work

The Internet Protocol (IP) is deliberately minimal: it routes packets from source to destination with no guarantee they arrive, stay in order, or arrive only once. That simplicity let the early internet scale, but applications need something stronger — a continuous byte stream where missing data is re-sent and duplicates are discarded. Transmission Control Protocol (TCP) sits on top of IP and provides that illusion of a reliable pipe. Congestion control, the algorithm family that decides how fast TCP may send, is why a single saturated link does not collapse every connection sharing it.

From datagrams to byte streams

IP hands TCP variable-size datagrams (segments) labeled with source and destination addresses. TCP adds:

Port numbers — multiplex many logical connections on one host (HTTPS on 443, your database on 5432).
Sequence numbers — every byte in the stream has a 32-bit index; segments carry a start sequence and length.
Acknowledgments (ACKs) — the receiver reports the next byte it expects; cumulative ACKs mean "I have everything below this number."
Checksums — detect corrupted segments in transit.

The sender maintains a send buffer of bytes not yet ACKed; the receiver buffers out-of-order segments until gaps fill. Your HTTP client never sees individual packets — it reads from a socket that blocks until bytes arrive in order. That abstraction is what TLS and HTTP build on after DNS resolves the hostname.

The three-way handshake

Before data flows, both sides synchronize initial sequence numbers (ISNs) and agree on window sizes:

SYN — client sends SYN with its ISN.
SYN-ACK — server acknowledges client ISN+1 and sends its own ISN.
ACK — client acknowledges server ISN+1; connection is established.

Random ISNs mitigate spoofing attacks where an off-path attacker guesses sequence numbers. A fourth leg (ACK of first data) is not required for the handshake itself but appears immediately when the client sends the HTTP request. Connection teardown uses FIN flags and a four-way close so both sides flush buffers — abrupt RST packets abort when errors are fatal.

Flow control: the receive window

Flow control protects the receiver from being overrun. Every ACK advertises a receive window (rwnd) — how many bytes of buffer space remain. The sender may have at most min(cwnd, rwnd) bytes in flight, where cwnd is the congestion window discussed below.

If the application reads slowly (a sluggish JSON parser, a blocked UI thread), rwnd shrinks to zero and the sender stops — this is backpressure at the transport layer. Unlike application-level rate limits (see our API rate limiting guide), rwnd is dynamic per connection and requires no central coordinator.

Congestion control: sharing the bottleneck fairly

Congestion control protects the network. Routers have finite queues; when arrivals exceed link capacity, packets drop. TCP interprets loss (or, in modern variants, rising delay) as a signal to slow down. Without it, every host would blast at line rate and induce collapse — the congestion collapse that plagued the 1980s internet.

Classic TCP Reno maintains a congestion window cwnd measured in segments:

Slow start — after connection open or a timeout, cwnd starts at a small value (often 10 segments) and doubles every RTT while ACKs arrive — exponential probe until loss or a threshold ssthresh.
Congestion avoidance — above ssthresh, cwnd grows by roughly one segment per RTT (linear increase) until loss.
Loss reaction — on timeout, cwnd resets to 1 and ssthresh halves; on three duplicate ACKs (fast retransmit), cwnd halves without dropping to 1 (fast recovery).

Fast retransmit avoids waiting for a full retransmission timeout when the receiver keeps ACKing the same gap — three duplicate ACKs imply a single lost segment ahead of data still flowing. That cuts tail latency for web assets and database queries riding the same kernel socket stack.

RTT, bandwidth-delay product, and bufferbloat

Round-trip time (RTT) sets the clock for every congestion algorithm. The bandwidth-delay product (BDP) is throughput × RTT — how many bytes must be in flight to saturate a link. A cross-continent 100 ms RTT at 1 Gbps needs ~12 MB in flight; Reno slow start must grow cwnd that large without overfilling router buffers.

Oversized router queues create bufferbloat: latency inflates long before packets drop, hurting interactive traffic (game inputs, wallet RPC polling) sharing a home uplink with a large download. Modern algorithms try to keep queues short:

TCP BBR (Bottleneck Bandwidth and Round-trip propagation time) — models max bandwidth and min RTT, pacing sends to avoid filling buffers; default on many Linux/Android stacks since ~2021.
CUBIC — default on Linux for years; cubic growth of cwnd after loss, tuned for high-bandwidth paths.

ss -ti on Linux or Wireshark's TCP trace graphs show cwnd and RTT evolution when you debug "why is this API slow only from Singapore?"

TCP vs UDP — and where QUIC fits

UDP exposes IP datagrams with ports but no reliability, ordering, or congestion control. Use it when the application tolerates loss (live video, game position updates) or implements its own recovery on top (QUIC, some VPN protocols). Voice and game relay networks often combine UDP with application-level redundancy — see our NAT traversal explainer for how relays fit in.

QUIC (HTTP/3) runs over UDP but embeds TLS 1.3, stream multiplexing without head-of-line blocking, and improved loss recovery in user space. A single lost packet no longer stalls unrelated HTTP responses on the same connection — a pain point of TCP+TLS on HTTP/2. Long-lived WebSocket sessions still often ride TCP today, but edge CDNs increasingly terminate QUIC at the reverse proxy.

Head-of-line blocking and stream multiplexing

TCP exposes a single ordered byte stream per connection. If segment 100 is lost but 101–110 already arrived, the receiver cannot deliver 101+ to the application until 100 is retransmitted — even when those later bytes belong to unrelated HTTP responses multiplexed on HTTP/2. That is head-of-line (HOL) blocking at the transport layer.

HTTP/2 mitigates this at the application layer with independent streams framed above TCP, but a single lost TCP segment still stalls every stream sharing the connection. QUIC moves each stream's reliability to separate flow-control and loss-recovery state so one gap does not freeze unrelated downloads. For long-polling or server-sent event feeds mixed with bursty REST calls, understanding this interaction explains why "one slow endpoint" can drag down an entire browser tab.

What operators and developers should watch

Keep-alive and connection reuse — TLS handshakes plus TCP slow start on every request waste RTTs; HTTP/2 and connection pools amortize setup.
Nagle's algorithm — coalesces tiny writes; disable (TCP_NODELAY) for latency-sensitive RPC if you batch at the application layer instead.
Timeouts should exceed RTT variance — aggressive client timeouts during slow start cause spurious retries that multiply load (pair with circuit breakers at the service layer).
SYN floods — half-open connection tables are a classic DoS vector; SYN cookies and SYN proxies at the load balancer mitigate.
RPC providers — blockchain JSON-RPC over HTTPS inherits all of this; regional latency is often TCP RTT + TLS + server queue, not "Solana being slow."

Practical checklist

Measure RTT and retransmit rate before blaming application code.
Reuse connections; avoid per-request TCP+TLS setup on hot paths.
Size socket buffers for your BDP on high-throughput bulk transfers.
Prefer QUIC/HTTP/3 at the CDN edge when head-of-line blocking hurts.
Use UDP only with eyes open — you inherit congestion control responsibility.
Correlate kernel TCP metrics with application latency percentiles.