How Load Balancing and Reverse Proxies Work

Reverse proxy vs forward proxy

A forward proxy sits in front of clients — think corporate HTTP proxy or VPN egress. Clients know about it and send traffic through it to reach the internet.

A reverse proxy sits in front of servers. Clients connect to api.example.com and have no idea whether one Node process or forty containers answer behind the hostname. The reverse proxy terminates the client connection (often including TLS), inspects the request, and opens a separate connection to an upstream backend.

That separation matters for security and operations: backends can live on private networks, rotate without DNS changes, and fail individually while the public endpoint stays up. Nginx on this VPS is a reverse proxy — it serves static files and forwards API calls to localhost services the internet never touches directly.

Layer 4 vs layer 7 load balancing

Load balancers operate at different depths in the network stack:

Layer 4 (transport) — Routes by IP address and TCP/UDP port. Fast and protocol-agnostic: the balancer does not parse HTTP headers. Useful for raw TCP services, game servers, or database read replicas where every byte looks opaque.
Layer 7 (application) — Routes by HTTP properties: URL path, Host header, cookies, or JWT claims. Slower per packet but far more expressive — send /api/ to microservices and /static/ to a CDN, or route WebSocket upgrades to dedicated nodes.

Cloud providers often expose both: an L4 network load balancer for low-latency TCP passthrough, and an L7 application load balancer for path-based rules. Hybrid setups terminate TLS at L7, then forward plain HTTP to backends on a trusted internal VLAN — a pattern that simplifies certificate management but requires strict network segmentation so unencrypted hops never cross hostile networks.

Balancing algorithms

Once a balancer knows the healthy upstream pool, it must pick one member per request (or per connection). Common strategies:

Round robin — Rotate through backends in order. Simple and fair when every server has similar capacity and requests have similar cost.
Least connections — Send the next request to whichever backend has the fewest active connections. Better when some requests hold connections longer (file uploads, long-polling, WebSockets).
Weighted round robin — Bigger machines get higher weights. Useful during gradual rollouts when new hardware joins the pool at half capacity.
Consistent hashing — Map a key (client IP, session cookie, user ID) to a backend via a hash ring. The same key usually lands on the same server, which helps in-memory caches — but uneven key distribution can hot-spot one node.
Random / power of two choices — Pick two backends at random and send to the less loaded. Surprisingly effective at scale with minimal coordination.

No algorithm fixes a overloaded pool. If every backend is at 95% CPU, smarter routing only rearranges the queue. Capacity planning and rate limiting at the edge remain necessary.

Health checks and graceful draining

Balancers only send traffic to backends that pass health checks. An active check might HTTP GET /healthz every five seconds; a passive check watches real traffic for elevated 5xx rates or slow responses.

When you deploy or patch a server, you do not want it yanked mid-request. Connection draining (also called deregistration delay) marks a backend unhealthy for new connections while existing ones finish — typically 30–300 seconds depending on your longest request. Rolling deploys depend on this: take one instance out, wait for drain, update, rejoin the pool, repeat.

Misconfigured health checks cause flapping: a backend that returns 200 on / but cannot reach its database looks healthy until user traffic fails. Deep checks (verify DB connectivity, disk space, dependency latency) reduce false positives at the cost of more probe traffic.

TLS termination and the trust boundary

Terminating TLS at the load balancer means the proxy holds the certificate private key and decrypts client traffic. Backends see plain HTTP on port 80 inside the data center. Benefits:

Centralized cert renewal (Let's Encrypt hooks on one place instead of N app servers).
Hardware-accelerated crypto on dedicated appliances.
Uniform HTTP/2 or HTTP/3 support even if legacy app servers only speak HTTP/1.1.

The trade-off is trust: anyone who can sniff the internal network between proxy and backend sees payloads. For regulated data, use TLS re-encryption (proxy terminates client TLS, then initiates a fresh TLS connection to backends) or mutual TLS between tiers. Wallet RPC providers follow similar patterns — your browser's TLS session ends at their edge; their infrastructure routes to validator nodes you never contact directly.

Sticky sessions and state

Stateless HTTP is the ideal: any server can answer any request because session data lives in Redis or a database. Reality often includes in-process caches or WebSocket rooms tied to one machine.

Session affinity (sticky sessions) forces the same client to the same backend — via a cookie the balancer sets, or by hashing the client IP. Sticky sessions simplify legacy apps but complicate deploys: draining one sticky backend logs out users mapped to it. Prefer external session stores when you can; use stickiness as a migration bridge, not a permanent architecture.

Long-lived connections (SSE, WebSockets, gRPC streams) are inherently sticky: the TCP connection stays on one backend until it closes. L7 balancers must support connection upgrade headers and often need longer idle timeouts than default 60-second HTTP keep-alive settings.

How DNS and load balancing fit together

DNS can return multiple A records for one hostname; clients pick one (often the first) and retry others on failure. That is a crude form of load distribution at resolution time — but TTL caching means clients stick to one IP for minutes.

Modern setups usually return one stable anycast or CNAME to a cloud load balancer VIP. The balancer's IP rarely changes; scaling happens by adding backends behind it, not by editing DNS. When you do change DNS, remember propagation delay — see our DNS explainer for TTL mechanics.

CDNs add another hop: edge PoPs cache static assets close to users while dynamic API calls pass through to origin load balancers. Cache headers from HTTP caching tell the CDN what it may store; the origin load balancer never sees cacheable image requests that the edge already satisfied.

Failure modes operators see in production

Thundering herd — Health check passes on all nodes simultaneously after an outage; every backend gets slammed at once. Mitigate with jittered probe intervals and slow-start on rejoin.
Retry storms — Client retries × balancer retries × backend timeouts multiply load. Cap retries at the edge and use exponential backoff — the same discipline as RPC 429 handling.
Header size limits — Enormous cookies or JWTs exceed proxy buffer defaults; requests fail with 400/502 before reaching the app. Tune large_client_header_buffers (nginx) or equivalent.
WebSocket stickiness through wrong layer — An L4 balancer hashing only on source IP breaks when many users share one NAT (mobile carrier). L7 cookie-based affinity is more reliable.
CORS preflight doubling — OPTIONS requests hit the balancer too; ensure preflight routes reach backends or are answered at the edge consistently — see CORS explained.

What to monitor

If you operate a balanced service:

Per-backend request rate, error rate, and p95 latency — one sick node should trip alerts before users notice.
Active connection count vs configured max — exhaustion looks like random timeouts, not clean 503s.
SSL handshake errors and cert expiry (even when terminated at the edge).
Drain queue depth during deploys — if drains never complete, your longest request exceeds the drain timeout.

How load balancing and reverse proxies work