Guide

Graceful shutdown explained

Your platform rolls out a new container image. Kubernetes sends SIGTERM to every pod on the old version, starts a 30-second countdown, then sends SIGKILL if the process is still alive. Without a shutdown plan, the load balancer may still route traffic to a pod that has already closed its listener — users see 502 Bad Gateway mid-checkout. Graceful shutdown is the sequence that stops accepting new work, finishes or cancels in-flight requests within a deadline, flushes logs and metrics, closes database pools, and exits cleanly. It is the bridge between zero-downtime deploy strategies and what actually happens inside each process. This guide covers signals and deadlines, connection draining with load balancers, Kubernetes lifecycle hooks, long-running job handling, failure modes, and a checklist you can apply to any HTTP service, worker, or queue consumer.

What graceful shutdown is — and is not

Graceful shutdown is an ordered teardown, not merely catching a signal and calling process.exit(0). A correct sequence has four phases:

Stop intake — close the HTTP listener, pause queue consumption, or flip an internal "shutting down" flag so new work is rejected with 503 Service Unavailable or equivalent.
Drain in-flight work — wait for active HTTP requests, RPC calls, or batch jobs to complete, subject to a hard deadline.
Release resources — commit or roll back open transactions, close connection pools, flush buffered logs and metrics, and stop background timers.
Exit — terminate with a zero exit code if drain succeeded, non-zero if forced.

It is not the same as a circuit breaker (which protects callers from a failing dependency) or backpressure (which slows producers when consumers lag). Those patterns manage overload during normal operation. Graceful shutdown manages the controlled end of operation — usually because the orchestrator is replacing your instance.

SIGTERM, SIGKILL, and shutdown deadlines

On Linux, process managers and container runtimes typically deliver SIGTERM first — a polite request to exit. If the process ignores it past a platform-specific grace period, SIGKILL follows. SIGKILL cannot be caught; the kernel terminates the process immediately. Any in-flight database write, payment capture, or file upload in progress at that instant may be left half-done.

Common grace periods:

Kubernetes — terminationGracePeriodSeconds on the Pod spec (default 30s). The kubelet sends SIGTERM, waits, then SIGKILL.
systemd — TimeoutStopSec in the unit file.
AWS ECS / Fargate — stopTimeout on the task definition (up to 120s).
Heroku / Fly.io — platform-defined drain windows, often 30s unless configured longer.

Your application-level shutdown timeout must be shorter than the platform grace period — leave headroom for preStop hooks, load balancer deregistration propagation, and final cleanup. A practical rule: if the pod gets 30 seconds, aim to finish draining by second 20 and spend the remainder on pool teardown and log flush.

Handling signals in application code

Register handlers for SIGTERM and SIGINT (Ctrl+C in development). The handler should be idempotent — orchestrators may deliver SIGTERM more than once. Avoid heavy work inside the signal handler itself; set a flag or enqueue shutdown on the main event loop / worker thread. In Node.js, use server.close() to stop accepting new TCP connections while finishing active ones. In Go, call http.Server.Shutdown(ctx) with a context deadline. In Java Spring Boot, enable graceful shutdown via server.shutdown=graceful and tune spring.lifecycle.timeout-per-shutdown-phase.

Connection draining and load balancers

Stopping your HTTP listener is necessary but not sufficient. External load balancers maintain their own connection tables and health-check state. There is a window — often several seconds — between your pod marking itself not-ready and the balancer stopping new requests to that backend.

Connection draining (also called deregistration delay) tells the balancer to stop sending new connections to an instance while allowing existing keep-alive connections to finish. AWS ALB "target deregistration delay," nginx proxy_next_upstream with health-based removal, and Envoy's drain manager all implement variants of this idea.

A robust deploy sequence looks like:

Pod receives SIGTERM (or preStop hook fires first — see below).
Readiness probe fails — Kubernetes removes the pod from Service endpoints.
Load balancer completes deregistration delay while existing requests drain.
Application server.close() / Shutdown() waits for in-flight handlers.
Pools and clients close; process exits before SIGKILL.

Misaligned timing is the top cause of deploy-time 502s: readiness flips too late (traffic still arrives after listener close) or too early without drain (balancer sends traffic to a pod that already rejected connections). Tune probes and delays together, not in isolation.

Kubernetes: preStop hooks and probe design

Kubernetes runs the preStop lifecycle hook before sending SIGTERM. A common pattern sleeps for 5–15 seconds in preStop to let endpoint removal propagate before the application begins draining:

lifecycle:
  preStop:
    exec:
      command: ["sh", "-c", "sleep 10"]

This is a blunt instrument but effective when balancer propagation is slow. Better: preStop calls an admin endpoint that atomically flips readiness and starts drain, then blocks until complete or timeout.

Distinguish liveness from readiness during shutdown:

Readiness should fail as soon as drain begins — remove the pod from load-balanced traffic.
Liveness should remain passing while the process is actively draining; failing liveness mid-drain triggers a restart that aborts in-flight work.

Increase terminationGracePeriodSeconds for workers that process long jobs — but only if jobs can checkpoint or safely resume elsewhere. Extending grace without idempotent job design merely delays SIGKILL; it does not guarantee completion.

Workers, queues, and long-running jobs

HTTP servers are the easy case. Queue consumers and batch workers need explicit "finish current message, then stop" semantics:

Stop polling — cancel the consumer subscription so no new messages are leased.
Complete or nack in-flight — acknowledge finished work; return unprocessed messages to the queue with visibility timeout if shutdown deadline approaches.
Idempotency — assume redelivery after partial processing. Pair with idempotency keys so a message processed twice does not double-charge or double-ship.

For jobs longer than the platform grace period, use a job runner that supports checkpointing (Sidekiq quiet mode, Celery warm shutdown, SQS visibility extension) or move work to a durable queue before shutdown begins. Never rely on infinite grace — orchestrators will kill you.

WebSocket and SSE connections require explicit close frames or client reconnect logic. See WebSockets and server-sent events for session stickiness implications during rolling restarts.

Database pools, caches, and side effects

After HTTP drain completes, close downstream resources in dependency order:

Stop accepting new transactions on application threads.
Wait for in-flight queries against the connection pool to return connections.
Call pool close() / end() so idle connections release server-side slots.
Flush metrics exporters and structured log buffers — SIGKILL drops anything still in memory.

Distributed locks and leader-election leases should be released on shutdown if the process holds them; otherwise failover waits for lease TTL. See distributed locking for fencing-token patterns when stale holders are a risk.

Failure modes and observability

Watch these during deploys and scale-in events:

502/503 spikes — mis-timed listener close vs balancer deregistration.
Duplicate processing — queue message redelivered after partial handling.
Connection pool exhaustion — old pods hold DB connections until SIGKILL while new pods scale up.
Stuck shutdown — a single hung request blocks server.close() forever; always use a deadline and force-close stragglers.
Zombie processes — child processes not reaped when parent receives SIGTERM (common with shell wrappers in containers).

Instrument shutdown as a first-class event: log "drain started," active connection count, drain duration, forced cancellations, and exit code. Correlate with deploy timestamps in your observability stack to catch regressions when probe timings change.

Production checklist

Register SIGTERM/SIGINT handlers that trigger a single, idempotent shutdown routine.
Stop accepting new connections and fail readiness before closing the listener.
Set an application drain deadline shorter than the platform grace period.
Align preStop sleep, readiness failure, and load balancer deregistration delay.
Keep liveness passing during drain; only readiness should fail early.
Pause queue consumers; ack or nack in-flight messages before exit.
Ensure handlers are idempotent — deploys will cause redelivery.
Close connection pools and release distributed locks after drain.
Flush logs and metrics buffers before final exit.
Load-test rolling deploys; alert on 502 rate correlated with release events.

Key takeaways

SIGTERM is a deadline, not a suggestion — SIGKILL follows if you do not exit in time.
Stop intake before drain — close the listener and fail readiness, then wait for in-flight work.
Load balancers lag endpoint updates — preStop and deregistration delay exist to cover propagation time.
Queue workers need explicit stop semantics — pause consumption and design for redelivery.
Measure deploy-time errors — graceful shutdown is proven by flat 502 rates during rolls, not by code existing.