Guide
Graceful shutdown explained
Your platform rolls out a new container image. Kubernetes sends
SIGTERM to every pod on the old version, starts a 30-second
countdown, then sends SIGKILL if the process is still alive.
Without a shutdown plan, the load balancer may still route traffic to a pod
that has already closed its listener — users see 502 Bad Gateway
mid-checkout. Graceful shutdown is the sequence that stops
accepting new work, finishes or cancels in-flight requests within a deadline,
flushes logs and metrics, closes database pools, and exits cleanly. It is the
bridge between
zero-downtime deploy strategies
and what actually happens inside each process. This guide covers signals and
deadlines, connection draining with
load balancers,
Kubernetes lifecycle hooks, long-running job handling, failure modes, and a
checklist you can apply to any HTTP service, worker, or queue consumer.
What graceful shutdown is — and is not
Graceful shutdown is an ordered teardown, not merely catching a signal and
calling process.exit(0). A correct sequence has four phases:
- Stop intake — close the HTTP listener, pause queue consumption, or flip an internal "shutting down" flag so new work is rejected with
503 Service Unavailableor equivalent. - Drain in-flight work — wait for active HTTP requests, RPC calls, or batch jobs to complete, subject to a hard deadline.
- Release resources — commit or roll back open transactions, close connection pools, flush buffered logs and metrics, and stop background timers.
- Exit — terminate with a zero exit code if drain succeeded, non-zero if forced.
It is not the same as a circuit breaker (which protects callers from a failing dependency) or backpressure (which slows producers when consumers lag). Those patterns manage overload during normal operation. Graceful shutdown manages the controlled end of operation — usually because the orchestrator is replacing your instance.
SIGTERM, SIGKILL, and shutdown deadlines
On Linux, process managers and container runtimes typically deliver
SIGTERM first — a polite request to exit. If the process
ignores it past a platform-specific grace period, SIGKILL
follows. SIGKILL cannot be caught; the kernel terminates the
process immediately. Any in-flight database write, payment capture, or
file upload in progress at that instant may be left half-done.
Common grace periods:
- Kubernetes —
terminationGracePeriodSecondson the Pod spec (default 30s). The kubelet sends SIGTERM, waits, then SIGKILL. - systemd —
TimeoutStopSecin the unit file. - AWS ECS / Fargate —
stopTimeouton the task definition (up to 120s). - Heroku / Fly.io — platform-defined drain windows, often 30s unless configured longer.
Your application-level shutdown timeout must be shorter than the platform grace period — leave headroom for preStop hooks, load balancer deregistration propagation, and final cleanup. A practical rule: if the pod gets 30 seconds, aim to finish draining by second 20 and spend the remainder on pool teardown and log flush.
Handling signals in application code
Register handlers for SIGTERM and SIGINT (Ctrl+C
in development). The handler should be idempotent — orchestrators may
deliver SIGTERM more than once. Avoid heavy work inside the signal handler
itself; set a flag or enqueue shutdown on the main event loop / worker
thread. In Node.js, use server.close() to stop accepting new
TCP connections while finishing active ones. In Go, call
http.Server.Shutdown(ctx) with a context deadline. In Java
Spring Boot, enable graceful shutdown via
server.shutdown=graceful and tune
spring.lifecycle.timeout-per-shutdown-phase.
Connection draining and load balancers
Stopping your HTTP listener is necessary but not sufficient. External load balancers maintain their own connection tables and health-check state. There is a window — often several seconds — between your pod marking itself not-ready and the balancer stopping new requests to that backend.
Connection draining (also called deregistration delay)
tells the balancer to stop sending new connections to an instance
while allowing existing keep-alive connections to finish. AWS ALB
"target deregistration delay," nginx proxy_next_upstream with
health-based removal, and Envoy's drain manager all implement variants of
this idea.
A robust deploy sequence looks like:
- Pod receives SIGTERM (or preStop hook fires first — see below).
- Readiness probe fails — Kubernetes removes the pod from Service endpoints.
- Load balancer completes deregistration delay while existing requests drain.
- Application
server.close()/Shutdown()waits for in-flight handlers. - Pools and clients close; process exits before SIGKILL.
Misaligned timing is the top cause of deploy-time 502s: readiness flips too late (traffic still arrives after listener close) or too early without drain (balancer sends traffic to a pod that already rejected connections). Tune probes and delays together, not in isolation.
Kubernetes: preStop hooks and probe design
Kubernetes runs the preStop lifecycle hook before
sending SIGTERM. A common pattern sleeps for 5–15 seconds in preStop to
let endpoint removal propagate before the application begins draining:
lifecycle:
preStop:
exec:
command: ["sh", "-c", "sleep 10"]
This is a blunt instrument but effective when balancer propagation is slow. Better: preStop calls an admin endpoint that atomically flips readiness and starts drain, then blocks until complete or timeout.
Distinguish liveness from readiness during shutdown:
- Readiness should fail as soon as drain begins — remove the pod from load-balanced traffic.
- Liveness should remain passing while the process is actively draining; failing liveness mid-drain triggers a restart that aborts in-flight work.
Increase terminationGracePeriodSeconds for workers that process
long jobs — but only if jobs can checkpoint or safely resume elsewhere.
Extending grace without idempotent job design merely delays SIGKILL; it
does not guarantee completion.
Workers, queues, and long-running jobs
HTTP servers are the easy case. Queue consumers and batch workers need explicit "finish current message, then stop" semantics:
- Stop polling — cancel the consumer subscription so no new messages are leased.
- Complete or nack in-flight — acknowledge finished work; return unprocessed messages to the queue with visibility timeout if shutdown deadline approaches.
- Idempotency — assume redelivery after partial processing. Pair with idempotency keys so a message processed twice does not double-charge or double-ship.
For jobs longer than the platform grace period, use a job runner that supports checkpointing (Sidekiq quiet mode, Celery warm shutdown, SQS visibility extension) or move work to a durable queue before shutdown begins. Never rely on infinite grace — orchestrators will kill you.
WebSocket and SSE connections require explicit close frames or client reconnect logic. See WebSockets and server-sent events for session stickiness implications during rolling restarts.
Database pools, caches, and side effects
After HTTP drain completes, close downstream resources in dependency order:
- Stop accepting new transactions on application threads.
- Wait for in-flight queries against the connection pool to return connections.
- Call pool
close()/end()so idle connections release server-side slots. - Flush metrics exporters and structured log buffers — SIGKILL drops anything still in memory.
Distributed locks and leader-election leases should be released on shutdown if the process holds them; otherwise failover waits for lease TTL. See distributed locking for fencing-token patterns when stale holders are a risk.
Failure modes and observability
Watch these during deploys and scale-in events:
- 502/503 spikes — mis-timed listener close vs balancer deregistration.
- Duplicate processing — queue message redelivered after partial handling.
- Connection pool exhaustion — old pods hold DB connections until SIGKILL while new pods scale up.
- Stuck shutdown — a single hung request blocks
server.close()forever; always use a deadline and force-close stragglers. - Zombie processes — child processes not reaped when parent receives SIGTERM (common with shell wrappers in containers).
Instrument shutdown as a first-class event: log "drain started," active connection count, drain duration, forced cancellations, and exit code. Correlate with deploy timestamps in your observability stack to catch regressions when probe timings change.
Production checklist
- Register SIGTERM/SIGINT handlers that trigger a single, idempotent shutdown routine.
- Stop accepting new connections and fail readiness before closing the listener.
- Set an application drain deadline shorter than the platform grace period.
- Align preStop sleep, readiness failure, and load balancer deregistration delay.
- Keep liveness passing during drain; only readiness should fail early.
- Pause queue consumers; ack or nack in-flight messages before exit.
- Ensure handlers are idempotent — deploys will cause redelivery.
- Close connection pools and release distributed locks after drain.
- Flush logs and metrics buffers before final exit.
- Load-test rolling deploys; alert on 502 rate correlated with release events.
Key takeaways
- SIGTERM is a deadline, not a suggestion — SIGKILL follows if you do not exit in time.
- Stop intake before drain — close the listener and fail readiness, then wait for in-flight work.
- Load balancers lag endpoint updates — preStop and deregistration delay exist to cover propagation time.
- Queue workers need explicit stop semantics — pause consumption and design for redelivery.
- Measure deploy-time errors — graceful shutdown is proven by flat 502 rates during rolls, not by code existing.
Related reading
- Blue-green and canary deployments explained — traffic switching strategies that depend on per-instance draining
- Load balancing explained — health checks, sticky sessions, and deregistration delay
- Kubernetes fundamentals explained — pods, probes, and termination grace periods
- Idempotency explained — safe retries when shutdown causes duplicate delivery