Guide

LLM agent retry, backoff and transient failure recovery explained

Harbor Logistics deployed a warehouse agent that called create_shipment whenever the model returned a tool plan. During a week of flaky carrier APIs, the runtime retried every failed HTTP call three times with no idempotency key and no distinction between “connection reset” and “invalid SKU.” 28% of successful agent runs produced duplicate pallet labels — each invisible to the model because the first attempt had actually committed server-side before the timeout. Finance flagged $340k in duplicate freight charges before engineering admitted the agent was not broken; the retry policy was.

Retry and backoff at the agent layer is the discipline of re-attempting only transient failures — network blips, 503s, rate limits with retry-after headers — while refusing to repeat operations that already succeeded or can never succeed. This guide covers error taxonomy, exponential backoff with jitter, idempotency keys for LLM completions and tool calls, per-step retry budgets, coordination with rate limiting and model fallback, the Harbor Logistics refactor, a technique decision table, pitfalls, and a production checklist.

Transient vs permanent: the error taxonomy agents need

A retry policy is only as good as its classifier. Agent runtimes should map every failure into one of four buckets before deciding the next action:

  • Transient — safe to retry after backoff: TCP reset, DNS flake, HTTP 502/503/504, provider timeout, empty completion body, malformed JSON that a repair pass might fix.
  • Rate limited — retry only after honoring Retry-After or token bucket refill; coordinate with throttling instead of hammering the same key.
  • Permanent (client) — do not retry the same request: HTTP 400/401/403/404, schema validation on user input, business rule rejection (“insufficient inventory”).
  • Ambiguous — timeout after a write tool call: the server may have committed. Require idempotency lookup or human escalation; never blind retry.

LLM completion failures add nuance. A truncated JSON tool call is often transient for one repair attempt; repeating the entire planner step five times burns budget and may diverge from the user’s intent. Separate completion retries (same messages, same temperature) from replan retries (new model sample) — only the former should be automatic.

Exponential backoff with jitter

Fixed-interval retries synchronize across clients and amplify outages — the classic “retry storm” that took down Harbor’s carrier integration alongside the original brownout. Production agents use exponential backoff with full jitter:

delay = random_uniform(0, min(cap, base * 2^attempt))

Typical starting values for agent infrastructure:

  • LLM completion retry — base 500ms, cap 8s, max 3 attempts before fallback hop.
  • Read tool retry — base 200ms, cap 4s, max 4 attempts (GET-safe).
  • Write tool retry — only with idempotency key; base 1s, cap 30s, max 2 attempts, then escalate.
  • Webhook delivery — base 5s, cap 300s, max 8 attempts over 24h for async side effects.

Jitter spreads load; caps prevent a single step from blocking the user for minutes. Always log attempt_number, delay_ms, and error_class on each retry in tracing spans so postmortems show whether backoff helped or merely delayed failure.

Idempotency keys: the ambiguous-timeout fix

The duplicate-shipment bug was an ambiguous timeout: Harbor’s WMS returned 200 and created label #A, but the TCP connection dropped before the agent runtime saw the body. The client retried and created label #B for the same logical shipment.

Fix: generate a stable idempotency key per logical operation at plan time, before the first HTTP byte leaves:

idempotency_key = hash(run_id, step_index, tool_name, canonical_args)

Send it as Idempotency-Key (or vendor equivalent) on every write tool invocation. On ambiguous timeout:

  1. Query status endpoint with the same key (if supported).
  2. If status unknown, surface a structured observation to the model: SHIPMENT_STATE_UNKNOWN with the key — do not auto-retry.
  3. Let the agent ask the user or call a reconcile tool.

Read tools are naturally idempotent; cache their last successful response keyed by (tool, args_hash) for the duration of the run so a completion retry does not re-fetch megabytes of catalog data. Write tools without server-side idempotency support should be wrapped in a broker that enforces keys or marked no_auto_retry in the tool registry.

Per-step and per-run retry budgets

Unlimited retries turn a 2s blip into a 90s user wait and runaway token spend. Harbor now enforces two ceilings:

Per-step budget

Each agent step (one model call + its tool batch) gets max_completion_retries=3 and max_tool_retries_per_call=4 for reads, 1 for writes. Exceeding the budget returns a structured RETRY_BUDGET_EXHAUSTED observation so the planner can switch strategy — alternate tool, smaller batch, or human handoff — instead of looping forever.

Per-run budget

Whole trajectories cap total retry attempts (e.g. 20) and total retry wall time (e.g. 120s). Pair with run timeouts so a poison integration cannot hold a session hostage.

Retry vs fallback ordering

On LLM 503, the preferred sequence is:

  1. One fast completion retry with backoff (same model).
  2. If still failing, hop to the next rung on the fallback ladder without additional same-model retries.
  3. If all rungs fail, degrade tier or abort with partial state export.

Retrying the same overloaded model six times before fallback was Harbor’s second mistake — it burned retry budget without changing the failure mode.

Structured observations for the agent loop

Retries should not be invisible to the model. Return JSON observations aligned with tool error handling conventions:

{
  "status": "TRANSIENT_ERROR",
  "tool": "create_shipment",
  "attempt": 2,
  "max_attempts": 2,
  "retryable": false,
  "idempotency_key": "shp_8f3a…",
  "message": "Timeout after 30s; shipment state unknown — do not retry without reconcile"
}

The model can then choose reconcile_shipment instead of blindly calling create_shipment again. For completion failures, inject a system note: “Previous tool call may have succeeded; verify before write.” Silent retries that change world state without updating context are how agents double-charge customers.

Harbor Logistics refactor walkthrough

Harbor replaced ad-hoc for (i=0; i<3; i++) loops with a RetryOrchestrator module:

  1. ErrorClassifier — maps HTTP codes, gRPC status, and vendor error bodies to the four buckets; vendor-specific rules live in config, not agent prompts.
  2. BackoffScheduler — full jitter per tool class; respects Retry-After; integrates with rate limiter buckets.
  3. IdempotencyStore — Redis-backed key registry per run; 24h TTL; status polling for ambiguous writes.
  4. BudgetGuard — per-step and per-run counters; emits RETRY_BUDGET_EXHAUSTED events.
  5. RetryAudit — every attempt logged to compliance trail with key, latency, and outcome.

Results after six weeks: duplicate shipment rate 28% → 1.1%, p95 agent run time during carrier outages 47s → 11s (fewer useless retries + faster fallback), support tickets for “double label” 112/week → 4/week. The remaining 1.1% were vendor APIs without idempotency support — queued for broker wrapping.

Technique decision table

ScenarioPreferAvoid
HTTP 503 on LLM completionOne backoff retry, then fallback ladderSix same-model retries
HTTP 429 with Retry-AfterHonor header + rate limiter deferImmediate retry ignoring header
Timeout after write toolIdempotency status lookupBlind identical retry
HTTP 400 invalid argsReturn permanent error to model for replanExponential backoff loop
Read tool connection resetBackoff retry up to read budgetEscalate to human on first flake
Malformed tool JSON onceSingle repair completion retryFull trajectory restart
Global provider outageFallback + degrade tierPer-client retry storm

Common pitfalls

  • Retry without classification — repeats permanent errors and wastes budget.
  • No idempotency on writes — duplicate side effects on timeout.
  • Fixed retry interval — synchronizes clients into retry storms.
  • Silent retries — model context diverges from actual world state.
  • Same policy for read and write — writes need stricter caps.
  • Retry before fallback — six attempts on a dead endpoint.
  • No per-run ceiling — one bad tool traps the session indefinitely.

Production checklist

  • Define four-bucket error taxonomy with vendor-specific mappings.
  • Implement exponential backoff with full jitter per tool class.
  • Generate idempotency keys before first write attempt.
  • Support status lookup or reconcile tools for ambiguous timeouts.
  • Set per-step and per-run retry budgets with structured exhaustion events.
  • Order retries: same-model once, then fallback, then degrade.
  • Return structured retry observations to the agent loop.
  • Log every attempt with span tags for postmortems.
  • Load-test retry behavior under 30% synthetic 503 rate.
  • Audit write tools without server-side idempotency quarterly.

Key takeaways

  • Classify before retry — transient, rate-limited, permanent, or ambiguous.
  • Idempotency keys prevent duplicate commits on timeout.
  • Backoff + jitter avoids retry storms; caps bound user wait.
  • Budgets force strategy change when retries stop helping.
  • Harbor Logistics cut duplicate shipments from 28% to 1.1% with RetryOrchestrator and write-safe policies.

Related reading