Guide
LLM agent retry, backoff and transient failure recovery explained
Harbor Logistics deployed a warehouse agent that called
create_shipment whenever the model returned a tool plan.
During a week of flaky carrier APIs, the runtime retried every failed HTTP
call three times with no idempotency key and no distinction between
“connection reset” and “invalid SKU.”
28% of successful agent runs produced duplicate pallet
labels — each invisible to the model because the first attempt had
actually committed server-side before the timeout. Finance flagged
$340k in duplicate freight charges before engineering
admitted the agent was not broken; the retry policy was.
Retry and backoff at the agent layer is the discipline of re-attempting only transient failures — network blips, 503s, rate limits with retry-after headers — while refusing to repeat operations that already succeeded or can never succeed. This guide covers error taxonomy, exponential backoff with jitter, idempotency keys for LLM completions and tool calls, per-step retry budgets, coordination with rate limiting and model fallback, the Harbor Logistics refactor, a technique decision table, pitfalls, and a production checklist.
Transient vs permanent: the error taxonomy agents need
A retry policy is only as good as its classifier. Agent runtimes should map every failure into one of four buckets before deciding the next action:
- Transient — safe to retry after backoff: TCP reset, DNS flake, HTTP 502/503/504, provider timeout, empty completion body, malformed JSON that a repair pass might fix.
- Rate limited — retry only after honoring
Retry-Afteror token bucket refill; coordinate with throttling instead of hammering the same key. - Permanent (client) — do not retry the same request: HTTP 400/401/403/404, schema validation on user input, business rule rejection (“insufficient inventory”).
- Ambiguous — timeout after a write tool call: the server may have committed. Require idempotency lookup or human escalation; never blind retry.
LLM completion failures add nuance. A truncated JSON tool call is often transient for one repair attempt; repeating the entire planner step five times burns budget and may diverge from the user’s intent. Separate completion retries (same messages, same temperature) from replan retries (new model sample) — only the former should be automatic.
Exponential backoff with jitter
Fixed-interval retries synchronize across clients and amplify outages — the classic “retry storm” that took down Harbor’s carrier integration alongside the original brownout. Production agents use exponential backoff with full jitter:
delay = random_uniform(0, min(cap, base * 2^attempt))
Typical starting values for agent infrastructure:
- LLM completion retry — base 500ms, cap 8s, max 3 attempts before fallback hop.
- Read tool retry — base 200ms, cap 4s, max 4 attempts (GET-safe).
- Write tool retry — only with idempotency key; base 1s, cap 30s, max 2 attempts, then escalate.
- Webhook delivery — base 5s, cap 300s, max 8 attempts over 24h for async side effects.
Jitter spreads load; caps prevent a single step from blocking the user
for minutes. Always log attempt_number,
delay_ms, and error_class on each retry in
tracing spans
so postmortems show whether backoff helped or merely delayed failure.
Idempotency keys: the ambiguous-timeout fix
The duplicate-shipment bug was an ambiguous timeout: Harbor’s WMS returned 200 and created label #A, but the TCP connection dropped before the agent runtime saw the body. The client retried and created label #B for the same logical shipment.
Fix: generate a stable idempotency key per logical operation at plan time, before the first HTTP byte leaves:
idempotency_key = hash(run_id, step_index, tool_name, canonical_args)
Send it as Idempotency-Key (or vendor equivalent) on
every write tool invocation. On ambiguous timeout:
- Query status endpoint with the same key (if supported).
- If status unknown, surface a structured observation to the model:
SHIPMENT_STATE_UNKNOWNwith the key — do not auto-retry. - Let the agent ask the user or call a reconcile tool.
Read tools are naturally idempotent; cache their last successful
response keyed by (tool, args_hash) for the duration of
the run so a completion retry does not re-fetch megabytes of catalog
data. Write tools without server-side idempotency support should be
wrapped in a broker that enforces keys or marked
no_auto_retry in the tool registry.
Per-step and per-run retry budgets
Unlimited retries turn a 2s blip into a 90s user wait and runaway token spend. Harbor now enforces two ceilings:
Per-step budget
Each agent step (one model call + its tool batch) gets
max_completion_retries=3 and
max_tool_retries_per_call=4 for reads,
1 for writes. Exceeding the budget returns a structured
RETRY_BUDGET_EXHAUSTED observation so the planner can
switch strategy — alternate tool, smaller batch, or human handoff
— instead of looping forever.
Per-run budget
Whole trajectories cap total retry attempts (e.g. 20) and total retry wall time (e.g. 120s). Pair with run timeouts so a poison integration cannot hold a session hostage.
Retry vs fallback ordering
On LLM 503, the preferred sequence is:
- One fast completion retry with backoff (same model).
- If still failing, hop to the next rung on the fallback ladder without additional same-model retries.
- If all rungs fail, degrade tier or abort with partial state export.
Retrying the same overloaded model six times before fallback was Harbor’s second mistake — it burned retry budget without changing the failure mode.
Structured observations for the agent loop
Retries should not be invisible to the model. Return JSON observations aligned with tool error handling conventions:
{
"status": "TRANSIENT_ERROR",
"tool": "create_shipment",
"attempt": 2,
"max_attempts": 2,
"retryable": false,
"idempotency_key": "shp_8f3a…",
"message": "Timeout after 30s; shipment state unknown — do not retry without reconcile"
}
The model can then choose reconcile_shipment instead of
blindly calling create_shipment again. For completion
failures, inject a system note: “Previous tool call may have
succeeded; verify before write.” Silent retries that change
world state without updating context are how agents double-charge
customers.
Harbor Logistics refactor walkthrough
Harbor replaced ad-hoc for (i=0; i<3; i++) loops with
a RetryOrchestrator module:
- ErrorClassifier — maps HTTP codes, gRPC status, and vendor error bodies to the four buckets; vendor-specific rules live in config, not agent prompts.
- BackoffScheduler — full jitter per tool class;
respects
Retry-After; integrates with rate limiter buckets. - IdempotencyStore — Redis-backed key registry per run; 24h TTL; status polling for ambiguous writes.
- BudgetGuard — per-step and per-run counters;
emits
RETRY_BUDGET_EXHAUSTEDevents. - RetryAudit — every attempt logged to compliance trail with key, latency, and outcome.
Results after six weeks: duplicate shipment rate 28% → 1.1%, p95 agent run time during carrier outages 47s → 11s (fewer useless retries + faster fallback), support tickets for “double label” 112/week → 4/week. The remaining 1.1% were vendor APIs without idempotency support — queued for broker wrapping.
Technique decision table
| Scenario | Prefer | Avoid |
|---|---|---|
| HTTP 503 on LLM completion | One backoff retry, then fallback ladder | Six same-model retries |
| HTTP 429 with Retry-After | Honor header + rate limiter defer | Immediate retry ignoring header |
| Timeout after write tool | Idempotency status lookup | Blind identical retry |
| HTTP 400 invalid args | Return permanent error to model for replan | Exponential backoff loop |
| Read tool connection reset | Backoff retry up to read budget | Escalate to human on first flake |
| Malformed tool JSON once | Single repair completion retry | Full trajectory restart |
| Global provider outage | Fallback + degrade tier | Per-client retry storm |
Common pitfalls
- Retry without classification — repeats permanent errors and wastes budget.
- No idempotency on writes — duplicate side effects on timeout.
- Fixed retry interval — synchronizes clients into retry storms.
- Silent retries — model context diverges from actual world state.
- Same policy for read and write — writes need stricter caps.
- Retry before fallback — six attempts on a dead endpoint.
- No per-run ceiling — one bad tool traps the session indefinitely.
Production checklist
- Define four-bucket error taxonomy with vendor-specific mappings.
- Implement exponential backoff with full jitter per tool class.
- Generate idempotency keys before first write attempt.
- Support status lookup or reconcile tools for ambiguous timeouts.
- Set per-step and per-run retry budgets with structured exhaustion events.
- Order retries: same-model once, then fallback, then degrade.
- Return structured retry observations to the agent loop.
- Log every attempt with span tags for postmortems.
- Load-test retry behavior under 30% synthetic 503 rate.
- Audit write tools without server-side idempotency quarterly.
Key takeaways
- Classify before retry — transient, rate-limited, permanent, or ambiguous.
- Idempotency keys prevent duplicate commits on timeout.
- Backoff + jitter avoids retry storms; caps bound user wait.
- Budgets force strategy change when retries stop helping.
- Harbor Logistics cut duplicate shipments from 28% to 1.1% with RetryOrchestrator and write-safe policies.
Related reading
- LLM tool error handling explained — structured observations and agent recovery paths
- LLM agent rate limiting and throttling explained — quotas before upstream 429s
- LLM agent model fallback and graceful degradation explained — routing ladders after retry exhaustion
- LLM agent compensating transactions and saga rollbacks explained — undo when retries cannot fix partial failure