Guide

LLM agent idempotency key and request deduplication systems explained

Harbor Payments wired an agent to process Stripe charge.refunded webhooks: verify the event, match the customer, post a ledger credit, and send a confirmation email. Stripe retried the webhook during a deploy blip. The queue consumer had no ingress dedup. The agent runtime started a second run with a fresh run_id. The refund tool had no idempotency key. Finance discovered duplicate credits on the same charge_id three days later. Duplicate refund attempts hit 44% of disputed tickets before anyone treated deduplication as infrastructure, not prompt engineering.

Idempotency and request deduplication ensure that the same logical trigger — webhook, cron tick, user click, or tool write — produces one side effect even when the network delivers duplicates, workers crash mid-flight, or retries replay after ambiguous timeouts. This is not tool result caching (reuse read payloads) and not saga compensation (undo a committed write). Dedup answers: “have we already accepted and processed this exact intent?” This guide covers dedup layers from ingress through webhook queues and cron triggers to write-tool brokers, idempotency record lifecycle, response replay, the Harbor Payments refactor, a technique decision table, pitfalls, and a production checklist tied to durable run state.

Why agents need dedup beyond “retry carefully”

Distributed systems are at-least-once by default. Webhook vendors, message queues, load balancers, and mobile clients all retry on timeout. Agent loops add another multiplier: a model may call the same write tool twice in one trajectory after a reflection pass, and a crashed worker may resume from a checkpoint and re-execute a step.

Without explicit dedup, each duplicate delivery looks like a new legitimate request. Prompts that say “only refund once” do not survive tool-level retries, parallel workers, or model replanning. Production agents need deterministic keys and authoritative stores that reject or replay duplicates before money moves.

Ingress dedup — same webhook event_id or Idempotency-Key header must not enqueue two jobs.
Run dedup — same trigger fingerprint must not start two concurrent agent runs for one business action.
Step dedup — checkpoint resume must skip already-committed tool steps.
Write-tool dedup — external APIs must see the same idempotency key on every retry of one logical mutation.

Idempotency key design

A good key is stable across retries of the same intent, unique across different intents, and scoped to tenant so keys never collide across customers.

Key sources (pick per layer)

Vendor-supplied — Stripe event.id, GitHub X-GitHub-Delivery, preferred when trustworthy and globally unique.
Client-supplied — Idempotency-Key header from the caller; validate format and length; reject missing keys on write endpoints.
Server-derived — hash of (tenant_id, trigger_type, canonical_payload) when the client cannot provide keys.
Agent-derived — hash of (run_id, step_index, tool_name, canonical_args) for write tools inside a trajectory.

Canonicalization rules

Keys must not change because JSON key order flipped or a float printed as 10 vs 10.0. Normalize: sort object keys, stringify numbers consistently, strip fields marked non_idempotent in the tool schema (timestamps, random request IDs injected by the runtime).

dedup_key = sha256(tenant_id + ":" + layer + ":" + canonical_json(intent))

Store keys with a TTL matched to business risk: webhook dedup 72h (vendor retry window), cron tick keys 25h (cover DST edge), write-tool keys 24–48h (reconcile window). Compaction jobs archive succeeded records to cold storage for audit, not replay.

Idempotency record lifecycle

Treat each key as a small state machine in a durable store (Redis with AOF, Postgres, or DynamoDB). Never rely on in-memory maps across workers.

ACQUIRE (pending) — insert key with status pending using a conditional write. If insert fails, another worker owns the work: fetch existing record.
EXECUTE — run the agent step or call the external API with the same key on every attempt.
COMMIT (succeeded) — store HTTP status, response body hash, and completed_at. Future duplicates replay this response without re-executing.
FAIL (failed) — terminal errors (400, business rejection) mark failed so identical retries return the same failure envelope, not a second attempt.
EXPIRE — after TTL, a new key may run; document that late duplicates after expiry are a known edge case.

For long agent runs, split records: one ingress key per webhook, one run lease key, and per-step keys inside checkpoint WAL. A single global key per customer is too coarse; a key per HTTP request without tenant scope is unsafe.

Four dedup layers in production agents

Layer 1: Ingress (webhook / API)

Verify signatures first, then check event_id in a dedup table before enqueueing. Return 200 OK on duplicate ingress even when skipping work — vendors stop retrying only when they see success. Log dedup_hit=true for metrics.

Layer 2: Queue consumer

Message brokers may deliver twice. Consumer uses message_dedup_key = hash(queue, event_id) with a short lease. Pair with DLQ handling so poison messages do not bypass dedup on infinite redelivery.

Layer 3: Run scheduler

Before spawning a worker, acquire run_lease(trigger_fingerprint). Cron overlap guards and webhook bursts both need this. If a run is in_progress, coalesce into the existing run or return cached outcome for read-only triggers.

Layer 4: Write-tool broker

Wrap non-idempotent vendor APIs. The broker owns the key, calls the vendor, persists the response, and returns a normalized observation to the model. On ambiguous timeout, status poll with the same key — never mint a new key for the same step. Tools without vendor idempotency support should be flagged requires_human_on_retry in the manifest.

Response replay and the agent loop

When a duplicate arrives after success, return the same structured observation the model would have seen the first time. Do not start a new planner turn that might choose a different tool path.

{
  "ok": true,
  "data": { "refund_id": "re_3Px9…", "amount_cents": 4200 },
  "_idempotency": {
    "replay": true,
    "key": "stripe_evt_1Ox…",
    "first_completed_at": "2026-06-12T09:14:02Z"
  }
}

The _idempotency.replay flag lets tracing distinguish fresh work from dedup hits. For user-visible actions, surface “already processed” in the final reply when replay occurs at ingress, so support staff do not interpret silence as failure.

Harbor Payments refactor walkthrough

Harbor implemented dedup at all four layers:

IngressGuard — Stripe event.id in Redis SET with 72h TTL; 200 on duplicate before queue touch.
ConsumerDedup — SQS visibility timeout + processed_messages table; idempotent enqueue to agent scheduler.
RunLease — charge_id + event_type fingerprint; one active refund agent per charge.
RefundBroker — wraps ledger API with server-derived keys; ambiguous timeouts poll GET /refunds?idempotency_key=… before returning REFUND_STATE_UNKNOWN to the model.

Duplicate refund rate on disputed tickets fell from 44% to 2.3% (remainder: vendor APIs without status lookup, now queued for broker wrapping). Support credits for double-refund clawbacks dropped 89/week to 6/week. p95 webhook-to-confirmation latency improved slightly because duplicate ingress returned immediately without spawning runs.

Technique decision table

Approach	Strengths	Weaknesses	Best for
Prompt-only (“do not repeat”)	Zero infra	Fails on retries, duplicates, reflection	Never in production writes
Ingress dedup only	Stops webhook storms	Misses in-run duplicate tool calls	Read-only notification agents
Full layered dedup (recommended)	Covers network, queue, run, tool	Requires key discipline and stores	Payments, inventory, ticketing writes
Tool result caching	Cuts duplicate reads	Does not prevent double writes	CRM/search alongside dedup
Saga + compensating transactions	Recovers multi-step failures	Does not prevent duplicate entry	Multi-service workflows after dedup keys exist

Common pitfalls

Keys without tenant_id — tenant A’s replay returns tenant B’s refund record.
New key on every retry — defeats the purpose; ambiguous timeouts become duplicate writes.
404 on duplicate ingress — vendors retry forever; always 200 when safely skipped.
Pending records without TTL — crashed workers leave keys stuck; need lease expiry and sweep.
Replaying different HTTP status codes — duplicate must return the same status and body as the first success.
Dedup store single-region — failover without replicated dedup re-opens the duplicate window.
Hashing non-canonical JSON — same intent, two keys, two charges.
Confusing dedup with locking — idempotency prevents duplicate effects; distributed locks serialize concurrent different intents on the same resource.

Production checklist

Assign a dedup key source per trigger type (vendor id, client header, or server hash).
Canonicalize payloads before hashing; document excluded fields.
Implement ingress dedup before enqueue; return 200 on safe duplicates.
Add consumer-level dedup for at-least-once queues.
Acquire run leases on trigger fingerprint before spawning workers.
Wrap write tools in a broker that enforces keys and response replay.
Use pending/succeeded/failed lifecycle with conditional acquire.
On ambiguous timeout, poll status with the same key — never auto-retry with a new key.
Return _idempotency metadata to the agent loop and traces.
Metric: ingress dedup hit rate, duplicate write prevented count, stuck-pending sweeps.

Key takeaways

Dedup is infrastructure — prompts cannot make distributed retries safe.
Layer ingress, queue, run, and write-tool dedup — each failure mode needs its own key scope.
Replay the first successful response — do not re-execute side effects on duplicate delivery.
Harbor Payments cut duplicate refunds from 44% to 2.3% with four-layer keys and a refund broker.
Pair dedup with retries and caching — complementary, not interchangeable.