Guide
LLM agent outbound webhook and callback delivery systems explained
Harbor Fulfillment sold an async order-processing agent: customers POST
an order payload, receive a run_id, and expect a callback
when the agent finishes routing, inventory checks, and carrier label
generation. The agent runtime wrote results to an internal database
and assumed customers would poll GET /runs/{id}. During a
traffic spike, polling clients backed off. Warehouse systems never
received completion events. Orders sat in
agent_pending for hours.
Missed callbacks accounted for 37% of SLA breaches
before anyone built outbound webhook delivery as first-class
infrastructure.
Outbound webhook and callback delivery is how async agents push completion, failure, and progress events to customer-owned HTTPS endpoints — with signing, retries, and deduplication so receivers can trust and process events exactly once. This is the mirror of inbound webhook triggers (vendor → agent) and distinct from streaming token delivery (browser → user). This guide covers subscription registries, signed event envelopes, delivery queue topology, retry and backoff policy paired with idempotent event IDs, the Harbor Fulfillment refactor, a technique decision table, pitfalls, and a production checklist tied to dead-letter handling.
Why agents need outbound delivery, not “poll when ready”
Polling works for demos. Production integrations fail when customers forget to poll, poll too slowly, or poll the wrong shard after a deploy. Agent runs can take seconds to minutes; warehouse and billing systems need push notification semantics.
Outbound callbacks also decouple your agent fleet from customer availability. You enqueue delivery attempts, retry on transient failures, and surface permanent failures to ops — instead of losing the outcome because nobody polled within a TTL window.
- Completion events —
run.succeeded,run.failedwith structured output and trace links. - Progress events (optional) —
run.step_completedfor long workflows; throttle to avoid webhook storms. - Human-in-the-loop events —
run.approval_requiredwhen escalation gates pause the run. - Delivery guarantees — at-least-once
delivery with signed
event_idfor receiver-side dedup; never claim exactly-once across the public internet.
Subscription registry and endpoint validation
Store per-tenant webhook subscriptions: URL, subscribed event types, signing secret version, disabled flag, and optional IP allowlists. Validate URLs at registration time — not on first delivery.
Registration checks
- HTTPS only — reject
http://except in explicit dev sandboxes with tenant flag. - No private IPs — block SSRF to
169.254.169.254,10.0.0.0/8, and metadata endpoints unless you operate a VPC connector product. - Challenge handshake — POST
endpoint.verifywith a nonce; customer must echo it in response body or header before subscription activates. - Secret rotation — support two active
signing secrets during rotation; document header that identifies
secret_version.
Bind subscriptions to the same tenant namespace used in multi-tenant isolation so a run for tenant A never delivers to tenant B’s URL even if misconfigured routing slips through.
Signed event envelope
Every outbound POST uses a versioned JSON envelope customers can verify before parsing business payloads.
{
"id": "evt_8Kp2mQx9vL",
"type": "run.succeeded",
"created_at": "2026-06-12T14:22:01.384Z",
"tenant_id": "ten_harbor_acme",
"data": {
"run_id": "run_3f9a…",
"status": "succeeded",
"output": { "label_url": "https://…", "carrier": "UPS" },
"duration_ms": 8420
}
}
Sign the raw request body with HMAC-SHA256 using the subscription secret. Standard headers:
X-Webhook-Id: evt_8Kp2mQx9vL(same asidin body)X-Webhook-Timestamp: 1718204521(Unix seconds)X-Webhook-Signature: v1=base64(hmac)
Document that receivers must reject timestamps older than five minutes
to block replay attacks. The signed string is typically
timestamp + "." + raw_body. Never sign parsed-then-reserialized
JSON — whitespace changes break verification.
Delivery queue architecture
When a run reaches a terminal or notify-worthy state, emit an outbox record in the same transaction as the run status update. A dispatcher worker reads the outbox and enqueues delivery jobs per matching subscription.
Pipeline stages
- Outbox insert —
(event_id, run_id, event_type, payload_json)committed with run state. - Fan-out — for each active subscription on
event_type, create a delivery row(delivery_id, event_id, endpoint_url, attempt=0). - HTTP dispatch — worker POSTs with timeouts (connect 3s, read 30s), follows redirects only if policy allows (many teams disable redirects entirely).
- Outcome classification — 2xx = success; 408/429/5xx = retryable; 4xx (except 429) = terminal unless misconfiguration playbook says otherwise.
- DLQ / disable — after max attempts, move to DLQ and optionally auto-disable endpoint after repeated 410/404.
Separate delivery queues from inbound trigger queues so a customer endpoint outage does not block new agent runs from starting.
Retry, backoff and idempotency
Outbound delivery is at-least-once. Customers must deduplicate on
event_id (or X-Webhook-Id). Your side must
not skip retries because “we probably delivered it” on
ambiguous timeouts.
Retry policy (typical production)
- Attempts — 8–12 over 24–72 hours for completion events; fewer for high-volume progress ticks.
- Backoff — exponential with jitter per
retry guidance;
honor
Retry-Afterwhen present. - Same payload — retries must send identical
body and
event_id; never mint a new id per attempt. - Success definition — only HTTP 2xx from customer endpoint; 204 is fine; document that empty bodies are OK.
Store first_attempt_at, last_attempt_at,
last_status_code, and response body snippet (truncated)
for support. Expose a GET /events/{id}/deliveries debug
API for enterprise tenants.
Receiver contract you document for customers
Ship a one-page integration guide: verify signature, dedupe on
event_id, return 2xx quickly, process async if needed.
// Receiver pseudocode
const raw = await request.text();
verifyHmac(raw, headers['X-Webhook-Signature'], secret);
const evt = JSON.parse(raw);
if (await db.seen(evt.id)) return 200;
await db.markSeen(evt.id);
queue.process(evt); // heavy work off hot path
return 200;
Warn customers that returning 200 before durable processing is fine only if they accept rare loss on crash; otherwise persist then 200. Your retries will stop; their duplicate handling must still be idempotent for the business action (create shipment, charge card).
Harbor Fulfillment refactor walkthrough
Harbor replaced poll-only with outbound delivery:
- Subscription API — CRUD on endpoints with SSRF checks and verify handshake; secrets in vault per tenant.
- Outbox on run FSM — terminal transitions
insert
run.succeeded/run.failedrows in Postgres outbox; dispatcher polls or uses LISTEN/NOTIFY. - Delivery worker pool — isolated from run executors; per-endpoint concurrency cap of 2 to avoid stampeding weak customer servers.
- Metrics and alerting —
delivery_success_rate,delivery_latency_p95, endpoints auto-paused after 50 consecutive failures.
Missed-callback SLA breaches fell from 37% to 4.1% (remainder: customers returning 200 before their own queue crashed). Support tickets for “agent finished but WMS empty” dropped 112/week to 14/week. p95 time-to-customer-awareness improved from 18 minutes (poll interval) to 4.2 seconds after run completion.
Technique decision table
| Approach | Strengths | Weaknesses | Best for |
|---|---|---|---|
| Client polling only | Simple for demos | Missed events, stale state, load on status API | Internal tools, prototypes |
| Outbound webhooks (recommended) | Push semantics, integrates with customer automation | Requires signing, retries, DLQ, docs | B2B async agents, fulfillment, billing |
| SSE / WebSocket from your API | Low latency to browser dashboards | Not a substitute for server-to-server; connection churn | Live run UI, ops consoles |
| Message bus bridge (SNS, Pub/Sub) | Enterprise buyers with existing buses | Extra product surface and IAM | Large enterprise tier |
| Email on completion | Universal fallback | Not machine-actionable; deliverability issues | Human notification alongside webhooks |
Common pitfalls
- New event_id per retry — customers double-ship orders; keep id stable.
- Signing reserialized JSON — verification fails randomly; sign raw bytes.
- No timestamp tolerance — replay window stays open forever or legit events get rejected.
- Blocking run workers on delivery — slow customer URLs stall the whole fleet; async queue.
- Treating 200 as delivery without dedup docs — customers must handle at-least-once.
- SSRF at registration — attacker registers
metadata.google.internal. - Shared delivery queue with inbound — outbound backlog blocks new runs.
- Progress webhook flood — one event per tool call DDoSes customer; batch or sample.
Production checklist
- Subscription CRUD with HTTPS-only, SSRF blocklist, and verify handshake.
- Versioned signing secrets with documented rotation path.
- Outbox pattern: emit events in same transaction as run terminal state.
- Fan-out delivery rows per subscription; isolate delivery worker pool.
- Stable
event_idacross all retry attempts. - Exponential backoff with jitter; honor
Retry-After. - Classify 4xx vs 5xx; DLQ and optional endpoint auto-disable.
- Customer docs: verify signature, dedupe on id, return 2xx quickly.
- Debug API or dashboard: delivery attempts per event.
- Metrics: success rate, latency p95, DLQ depth, disabled endpoints.
Key takeaways
- Async agents need push callbacks — polling is not a production integration contract.
- Sign raw bodies with timestamp bounds — receivers trust events before acting.
- Deliver at-least-once with stable event IDs — customer dedup is mandatory.
- Isolate delivery from run execution — customer outages must not stall agents.
- Harbor Fulfillment cut missed callbacks from 37% to 4.1% with outbox, signed webhooks, and retry discipline.
Related reading
- LLM agent inbound webhook and async job queue explained — signature verification and durable triggers into agents
- LLM agent retry and backoff explained — backoff with jitter for delivery workers
- LLM agent idempotency and deduplication explained — event_id semantics mirrored on the receiver
- LLM agent dead letter queue explained — triage for permanently failing endpoints