Guide

LLM agent outbound webhook and callback delivery systems explained

Harbor Fulfillment sold an async order-processing agent: customers POST an order payload, receive a run_id, and expect a callback when the agent finishes routing, inventory checks, and carrier label generation. The agent runtime wrote results to an internal database and assumed customers would poll GET /runs/{id}. During a traffic spike, polling clients backed off. Warehouse systems never received completion events. Orders sat in agent_pending for hours. Missed callbacks accounted for 37% of SLA breaches before anyone built outbound webhook delivery as first-class infrastructure.

Outbound webhook and callback delivery is how async agents push completion, failure, and progress events to customer-owned HTTPS endpoints — with signing, retries, and deduplication so receivers can trust and process events exactly once. This is the mirror of inbound webhook triggers (vendor → agent) and distinct from streaming token delivery (browser → user). This guide covers subscription registries, signed event envelopes, delivery queue topology, retry and backoff policy paired with idempotent event IDs, the Harbor Fulfillment refactor, a technique decision table, pitfalls, and a production checklist tied to dead-letter handling.

Why agents need outbound delivery, not “poll when ready”

Polling works for demos. Production integrations fail when customers forget to poll, poll too slowly, or poll the wrong shard after a deploy. Agent runs can take seconds to minutes; warehouse and billing systems need push notification semantics.

Outbound callbacks also decouple your agent fleet from customer availability. You enqueue delivery attempts, retry on transient failures, and surface permanent failures to ops — instead of losing the outcome because nobody polled within a TTL window.

  • Completion eventsrun.succeeded, run.failed with structured output and trace links.
  • Progress events (optional) — run.step_completed for long workflows; throttle to avoid webhook storms.
  • Human-in-the-loop eventsrun.approval_required when escalation gates pause the run.
  • Delivery guarantees — at-least-once delivery with signed event_id for receiver-side dedup; never claim exactly-once across the public internet.

Subscription registry and endpoint validation

Store per-tenant webhook subscriptions: URL, subscribed event types, signing secret version, disabled flag, and optional IP allowlists. Validate URLs at registration time — not on first delivery.

Registration checks

  • HTTPS only — reject http:// except in explicit dev sandboxes with tenant flag.
  • No private IPs — block SSRF to 169.254.169.254, 10.0.0.0/8, and metadata endpoints unless you operate a VPC connector product.
  • Challenge handshake — POST endpoint.verify with a nonce; customer must echo it in response body or header before subscription activates.
  • Secret rotation — support two active signing secrets during rotation; document header that identifies secret_version.

Bind subscriptions to the same tenant namespace used in multi-tenant isolation so a run for tenant A never delivers to tenant B’s URL even if misconfigured routing slips through.

Signed event envelope

Every outbound POST uses a versioned JSON envelope customers can verify before parsing business payloads.

{
  "id": "evt_8Kp2mQx9vL",
  "type": "run.succeeded",
  "created_at": "2026-06-12T14:22:01.384Z",
  "tenant_id": "ten_harbor_acme",
  "data": {
    "run_id": "run_3f9a…",
    "status": "succeeded",
    "output": { "label_url": "https://…", "carrier": "UPS" },
    "duration_ms": 8420
  }
}

Sign the raw request body with HMAC-SHA256 using the subscription secret. Standard headers:

  • X-Webhook-Id: evt_8Kp2mQx9vL (same as id in body)
  • X-Webhook-Timestamp: 1718204521 (Unix seconds)
  • X-Webhook-Signature: v1=base64(hmac)

Document that receivers must reject timestamps older than five minutes to block replay attacks. The signed string is typically timestamp + "." + raw_body. Never sign parsed-then-reserialized JSON — whitespace changes break verification.

Delivery queue architecture

When a run reaches a terminal or notify-worthy state, emit an outbox record in the same transaction as the run status update. A dispatcher worker reads the outbox and enqueues delivery jobs per matching subscription.

Pipeline stages

  1. Outbox insert(event_id, run_id, event_type, payload_json) committed with run state.
  2. Fan-out — for each active subscription on event_type, create a delivery row (delivery_id, event_id, endpoint_url, attempt=0).
  3. HTTP dispatch — worker POSTs with timeouts (connect 3s, read 30s), follows redirects only if policy allows (many teams disable redirects entirely).
  4. Outcome classification — 2xx = success; 408/429/5xx = retryable; 4xx (except 429) = terminal unless misconfiguration playbook says otherwise.
  5. DLQ / disable — after max attempts, move to DLQ and optionally auto-disable endpoint after repeated 410/404.

Separate delivery queues from inbound trigger queues so a customer endpoint outage does not block new agent runs from starting.

Retry, backoff and idempotency

Outbound delivery is at-least-once. Customers must deduplicate on event_id (or X-Webhook-Id). Your side must not skip retries because “we probably delivered it” on ambiguous timeouts.

Retry policy (typical production)

  • Attempts — 8–12 over 24–72 hours for completion events; fewer for high-volume progress ticks.
  • Backoff — exponential with jitter per retry guidance; honor Retry-After when present.
  • Same payload — retries must send identical body and event_id; never mint a new id per attempt.
  • Success definition — only HTTP 2xx from customer endpoint; 204 is fine; document that empty bodies are OK.

Store first_attempt_at, last_attempt_at, last_status_code, and response body snippet (truncated) for support. Expose a GET /events/{id}/deliveries debug API for enterprise tenants.

Receiver contract you document for customers

Ship a one-page integration guide: verify signature, dedupe on event_id, return 2xx quickly, process async if needed.

// Receiver pseudocode
const raw = await request.text();
verifyHmac(raw, headers['X-Webhook-Signature'], secret);
const evt = JSON.parse(raw);
if (await db.seen(evt.id)) return 200;
await db.markSeen(evt.id);
queue.process(evt);  // heavy work off hot path
return 200;

Warn customers that returning 200 before durable processing is fine only if they accept rare loss on crash; otherwise persist then 200. Your retries will stop; their duplicate handling must still be idempotent for the business action (create shipment, charge card).

Harbor Fulfillment refactor walkthrough

Harbor replaced poll-only with outbound delivery:

  1. Subscription API — CRUD on endpoints with SSRF checks and verify handshake; secrets in vault per tenant.
  2. Outbox on run FSM — terminal transitions insert run.succeeded / run.failed rows in Postgres outbox; dispatcher polls or uses LISTEN/NOTIFY.
  3. Delivery worker pool — isolated from run executors; per-endpoint concurrency cap of 2 to avoid stampeding weak customer servers.
  4. Metrics and alertingdelivery_success_rate, delivery_latency_p95, endpoints auto-paused after 50 consecutive failures.

Missed-callback SLA breaches fell from 37% to 4.1% (remainder: customers returning 200 before their own queue crashed). Support tickets for “agent finished but WMS empty” dropped 112/week to 14/week. p95 time-to-customer-awareness improved from 18 minutes (poll interval) to 4.2 seconds after run completion.

Technique decision table

Approach Strengths Weaknesses Best for
Client polling only Simple for demos Missed events, stale state, load on status API Internal tools, prototypes
Outbound webhooks (recommended) Push semantics, integrates with customer automation Requires signing, retries, DLQ, docs B2B async agents, fulfillment, billing
SSE / WebSocket from your API Low latency to browser dashboards Not a substitute for server-to-server; connection churn Live run UI, ops consoles
Message bus bridge (SNS, Pub/Sub) Enterprise buyers with existing buses Extra product surface and IAM Large enterprise tier
Email on completion Universal fallback Not machine-actionable; deliverability issues Human notification alongside webhooks

Common pitfalls

  • New event_id per retry — customers double-ship orders; keep id stable.
  • Signing reserialized JSON — verification fails randomly; sign raw bytes.
  • No timestamp tolerance — replay window stays open forever or legit events get rejected.
  • Blocking run workers on delivery — slow customer URLs stall the whole fleet; async queue.
  • Treating 200 as delivery without dedup docs — customers must handle at-least-once.
  • SSRF at registration — attacker registers metadata.google.internal.
  • Shared delivery queue with inbound — outbound backlog blocks new runs.
  • Progress webhook flood — one event per tool call DDoSes customer; batch or sample.

Production checklist

  • Subscription CRUD with HTTPS-only, SSRF blocklist, and verify handshake.
  • Versioned signing secrets with documented rotation path.
  • Outbox pattern: emit events in same transaction as run terminal state.
  • Fan-out delivery rows per subscription; isolate delivery worker pool.
  • Stable event_id across all retry attempts.
  • Exponential backoff with jitter; honor Retry-After.
  • Classify 4xx vs 5xx; DLQ and optional endpoint auto-disable.
  • Customer docs: verify signature, dedupe on id, return 2xx quickly.
  • Debug API or dashboard: delivery attempts per event.
  • Metrics: success rate, latency p95, DLQ depth, disabled endpoints.

Key takeaways

  • Async agents need push callbacks — polling is not a production integration contract.
  • Sign raw bodies with timestamp bounds — receivers trust events before acting.
  • Deliver at-least-once with stable event IDs — customer dedup is mandatory.
  • Isolate delivery from run execution — customer outages must not stall agents.
  • Harbor Fulfillment cut missed callbacks from 37% to 4.1% with outbox, signed webhooks, and retry discipline.

Related reading