Guide

LLM agent webhook and async job queue systems explained

Harbor Integrations sold “event-driven support agents”: when a Zendesk ticket updated, a webhook fired and an LLM agent drafted a reply, checked refund policy, and sometimes called Stripe. The first version handled webhooks synchronously inside the HTTP request — no queue, no idempotency store. During a Zendesk outage recovery, the vendor replayed three days of ticket events. Harbor's handlers ran each event twice; agents sent 1,400 duplicate customer emails and initiated 41% duplicate side-effect tool calls in a four-hour window before an engineer killed the deployment. After moving to signed ingress, deduplicated job queues, and durable run records, duplicate side effects fell to 3% (almost entirely third-party retries outside the dedup window).

Webhook ingress is how external systems trigger agent work. Async job queues decouple “we received an event” from “we executed a multi-minute agent run with tools.” Together they define reliability, fairness, and tenant safety for event-driven agents. This guide covers verification, idempotency, queue topology, worker tenant binding, scheduling, integration with rate limits and cancellation, the Harbor refactor, a technique decision table, pitfalls, and a production checklist.

Why synchronous webhook handlers fail for agents

A CRUD webhook handler can often finish in milliseconds: validate signature, upsert a row, return 200. Agent runs are different:

  • Duration — tool loops routinely take 30 seconds to several minutes; most SaaS webhooks time out at 10–30 seconds.
  • Retries — providers retry on slow or ambiguous responses; without dedup, retries become duplicate runs.
  • Bursts — Monday-morning ticket floods or GitHub push storms can exceed orchestrator concurrency.
  • Partial failure — the agent may send email (irreversible) then crash before responding 200; the provider retries and sends again.
  • Human-in-the-loop — some events should enqueue a draft, not auto-execute writes.

The correct pattern is ack fast, work async: verify the webhook, persist an idempotent job, return 2xx immediately, process on workers with full tracing and checkpointing.

Webhook ingress: verify, normalize, enqueue

Signature verification

Every ingress route must validate provider signatures (HMAC-SHA256, Ed25519, or vendor-specific schemes) using a per-tenant secret from your secret broker. Reject before parsing large bodies. Rotate secrets with dual-key acceptance windows.

Event normalization

Map vendor payloads to an internal envelope:

{
  "event_id": "evt_zd_88421",
  "source": "zendesk",
  "tenant_id": "tnt_8f2a…",
  "event_type": "ticket.updated",
  "occurred_at": "2026-06-12T09:14:22Z",
  "payload_hash": "sha256:…",
  "dedup_key": "zendesk:ticket:99102:rev:44"
}

Store raw payload bytes (encrypted) for replay debugging, but agents should consume normalized fields so tool code does not depend on Zendesk JSON shape.

Fast ACK contract

HTTP handler responsibilities only: auth, dedup check, insert job row, publish to queue, return 200/202. Never call the model inside the request thread. If enqueue fails, return 5xx so the provider retries — but only after the idempotency row prevents double enqueue on retry.

Idempotency and deduplication

Idempotency is the difference between Harbor's 41% duplicate tool calls and 3%. Implement at two layers:

  • Ingress dedup — unique constraint on (tenant_id, dedup_key) or (tenant_id, event_id). Second delivery returns 200 with the existing job_id.
  • Run-level idempotency — write tools accept Idempotency-Key headers derived from job_id per retry guidance.
  • Side-effect ledger — before sending email or charging cards, check a ledger: “already executed action X for job Y.”

Choose dedup keys carefully. ticket_id alone collapses distinct updates; ticket_id + revision or vendor event_id is safer. Document TTL: some providers replay after 72 hours — Harbor extended dedup retention from 24 hours to 7 days.

Job queue topology for agent workloads

Generic task queues work, but agent jobs have unique needs:

Priority and fairness

Separate queues or weighted priorities: interactive user chat > webhook automations > batch backfills. Within webhooks, apply per-tenant fair queuing so one customer's GitHub monorepo cannot starve others.

Concurrency caps

Limit concurrent runs per tenant, per integration, and globally. A spike in issue.opened events should queue, not spawn 500 sandboxes.

Poison messages and DLQ

After N failures with exponential backoff, move jobs to a dead-letter queue with the last error, payload snapshot, and link to traces. Operators need a “replay DLQ job” button that creates a new job with a fresh idempotency scope if side effects already partially applied.

Delayed and scheduled jobs

Some automations should wait: “if no human reply in 4 hours, agent follows up.” Use a scheduler tier (cron, delayed SQS, or time-wheel) that enqueues standard jobs at run_at. Persist scheduled jobs in the same database as checkpoints so deploys do not lose timers.

Tenant binding on async workers

The highest-risk bug class after dedup failures is processing a job without correct tenant scope — the same failure mode as in multi-tenant isolation. Rules:

  • tenant_id lives on the job row, set at ingress from the webhook routing table (URL path, subdomain, or signed claim) — never from unauthenticated payload fields alone.
  • Workers rehydrate tenant context before any model call, tool dispatch, or memory read.
  • Queue messages are opaque IDs pointing to DB rows, not full payloads with embeddable tenant overrides.
  • Cross-tenant integration tests enqueue jobs for two tenants with colliding external IDs; assert zero cross-reads.

Worker execution loop

A typical worker cycle:

  1. Lease job with visibility timeout (extend heartbeat during long runs).
  2. Create run_id; load tenant context and integration credentials.
  3. Execute agent graph with middleware hooks for policy and redaction.
  4. Checkpoint state after each tool per durable-state design.
  5. Mark job succeeded or failed; release lease.
  6. Optionally POST a callback URL if the integration registered one.

Integrate cancellation: if the ticket was solved while the job queued, a lightweight pre-flight tool should no-op the run before sending customer email.

For user-visible progress, workers can publish events to SSE channels keyed by run_id even though the trigger was a webhook.

Harbor Integrations refactor walkthrough

Harbor's remediation sprint:

  1. Ingress service — dedicated pods; only verify, dedup, enqueue; p99 < 80 ms.
  2. Postgres job table — unique on (tenant_id, dedup_key); status machine pending → running → succeeded|failed|cancelled.
  3. SQS + per-tenant rate tokens — dequeue only when tenant concurrency budget allows.
  4. Side-effect ledger — email and Stripe tools check ledger before execute; retries become no-ops.
  5. 7-day dedup window — covers vendor replay storms.
  6. DLQ dashboard — support replays with one click and mandatory incident note.

Mean time from ticket event to first agent draft increased by 4 seconds (queue wait) but customer-facing duplicate emails dropped by 93%. Zendesk webhook timeout errors went from hundreds per day to zero.

Technique decision table

Approach Latency to start work Duplicate risk When to use
Synchronous HTTP handler calls agent inline Lowest Very high Prototypes only; no write tools
Enqueue on ingress + single shared worker pool Low (+queue wait) Low with dedup Default for SaaS webhook agents
Per-tenant queues + fair scheduling Low Low Multi-tenant platforms with bursty integrators
Event bus (Kafka/NATS) + stream processors Medium Medium (needs consumer idempotency) High volume, many event types, analytics fan-out
Workflow engine (Temporal, Step Functions) Medium Very low Long-running, multi-day agent workflows with timers

Most teams should start with a relational job table plus a managed queue. Adopt Temporal when you have many delayed steps, human approvals, and compensating transactions across days.

Common pitfalls

  • 200 before enqueue commits — crash between ACK and persist loses events; use transactional outbox pattern.
  • Weak dedup keys — collapsing distinct events causes missed automations; over-broad keys cause duplicates.
  • Calling write tools before checkpoint — retry after crash duplicates side effects; ledger first.
  • Unbounded webhook body parsing — large GitHub payloads can OOM ingress; stream-parse or size-cap.
  • Missing tenant on worker dequeue — trusting payload org fields without routing-table lookup.
  • No visibility timeout extension — long runs requeued mid-flight, doubling work.
  • Ignoring provider replay headers — some vendors send X-Request-Id; store it even if you have custom dedup keys.

Production checklist

  • Webhook ingress verifies signatures with per-tenant secrets; rejects unsigned requests.
  • Handler returns 2xx only after idempotent job row + queue publish succeed (outbox).
  • Unique constraint on (tenant_id, dedup_key) or vendor event_id.
  • Workers lease jobs with heartbeats; visibility timeout extends during long runs.
  • tenant_id set at ingress from authenticated routing; rehydrated on every worker step.
  • Write tools use idempotency keys tied to job_id; side-effect ledger enforced.
  • Per-tenant and global concurrency caps integrated with rate-limit middleware.
  • DLQ with operator replay; scheduled/delayed jobs survive deploys.
  • Pre-flight cancellation checks stale events before irreversible tools.
  • Traces link event_id → job_id → run_id for end-to-end debugging.

Key takeaways

  • Never run agents inside webhook HTTP threads — ack fast, queue work, return 2xx.
  • Idempotency is non-negotiable — providers will retry; your design must welcome retries safely.
  • Jobs carry tenant scope — async boundaries are where multi-tenant leaks happen.
  • Side-effect ledgers beat hope — assume workers crash mid-run.
  • Harbor cut duplicate side effects from 41% to 3% with dedup keys, outbox enqueue, and ledgers — not by disabling webhooks.

Related reading