Guide

LLM agent compensating transactions and saga rollbacks explained

Harbor Finance shipped a customer-refund agent that followed a sensible four-step playbook on every approved return: verify order eligibility, issue a Stripe refund, create a Zendesk ticket for audit, and post a ledger adjustment in NetSuite. Steps one and two succeeded reliably. Step three failed about 18% of the time — Zendesk rate limits, malformed custom fields, transient 503s. The agent treated the failure as “retry the whole turn.” On retry it called issue_refund again. Stripe’s API is idempotent only when you pass the same idempotency key; the model generated fresh arguments each turn. 31% of failed runs left at least one orphan side effect: a double refund, a refund with no support case, or a ledger entry that did not match billing.

Compensating transactions are the inverse operations that undo committed forward steps when a later step in a multi-tool workflow fails. A saga is the orchestration pattern that runs those forward steps in order, records what succeeded, and walks compensators backward on failure. This sits alongside tool error handling (how you surface failures to the model) and durable execution (how you persist saga state across restarts). Harbor replaced blind retries with an explicit saga ledger, paired every mutating tool with a compensator, and moved irreversible steps behind approval gates. Orphan side-effect rate fell to 2%; mean time to reconcile finance exceptions dropped from 4.2 hours to 22 minutes. This guide covers when sagas beat simple retries, compensator design, choreography vs orchestration, idempotency for rollbacks, the Harbor Finance refactor, a technique decision table, pitfalls, and a production checklist.

Why blind retry breaks multi-step agents

A single-tool agent can often retry safely: read logs, fail, read logs again. Multi-step mutating chains are different. Once step k commits externally, steps 1..k−1 are facts in other systems. Common failure modes:

  • Duplicate forward actions — retry re-executes charge_card or send_email because the model does not know step two already succeeded.
  • Orphan records — refund issued but CRM ticket never created; support has no context, finance has no audit trail.
  • Partial parallel failure — two writes in one parallel batch succeed; a third fails; retry doubles the successful pair.
  • Crash between commit and checkpoint — worker dies after Stripe returns 200 but before the saga ledger records REFUND_COMMITTED.

Generic retry policies assume idempotent reads or single-shot writes. They do not define what “undo” means when undo is a separate API call with its own failure modes.

Saga anatomy for agent workflows

Adapt the distributed-systems saga pattern to LLM agents by making the runtime — not the model — own saga state:

  1. Forward step — a mutating tool call with a declared compensator and idempotency key scope.
  2. Saga ledger entry — append-only record: {saga_id, step_index, tool, idempotency_key, external_ref, status}.
  3. Compensating step — inverse tool invoked in reverse order on failure (e.g. void_refund, delete_draft_ticket).
  4. Terminal statesCOMPLETED, COMPENSATED, COMPENSATION_FAILED (requires human).

The model may still propose the sequence, but the orchestrator enforces: no step n+1 until step n is ledgered; on failure at n, run compensators for n−1..1 before returning an observation to the model.

Choreography vs orchestration

  • Orchestrated saga — central worker owns the ledger and invokes tools. Best for finance, provisioning, and compliance flows where order matters.
  • Choreographed saga — each tool emits events; downstream services subscribe. Rare in current LLM stacks unless you already run event-driven microservices.

Harbor chose orchestration: one saga manager per run, explicit compensator registry, and the LLM only asked to plan when the predefined template did not fit.

Designing compensating tools

A compensator is not “delete the row” hope. Production compensators need:

  • Semantic inverseissue_refund pairs with void_refund(refund_id), not a generic undo_last_action.
  • Idempotency — calling void_refund twice on the same refund_id must be safe (return success if already voided).
  • Partial compensation — some steps cannot be fully undone (email sent, SMS delivered). Mark as IRREVERSIBLE in the tool manifest and run them last or behind human approval.
  • Compensation timeouts — void APIs can hang; sagas need deadlines and escalation to ops, not infinite retry loops.
  • Observation to the model — return structured JSON: which forward steps committed, which compensators ran, which external IDs to reference on the next attempt.

Document compensators in the same JSON Schema bundle as forward tools so evaluators and E2E harnesses can simulate failure at each step.

Harbor Finance refactor walkthrough

Harbor’s target flow after refactor:

  1. verify_return_eligibility (read-only, no compensator)
  2. issue_refund → compensator void_refund
  3. create_support_ticket → compensator close_draft_ticket
  4. post_ledger_adjustment → compensator reverse_ledger_entry
  5. send_customer_confirmation (irreversible; moved to step 5, gated on 2–4 success)

Key changes:

  • Runtime generated stable idempotency_key = saga_id + step_index and injected it into Stripe calls; the model never invented keys.
  • On Zendesk failure at step 3, saga manager voided the refund before returning an error observation — customer saw “refund could not be completed” instead of a silent double charge on retry.
  • Checkpoint after each ledger append tied into durable execution so deploy mid-saga resumed compensating from the last committed step.
  • Dashboard alert on COMPENSATION_FAILED routed to on-call finance ops with full external refs.

Result: duplicate refund incidents 31% → 2% (remaining cases were irreversible email sends before the reorder). Finance reconciliation SLA 4.2 h → 22 min.

Technique decision table

Strategy Best for Weak when Harbor-style signal
Blind retry same turn Read-only tools, single mutating step Multi-step writes with external side effects Retry doubles Stripe charges
Forward-only saga (no compensators) Steps are cheap drafts deletable in one API Payments, shipments, ledger posts Orphan refunds pile up
Full compensating saga Ordered multi-system workflows Every step irreversible (use human gates) 3+ mutating integrations per run
Two-phase commit (2PC) You control all participants SaaS APIs without prepare phase Rare for LLM tool chains
Manual reconciliation queue Low-volume, high-stakes exceptions High throughput automation Fallback when compensation fails

Common pitfalls

  • Letting the model invent idempotency keys — retries are never safe; inject keys from saga_id.
  • Compensators that are not idempotent — double-void creates new finance exceptions.
  • Irreversible steps early in the chain — send email before payment confirms; compensation cannot unsend.
  • Compensating in wrong order — reverse ledger before voiding refund leaves unbalanced books.
  • Hiding compensation from the model — next turn replans from scratch and duplicates forward steps.
  • No COMPENSATION_FAILED playbook — sagas stall in ambiguous state; humans discover it days later.
  • Parallel writes without per-branch sagas — one failure should not compensate unrelated successful branches unless they share a resource key.
  • Missing traces on compensator latency — rollback path is slower than forward path; timeouts fire mid-compensation.

Engineer checklist

  • Inventory every mutating tool; pair with a documented compensator or mark IRREVERSIBLE.
  • Run irreversible steps last or behind human approval.
  • Generate idempotency keys in the runtime (saga_id + step_index), not in prompts.
  • Append saga ledger entries atomically with forward commit acknowledgment.
  • On step failure, run compensators in reverse order before the next model turn.
  • Make every compensator idempotent; test double-invocation in CI.
  • Return structured saga status in tool observations (committed refs, compensation results).
  • Integrate saga checkpoints with durable execution for crash recovery.
  • Alert on COMPENSATION_FAILED with external IDs and run links.
  • Simulate failure at each step in E2E harnesses with fake integrations.
  • Log forward and compensator spans separately in traces.
  • Review saga templates quarterly as new tools join the catalog.

Key takeaways

  • Multi-step agent workflows need explicit undo semantics — retry is not rollback.
  • Compensating transactions pair with forward mutating tools and must be idempotent.
  • Orchestrated sagas fit most LLM stacks better than hoping the model sequences safely.
  • Irreversible actions belong at the end or behind human gates.
  • Harbor cut orphan side effects 31% → 2% with saga ledgers and runtime idempotency keys — not a larger model.

Related reading