Guide
LLM ReAct agent loop explained
Harbor Support's tier-two refund agent had a dangerous blind spot. The model
would emit a confident Thought: (“Customer wants cancellation; I
should void the charge”) and immediately call cancel_order
without ever reading the JSON from check_fulfillment_status. The
runtime logged the observation, but the next model turn started from a truncated
scratchpad that dropped the shipped: true field. Three chargebacks in
one week traced to the same pattern: the assistant acted before it observed.
ReAct (Reasoning + Acting, from Yao et al.) is the interleaved control loop behind most production tool-using agents: the model writes a short reasoning step, selects a tool action, receives an observation from the environment, and repeats until it can answer or hit a guardrail. Unlike batch planners, ReAct lets each observation reshape the very next thought — ideal for support tickets, live lookups, and short multi-hop tasks. This guide covers loop anatomy, scratchpad and parsing design, stop conditions, pairing with function calling and agent memory, the Harbor Support refactor, a technique decision table vs plan-and-execute, pitfalls, and a production checklist.
What the ReAct loop is
A ReAct agent is not a single prompt — it is a runtime contract between the LLM and your orchestrator. Each iteration has four phases:
- Thought — the model explains what it knows, what is missing, and what it will try next. Thoughts should be brief; they are for the model's own chain, not user-facing prose.
- Action — a structured tool invocation: function name plus
arguments, or a legacy
Action: search[query]string in research-style prompts. - Observation — the environment returns tool output, API JSON, retrieval chunks, or error text. Observations are facts, not model-generated.
- Termination — when the model emits
Final Answer:or a provider-nativefinishtool, the loop stops and the user sees the result.
The scratchpad is the concatenation of prior thought-action-observation triples appended to the prompt each turn. The model never “remembers” tools implicitly — it only sees what you serialize into context.
| Loop variant | Action format | Typical use |
|---|---|---|
| Native function calling | JSON tool_calls from OpenAI/Anthropic/Gemini APIs | Production agents with schema validation |
| Text ReAct (paper style) | Action: tool_name[arg] parsed by regex |
Research repros, models without tool APIs |
| Hybrid | Thought in text + structured tool_calls | Debuggability with strict execution |
| Silent ReAct | Tool calls only; reasoning hidden or in separate channel | Latency-sensitive UIs; reasoning models with internal CoT |
Designing the scratchpad
Scratchpad quality determines whether ReAct works or hallucinates progress. Treat it as a first-class data structure, not a debug log.
What to include each turn
- User goal and immutable constraints (refund policy caps, read-only mode).
- Tool catalog summary — names, one-line descriptions, not full OpenAPI dumps every turn (link schemas once in system prompt).
- Numbered history of completed triples:
Step 3 Observation: {...}. - Running facts block — a bullet list extracted from observations (order ID, shipment status, refund eligibility). Update via template or a cheap summarizer every 2–3 steps.
Observation formatting
Raw API responses waste tokens and bury signals. Normalize observations:
- Truncate long bodies with
[truncated 2,400 chars]and a hash for retrieval if needed later. - Highlight fields the policy cares about (
shipped: truein bold or a top-level summary line). - Prefix errors clearly:
Observation (tool error): timeout after 5sso the model does not invent success. - Never let the model paraphrase an observation into the scratchpad without the original JSON attached — paraphrase drift causes wrong actions.
Context budget
ReAct contexts grow linearly with steps. Plan for compression after step 5: replace middle observations with the facts block only, or spill full transcripts to external memory keyed by step ID. Pair with context compression when tickets routinely exceed eight tool hops.
Action parsing and execution
The orchestrator — not the model — executes tools. Parsing must be strict:
Validation pipeline
- Parse action into
{ name, arguments }. - Reject unknown tool names; return observation explaining valid options.
- Validate arguments against JSON Schema; reject with schema errors as observation text (the model often self-corrects on the next turn).
- Check policy gates server-side: refund amount caps, PII export blocks, destructive actions requiring human approval.
- Execute with timeout, idempotency key, and audit log.
- Serialize observation back into scratchpad before the next LLM call.
For native function calling, the provider parses tool_calls; your job is enforcing gates and never skipping step 6. A common bug is streaming the assistant message to the user before observations are attached — users see intent without verified facts.
Parallel vs sequential actions
ReAct is traditionally sequential: one action per thought. Some runtimes allow parallel tool calls when actions are independent (fetch CRM + fetch shipping in one turn). Merge observations in deterministic order and instruct the model that parallel results arrive as a single combined observation block.
Stop conditions and guardrails
Unbounded ReAct loops burn budget and escalate risk. Enforce hard stops:
- Max steps — typical support flows: 8–12; research agents: 20–30 with compression.
- Max wall time — independent of step count; kills hung API calls.
- Duplicate action detector — same tool + same args twice without state change triggers escalation or forced final answer.
- Final answer schema — require structured output for refunds (amount, reason code, ticket ID) validated before user delivery.
- Human-in-the-loop — pause loop when observation contains
requires_approval: truefrom policy middleware.
When max steps is hit, run a recovery synthesizer prompt: given scratchpad + facts block, produce best-effort answer or honest escalation — never silently drop the ticket.
Harbor Support ticket refactor
Harbor's refund agent previously used a single-shot function-calling prompt: the model picked tools in one turn without explicit thoughts. Failures were opaque in logs and hard to regression-test.
After ReAct refactor:
- System prompt mandates Thought before every Action except Final Answer; thoughts logged for QA review.
- Facts block updated by a template after each observation (order status, payment method, prior refunds).
- Policy gate blocks
cancel_orderwhen facts showshipped: true— model receives observation explaining the block instead of a silent failure. - Max 10 steps with compression after step 6; full transcripts stored in ticket metadata, not context.
- Golden evals assert required tool order on 40 refund scenarios (check status before mutate).
Wrong-side-effect refunds dropped to zero in staging; mean steps per resolved ticket fell from 6.2 to 4.1 because thoughts eliminated redundant duplicate lookups.
Technique decision table
| Approach | Best when | Skip when |
|---|---|---|
| ReAct interleaved loop | 3–8 tool hops; dynamic next step depends on last observation; support and ops bots | 15+ step pipelines where scratchpad collapse causes omissions |
| Plan-and-execute | Long workflows with auditable step lists; M&A and batch doc review | Sub-second single-lookup queries |
| Single-turn function calling | One or two tools with predictable shape (weather, calculator) | Multi-hop reasoning where order of operations matters |
| Fixed workflow DAG | Regulated SOPs with rare exceptions | High-variance user goals requiring adaptive tool choice |
| Agentic RAG only | Answer quality hinges on retrieval iterations, few side-effect tools | Mutating APIs (refunds, deploys, ticket updates) |
| Tree of Thoughts | Reasoning puzzles without production side effects | Live tool orchestration with cost and safety constraints |
Common pitfalls
- Observation skipping — next LLM call before tool result is in context; root cause of Harbor's shipped-order cancellations.
- Thought theater — verbose reasoning that consumes budget without changing actions; cap thought length in prompt.
- Unparseable actions — free-form Action lines that regex cannot handle; prefer native tool_calls in production.
- Tool catalog bloat — 40 tools in every turn; route to subsets by intent first.
- Trusting thoughts as facts — “I confirmed the order is pending” without a matching observation is a hallucination.
- No idempotency — retrying a failed HTTP call duplicates charges; pass idempotency keys from action args.
- User-visible scratchpad leaks — streaming thoughts that expose internal policy or customer PII.
- Missing eval on tool order — accuracy metrics on final text alone miss dangerous action sequences.
Production checklist
- Explicit Thought → Action → Observation cycle enforced by runtime.
- Observations appended to scratchpad before every subsequent model call.
- Facts block or summary updated after each observation.
- JSON Schema validation on all tool arguments.
- Server-side policy gates on destructive or financial tools.
- Max steps, max wall time, and duplicate-action detection configured.
- Recovery synthesizer or human escalation when limits hit.
- Full scratchpad archived to ticket/log storage; compressed view in context.
- Golden tests assert tool order and gate behavior, not just final strings.
- Idempotency keys and timeouts on every external API call.
- Thoughts excluded from user stream unless product intentionally shows them.
- Metrics: steps per task, tool error rate, observation truncation rate.
Key takeaways
- ReAct is a runtime loop, not a prompt trick.
- Observations are ground truth; thoughts are hypotheses.
- Format and compress the scratchpad or long tasks collapse.
- Pair interleaved reasoning with server-side policy gates.
- Graduate to plan-and-execute when step count and audit needs grow.
Related reading
- LLM function calling explained — schemas, APIs, and the multi-turn call loop
- LLM plan-and-execute explained — when to split planner and executor roles
- AI agents and tool use explained — broader agent patterns and guardrails
- LLM agent memory explained — episodic logs and vector stores beyond the scratchpad