Guide

LLM ReAct agent loop explained

Harbor Support's tier-two refund agent had a dangerous blind spot. The model would emit a confident Thought: (“Customer wants cancellation; I should void the charge”) and immediately call cancel_order without ever reading the JSON from check_fulfillment_status. The runtime logged the observation, but the next model turn started from a truncated scratchpad that dropped the shipped: true field. Three chargebacks in one week traced to the same pattern: the assistant acted before it observed.

ReAct (Reasoning + Acting, from Yao et al.) is the interleaved control loop behind most production tool-using agents: the model writes a short reasoning step, selects a tool action, receives an observation from the environment, and repeats until it can answer or hit a guardrail. Unlike batch planners, ReAct lets each observation reshape the very next thought — ideal for support tickets, live lookups, and short multi-hop tasks. This guide covers loop anatomy, scratchpad and parsing design, stop conditions, pairing with function calling and agent memory, the Harbor Support refactor, a technique decision table vs plan-and-execute, pitfalls, and a production checklist.

What the ReAct loop is

A ReAct agent is not a single prompt — it is a runtime contract between the LLM and your orchestrator. Each iteration has four phases:

Thought — the model explains what it knows, what is missing, and what it will try next. Thoughts should be brief; they are for the model's own chain, not user-facing prose.
Action — a structured tool invocation: function name plus arguments, or a legacy Action: search[query] string in research-style prompts.
Observation — the environment returns tool output, API JSON, retrieval chunks, or error text. Observations are facts, not model-generated.
Termination — when the model emits Final Answer: or a provider-native finish tool, the loop stops and the user sees the result.

The scratchpad is the concatenation of prior thought-action-observation triples appended to the prompt each turn. The model never “remembers” tools implicitly — it only sees what you serialize into context.

Loop variant	Action format	Typical use
Native function calling	JSON tool_calls from OpenAI/Anthropic/Gemini APIs	Production agents with schema validation
Text ReAct (paper style)	`Action: tool_name[arg]` parsed by regex	Research repros, models without tool APIs
Hybrid	Thought in text + structured tool_calls	Debuggability with strict execution
Silent ReAct	Tool calls only; reasoning hidden or in separate channel	Latency-sensitive UIs; reasoning models with internal CoT

Designing the scratchpad

Scratchpad quality determines whether ReAct works or hallucinates progress. Treat it as a first-class data structure, not a debug log.

What to include each turn

User goal and immutable constraints (refund policy caps, read-only mode).
Tool catalog summary — names, one-line descriptions, not full OpenAPI dumps every turn (link schemas once in system prompt).
Numbered history of completed triples: Step 3 Observation: {...}.
Running facts block — a bullet list extracted from observations (order ID, shipment status, refund eligibility). Update via template or a cheap summarizer every 2–3 steps.

Observation formatting

Raw API responses waste tokens and bury signals. Normalize observations:

Truncate long bodies with [truncated 2,400 chars] and a hash for retrieval if needed later.
Highlight fields the policy cares about (shipped: true in bold or a top-level summary line).
Prefix errors clearly: Observation (tool error): timeout after 5s so the model does not invent success.
Never let the model paraphrase an observation into the scratchpad without the original JSON attached — paraphrase drift causes wrong actions.

Context budget

ReAct contexts grow linearly with steps. Plan for compression after step 5: replace middle observations with the facts block only, or spill full transcripts to external memory keyed by step ID. Pair with context compression when tickets routinely exceed eight tool hops.

Action parsing and execution

The orchestrator — not the model — executes tools. Parsing must be strict:

Validation pipeline

Parse action into { name, arguments }.
Reject unknown tool names; return observation explaining valid options.
Validate arguments against JSON Schema; reject with schema errors as observation text (the model often self-corrects on the next turn).
Check policy gates server-side: refund amount caps, PII export blocks, destructive actions requiring human approval.
Execute with timeout, idempotency key, and audit log.
Serialize observation back into scratchpad before the next LLM call.

For native function calling, the provider parses tool_calls; your job is enforcing gates and never skipping step 6. A common bug is streaming the assistant message to the user before observations are attached — users see intent without verified facts.

Parallel vs sequential actions

ReAct is traditionally sequential: one action per thought. Some runtimes allow parallel tool calls when actions are independent (fetch CRM + fetch shipping in one turn). Merge observations in deterministic order and instruct the model that parallel results arrive as a single combined observation block.

Stop conditions and guardrails

Unbounded ReAct loops burn budget and escalate risk. Enforce hard stops:

Max steps — typical support flows: 8–12; research agents: 20–30 with compression.
Max wall time — independent of step count; kills hung API calls.
Duplicate action detector — same tool + same args twice without state change triggers escalation or forced final answer.
Final answer schema — require structured output for refunds (amount, reason code, ticket ID) validated before user delivery.
Human-in-the-loop — pause loop when observation contains requires_approval: true from policy middleware.

When max steps is hit, run a recovery synthesizer prompt: given scratchpad + facts block, produce best-effort answer or honest escalation — never silently drop the ticket.

Harbor Support ticket refactor

Harbor's refund agent previously used a single-shot function-calling prompt: the model picked tools in one turn without explicit thoughts. Failures were opaque in logs and hard to regression-test.

After ReAct refactor:

System prompt mandates Thought before every Action except Final Answer; thoughts logged for QA review.
Facts block updated by a template after each observation (order status, payment method, prior refunds).
Policy gate blocks cancel_order when facts show shipped: true — model receives observation explaining the block instead of a silent failure.
Max 10 steps with compression after step 6; full transcripts stored in ticket metadata, not context.
Golden evals assert required tool order on 40 refund scenarios (check status before mutate).

Wrong-side-effect refunds dropped to zero in staging; mean steps per resolved ticket fell from 6.2 to 4.1 because thoughts eliminated redundant duplicate lookups.

Technique decision table

Approach	Best when	Skip when
ReAct interleaved loop	3–8 tool hops; dynamic next step depends on last observation; support and ops bots	15+ step pipelines where scratchpad collapse causes omissions
Plan-and-execute	Long workflows with auditable step lists; M&A and batch doc review	Sub-second single-lookup queries
Single-turn function calling	One or two tools with predictable shape (weather, calculator)	Multi-hop reasoning where order of operations matters
Fixed workflow DAG	Regulated SOPs with rare exceptions	High-variance user goals requiring adaptive tool choice
Agentic RAG only	Answer quality hinges on retrieval iterations, few side-effect tools	Mutating APIs (refunds, deploys, ticket updates)
Tree of Thoughts	Reasoning puzzles without production side effects	Live tool orchestration with cost and safety constraints

Common pitfalls

Observation skipping — next LLM call before tool result is in context; root cause of Harbor's shipped-order cancellations.
Thought theater — verbose reasoning that consumes budget without changing actions; cap thought length in prompt.
Unparseable actions — free-form Action lines that regex cannot handle; prefer native tool_calls in production.
Tool catalog bloat — 40 tools in every turn; route to subsets by intent first.
Trusting thoughts as facts — “I confirmed the order is pending” without a matching observation is a hallucination.
No idempotency — retrying a failed HTTP call duplicates charges; pass idempotency keys from action args.
User-visible scratchpad leaks — streaming thoughts that expose internal policy or customer PII.
Missing eval on tool order — accuracy metrics on final text alone miss dangerous action sequences.

Production checklist

Explicit Thought → Action → Observation cycle enforced by runtime.
Observations appended to scratchpad before every subsequent model call.
Facts block or summary updated after each observation.
JSON Schema validation on all tool arguments.
Server-side policy gates on destructive or financial tools.
Max steps, max wall time, and duplicate-action detection configured.
Recovery synthesizer or human escalation when limits hit.
Full scratchpad archived to ticket/log storage; compressed view in context.
Golden tests assert tool order and gate behavior, not just final strings.
Idempotency keys and timeouts on every external API call.
Thoughts excluded from user stream unless product intentionally shows them.
Metrics: steps per task, tool error rate, observation truncation rate.

Key takeaways

ReAct is a runtime loop, not a prompt trick.
Observations are ground truth; thoughts are hypotheses.
Format and compress the scratchpad or long tasks collapse.
Pair interleaved reasoning with server-side policy gates.
Graduate to plan-and-execute when step count and audit needs grow.