Guide
LLM agent subagent delegation explained
Harbor Analytics’ private-market due-diligence agent was a single ReAct loop with forty-seven tools: SEC EDGAR search, news APIs, spreadsheet export, CRM updates, and a final memo writer. A typical engagement pulled twelve 10-K filings, six earnings transcripts, and two hundred news articles into one growing message thread. By step nineteen the parent context exceeded 180K tokens, the context budget allocator started evicting early tool results, and the model cited a revenue figure from the wrong fiscal year. Human reviewers flagged 39% of memos for at least one factual error tied to context confusion or tool misuse.
Subagent delegation splits work: a supervisor agent plans and synthesizes; specialized child agents run in isolated contexts with narrower tool allowlists, fixed output schemas, and independent budgets. The parent sees only structured summaries — not every raw PDF chunk. This guide covers when to delegate versus call a tool inline, spawn-and-wait versus parallel fan-out, permission scoping, parent-child trace linking, durable child runs, failure propagation, the Harbor Analytics refactor, a technique decision table, pitfalls, and a production checklist.
Delegation vs inline tools
Not every multi-step task needs a subagent. A subagent is justified when the
subtask has its own exploration loop (many tool calls whose
intermediate reasoning should not pollute the parent), a different skill
profile (code execution vs legal summarization), or a security
boundary (child may read customer PII; parent may not). Use a direct tool
call when the operation is atomic: lookup_ticker,
send_slack_message, run_sql_query with a bounded result.
Rule of thumb: if completing the subtask would add more than three tool rounds or more than ~8K tokens of raw observations to the parent, delegate. If the subtask shares the parent’s goal and tools, inline is cheaper and easier to debug.
Anti-patterns
- Subagent per tool call — overhead dominates; use tools.
- Subagent as a fancy prompt wrapper — if there is no loop, use structured output instead.
- Deep nesting without budgets — supervisor → researcher → fetcher → parser stacks explode cost.
Supervisor spawn patterns
Production systems converge on a small set of orchestration shapes:
Spawn-and-wait (synchronous delegation)
Parent calls delegate_research(task_spec); runtime starts a child
run, blocks until the child returns a result envelope, then
continues. Simple to reason about; parent wall-clock includes child runtime.
Best for one-off subtasks under a few minutes.
Parallel fan-out (map-reduce)
Parent spawns N children with disjoint scopes (one per competitor, one per document batch), waits on all handles, then merges. Requires explicit merge logic — often a second model call with only child summaries in context. Pair with concurrency caps so a enthusiastic supervisor does not launch fifty simultaneous research agents.
Pipeline handoff
Child A produces a typed artifact (extracted table JSON); Child B consumes it. The parent orchestrates stage gates but does not re-ingest raw text. Works well for ETL-style agent workflows.
Long-running child with callback
Child runs as a
durable workflow
for hours; parent checkpoints in waiting_child and resumes on
webhook. Essential when subtasks include human review inside the child.
Context isolation and result envelopes
The main benefit of delegation is context firewalling. The child retains full message history for its own loop; the parent receives only a contract-shaped payload:
{
"child_run_id": "run_child_7c2ā¦",
"status": "completed",
"task_spec_hash": "b91eā¦",
"summary": "Acme Corp FY2025 revenue $4.2B (+12% YoY) per 10-K Item 8.",
"structured_facts": [
{ "field": "revenue_fy2025", "value": 4200000000, "unit": "USD",
"source": "sec://0001234567-25-000042#item8", "confidence": 0.97 }
],
"citations": ["sec://0001234567-25-000042"],
"tokens_used": { "in": 84000, "out": 3200 },
"warnings": ["Q3 transcript unavailable; used 10-K only"]
}
Enforce a maximum summary length (tokens or characters) at the runtime boundary.
Reject child outputs that omit required fields or exceed bounds — trigger
one repair pass inside the child before surfacing failure to the parent.
Optionally store the child’s full trace in object storage; link by
child_run_id for auditors without loading it into the parent prompt.
Summarization inside the child should be loss-aware: separate “facts with citations” from “narrative prose.” Parents synthesize narrative; they should not re-derive numbers from prose alone.
Permission scoping and tool allowlists
Each subagent profile declares a capability manifest:
- Tool allowlist — e.g.
edgar_search,fetch_filing_sectiononly; nocrm_write. - Data scope — tenant ID, folder prefix, row-level filters injected by the runtime, not chosen by the model.
- Model tier — cheap model for extraction children; capable model for supervisor synthesis.
- Mutability class — read-only children by default; write tools require elevated profile and often HITL.
The parent’s delegate_* tool should accept only a
profile_id and structured task_spec — not raw tool
lists from the model. Prevents a compromised or confused supervisor from spawning
a child with admin credentials. Align with
sandbox execution
when children run code.
Budget allocation and termination
Subagents need independent ceilings enforced by the runtime, not the prompt:
- Token budget — hard stop when child exceeds
max_tokens_in + max_tokens_out; return partial envelope withstatus: budget_exceeded. - Step budget — max ReAct iterations per child;
surfaces
status: step_limitso parent can narrow the task. - Wall-clock timeout — kill runaway children; parent decides retry vs escalate.
- Cost attribution — tag child spend with
parent_run_idfor chargeback and trace dashboards.
The parent’s own termination policy should count delegated work: either pre-reserve budget for expected children or fail closed when remaining budget cannot cover a spawn request.
Parent-child tracing and debugging
Nested agents fail opaquely without hierarchy in your observability backend:
- Propagate
trace_idfrom parent; assign distinctrun_idper child. - Emit span events:
child_spawned,child_completed,child_failedwith profile, budgets, and envelope digest. - Surface a tree UI: operators expand the supervisor span to see which child hallucinated a citation.
- Log
task_spec_hashso replaying a child does not require reconstructing the parent’s full chat.
When a child fails, return structured errors to the parent
(error_code, retryable, user_message) rather
than a stack trace string the model might misinterpret as ground truth.
Failure propagation and partial results
Parallel fan-out rarely achieves 100% child success. Parent merge logic should handle:
- Partial success — memo proceeds with explicit gaps (“Competitor B data unavailable”).
- Retry policy — retry transient child failures once with narrowed scope; do not blindly re-spawn identical tasks.
- Supervisor escalation — after two child failures on the same subtask, pause for human routing.
- Idempotent child spawns — key
parent_run_id + subtask_idso resume does not duplicate expensive EDGAR pulls.
Avoid “best effort” silent drops: if a child fails, the parent prompt must include the failure reason in the merge context so the final output does not imply completeness.
Harbor Analytics refactor
Harbor replaced the monolithic due-diligence loop with a three-tier design:
- Supervisor — plans sections, spawns children, writes
final memo; tools:
delegate_*,merge_facts,export_memoonly. - Filing researcher — read-only SEC tools, returns
structured_factsenvelope per company per fiscal year. - News sentiment researcher — news APIs with 90-day window cap; returns theme tags and cited headlines, not full article bodies.
Parallel fan-out across four target companies cut wall-clock research phase 38 → 14 minutes. Parent context at memo-writing step 182K → 41K tokens (78% reduction). Human-flagged factual errors per memo 39% → 11%; on spot audits, verified claim accuracy 61% → 89%. Total token spend per engagement rose ~12% (extra supervisor merge calls) but reviewer rework hours fell enough to net positive margin on fixed-fee engagements.
Technique decision table
| Need | Prefer | Avoid |
|---|---|---|
| Single API lookup | Inline tool | Subagent spawn |
| Multi-document research | Read-only child + fact envelope | Parent ingests all PDFs |
| Compare N entities | Parallel children + merge | Sequential monolithic loop |
| Untrusted code execution | Sandboxed child profile | Parent with shell tool |
| Hour-long sub-workflow | Durable child + parent wait state | Blocking HTTP to child |
| Debug wrong citation | Child trace tree by run_id | Re-run entire supervisor |
| Strict output schema | Child with grammar/JSON validation | Free-form prose to parent |
Common pitfalls
- Returning raw child chat to parent — defeats isolation; use envelopes.
- Unbounded parallel spawns — rate limits and cost spikes; cap concurrency.
- Same tool allowlist on parent and child — no security or focus benefit.
- No task_spec versioning — cannot reproduce or audit child inputs.
- Parent rewrites child numbers — merge from
structured_facts, not prose. - Missing failure in merge — silent incomplete memos.
- Orphan child runs — parent crashes; children keep burning budget; tie to parent lifecycle.
- Depth > 2 without strong need — prefer flat supervisor + specialists.
Production checklist
- Define delegate-vs-inline heuristics (tool rounds, token volume, security).
- Ship typed result envelopes with citations, budgets, and status codes.
- Enforce per-child token, step, and wall-clock limits in the runtime.
- Scope tool allowlists and data filters by
profile_id, not model choice. - Propagate
trace_id; assign uniquerun_idper child. - Cap parallel spawns; queue excess work with backpressure.
- Idempotency-key child spawns on parent resume.
- Handle partial fan-out failures explicitly in parent merge prompts.
- Store full child traces out-of-band; link from parent span.
- Attribute child cost to
parent_run_idfor dashboards. - Validate child output schema before parent continues.
- Cancel or orphan-guard children when parent enters
cancelled.
Key takeaways
- Delegate when subtasks need their own loop, not for every tool call.
- Context isolation via envelopes keeps parents small and reduces cross-contamination.
- Harbor cut parent context 78% and raised verified accuracy 61% → 89%.
- Permission scoping and budgets belong in the runtime, not the system prompt.
- Trace hierarchies make nested failures debuggable without rerunning everything.
Related reading
- Multi-agent orchestration — broader coordination patterns beyond parent-child
- Durable agent execution — long-running children and parent wait states
- Agent observability — parent-child span trees and cost attribution
- ReAct agent loop — what each child runs internally