Guide
LLM agent planning and task decomposition explained
Harbor DevOps’ release agent ran as a pure ReAct loop: each step chose the next tool call from scratch. On a routine blue-green deploy it applied a database migration, started canary traffic, noticed a health-check failure, rolled back the app tier — but left the migration applied. The next deploy attempt hit schema drift; on-call engineers spent forty minutes reconciling state. Partial-failure incidents on agent-driven releases hit 34% over ninety days. The model wasn’t dumb; it lacked a plan that encoded dependencies, rollback scope, and verification gates before irreversible steps ran.
Task decomposition breaks a user goal into ordered, checkable subtasks. Planning materializes that decomposition as a durable artifact the runtime can execute, audit, and replan against. Replacing ad-hoc ReAct with a typed plan DAG, explicit preconditions, and post-step validators cut Harbor’s partial-failure rate to 6% and shaved 22 minutes off mean time to safe deploy. This guide covers upfront vs reactive planning, hierarchical task networks, plan schema design, plan-act-verify loops, replanning triggers, integration with subagent delegation and checkpointing, the Harbor DevOps refactor, a technique decision table, pitfalls, and a production checklist.
Planning vs reactive tool loops
A reactive agent asks “what should I do next?” after every observation. That works for short, reversible tasks. It fails when steps have hidden dependencies (migrate before scale-up), irreversible side effects (send customer email, delete S3 prefix), or parallelizable sub-work that one thread cannot hold in working memory. Planning front-loads structure: the agent (or a dedicated planner model) emits a graph of steps with inputs, outputs, success criteria, and rollback hints before expensive execution begins.
Plan-act-verify (PAV)
Production agents rarely run a plan blindly. Plan-act-verify alternates three phases: (1) produce or refresh a plan slice, (2) execute one or more steps, (3) run validators — automated checks, smaller judge models, or human approval — before advancing. Failed verification triggers replanning on the remaining subgraph rather than retrying the same tool call indefinitely. This pairs naturally with loop termination rules: stagnation detectors watch for plans that oscillate between the same two failed steps.
When planning pays off
- Five or more steps with ordering constraints
- Mix of read-only research and mutating actions
- Human-visible milestones (“here is what I will do”)
- Compliance or change-management audit trails
- Multi-agent workflows where children need scoped briefs
Skip heavy planning for single-tool lookups, one-shot summarization, or tightly bounded chat turns where latency dominates and rollback is trivial.
Decomposition strategies
Decomposition is the art of splitting a goal without losing the intent. Common patterns:
Sequential pipeline
Linear steps: gather requirements → draft → review → publish. Simple to execute; poor fit when middle steps can run in parallel.
Hierarchical task network (HTN)
Compound tasks expand into subtasks via methods until only primitive actions remain. “Deploy service” expands to build, test, migrate, canary, promote, each with its own sub-plan. HTN shines when you have a library of reusable methods per domain (finance close, incident response, data pipeline).
Goal-oriented decomposition
The planner states sub-goals (“confirm schema version matches main”) without prescribing tools. An executor model maps goals to tools at runtime. More flexible; harder to audit unless goals are typed and testable.
Map-reduce over documents
Split corpora by section or file, process in parallel (subagents), merge summaries upstream. Planning here is mostly partition boundaries and merge schema — not a deep HTN.
Plan representation and schema
Free-text bullet plans are fine for demos; production needs machine-readable structure stored alongside the run ID.
Minimal plan node fields
id— stable step identifierdescription— human-readable intentdepends_on— list of prerequisite step IDs (DAG edges)action_type— tool, subagent, human_gate, verify_onlysuccess_criteria— predicate or validator refrollback_hint— optional compensating actionstatus— pending, running, succeeded, failed, skipped
Serialize as JSON validated against a versioned schema. Version bumps when you
add fields (e.g. estimated_tokens,
approval_tier). Store the plan in your
checkpoint store
so restarts resume mid-DAG without re-planning from scratch unless inputs changed.
DAG vs checklist
A checklist is a degenerate DAG (total order). Use checklists when steps never parallelize. Use a DAG when independent research threads can fan out and join at a merge step. Cycles are forbidden; if the model emits a cycle, reject at validation and ask for a revised plan.
Token budget for plans
Large plans blow the planner’s context. Cap depth (max three expansion levels), collapse completed subtrees to one-line summaries in the active prompt, and delegate leaf exploration to subagents with narrow briefs tied to plan node IDs. Track plan size in your context budget allocator separately from chat history.
Execution: binding plans to tools and agents
The executor reads the next runnable node (all dependencies succeeded), resolves
action_type, and dispatches. Tool nodes map to registered functions
with argument schemas; subagent nodes spawn children with a brief derived from
the node description plus upstream outputs; human_gate nodes enqueue approval
tasks and pause the run.
Idempotency and side effects
Tag mutating nodes with idempotency keys derived from run_id and
step_id. Replays after crash must not double-charge or double-send.
Read-only nodes can retry freely. Document which steps are
compensatable in rollback_hint — Harbor’s
fix bundled migration + app deploy into one transactional segment with a shared
rollback script referenced from both nodes.
Observability
Emit spans per plan node: planned start, tool latency, validator result, replan events. Dashboards show critical-path duration and which step types fail most. Link to agent tracing so on-call can diff planned vs executed paths.
Replanning triggers
Static plans rot when the world changes. Replan when:
- A validator fails after bounded retries
- New user input contradicts assumptions baked into the plan
- A tool returns
NOT_FOUNDor policy denial on a critical path - Cost or step budget exceeds threshold mid-run
- External event (pager, webhook) marks a dependency stale
Replanning should pass executed history and failure
diagnosis to the planner, not restart from zero. Constrain replans:
“adjust only nodes downstream of step migrate_db” prevents
thrashing. Cap replans per run (e.g. three) before escalating to
human-in-the-loop
or
handoff.
Harbor DevOps refactor (case study)
Harbor’s agent previously interleaved kubectl, Terraform, and Slack tools without an explicit dependency graph. Rollbacks were best-effort prose (“undo last change”) the model interpreted inconsistently.
- Method library — HTN methods for
blue_green_deploy,schema_migration,feature_flag_togglewith fixed subtask order. - Plan validator — rejected plans missing health-check steps before traffic shift or missing rollback_hint on mutating nodes.
- Segment locks — migration + deploy marked one segment; rollback script ran as atomic pair on failure.
- Human gate — production promote required one-click approval
tied to plan node
promote_canary. - Replan scope — failed canary triggered replan from
run_integration_testsonward, not full redeploy from build.
Partial-failure incidents fell 34% → 6%. Mean time to safe deploy dropped 22 minutes because operators saw the plan upfront and validators caught drift before full promotion.
Technique decision table
| Scenario | Prefer | Avoid |
|---|---|---|
| Single lookup or summarize | Direct tool / one-shot | Full HTN planner |
| 3–10 ordered steps, audit needed | Structured plan DAG + PAV | Unbounded ReAct |
| Parallel research on many files | Map-reduce + subagents | One thread sequential read |
| Irreversible mutations | Plan with rollback_hint + human gate | Retry same tool blindly |
| Domain with known playbooks | HTN method library | LLM reinvents steps each run |
| Fast-changing user chat | Lightweight next-step plan | Freeze 20-step upfront plan |
| Long-running workflow (hours) | Checkpointed DAG + durable execution | In-memory plan only |
| Validator failure | Scoped replan downstream | Restart entire run |
Common pitfalls
- Plans as prose only — cannot resume, diff, or validate automatically.
- Over-planning latency — thirty-second planner on a two-step task.
- Missing success criteria — executor cannot tell done from stuck.
- No rollback on mutating steps — Harbor-style partial failures.
- Cyclic dependencies — validate DAG at ingest.
- Plan drift from execution — executed path not written back to store.
- Replan loops — cap replans; escalate to human.
- Giant monolithic plans — delegate leaves to subagents.
Production checklist
- Define versioned plan JSON schema with required node fields.
- Validate DAG acyclicity and dependency refs at plan ingest.
- Require
success_criteriaon mutating and gate nodes. - Attach
rollback_hintor compensating workflow to irreversible steps. - Implement PAV loop with automated validators per critical node.
- Persist plan + status in checkpoint store for durable runs.
- Map plan nodes to executor, subagent, tool, and human_gate dispatchers.
- Enforce idempotency keys on side-effecting tools.
- Cap replans; route to HITL or handoff on exhaustion.
- Emit trace spans per node with planned vs actual timing.
- Collapse completed subtrees in planner context to save tokens.
- Measure partial-failure rate and critical-path duration pre/post.
Key takeaways
- Planning is a durable contract for multi-step agent work, not a chat bullet list.
- Harbor cut partial-failure deploys 34% → 6% with DAG plans and segment rollbacks.
- Plan-act-verify separates execution from validation and enables scoped replans.
- HTN method libraries encode domain playbooks the LLM should not reinvent.
- Bind plans to checkpointing and subagents for long, parallel workflows.
Related reading
- ReAct agent loop — reactive baseline planning improves upon
- Subagent delegation — executing plan leaf nodes in isolated contexts
- Durable agent execution — persisting plan state across restarts
- Agent loop termination — stagnation detection when replans fail