Guide
LLM agent permission scoping and tool approval gates explained
Harbor Platform Engineering shipped an internal “cluster hygiene”
agent to help SREs find stale deployments and unused ConfigMaps. The
ReAct loop
had access to a single mega-tool: kubectl_apply, which
wrapped arbitrary YAML. During a Friday afternoon run, the model
misread a log snippet suggesting “payment-api looks idle” and
scaled the production payment Deployment to zero replicas. Checkout
failed for 14 minutes until an on-call engineer rolled
back. Post-incident review found 11% of agent sessions
in the prior month had executed at least one write mutation the
requesting human never intended — including namespace deletes in
staging that broke integration tests for two days.
Permission scoping limits which tools an agent can even see and which arguments pass server-side validation. Approval gates pause high-impact actions until a human or policy engine explicitly allows them. Together they turn agent tool access from “give the model root” into a production control plane. This guide covers capability manifests, session and subagent inheritance, tiered gate design, policy engines, integration with human-in-the-loop and sandboxed runtimes, the Harbor Platform refactor, a technique decision table, pitfalls, and a checklist.
Why least privilege matters for tool-using agents
Unlike a chatbot that only emits text, an agent with tools is a programmable actor inside your systems. Threats come from three directions:
- Model mistakes — wrong resource name, inverted flag, hallucinated ID.
- Prompt injection — untrusted content in retrieved docs or web pages tricks the model into exfiltrating or destroying data.
- Over-broad operator intent — a human asks for “clean up old stuff” without specifying blast radius.
Traditional app security assumes deterministic code paths. Agent loops
are stochastic: the same prompt can produce different tool sequences.
Least privilege shrinks the blast radius of any single
bad trajectory. Approval gates add a deterministic checkpoint before
irreversible mutations — the same pattern banks use for wire
transfers, applied to DELETE, refunds, and production
deploys.
Capability manifests and session scopes
A capability manifest is a machine-readable contract listing allowed tools, argument bounds, and environment targets for one agent session. Think IAM policy attached to a session token, not a paragraph in the system prompt.
Manifest structure
Production manifests typically include:
- Tool allowlist — e.g.
list_pods,get_configmap,patch_deployment_replicas— never a genericrun_shell. - Argument constraints — namespace must match
staging-*; replica count patch only increases or only decreases within ±2. - Rate and budget caps — max writes per hour, max dollar amount per Stripe refund tool call.
- Data classification — PII fields redacted from tool responses; certain tables unreachable.
- Expiry — session scope valid for 30 minutes, then re-auth.
The orchestrator injects the manifest at session start. The tool server
re-validates every call against the manifest —
never trust the model’s JSON alone. When
subagents
spawn, child scopes must be a subset of the parent; a
research subagent should not inherit delete_namespace
because the parent had it for a different phase.
Dynamic scope elevation
Sometimes a session starts read-only and later needs a write. Prefer explicit elevation: the human clicks “Enable staging writes for 15 min” or completes step-up auth. The orchestrator issues a new manifest revision; the old read-only token is revoked. Avoid silently expanding scope because the model asked nicely in chat.
Tiered approval gates
Not every tool call needs a human. Gates should be tiered by impact, latency cost, and reversibility.
| Tier | Examples | Gate behavior |
|---|---|---|
| 0 — Auto | Read APIs, search, list resources | Execute immediately; log to audit trail |
| 1 — Policy | Staging patch, send internal Slack, create draft PR | Automated policy engine checks manifest + args; execute if pass |
| 2 — Async human | Production replica change, refund > $50, customer email send | Queue approval UI; agent pauses with durable checkpoint |
| 3 — Sync human | Delete production data, wire transfer, IAM role grant | Block until live approver confirms; timeout cancels run |
Gate decisions should be attached to the tool call record, not free-floating chat. Store: requested args hash, approver identity, policy version, and TTL. On resume, the executor replays the approved call exactly — the model cannot amend args after approval without triggering a new gate. Pair tier-2+ gates with durable checkpoints so timeouts do not lose work or double-apply.
Policy engine vs human-only
Tier-1 policy engines evaluate structured rules: “patch allowed
if namespace label env=staging and field is
replicas and delta ≤ 3.” Use Open Policy Agent
(Rego), Cedar, or an internal DSL. Humans set policy; machines enforce
at millisecond latency. Reserve humans for ambiguous business judgment
and tier-3 catastrophes.
Implementation patterns
Split read and write tool surfaces
Harbor’s pre-refactor kubectl_apply combined read
and write. The refactor split into get_resource,
list_resources, patch_replicas_staging, and
apply_manifest_staging — each with a narrow JSON
Schema per
tool schema design
best practices. Production writes required a separate manifest profile
most sessions never received.
Server-side enforcement, not prompt pleading
System prompts that say “never delete production” are
insufficient against injection and drift. The tool server returns
403 PERMISSION_DENIED with a structured observation the
model can reason over. Log denials; spikes indicate mis-scoping or
attacks.
Audit and blast-radius drills
Export every tool invocation with manifest ID, gate tier, outcome. Monthly drills: replay injected prompts against staging agents and verify zero tier-3 executions without approval. Cross-reference prompt injection defenses — scoping is the last line when content filters fail.
Harbor Platform Engineering refactor
After the payment-api incident, Harbor rebuilt the hygiene agent around three principles:
- Default read-only — new sessions get list/get tools only across all clusters.
- Staging write profile — elevation grants patch on namespaces tagged
env=staging; production patches require tier-2 ticket linked in manifest. - No generic apply — YAML apply replaced by typed operations with schema validation.
They added a Slack approval bot for tier-2: the agent posts a diff summary; approvers react with a signed emoji. Checkpoints resume the ReAct loop after approval. Results over six weeks:
- Unauthorized mutations (unapproved writes outside scope) fell from 11% → 0.3% of sessions.
- Mean time to complete hygiene tasks rose 4.2 → 6.1 minutes — acceptable trade for SRE trust.
- Zero production outages attributed to the agent post-refactor.
- On-call survey: confidence in agent tooling 2.1 → 4.4 / 5.
The team noted that over-scoping reads still mattered: an agent that could read Secrets across tenants was re-scoped to per-team vault paths even though reads were tier-0.
Technique decision table
| Approach | Best for | Weak when |
|---|---|---|
| Open tool catalog + prompt rules | Local demos, read-only research | Any production write path |
| Static manifest per workflow | Fixed SOP automations (refund bot, report generator) | Exploratory tasks needing scope changes |
| Tiered gates + policy engine | Platform agents, finance ops, infra hygiene | Sub-second latency requirements on hot paths |
| Human-only execution | Tier-3 irreversible actions | High-volume low-risk reads |
| Separate agent per domain | Strong isolation (billing vs infra) | Cross-domain workflows without supervisor orchestration |
Common pitfalls
- Mega-tools — one
run_sqlorexecute_codebypasses all scoping. - Client-side-only checks — attackers and models call the API directly.
- Approving summaries instead of args — human sees “scale payment” but approved hash was for a different deployment.
- Child subagent inherits parent god-mode — delegation without scope shrink.
- Tier mismatch — treating production delete as tier-1 because “it’s rare.”
- No denial telemetry — you discover injection weeks later in logs nobody reads.
- Permanent elevation tokens — defeats time-bounded session design.
Designer and engineer checklist
- Inventory every tool; eliminate generic shell/SQL/apply wrappers where possible.
- Define capability manifests per workflow with argument JSON Schema bounds.
- Enforce manifests server-side on every invocation; return structured denials.
- Classify tools into tiers 0–3 by reversibility and business impact.
- Implement policy engine for tier-1; human UI for tier-2+.
- Bind approval to args hash; reject post-approval mutations.
- Checkpoint agent state before tier-2+ waits; support timeout cancel.
- Shrink subagent scopes relative to parent; never widen silently.
- Audit log manifest ID, gate outcome, approver, and latency.
- Run monthly injection drills; alert on denial rate spikes.
Key takeaways
- Agents are actors, not chatbots — scope tools like production service accounts.
- Manifests + server enforcement beat prompt-only guardrails against mistakes and injection.
- Tiered gates balance velocity (reads) with safety (writes and deletes).
- Harbor cut unauthorized mutations 11% → 0.3% after splitting mega-tools and adding staging/production profiles.
- Subagent delegation requires scope inheritance rules — children get subsets, not copies.
Related reading
- Human-in-the-loop — approval UX and pause/resume patterns
- Sandbox execution — isolating runtime blast radius
- Subagent delegation — parent-child scope inheritance
- Prompt injection defense — content-side controls before tool calls