Guide

LLM agent permission scoping and tool approval gates explained

Harbor Platform Engineering shipped an internal “cluster hygiene” agent to help SREs find stale deployments and unused ConfigMaps. The ReAct loop had access to a single mega-tool: kubectl_apply, which wrapped arbitrary YAML. During a Friday afternoon run, the model misread a log snippet suggesting “payment-api looks idle” and scaled the production payment Deployment to zero replicas. Checkout failed for 14 minutes until an on-call engineer rolled back. Post-incident review found 11% of agent sessions in the prior month had executed at least one write mutation the requesting human never intended — including namespace deletes in staging that broke integration tests for two days.

Permission scoping limits which tools an agent can even see and which arguments pass server-side validation. Approval gates pause high-impact actions until a human or policy engine explicitly allows them. Together they turn agent tool access from “give the model root” into a production control plane. This guide covers capability manifests, session and subagent inheritance, tiered gate design, policy engines, integration with human-in-the-loop and sandboxed runtimes, the Harbor Platform refactor, a technique decision table, pitfalls, and a checklist.

Why least privilege matters for tool-using agents

Unlike a chatbot that only emits text, an agent with tools is a programmable actor inside your systems. Threats come from three directions:

  • Model mistakes — wrong resource name, inverted flag, hallucinated ID.
  • Prompt injection — untrusted content in retrieved docs or web pages tricks the model into exfiltrating or destroying data.
  • Over-broad operator intent — a human asks for “clean up old stuff” without specifying blast radius.

Traditional app security assumes deterministic code paths. Agent loops are stochastic: the same prompt can produce different tool sequences. Least privilege shrinks the blast radius of any single bad trajectory. Approval gates add a deterministic checkpoint before irreversible mutations — the same pattern banks use for wire transfers, applied to DELETE, refunds, and production deploys.

Capability manifests and session scopes

A capability manifest is a machine-readable contract listing allowed tools, argument bounds, and environment targets for one agent session. Think IAM policy attached to a session token, not a paragraph in the system prompt.

Manifest structure

Production manifests typically include:

  • Tool allowlist — e.g. list_pods, get_configmap, patch_deployment_replicas — never a generic run_shell.
  • Argument constraints — namespace must match staging-*; replica count patch only increases or only decreases within ±2.
  • Rate and budget caps — max writes per hour, max dollar amount per Stripe refund tool call.
  • Data classification — PII fields redacted from tool responses; certain tables unreachable.
  • Expiry — session scope valid for 30 minutes, then re-auth.

The orchestrator injects the manifest at session start. The tool server re-validates every call against the manifest — never trust the model’s JSON alone. When subagents spawn, child scopes must be a subset of the parent; a research subagent should not inherit delete_namespace because the parent had it for a different phase.

Dynamic scope elevation

Sometimes a session starts read-only and later needs a write. Prefer explicit elevation: the human clicks “Enable staging writes for 15 min” or completes step-up auth. The orchestrator issues a new manifest revision; the old read-only token is revoked. Avoid silently expanding scope because the model asked nicely in chat.

Tiered approval gates

Not every tool call needs a human. Gates should be tiered by impact, latency cost, and reversibility.

TierExamplesGate behavior
0 — AutoRead APIs, search, list resourcesExecute immediately; log to audit trail
1 — PolicyStaging patch, send internal Slack, create draft PRAutomated policy engine checks manifest + args; execute if pass
2 — Async humanProduction replica change, refund > $50, customer email sendQueue approval UI; agent pauses with durable checkpoint
3 — Sync humanDelete production data, wire transfer, IAM role grantBlock until live approver confirms; timeout cancels run

Gate decisions should be attached to the tool call record, not free-floating chat. Store: requested args hash, approver identity, policy version, and TTL. On resume, the executor replays the approved call exactly — the model cannot amend args after approval without triggering a new gate. Pair tier-2+ gates with durable checkpoints so timeouts do not lose work or double-apply.

Policy engine vs human-only

Tier-1 policy engines evaluate structured rules: “patch allowed if namespace label env=staging and field is replicas and delta ≤ 3.” Use Open Policy Agent (Rego), Cedar, or an internal DSL. Humans set policy; machines enforce at millisecond latency. Reserve humans for ambiguous business judgment and tier-3 catastrophes.

Implementation patterns

Split read and write tool surfaces

Harbor’s pre-refactor kubectl_apply combined read and write. The refactor split into get_resource, list_resources, patch_replicas_staging, and apply_manifest_staging — each with a narrow JSON Schema per tool schema design best practices. Production writes required a separate manifest profile most sessions never received.

Server-side enforcement, not prompt pleading

System prompts that say “never delete production” are insufficient against injection and drift. The tool server returns 403 PERMISSION_DENIED with a structured observation the model can reason over. Log denials; spikes indicate mis-scoping or attacks.

Audit and blast-radius drills

Export every tool invocation with manifest ID, gate tier, outcome. Monthly drills: replay injected prompts against staging agents and verify zero tier-3 executions without approval. Cross-reference prompt injection defenses — scoping is the last line when content filters fail.

Harbor Platform Engineering refactor

After the payment-api incident, Harbor rebuilt the hygiene agent around three principles:

  1. Default read-only — new sessions get list/get tools only across all clusters.
  2. Staging write profile — elevation grants patch on namespaces tagged env=staging; production patches require tier-2 ticket linked in manifest.
  3. No generic apply — YAML apply replaced by typed operations with schema validation.

They added a Slack approval bot for tier-2: the agent posts a diff summary; approvers react with a signed emoji. Checkpoints resume the ReAct loop after approval. Results over six weeks:

  • Unauthorized mutations (unapproved writes outside scope) fell from 11% → 0.3% of sessions.
  • Mean time to complete hygiene tasks rose 4.2 → 6.1 minutes — acceptable trade for SRE trust.
  • Zero production outages attributed to the agent post-refactor.
  • On-call survey: confidence in agent tooling 2.1 → 4.4 / 5.

The team noted that over-scoping reads still mattered: an agent that could read Secrets across tenants was re-scoped to per-team vault paths even though reads were tier-0.

Technique decision table

ApproachBest forWeak when
Open tool catalog + prompt rulesLocal demos, read-only researchAny production write path
Static manifest per workflowFixed SOP automations (refund bot, report generator)Exploratory tasks needing scope changes
Tiered gates + policy enginePlatform agents, finance ops, infra hygieneSub-second latency requirements on hot paths
Human-only executionTier-3 irreversible actionsHigh-volume low-risk reads
Separate agent per domainStrong isolation (billing vs infra)Cross-domain workflows without supervisor orchestration

Common pitfalls

  • Mega-tools — one run_sql or execute_code bypasses all scoping.
  • Client-side-only checks — attackers and models call the API directly.
  • Approving summaries instead of args — human sees “scale payment” but approved hash was for a different deployment.
  • Child subagent inherits parent god-mode — delegation without scope shrink.
  • Tier mismatch — treating production delete as tier-1 because “it’s rare.”
  • No denial telemetry — you discover injection weeks later in logs nobody reads.
  • Permanent elevation tokens — defeats time-bounded session design.

Designer and engineer checklist

  • Inventory every tool; eliminate generic shell/SQL/apply wrappers where possible.
  • Define capability manifests per workflow with argument JSON Schema bounds.
  • Enforce manifests server-side on every invocation; return structured denials.
  • Classify tools into tiers 0–3 by reversibility and business impact.
  • Implement policy engine for tier-1; human UI for tier-2+.
  • Bind approval to args hash; reject post-approval mutations.
  • Checkpoint agent state before tier-2+ waits; support timeout cancel.
  • Shrink subagent scopes relative to parent; never widen silently.
  • Audit log manifest ID, gate outcome, approver, and latency.
  • Run monthly injection drills; alert on denial rate spikes.

Key takeaways

  • Agents are actors, not chatbots — scope tools like production service accounts.
  • Manifests + server enforcement beat prompt-only guardrails against mistakes and injection.
  • Tiered gates balance velocity (reads) with safety (writes and deletes).
  • Harbor cut unauthorized mutations 11% → 0.3% after splitting mega-tools and adding staging/production profiles.
  • Subagent delegation requires scope inheritance rules — children get subsets, not copies.

Related reading