Guide

LLM agent secrets and credential injection explained

Harbor Platform shipped a billing-reconciliation agent that called Stripe, NetSuite, and an internal ledger API. Engineers seeded the system prompt with STRIPE_SECRET_KEY=sk_live_... so the model could “authenticate properly.” On a failed refund attempt, the model echoed the full key inside the refund_payment tool argument JSON. That span exported to Datadog, appeared in a support engineer’s screen-share recording, and was forwarded to Stripe’s partner error webhook. Security rotated the key within hours, but 14% of production runs in the prior month had at least one secret substring in stored traces or chat logs.

Credential injection keeps secrets out of prompts, model weights, and tool argument payloads the LLM authors. Instead, the runtime resolves opaque credential_ref handles at execution time inside a trusted broker, attaches short-lived tokens to outbound HTTP, and redacts material before anything hits observability or audit stores. Harbor replaced inline env vars with a broker + scoped delegation tokens; secret leakage in traces dropped to zero over six weeks while agent task success held flat. This guide covers why agents leak secrets, broker architecture, scoped token design, tool schema patterns, integration with sandboxes and permission gates, the Harbor Platform refactor, a technique decision table, pitfalls, and a production checklist.

Why agents are a secret-leak surface

Traditional services read secrets from environment variables at process start and never expose them to application logs. Agents add three new exfiltration paths:

  • Prompt and context retention — anything in the system prompt or retrieved documents can be paraphrased back to users or echoed in tool args.
  • Tool argument authoring — the model chooses JSON fields; if the schema includes api_key, it will eventually fill it.
  • Observability defaultstraces and debug dumps often record full request/response bodies unless you redact aggressively.

Prompt injection makes this worse: an attacker who can influence retrieved text may instruct the model to repeat secrets into a public channel. Treat every secret the model can see as already compromised for practical purposes.

Classes of credentials agents touch

  • Long-lived API keys — Stripe, SendGrid, cloud provider admin keys
  • OAuth refresh tokens — per-user Google, Salesforce, GitHub access
  • Database connection strings — especially dangerous in text-to-SQL agents
  • Session cookies and JWTs — common in browser-automation agents
  • Signing keys — webhook HMAC, JWT issuers, on-chain hot wallets

Credential broker pattern

A credential broker is a trusted sidecar or middleware layer that sits between the agent runtime and external APIs. The model never receives raw secret material — only stable references the broker understands.

Request flow

  1. Model calls refund_payment(customer_id, amount_cents, credential_ref: "stripe_prod_ro")
  2. Runtime validates the ref against the active permission manifest
  3. Broker fetches the live secret from Vault / KMS / cloud secret manager
  4. Broker issues the outbound HTTP call; span logs show credential_ref only
  5. Tool result returned to model contains status and IDs, not Authorization headers

Broker responsibilities

  • Resolve refs to secrets — never pass resolved values back to the LLM
  • Enforce scope — ref stripe_prod_ro cannot call DELETE endpoints
  • Rotate without agent redeploy — update vault version; refs stay stable
  • Emit audit events — which run used which ref (not the secret itself)
  • Rate-limit and anomaly-detect — burst refunds page finance even if the model is compromised

For user-delegated OAuth, the broker stores refresh tokens encrypted per tenant and exchanges them for access tokens with TTL under five minutes. The model sees user_google_calendar:acct_4821, not a bearer string.

Scoped short-lived tokens

Long-lived keys in a broker vault are necessary, but each tool invocation should prefer delegation tokens minted for that run and that action.

Token shape

  • sub — agent run_id + tool_call_id
  • scope — e.g. stripe:refund:customer_cus_abc
  • exp — 60–300 seconds; single-use where APIs allow
  • policy_hash — ties to the permission manifest version

Why TTL matters

If a trace or chat log leaks a delegation token instead of a root API key, blast radius stays bounded. Harbor mints Stripe-scoped tokens only after tier-2 approval for refunds above $500. Tokens rejected at the payment API if the approval attestation is missing from the broker’s internal ledger.

Downstream API support

Prefer platforms with native scoped keys (Stripe restricted keys, AWS STS, GCP service account impersonation, GitHub fine-grained PATs). Where only root keys exist, put a narrow internal gateway in front and give the broker gateway credentials instead.

Tool schema design: references, not values

Tool JSON schemas teach the model what to send. Any field named password, token, secret, or api_key is an invitation to leak.

Do

  • Accept credential_ref enums bound to the manifest (stripe_prod_ro | netsuite_sandbox)
  • Document that refs are resolved server-side; model must not invent values
  • Validate refs in middleware before the broker call — unknown ref hard-fails
  • Return structured errors (credential_not_authorized) the model can reason about

Do not

  • Include optional authorization_header “for flexibility”
  • Pass through user-supplied URLs with embedded basic auth
  • Let the model choose which vault path to read

For browser automation, run sessions inside an isolated profile where cookies are injected by the sandbox launcher — never as CLI arguments the model types.

Runtime injection and redaction

Injection points

Layer What gets injected Model visibility
HTTP tool middleware Authorization header from broker None
SQL tool gateway Read-only DB role per tenant Connection alias only
MCP server Server-side env the host holds Tool metadata, not env
Subagent spawn Child-scoped ref subset Child manifest names only

Redaction before observability

Apply the same rules as PII redaction to secrets: regex for sk_live_, AKIA, JWT three-part blobs, and Bearer prefixes. Redact at span export time; do not rely on engineers remembering to scrub tickets. Harbor blocks deploy if integration tests detect known secret patterns in fixture traces.

Retries and idempotency

When retries replay tool calls, the broker must re-mint delegation tokens rather than cache Authorization headers in checkpoints the model might later read via durable execution state exports.

Harbor Platform refactor

Before the broker, Harbor stored integration secrets in Kubernetes secrets mounted as env vars, copied key names into the system prompt, and logged full tool JSON in traces. OAuth refresh tokens lived in the same Redis keyspace as chat history.

Changes shipped

  1. Central credential-broker service with Vault backend; agents receive zero env secrets
  2. Permission manifest lists allowed credential_ref per agent role; CI fails on schema drift
  3. Tool middleware strips any argument key matching /secret|token|password|api_key/i before execution
  4. Span processor redacts secret patterns; weekly scanner replays random traces
  5. User OAuth moved to encrypted per-tenant table; broker is sole reader

Outcomes

  • Secret substrings in stored traces: 14% of runs → 0% over six weeks
  • Unauthorized API calls blocked at broker: 3.2% of attempted tool calls (mostly misconfigured refs)
  • Mean tool latency +18 ms for broker hop — accepted vs rotation incidents
  • Key rotation now zero-downtime: update Vault; no agent image rebuild

Technique decision table

Approach Best for Weak when
Secrets in system prompt / env exposed to model Local prototypes, read-only public APIs Any production mutating tool; compliance; prompt injection risk
Credential broker + refs Production agents with multiple integrations Extreme low-latency edge without sidecar budget
Per-user OAuth via broker Calendar, email, CRM agents acting on behalf of users Batch jobs with no user context (use service refs)
Pre-authenticated sandbox only Browser agents, code execution with network Agents that must call diverse external APIs per turn
Human pastes secret each run One-off admin scripts Automation, scale, audit requirements

Common pitfalls

  • Teaching the model where secrets live — “use the key from env STRIPE_SECRET_KEY” guarantees eventual leakage.
  • Optional secret fields in tool schemas — the model fills optional fields when confused.
  • Logging full HTTP in tool errors — upstream 401 responses echo Authorization headers.
  • Sharing refs across tenantsstripe_prod_ro must map per-tenant vault paths.
  • Checkpointing bearer tokens — durable state becomes a secret store readable by support tools.
  • Subagent inherits parent vault — child agents need strict ref subsets, not root access.
  • Redaction only in UI — raw spans still export to S3; scrub at export.
  • Ignoring vendor logs — tool args sent to third parties may be retained under their policies.

Production checklist

  • Zero raw secrets in system prompts, RAG corpora, or few-shot examples.
  • Tool schemas use credential_ref enums, not free-text secret fields.
  • Credential broker resolves refs; model never receives resolved values.
  • Permission manifest binds refs to agent roles and environments.
  • Short-lived delegation tokens per mutating tool call where supported.
  • Span and audit exporters redact secret patterns before persistence.
  • Weekly automated scan of traces and chat logs for leak patterns.
  • OAuth refresh tokens encrypted; broker is sole accessor.
  • Retries re-mint tokens; checkpoints store refs only.
  • Subagents receive narrowed ref sets via delegation policy.
  • Rotation runbook updates vault without redeploying agent images.
  • Incident playbooks cover trace exposure and partner notification.

Key takeaways

  • If the model can see a secret, assume it will leak — via tools, traces, or injection.
  • Harbor cut trace secret exposure 14% → 0% with a broker and ref-only schemas.
  • Inject at runtime in trusted middleware, not in prompts.
  • Scoped TTL tokens bound blast radius when redaction fails.
  • Redact at export — observability is a secret store unless proven otherwise.

Related reading