Guide

LLM agent middleware hook pipeline explained

Harbor Platform shipped a tier-1 support agent with twelve tool adapters — CRM lookup, billing API, knowledge search, ticket mutation, and more. Each adapter author copy-pasted the same concerns: PII redaction, per-tenant rate limits, approval gates for write tools, and trace span tags. One team forgot the redaction step on a new Stripe webhook reader; staging logs showed full card metadata flowing into the model context. In production shadow mode, 14% of runs briefly exposed secrets before a downstream output validator caught the leak — too late for compliance comfort.

Agent middleware is the composable hook layer that runs around every model call and every tool invocation. Instead of scattering policy across adapters, you register ordered pre/post hooks: authenticate the run context, inject credentials at runtime, cap observations, enforce permissions, emit spans, and short-circuit unsafe paths before side effects execute. This guide covers hook types, pipeline ordering, short-circuit semantics, integration with observability and approval gates, the Harbor Platform refactor, a technique decision table, pitfalls, and a production checklist.

What middleware solves in agent runtimes

An agent loop is repetitive: read state, call the model, parse tool requests, execute tools, append observations, repeat. Cross-cutting behavior appears at every step — logging, cost accounting, safety, tenancy isolation, retries. Without middleware, each concern becomes a fork of the core loop or a partial copy inside every tool. That leads to:

  • Policy drift — adapter A redacts emails; adapter B does not; nobody notices until an audit.
  • Ordering bugs — you validate output after a write tool already mutated production data.
  • Untestable spaghetti — safety logic mixed with business logic cannot be unit-tested in isolation.
  • Latency stacking — duplicate JSON parsing and schema checks on the same payload in three layers.

Middleware centralizes these hooks behind a stable pipeline contract. Frameworks expose it under different names — LangChain AgentMiddleware, LangGraph node wrappers, custom before_model / after_tool registries — but the pattern is the same: ordered functions with shared run context.

Hook types and the run context object

Define a run context passed through every hook: run_id, tenant_id, user_id, budget_remaining, trace_span, mutable metadata, and references to durable state. Hooks read and write this bag; they never rely on global variables.

Model hooks

  • Pre-model — trim history, inject system deltas, attach cached prefix tokens, block when budget exhausted.
  • Post-model — parse structured output, validate tool-call JSON, strip disallowed content before history append.

Tool hooks

  • Pre-tool — permission check, argument schema validation, idempotency key injection, rate-limit acquire, human-approval pause for destructive ops.
  • Post-tool — observation truncation, PII scrub, cost tag, error envelope normalization, audit event emit.

Run lifecycle hooks

  • On run start / end — lease resources, finalize spans, flush metrics.
  • On error / cancel — compensating actions, partial result persistence (see cancellation lifecycle).

Keep hooks pure where possible: return a modified context or a short_circuit result. Side effects belong in explicitly named hooks (audit emit, credential fetch), not buried in generic wrappers.

Pipeline ordering and short-circuit semantics

Order matters. A practical default stack for tool execution:

  1. AuthZ / tenancy gate — reject unknown tenant before any network I/O.
  2. Approval gate — pause write tools pending human sign-off when policy requires it.
  3. Rate limiter — acquire token per integration.
  4. Argument validator — schema + business rules.
  5. Credential injector — attach scoped tokens (see secrets injection).
  6. Tool executor — the actual adapter.
  7. Observation middleware — truncate/summarize (see tool result summarization).
  8. Audit / trace emit — hash-chained event with redacted payload pointer.

Short-circuit means a hook returns early without calling downstream hooks. Example: pre-tool AuthZ fails → return a structured tool error observation without hitting the CRM API. Document whether short-circuit skips post-tool hooks (usually: run a minimal post-tool audit hook anyway).

For model calls, run budget checks before the LLM request and output validation before appending assistant messages to history — otherwise bad tool JSON poisons the next turn.

Composing middleware without framework lock-in

You do not need a heavyweight framework to get middleware. A minimal implementation is a list of callables and a reducer:

async function runPipeline(hooks, ctx, finalFn) {
  for (const hook of hooks) {
    const result = await hook(ctx);
    if (result?.shortCircuit) return result.value;
    if (result?.ctx) ctx = result.ctx;
  }
  return finalFn(ctx);
}

Register hooks per environment: staging may log full payloads; production runs redaction earlier. Feature flags can insert experimental hooks without editing tool adapters. Version the pipeline manifest alongside your agent release so deterministic replay replays the same hook order.

Avoid mega-hooks that do six unrelated things. Split redactPii, capObservationTokens, and emitAudit so tests and ownership stay clear. Compose with explicit ordering in a single registry file reviewed by security.

Integration with observability and cost controls

Middleware is the natural place to attach cost tags and trace spans — one span per tool call wrapping pre/post hooks, child spans for redaction and validation. Propagate run_id into every hook so logs correlate across services.

  • Pre-model: record input token estimate; decrement context budget.
  • Post-model: record output tokens and model ID for FinOps dashboards.
  • Pre-tool: stamp integration and operation labels before outbound HTTP.
  • Post-tool: emit latency, status, and truncated payload hash — never raw secrets.

If observability hooks throw, decide policy: fail open (log locally) vs fail closed (abort run). Production agents usually fail open on metrics but fail closed on AuthZ.

Harbor Platform refactor walkthrough

Harbor consolidated twelve adapters behind one ToolGateway with a shared middleware stack:

  1. TenantContextHook — validates JWT, sets tenant_id, loads data-residency flags.
  2. WriteApprovalHook — queues ticket-close and refund tools for async human approval.
  3. RateLimitHook — per-integration token bucket shared across all tools hitting the same CRM cluster.
  4. CredentialBrokerHook — injects short-lived OAuth tokens; never passes secrets through the model.
  5. ObservationCapHook — projects JSON and caps at 8k tokens before history append.
  6. AuditHook — writes hash-chained events to the compliance store with S3 pointers for full payloads.

New tools now implement only business logic; policy ships once in the registry. Outcomes: secret-leak shadow pages 14% → 0%, median time to add a new tool 3.2 → 1.4 days, policy regression tests cover the pipeline independently of CRM mocks. Support engineers reported fewer “agent did something I did not approve” tickets because write gates became impossible to bypass by accident.

Technique decision table

ScenarioPreferAvoid
Cross-cutting policy on all toolsOrdered middleware pipelineCopy-paste in each adapter
One-off experimental toolOpt-out flag with documented riskPermanent bypass of AuthZ hook
Destructive write operationsPre-tool approval short-circuitPost-hoc output validation only
High-cardinality debug in stagingVerbose hook behind env flagFull payload logging in production
Latency-sensitive read toolsLight pre-hooks; heavy post-process asyncSequential LLM validation per row
Multi-tenant SaaS agentTenant gate as first hookTenant ID from model-provided args

Common pitfalls

  • Wrong hook order — redacting after audit logs already captured secrets.
  • Hooks that call the model — hidden nested LLM calls explode cost and break replay unless recorded.
  • Mutable shared state — parallel tool execution races on a global counter; keep state in run context.
  • Swallowing errors — middleware catches exceptions and returns empty observations; failures look like success.
  • Unbounded hook chains — twenty micro-hooks with no manifest; nobody knows final order.
  • Bypass paths for “speed” — internal admin tools that skip AuthZ become the breach vector.
  • Testing only happy path — golden tests must exercise short-circuit branches and hook failures.

Production checklist

  • Single registry documents hook order with owners and version.
  • Tenant/auth hook runs before any outbound tool network call.
  • Write tools pass through approval middleware when policy requires.
  • Credential injection uses broker tokens, not model-visible secrets.
  • Observation cap runs post-tool before history append.
  • Short-circuit returns structured tool errors the model can parse.
  • Audit hook runs even when pre-tool short-circuits (deny events).
  • Spans and cost tags attached at middleware boundaries.
  • Pipeline manifest pinned in run replay artifacts.
  • Unit tests per hook; integration test for full stack order.
  • Alert on new tools registered without pipeline attachment.

Key takeaways

  • Middleware is how agents share policy without copy-paste — one ordered pipeline beats twelve partial implementations.
  • Pre-tool hooks prevent damage; post-tool hooks shape what the model learns — order both deliberately.
  • Short-circuit with structured errors — blocked calls should teach the model, not crash the run.
  • Keep hooks small and testable — compose, do not accumulate god-wrappers.
  • Harbor Platform cut secret-leak pages to 0% and halved tool integration time with a gateway middleware stack, not more validators at the end.

Related reading