Guide

LLM agent multi-tenancy and tenant isolation systems explained

Harbor Platform sold an enterprise agent API: each customer got a branded support bot with access to its own knowledge base, CRM tools, and episodic memory. Twelve tenants shared one Kubernetes cluster and one Postgres instance. During a penetration test, auditors triggered a retrieval query that returned contract language from a different tenant because the vector index filter used org_id from the HTTP header while episodic memory reads keyed only on session_id — IDs that were not globally unique across tenants. Internal sampling estimated 3.2% of production runs had at least one cross-tenant data touch in logs; two customers received breach notifications. After a full isolation refactor, cross-tenant touches dropped to 0% in quarterly red-team exercises.

Multi-tenancy for LLM agents means many customers share orchestration, model routing, and tool infrastructure while their data, credentials, quotas, and side effects remain strictly partitioned. Tenant isolation is the set of enforcement layers that make that promise true at every hop — not just at the API gateway. This guide covers tenant context propagation, data-plane namespaces, compute and secret boundaries, integration with per-tenant rate limits and scoped credential injection, the Harbor Platform refactor, a technique decision table, pitfalls, and a production checklist.

Why agents are harder to multi-tenant than CRUD APIs

A typical REST API scopes one request to one tenant: authenticate, attach tenant_id to the query, return rows. Agent runs are long, stateful graphs that touch many subsystems over minutes:

Episodic and vector memory — prior turns, retrieved chunks, and tool observations persist across steps and sessions.
Tool side effects — writes to CRM, email, or payment APIs use tenant-specific OAuth tokens that must never leak into another tenant's sandbox.
Subagent delegation — child runs inherit (or fail to inherit) parent tenant context.
Shared caches — prompt templates, embedding caches, and model warm pools can accidentally key only on content hash, not tenant.
Async continuations — webhooks, queue workers, and checkpoint resume must rehydrate tenant scope without trusting client-supplied IDs alone.

One missing filter on any layer becomes a data breach. Isolation must be defense in depth: gateway auth, middleware enforcement, database row-level security, and integration tests that deliberately collide tenant IDs.

Tenant context: the object everything must carry

Define an immutable tenant context struct created at the edge after authentication and threaded through every async boundary:

{
  "tenant_id": "tnt_8f2a…",
  "environment": "production",
  "principal_id": "usr_91bc…",
  "run_id": "run_7e4d…",
  "data_residency": "eu-west-1",
  "quota_tier": "enterprise"
}

Rules that prevent Harbor-style leaks:

Never accept tenant_id from unauthenticated body fields. Derive it from signed JWT claims or mTLS service identity.
Propagate via context, not globals. Thread-local or async-local storage in Node/Python; explicit parameter in Go/Rust.
Serialize into durable checkpoints so resume workers do not reconstruct scope from external IDs alone.
Log tenant_id on every span per observability conventions — but redact tenant content from shared log indexes.
Validate on tool entry in middleware hooks: if context is missing, fail closed.

Composite primary keys should include tenant_id everywhere: (tenant_id, session_id), (tenant_id, run_id), (tenant_id, document_id). Never rely on UUID v4 uniqueness across tenants.

Data-plane isolation layers

Relational stores and run history

Enable Postgres row-level security (RLS) or equivalent: policies like tenant_id = current_setting('app.tenant_id') set per connection from the pool checkout hook. Application queries that omit the tenant predicate should return zero rows, not all rows. Audit tables for agent runs, tool calls, and human approvals need the same policy.

Vector and episodic memory

Vector indexes must filter on tenant_id before top-k scoring, not after. Options:

Metadata filter per query — simplest; verify your engine cannot be tricked into ignoring filters.
Physical namespace per tenant — separate collections or index partitions; higher ops cost, strongest isolation.
Dedicated index per enterprise tier — hybrid for regulated customers.

Episodic memory (conversation summaries, user preferences) shares the same rule: Harbor's bug was filtering vectors but not SQL episodic rows.

Object storage and attachments

Prefix buckets with tenants/{tenant_id}/ and enforce IAM policies at the storage layer. Pre-signed URLs must be short-lived and scoped to one prefix. Agents uploading files should never receive a bucket root credential.

Compute isolation: sandboxes, secrets, and egress

Code-execution tools are the highest-risk surface. Each tenant's sandbox should receive:

Ephemeral filesystem destroyed after the run per sandbox design.
Tenant-scoped secret handles — not raw API keys in environment variables shared across pods.
Egress allowlists derived from tenant configuration: Tenant A's CRM domain, not the internet.
CPU/memory ceilings tied to quota tier so one tenant cannot starve others.

Secret brokers should map (tenant_id, integration_name) to vault paths. A tool requesting salesforce without a matching tenant credential fails before any network call. Permission scoping layers add tool-level policy: Tenant B's agents simply do not have delete_customer in their allowlist.

Quota envelopes and noisy-neighbor control

Isolation is not only about secrecy — it is about fairness. Per-tenant envelopes should cover:

Concurrent runs and open SSE streams
Tokens per minute and tool calls per hour
Sandbox CPU-seconds and egress bytes
Storage for vectors, attachments, and audit logs

Implement quotas in the same middleware that enforces tenant context so a misconfigured worker cannot bypass limits. Surface remaining budget in run metadata for FinOps dashboards — enterprise customers expect per-tenant invoices, not blended averages.

Harbor Platform refactor walkthrough

Harbor's remediation sprint addressed each leak class:

Composite keys — migrated session_id to (tenant_id, session_id) on episodic tables; backfilled with tenant from run metadata.
RLS on Postgres — enabled on all agent tables; integration tests assert cross-tenant SELECT returns empty.
Vector filter audit — wrapper that refuses queries without tenant_id in the filter AST; CI fuzzes with colliding IDs.
Context middleware — single hook at orchestrator entry; subagents receive copied context, never reconstructed from prompts.
Tenant chaos tests — weekly job runs two tenants with identical session UUIDs; asserts zero cross-reads in traces.

Time-to-first-tenant onboard dropped from three days (manual namespace setup) to four hours (templated provisioning). No customer required a dedicated cluster after the refactor — the previous plan to “fix” isolation by splitting deployments was more expensive and still would not have fixed shared embedding caches.

Technique decision table

Approach	Isolation strength	Cost / complexity	When to use
Shared cluster + RLS + metadata filters	Good (if tested relentlessly)	Low marginal cost	Default for SMB and mid-market SaaS agents
Physical namespace per tenant (DB, vector, bucket)	Very good	Medium ops overhead	Regulated industries, >1k tenants with uneven size
Dedicated cluster per enterprise tenant	Strongest	High	Contractual requirement, government, or >$1M ACV deals
Single-tenant deploy per customer (no sharing)	Maximum	Highest	On-prem or air-gapped; not true multi-tenant economics

Most teams should start shared with RLS and invest in chaos testing before paying for per-tenant clusters. Dedicated stacks multiply patch burden and slow feature rollout unless automation is excellent.

Common pitfalls

Trusting client-supplied tenant headers — attackers iterate IDs; bind tenant to auth token only.
Global session IDs — UUID collision or sequential IDs across tenants cause silent cross-talk.
Filter-after-retrieval — scoring top-k globally then filtering tenants leaks timing side channels and occasionally wrong chunks.
Shared prompt cache keys — cache entries that omit tenant_id can return another customer's tool output summary.
Webhook callbacks without tenant binding — async workers process jobs with only run_id and guess wrong tenant.
Support/admin impersonation without audit — break-glass access must log actor, tenant, and justification.
Cross-tenant analytics dashboards — aggregating metrics is fine; joining raw transcripts is not.

Production checklist

Immutable tenant context created at auth boundary; propagated to all async work.
Composite keys include tenant_id on sessions, runs, documents, and files.
RLS or equivalent enforced on every datastore agent code touches.
Vector queries refuse execution without tenant filter in the query plan.
Secrets broker resolves credentials by (tenant_id, integration) only.
Sandboxes are ephemeral with tenant-scoped egress allowlists.
Per-tenant quota envelopes on runs, tokens, tools, and sandbox resources.
Chaos tests deliberately collide IDs across two tenants weekly.
Traces and audit logs include tenant_id; content indexed per-tenant or redacted.
Provisioning automation creates namespaces, policies, and default quotas together.

Key takeaways

Agents touch more surfaces than APIs — isolation must cover memory, tools, sandboxes, and async workers.
Tenant context is a first-class object — thread it everywhere; fail closed if missing.
RLS plus filter-before-score beats hoping application code never forgets a WHERE clause.
Quotas are isolation too — noisy neighbors erode trust even without data leaks.
Harbor cut cross-tenant touches from 3.2% to 0% with composite keys and chaos tests, not by splitting clusters.