Guide
LLM agent multi-tenancy and tenant isolation systems explained
Harbor Platform sold an enterprise agent API: each customer got a branded
support bot with access to its own knowledge base, CRM tools, and episodic
memory. Twelve tenants shared one Kubernetes cluster and one Postgres
instance. During a penetration test, auditors triggered a retrieval query
that returned contract language from a different tenant because
the vector index filter used org_id from the HTTP header while
episodic memory reads keyed only on session_id — IDs that
were not globally unique across tenants. Internal sampling estimated
3.2% of production runs had at least one cross-tenant data
touch in logs; two customers received breach notifications. After a full
isolation refactor, cross-tenant touches dropped to 0% in
quarterly red-team exercises.
Multi-tenancy for LLM agents means many customers share orchestration, model routing, and tool infrastructure while their data, credentials, quotas, and side effects remain strictly partitioned. Tenant isolation is the set of enforcement layers that make that promise true at every hop — not just at the API gateway. This guide covers tenant context propagation, data-plane namespaces, compute and secret boundaries, integration with per-tenant rate limits and scoped credential injection, the Harbor Platform refactor, a technique decision table, pitfalls, and a production checklist.
Why agents are harder to multi-tenant than CRUD APIs
A typical REST API scopes one request to one tenant: authenticate, attach
tenant_id to the query, return rows. Agent runs are long,
stateful graphs that touch many subsystems over minutes:
- Episodic and vector memory — prior turns, retrieved chunks, and tool observations persist across steps and sessions.
- Tool side effects — writes to CRM, email, or payment APIs use tenant-specific OAuth tokens that must never leak into another tenant's sandbox.
- Subagent delegation — child runs inherit (or fail to inherit) parent tenant context.
- Shared caches — prompt templates, embedding caches, and model warm pools can accidentally key only on content hash, not tenant.
- Async continuations — webhooks, queue workers, and checkpoint resume must rehydrate tenant scope without trusting client-supplied IDs alone.
One missing filter on any layer becomes a data breach. Isolation must be defense in depth: gateway auth, middleware enforcement, database row-level security, and integration tests that deliberately collide tenant IDs.
Tenant context: the object everything must carry
Define an immutable tenant context struct created at the edge after authentication and threaded through every async boundary:
{
"tenant_id": "tnt_8f2a…",
"environment": "production",
"principal_id": "usr_91bc…",
"run_id": "run_7e4d…",
"data_residency": "eu-west-1",
"quota_tier": "enterprise"
}
Rules that prevent Harbor-style leaks:
- Never accept tenant_id from unauthenticated body fields. Derive it from signed JWT claims or mTLS service identity.
- Propagate via context, not globals. Thread-local or async-local storage in Node/Python; explicit parameter in Go/Rust.
- Serialize into durable checkpoints so resume workers do not reconstruct scope from external IDs alone.
- Log tenant_id on every span per observability conventions — but redact tenant content from shared log indexes.
- Validate on tool entry in middleware hooks: if context is missing, fail closed.
Composite primary keys should include tenant_id everywhere:
(tenant_id, session_id), (tenant_id, run_id),
(tenant_id, document_id). Never rely on UUID v4 uniqueness
across tenants.
Data-plane isolation layers
Relational stores and run history
Enable Postgres row-level security (RLS) or equivalent: policies like
tenant_id = current_setting('app.tenant_id') set per
connection from the pool checkout hook. Application queries that omit
the tenant predicate should return zero rows, not all rows. Audit tables
for agent runs, tool calls, and human approvals need the same policy.
Vector and episodic memory
Vector indexes must filter on tenant_id before top-k
scoring, not after. Options:
- Metadata filter per query — simplest; verify your engine cannot be tricked into ignoring filters.
- Physical namespace per tenant — separate collections or index partitions; higher ops cost, strongest isolation.
- Dedicated index per enterprise tier — hybrid for regulated customers.
Episodic memory (conversation summaries, user preferences) shares the same rule: Harbor's bug was filtering vectors but not SQL episodic rows.
Object storage and attachments
Prefix buckets with tenants/{tenant_id}/ and enforce IAM
policies at the storage layer. Pre-signed URLs must be short-lived and
scoped to one prefix. Agents uploading files should never receive a bucket
root credential.
Compute isolation: sandboxes, secrets, and egress
Code-execution tools are the highest-risk surface. Each tenant's sandbox should receive:
- Ephemeral filesystem destroyed after the run per sandbox design.
- Tenant-scoped secret handles — not raw API keys in environment variables shared across pods.
- Egress allowlists derived from tenant configuration: Tenant A's CRM domain, not the internet.
- CPU/memory ceilings tied to quota tier so one tenant cannot starve others.
Secret brokers should map (tenant_id, integration_name) to
vault paths. A tool requesting salesforce without a matching
tenant credential fails before any network call.
Permission scoping
layers add tool-level policy: Tenant B's agents simply do not have
delete_customer in their allowlist.
Quota envelopes and noisy-neighbor control
Isolation is not only about secrecy — it is about fairness. Per-tenant envelopes should cover:
- Concurrent runs and open SSE streams
- Tokens per minute and tool calls per hour
- Sandbox CPU-seconds and egress bytes
- Storage for vectors, attachments, and audit logs
Implement quotas in the same middleware that enforces tenant context so a misconfigured worker cannot bypass limits. Surface remaining budget in run metadata for FinOps dashboards — enterprise customers expect per-tenant invoices, not blended averages.
Harbor Platform refactor walkthrough
Harbor's remediation sprint addressed each leak class:
- Composite keys — migrated
session_idto(tenant_id, session_id)on episodic tables; backfilled with tenant from run metadata. - RLS on Postgres — enabled on all agent tables; integration tests assert cross-tenant SELECT returns empty.
- Vector filter audit — wrapper that refuses queries
without
tenant_idin the filter AST; CI fuzzes with colliding IDs. - Context middleware — single hook at orchestrator entry; subagents receive copied context, never reconstructed from prompts.
- Tenant chaos tests — weekly job runs two tenants with identical session UUIDs; asserts zero cross-reads in traces.
Time-to-first-tenant onboard dropped from three days (manual namespace setup) to four hours (templated provisioning). No customer required a dedicated cluster after the refactor — the previous plan to “fix” isolation by splitting deployments was more expensive and still would not have fixed shared embedding caches.
Technique decision table
| Approach | Isolation strength | Cost / complexity | When to use |
|---|---|---|---|
| Shared cluster + RLS + metadata filters | Good (if tested relentlessly) | Low marginal cost | Default for SMB and mid-market SaaS agents |
| Physical namespace per tenant (DB, vector, bucket) | Very good | Medium ops overhead | Regulated industries, >1k tenants with uneven size |
| Dedicated cluster per enterprise tenant | Strongest | High | Contractual requirement, government, or >$1M ACV deals |
| Single-tenant deploy per customer (no sharing) | Maximum | Highest | On-prem or air-gapped; not true multi-tenant economics |
Most teams should start shared with RLS and invest in chaos testing before paying for per-tenant clusters. Dedicated stacks multiply patch burden and slow feature rollout unless automation is excellent.
Common pitfalls
- Trusting client-supplied tenant headers — attackers iterate IDs; bind tenant to auth token only.
- Global session IDs — UUID collision or sequential IDs across tenants cause silent cross-talk.
- Filter-after-retrieval — scoring top-k globally then filtering tenants leaks timing side channels and occasionally wrong chunks.
- Shared prompt cache keys — cache entries that omit tenant_id can return another customer's tool output summary.
- Webhook callbacks without tenant binding — async
workers process jobs with only
run_idand guess wrong tenant. - Support/admin impersonation without audit — break-glass access must log actor, tenant, and justification.
- Cross-tenant analytics dashboards — aggregating metrics is fine; joining raw transcripts is not.
Production checklist
- Immutable tenant context created at auth boundary; propagated to all async work.
- Composite keys include
tenant_idon sessions, runs, documents, and files. - RLS or equivalent enforced on every datastore agent code touches.
- Vector queries refuse execution without tenant filter in the query plan.
- Secrets broker resolves credentials by
(tenant_id, integration)only. - Sandboxes are ephemeral with tenant-scoped egress allowlists.
- Per-tenant quota envelopes on runs, tokens, tools, and sandbox resources.
- Chaos tests deliberately collide IDs across two tenants weekly.
- Traces and audit logs include tenant_id; content indexed per-tenant or redacted.
- Provisioning automation creates namespaces, policies, and default quotas together.
Key takeaways
- Agents touch more surfaces than APIs — isolation must cover memory, tools, sandboxes, and async workers.
- Tenant context is a first-class object — thread it everywhere; fail closed if missing.
- RLS plus filter-before-score beats hoping application code never forgets a WHERE clause.
- Quotas are isolation too — noisy neighbors erode trust even without data leaks.
- Harbor cut cross-tenant touches from 3.2% to 0% with composite keys and chaos tests, not by splitting clusters.
Related reading
- LLM agent secrets and credential injection explained — vault brokers scoped per tenant
- LLM agent rate limiting and throttling explained — per-tenant quota envelopes
- LLM agent permission scoping and tool approval gates explained — tenant-specific tool allowlists
- LLM agent run audit trail and compliance logging explained — immutable per-tenant event logs