Guide

LLM agent MCP tool server integration production systems explained

Harbor Platform’s support agents connected to fourteen Model Context Protocol servers — ticketing, CRM, knowledge base, billing — by spawning a fresh stdio subprocess for every tool call. Cold starts averaged 1.8 s before the first JSON-RPC handshake completed; zombie processes accumulated when runs cancelled mid-flight; and cancellation did not propagate to child servers, leaving database transactions open. Tool-call timeout rate hit 47% under normal load, and FinOps attributed 31% of agent run wall time to MCP process boot alone.

Engineers replaced per-call spawns with a connection pool, a versioned capability registry that maps tool names to server endpoints, and tenant-scoped credential injection at borrow time. Health probes and circuit breakers isolate failing servers without poisoning the whole agent fleet. Tool timeouts fell to 6.8%; p95 tool latency dropped from 2.4 s to 340 ms. This guide explains how production agents integrate MCP servers beyond the protocol primer — lifecycle, discovery, auth, pooling, observability, the Harbor Platform refactor, a technique decision table, pitfalls, and a production checklist.

Why MCP integration is an agent-ops problem, not a protocol exercise

The MCP specification defines how hosts discover tools, resources, and prompts over JSON-RPC. It does not define how a multi-tenant SaaS agent platform should keep fourteen servers warm, rotate OAuth tokens per customer, or survive a Postgres MCP server that wedged after a migration. Those concerns sit in the integration layer between your agent loop and the outside world.

What the integration layer must own

Connection lifecycle — borrow, use, return, or recycle pooled sessions instead of fork-per-call.
Capability registry — stable tool names across server versions; schema snapshots pinned per run.
Tenant auth — inject customer credentials without passing secrets through the model context.
Health and circuit breaking — probe servers; shed load when error budgets burn.
Observability — trace each tools/call with server id, latency, and payload size.
Policy gates — route dangerous tools through approval workflows before execution.

Treat MCP servers like microservices your agent orchestrator calls — not like local Python functions imported at startup.

Architecture: registry, pool, and run-scoped capability snapshot

A production integration stack has three durable components plus one per-run artifact:

1. Capability registry (durable config)

The registry stores server definitions: transport (stdio vs streamable HTTP), launch command or base URL, health-check interval, max concurrent sessions, allowed tenants, and tool-name prefixes. When an MCP server publishes a new tool list, the registry ingests it, diffs schemas, and marks the server stale until a warm pool member refreshes. Version pins let you roll out crm-mcp@2.1 to 5% of tenants without renaming tools exposed to the model.

2. Connection pool (runtime)

Pool workers maintain long-lived MCP sessions. On borrow, the worker:

Checks out a healthy session or opens one if below min idle.
Injects tenant credentials via MCP’s env or HTTP header hooks (never via prompt).
Returns a handle scoped to the current run id for tracing.

On return, the worker clears tenant state, resets any server-side session variables, and either returns the connection to idle or destroys it if the error budget tripped. For stdio servers, cap pool size — each idle process still consumes file descriptors and memory.

3. Run capability snapshot (ephemeral)

At run start, the agent runtime builds a frozen tool manifest from the registry: JSON schemas, descriptions, and server routing metadata. The LLM sees only this snapshot for the run’s duration, so mid-run registry changes do not mutate in-flight tool definitions. Pair snapshots with feature flags to enable new MCP servers per tenant cohort.

4. Tool router (per call)

When the model emits a tool call, the router resolves the name against the snapshot, borrows a pooled connection for the target server, applies routing policy (rate limits, approval gates), executes tools/call, and records structured results for the agent loop. Errors map to retriable vs fatal classes before partial-failure recovery logic runs.

Transport choices in production

Stdio subprocess servers

Best for local integrations (filesystem, git, internal CLIs). Production requirements: pre-warm min idle processes, enforce max lifetime (recycle after N calls or M minutes), propagate SIGTERM on run cancel, and sandbox with cgroup memory limits. Never spawn on the critical path of a latency-SLO run without a warm pool.

Streamable HTTP / SSE servers

Best for shared platform services (CRM, ticketing, analytics). Use keep-alive HTTP/2 connections, mTLS or OAuth between agent platform and MCP server, and per-tenant rate limits at the server ingress. Health checks can be ordinary HTTP /healthz plus an MCP tools/list probe.

Hybrid fleets

Most enterprise agents mix both: stdio for code sandbox and HTTP for SaaS APIs. The registry abstracts transport so the agent loop sees uniform tool names. FinOps should tag cost by server id, not transport type.

Security and tenancy

MCP’s power is exposing real systems to models. That is also the threat surface.

Credential injection at borrow — map tenant id to short-lived tokens in a secrets service; rotate without redeploying servers.
Tool allowlists per tenant tier — free tier gets read-only CRM tools; paid tier gets write tools behind approval gates.
Output redaction — run MCP tool results through PII scanners before returning to the model or user.
Blast-radius isolation — bulkhead pools so a runaway ticketing server cannot exhaust connections for billing tools.
Audit trail — log every tools/call with actor, tenant, arguments hash, and result status for compliance replay.

Pair ingress scanning from adversarial input firewalls with egress validation on tool outputs — MCP does not sanitize responses for you.

Observability and SLOs

Instrument four layers: pool borrow wait time, MCP JSON-RPC round-trip, server-side execution, and end-to-end tool success rate per server id. Alert when:

p95 borrow wait exceeds 200 ms (pool undersized).
Server error rate crosses 5% over 5 minutes (circuit opens).
Schema drift detected between registry and live tools/list.
Zombie process count grows while active runs flatline.

Export traces compatible with your existing agent observability stack so on-call can correlate a slow support answer with a wedged postgres-mcp session.

Harbor Platform refactor

Harbor Platform shipped five integration changes in one release:

Warm stdio pools — min 2 idle per server type; max 8; recycle after 200 calls or 15 minutes.
Capability registry v2 — versioned tool schemas with diff alerts; run snapshots pinned at run start.
Tenant credential broker — OAuth refresh at borrow; no tokens in env vars visible to the model.
Circuit breakers per server — open after 10 consecutive timeouts; half-open probe every 30 s.
Cancel propagation — run cancel sends MCP notifications/cancelled and SIGTERM to stdio children.

Tool-call timeout rate fell from 47% to 6.8%. p95 tool latency improved 86%. Zombie MCP processes per host dropped from averages of 140 to under 20.

Technique decision table

Approach	Strength	Weakness	Best for
Spawn MCP per tool call	Simple demo code; perfect isolation	Cold starts; FD leaks; cancel gaps	Local prototypes only
Warm stdio pool	Low latency for local tools	Memory per idle process; host-bound	Filesystem, git, codegen sandboxes
HTTP MCP with shared service	Central ops, horizontal scale	Network hop; auth complexity	CRM, ticketing, warehouse queries
Registry + run snapshot	Stable schemas mid-run; safe rollouts	Registry ops overhead	Multi-tenant production agents
Inline REST wrappers (no MCP)	Familiar ops patterns	No standard discovery; N custom clients	Single API, no tool ecosystem
Dynamic tools/list every turn	Always fresh tool list	Token bloat; schema churn confuses model	Rapid dev; not production

Pitfalls

Secrets in prompts — passing API keys to the model so it can “call the tool correctly”; use borrow-time injection.
Unbounded pool growth — one popular tenant spawns dozens of idle stdio servers; enforce max idle and LRU eviction.
Schema drift mid-run — registry updates tool required fields while a long agent run is in flight; pin snapshots.
No cancel propagation — user abandons chat but MCP server keeps writing to CRM.
Tool name collisions — two servers export search; namespace with prefixes in the registry.
Giant tool manifests — listing 200 tools every run burns context; use dynamic tool selection to expose subsets per intent.
Ignoring server-side quotas — pooled connections mask rate limits until the vendor bans your integration account.

Production checklist

Register every MCP server with transport, health probe, and tenant allowlist.
Pre-warm connection pools; set min idle, max size, and max lifetime per server.
Build run-scoped capability snapshots at run start; pin schema versions.
Inject tenant credentials at borrow time via secrets broker; never via prompt.
Namespace tool names when multiple servers expose similar operations.
Propagate run cancellation to MCP sessions and stdio child processes.
Wrap each server in a circuit breaker with half-open recovery probes.
Trace tools/call with server id, tenant, latency, and payload size metrics.
Route write tools through permission gates and human approval where required.
Redact PII from tool outputs before returning to model or user.
Alert on pool borrow wait, schema drift, and zombie process growth.
Load-test pool exhaustion under peak concurrent runs per tenant tier.

Key takeaways

Production MCP integration is agent-ops: pools, registry, tenancy, and circuit breakers — not just JSON-RPC wiring.
Pin capability snapshots per run so schema changes and feature flags do not corrupt in-flight agent loops.
Inject credentials at connection borrow; the model should never see integration secrets.
Stdio pools need lifecycle discipline — warm idle, recycle, cancel propagation, and FD limits.
Harbor Platform cut tool timeouts from 47% to 6.8% with warm pools, a versioned registry, and per-server circuit breakers.