Guide
LLM agent MCP tool server integration production systems explained
Harbor Platform’s support agents connected to fourteen
Model Context Protocol
servers — ticketing, CRM, knowledge base, billing — by spawning a
fresh stdio subprocess for every tool call. Cold starts averaged
1.8 s before the first JSON-RPC handshake completed;
zombie processes accumulated when runs cancelled mid-flight; and
cancellation
did not propagate to child servers, leaving database transactions open.
Tool-call timeout rate hit 47% under normal load, and
FinOps attributed 31% of agent run wall time to MCP process
boot alone.
Engineers replaced per-call spawns with a connection pool, a versioned capability registry that maps tool names to server endpoints, and tenant-scoped credential injection at borrow time. Health probes and circuit breakers isolate failing servers without poisoning the whole agent fleet. Tool timeouts fell to 6.8%; p95 tool latency dropped from 2.4 s to 340 ms. This guide explains how production agents integrate MCP servers beyond the protocol primer — lifecycle, discovery, auth, pooling, observability, the Harbor Platform refactor, a technique decision table, pitfalls, and a production checklist.
Why MCP integration is an agent-ops problem, not a protocol exercise
The MCP specification defines how hosts discover tools, resources, and prompts over JSON-RPC. It does not define how a multi-tenant SaaS agent platform should keep fourteen servers warm, rotate OAuth tokens per customer, or survive a Postgres MCP server that wedged after a migration. Those concerns sit in the integration layer between your agent loop and the outside world.
What the integration layer must own
- Connection lifecycle — borrow, use, return, or recycle pooled sessions instead of fork-per-call.
- Capability registry — stable tool names across server versions; schema snapshots pinned per run.
- Tenant auth — inject customer credentials without passing secrets through the model context.
- Health and circuit breaking — probe servers; shed load when error budgets burn.
- Observability — trace each
tools/callwith server id, latency, and payload size. - Policy gates — route dangerous tools through approval workflows before execution.
Treat MCP servers like microservices your agent orchestrator calls — not like local Python functions imported at startup.
Architecture: registry, pool, and run-scoped capability snapshot
A production integration stack has three durable components plus one per-run artifact:
1. Capability registry (durable config)
The registry stores server definitions: transport (stdio vs
streamable HTTP), launch command or base URL, health-check interval, max
concurrent sessions, allowed tenants, and tool-name prefixes. When an MCP
server publishes a new tool list, the registry ingests it, diffs schemas,
and marks the server stale until a warm pool member refreshes.
Version pins let you roll out crm-mcp@2.1 to 5% of tenants
without renaming tools exposed to the model.
2. Connection pool (runtime)
Pool workers maintain long-lived MCP sessions. On borrow, the worker:
- Checks out a healthy session or opens one if below min idle.
- Injects tenant credentials via MCP’s env or HTTP header hooks (never via prompt).
- Returns a handle scoped to the current run id for tracing.
On return, the worker clears tenant state, resets any server-side session
variables, and either returns the connection to idle or destroys it if the
error budget tripped. For stdio servers, cap pool size —
each idle process still consumes file descriptors and memory.
3. Run capability snapshot (ephemeral)
At run start, the agent runtime builds a frozen tool manifest from the registry: JSON schemas, descriptions, and server routing metadata. The LLM sees only this snapshot for the run’s duration, so mid-run registry changes do not mutate in-flight tool definitions. Pair snapshots with feature flags to enable new MCP servers per tenant cohort.
4. Tool router (per call)
When the model emits a tool call, the router resolves the name against the
snapshot, borrows a pooled connection for the target server, applies
routing policy
(rate limits, approval gates), executes tools/call, and records
structured results for the agent loop. Errors map to retriable vs fatal
classes before
partial-failure recovery
logic runs.
Transport choices in production
Stdio subprocess servers
Best for local integrations (filesystem, git, internal CLIs). Production requirements: pre-warm min idle processes, enforce max lifetime (recycle after N calls or M minutes), propagate SIGTERM on run cancel, and sandbox with cgroup memory limits. Never spawn on the critical path of a latency-SLO run without a warm pool.
Streamable HTTP / SSE servers
Best for shared platform services (CRM, ticketing, analytics). Use keep-alive
HTTP/2 connections, mTLS or OAuth between agent platform and MCP server,
and per-tenant rate limits at the server ingress. Health checks can be
ordinary HTTP /healthz plus an MCP tools/list
probe.
Hybrid fleets
Most enterprise agents mix both: stdio for code sandbox and HTTP for SaaS APIs. The registry abstracts transport so the agent loop sees uniform tool names. FinOps should tag cost by server id, not transport type.
Security and tenancy
MCP’s power is exposing real systems to models. That is also the threat surface.
- Credential injection at borrow — map tenant id to short-lived tokens in a secrets service; rotate without redeploying servers.
- Tool allowlists per tenant tier — free tier gets read-only CRM tools; paid tier gets write tools behind approval gates.
- Output redaction — run MCP tool results through PII scanners before returning to the model or user.
- Blast-radius isolation — bulkhead pools so a runaway ticketing server cannot exhaust connections for billing tools.
- Audit trail — log every
tools/callwith actor, tenant, arguments hash, and result status for compliance replay.
Pair ingress scanning from adversarial input firewalls with egress validation on tool outputs — MCP does not sanitize responses for you.
Observability and SLOs
Instrument four layers: pool borrow wait time, MCP JSON-RPC round-trip, server-side execution, and end-to-end tool success rate per server id. Alert when:
- p95 borrow wait exceeds 200 ms (pool undersized).
- Server error rate crosses 5% over 5 minutes (circuit opens).
- Schema drift detected between registry and live
tools/list. - Zombie process count grows while active runs flatline.
Export traces compatible with your existing
agent observability
stack so on-call can correlate a slow support answer with a wedged
postgres-mcp session.
Harbor Platform refactor
Harbor Platform shipped five integration changes in one release:
- Warm stdio pools — min 2 idle per server type; max 8; recycle after 200 calls or 15 minutes.
- Capability registry v2 — versioned tool schemas with diff alerts; run snapshots pinned at run start.
- Tenant credential broker — OAuth refresh at borrow; no tokens in env vars visible to the model.
- Circuit breakers per server — open after 10 consecutive timeouts; half-open probe every 30 s.
- Cancel propagation — run cancel sends MCP
notifications/cancelledand SIGTERM to stdio children.
Tool-call timeout rate fell from 47% to 6.8%. p95 tool latency improved 86%. Zombie MCP processes per host dropped from averages of 140 to under 20.
Technique decision table
| Approach | Strength | Weakness | Best for |
|---|---|---|---|
| Spawn MCP per tool call | Simple demo code; perfect isolation | Cold starts; FD leaks; cancel gaps | Local prototypes only |
| Warm stdio pool | Low latency for local tools | Memory per idle process; host-bound | Filesystem, git, codegen sandboxes |
| HTTP MCP with shared service | Central ops, horizontal scale | Network hop; auth complexity | CRM, ticketing, warehouse queries |
| Registry + run snapshot | Stable schemas mid-run; safe rollouts | Registry ops overhead | Multi-tenant production agents |
| Inline REST wrappers (no MCP) | Familiar ops patterns | No standard discovery; N custom clients | Single API, no tool ecosystem |
| Dynamic tools/list every turn | Always fresh tool list | Token bloat; schema churn confuses model | Rapid dev; not production |
Pitfalls
- Secrets in prompts — passing API keys to the model so it can “call the tool correctly”; use borrow-time injection.
- Unbounded pool growth — one popular tenant spawns dozens of idle stdio servers; enforce max idle and LRU eviction.
- Schema drift mid-run — registry updates tool required fields while a long agent run is in flight; pin snapshots.
- No cancel propagation — user abandons chat but MCP server keeps writing to CRM.
- Tool name collisions — two servers export
search; namespace with prefixes in the registry. - Giant tool manifests — listing 200 tools every run burns context; use dynamic tool selection to expose subsets per intent.
- Ignoring server-side quotas — pooled connections mask rate limits until the vendor bans your integration account.
Production checklist
- Register every MCP server with transport, health probe, and tenant allowlist.
- Pre-warm connection pools; set min idle, max size, and max lifetime per server.
- Build run-scoped capability snapshots at run start; pin schema versions.
- Inject tenant credentials at borrow time via secrets broker; never via prompt.
- Namespace tool names when multiple servers expose similar operations.
- Propagate run cancellation to MCP sessions and stdio child processes.
- Wrap each server in a circuit breaker with half-open recovery probes.
- Trace tools/call with server id, tenant, latency, and payload size metrics.
- Route write tools through permission gates and human approval where required.
- Redact PII from tool outputs before returning to model or user.
- Alert on pool borrow wait, schema drift, and zombie process growth.
- Load-test pool exhaustion under peak concurrent runs per tenant tier.
Key takeaways
- Production MCP integration is agent-ops: pools, registry, tenancy, and circuit breakers — not just JSON-RPC wiring.
- Pin capability snapshots per run so schema changes and feature flags do not corrupt in-flight agent loops.
- Inject credentials at connection borrow; the model should never see integration secrets.
- Stdio pools need lifecycle discipline — warm idle, recycle, cancel propagation, and FD limits.
- Harbor Platform cut tool timeouts from 47% to 6.8% with warm pools, a versioned registry, and per-server circuit breakers.
Related reading
- Model Context Protocol (MCP) explained — protocol primitives: hosts, clients, servers, tools, resources
- Dynamic tool selection and routing explained — expose the right tool subset per intent
- Permission scoping and tool approval gates explained — gate dangerous MCP tools before execution
- Circuit breaker and bulkhead resilience explained — isolate failing MCP servers from the fleet