Guide

LLM agent handoff and session transfer explained

Harbor Support’s tier-1 bot escalated billing disputes by pasting the last forty chat turns into the tier-2 queue. Agents re-asked account numbers; customers repeated themselves; repeat contacts within seven days hit 41% on escalated threads. The model hadn’t failed — the handoff had. Tier-2 received noise (duplicate greetings, failed tool JSON, policy snippets) without a crisp problem statement, verified facts, or open actions. Replacing chat dumps with a typed session transfer package — summary, structured entities, attempted resolutions, and explicit next steps — cut repeat-contact rate to 27% and shaved 19 minutes off median time-to-resolution on escalations. Handoff is the production discipline of moving responsibility between agents, specialists, and humans while preserving only what the receiver needs to continue work.

This guide covers handoff triggers, warm vs cold transfer, state package design, agent-to-agent and agent-to-human paths, channel switches (voice to chat), continuity with human-in-the-loop queues, the Harbor Support refactor, a technique decision table, pitfalls, and a production checklist.

What handoff is (and is not)

A handoff transfers ownership of a user session from one runtime actor to another: tier-1 bot to billing specialist bot, research agent to human analyst, voice agent to async email thread. It is distinct from subagent delegation, where a parent spawns a child and later merges results — the user still talks to one surface. Handoff means the user’s next message may be handled by a different model, tool set, or human with a fresh context window.

Session transfer is the data contract that makes handoff reliable: what you serialize at the boundary so the receiver does not re-derive facts from raw logs. Without that contract, every escalation becomes “please summarize what we discussed” — expensive, error-prone, and hostile to customers who already explained the issue once.

Handoff vs memory

Long-term agent memory persists across sessions (user profile, prior tickets). Handoff packages are ephemeral and scoped to this incident: they answer “what does the next handler need right now?” Memory may inform the package; the package should not dump the entire memory store into the receiver’s prompt.

When to hand off

Triggers should be explicit rules plus model signals, not vague “escalate if unsure” instructions:

Policy boundaries — refunds above threshold, legal threats, medical advice, account closure require human or specialist profile.
Confidence and calibration — route when calibrated uncertainty exceeds abstention threshold after one clarification attempt.
Tool failure loops — same tool error twice, or CRM lookup returns no match after verified identity step.
User request — “talk to a person” is a first-class intent; do not argue past it.
SLA or sentiment — long handle time, repeated negative sentiment scores, or profanity flags (with locale-aware nuance).
Capability mismatch — sales bot receives post-sale billing question; hand off to billing profile rather than improvising.

Each trigger should map to a target profile (billing_t2, human_queue_premium) and a priority, not a generic “escalate” tool that drops tickets into one undifferentiated pool.

Warm transfer vs cold transfer

Borrowing from contact-center terminology:

Warm transfer — the outgoing agent stays connected until the receiver acknowledges readiness. User hears or sees a brief bridge (“Connecting you to billing…”). Receiver gets the package before the first user message after transfer. Best for voice and high-stakes cases where dropped context feels personal.
Cold transfer — outgoing agent ends; receiver picks up asynchronously from queue state. User may wait; receiver must open with context-aware greeting, not “How can I help?” Common in chat and email when warm bridging adds latency without benefit.

For LLM agents, warm transfer means: (1) generate and validate the state package, (2) pre-load receiver context, (3) only then emit the handoff event to the user. Cold transfer skips live bridging but still requires pre-loaded receiver context — never make the human or tier-2 bot read forty turns manually.

The session transfer package

Treat the package as a versioned schema, not prose. A practical minimum:

{
  "handoff_id": "ho_8f3…",
  "source_run_id": "run_t1_22a…",
  "target_profile": "billing_specialist_v2",
  "transfer_mode": "warm",
  "user_verified": true,
  "problem_statement": "Duplicate charge $49.99 on 2026-06-01; user wants refund.",
  "structured_entities": {
    "account_id": "acct_9912",
    "charge_ids": ["ch_44b", "ch_44c"],
    "product_sku": "PRO-MONTHLY"
  },
  "attempted_actions": [
    { "tool": "fetch_charges", "result": "two charges same SKU same day", "at": "…" },
    { "tool": "issue_refund", "result": "denied_policy_window", "at": "…" }
  ],
  "open_questions": ["Confirm which charge user wants reversed"],
  "recommended_next_step": "Apply goodwill refund per policy §4.2; supervisor approval if > $50",
  "citations": ["crm://ticket/8812", "policy://billing-refunds#4.2"],
  "sentiment": "frustrated",
  "locale": "en-US",
  "channel_origin": "voice",
  "channel_target": "chat",
  "privacy_flags": { "pii_redacted_in_summary": false }
}

Separate fields for problem_statement (one paragraph max), structured_entities (machine-readable IDs the receiver’s tools need), and attempted_actions (audit trail without full tool JSON). The receiver’s system prompt should inject the package in a fixed block, not interleave it with raw chat history.

What to exclude

Full tool definitions and failed parse retries
Duplicate policy excerpts already in receiver’s system prompt
PII the receiver profile is not authorized to see
Model chain-of-thought or internal planner notes
Entire RAG chunks — cite IDs; let receiver re-fetch if needed

Agent-to-agent handoff

When routing between LLM profiles (tier-1 → billing specialist → retention bot), use the same package schema. Differences are tool allowlists and tone, not a different transfer format. Steps:

Outgoing agent calls initiate_handoff(target_profile, reason_code).
Runtime runs a handoff summarizer (small model or rules + LLM) over structured session state — not over unbounded chat.
Validator checks required fields (account_id present, problem_statement non-empty, citations for factual claims).
Receiver run starts with handoff_package + user’s next message only; optional last 2–3 turns for tone continuity.
Outgoing run enters completed_handoff; no further user messages route to it.

For parallel specialist consultation (legal + billing), prefer delegation with merge instead of user-visible triple handoffs unless the user explicitly wants separate threads.

Agent-to-human handoff

Humans need scan-friendly packages in the CRM sidebar, not prompt injection. Render problem statement, entities, attempted actions, and recommended next step as UI fields. Attach deep links to traces via source_run_id for debugging without pasting traces into the ticket body.

Integrate with HITL queues:

Accept / reject — agent can pull next ticket only after acknowledging package summary.
Return to bot — human marks “resolved by policy bot” with reason; package updates feed eval sets.
Co-pilot mode — human edits draft; bot learns from diff, not from re-reading full thread.

Never require customers to repeat verification steps completed pre-handoff unless fraud policy mandates re-auth for the new channel.

Channel and modality switches

Voice-to-chat and chat-to-email breaks are where context loss hurts most:

Voice → chat — send SMS or in-app link with session token; pre-populate chat with package summary visible to user (“Continuing your billing issue…”).
Chat → voice — IVR or callback reads short problem_statement; do not read sixteen-digit IDs aloud — use “we have your account on file.”
Async email — package drives subject line and first paragraph; include ticket ID and no-reply guardrails.

Store channel_origin and channel_target in the package so receivers adjust brevity (voice-trained habits do not belong in email walls of text).

Durable sessions and resume

Handoffs interact with durable execution: the outgoing run checkpoints awaiting_handoff_ack; the receiver run idempotently starts from handoff_id. If the user abandons mid-transfer, TTL the package and release queue capacity. On user return, offer resume from package rather than new session — keyed on user id + open incident id.

Attribute cost and latency across both runs in trace trees: span link handoff_id connects source and target for postmortems.

Harbor Support refactor

Harbor replaced tier-1 → tier-2 escalation with:

Handoff summarizer — fine-tuned small model on 12K historical escalations; outputs schema-valid JSON only.
Validator gate — rejects packages missing account_id or problem_statement; tier-1 must repair once before queue insert.
CRM sidebar — humans see four fields + attempted actions list; raw chat collapsed by default.
Warm chat bridge — user sees “Specialist joining…” for 2–4 s while tier-2 context loads.

Repeat contact within 7 days on escalated threads 41% → 27%. Median time-to-resolution on escalations 47 → 28 minutes. Human agent CSAT on escalations 3.1 → 4.0 (5-point scale). Tier-1 average handle time rose 8 seconds (summarizer call) — net savings from fewer repeat calls.

Technique decision table

Need	Prefer	Avoid
Same profile, more tool rounds	Continue session	Handoff
Subtask with isolated tools	Subagent delegation	User-visible handoff
Specialist model or human	Typed transfer package	Full chat paste
Voice escalation	Warm transfer + bridge copy	Cold drop to hold music
Async email follow-up	Package-driven template	Empty “we’ll get back to you”
Debug wrong escalation	Linked source_run_id trace	Re-run tier-1 from scratch
Compliance audit	Immutable handoff_id + schema version	Editable free-text notes only

Common pitfalls

Dumping full chat history — buries signal; blows receiver token budget.
Handoff without validation — tier-2 inherits hallucinated account IDs.
Duplicate verification — customers churn; respect prior auth scope.
No outgoing run termination — user messages hit both bots; conflicting replies.
Generic escalation pool — billing waits behind sales; use profile routing.
Missing attempted_actions — receiver repeats failed refunds.
PII leakage across profiles — redact or scope fields per receiver RBAC.
Orphan packages — user never connects; TTL and notify tier-1.

Production checklist

Define handoff triggers with mapped target profiles and priority.
Ship versioned session transfer schema with validator gate.
Run handoff summarizer on structured state, not raw chat alone.
Pre-load receiver context before user sees handoff complete.
Terminate or freeze outgoing run on successful handoff ack.
Render human-facing sidebar fields separate from model prompt block.
Link source and target spans with handoff_id.
Support warm and cold modes per channel policy.
Preserve verification scope across channel switches where policy allows.
TTL abandoned packages; offer resume on return.
Log schema version for compliance replay.
Measure repeat-contact rate and escalation handle time pre/post.

Key takeaways

Handoff is a boundary contract, not a longer chat log.
Harbor cut repeat contacts 41% → 27% with typed packages and CRM sidebar.
Warm transfer pre-loads receiver context before the user continues.
Delegate subtasks; hand off when profile, human, or channel must change.
Trace-linked handoffs make cross-tier debugging auditable.