Guide

LLM agent session lifecycle and conversation thread management systems explained

Harbor Portal runs an internal IT helpdesk agent across Slack, email, and a web widget. Analysts found 38% of resolved tickets reopened within 48 hours because session boundaries were wrong: a user’s “still broken” reply sometimes spawned a blank thread with no VPN logs, while other times a three-week-old laptop-provisioning context resurrected after idle timeout and the agent quoted obsolete policy. Stateless “append to messages[]” storage and ad-hoc TTLs could not distinguish a legitimate follow-up from a new incident or a stale resume. After shipping an explicit session lifecycle FSM with thread lineage, idle compaction triggers, and signed resume tokens, repeat contacts fell from 38% to 7.4%, median time-to-resolution improved 22%, and context-window spend dropped 19% because archived threads stopped being re-injected whole.

Session lifecycle management defines when a conversation starts, stays active, idles, compacts, archives, or hands off — and which state a returning user may resume. It sits below channel adapters (Slack, SMS, voice) and above individual run checkpoints (which survive pod crashes mid-tool). This guide covers thread versus session models, lifecycle FSMs, resume tokens, compaction hooks, multi-channel continuity, the Harbor Portal refactor, a decision table versus adjacent patterns, pitfalls, and a production checklist tied to handoffs, history compaction, and run cancellation.

Sessions, threads, and runs are not the same object

Teams conflate these layers and pay for it in tokens and user trust:

Session — the bounded relationship between one user (or party) and the agent for a coherent goal: “fix my VPN,” “onboard new hire Jane.” Sessions have lifecycle states and retention policy.
Thread — an ordered message lineage inside a session. A session may fork threads after escalation or split sub-topics, but each thread shares session-level metadata (tenant, user id, entitlements).
Run — one model invocation loop: plan, tool calls, completion. Many runs append to one thread; runs have their own cancel and timeout FSM.

Without explicit session ids, channels guess continuity from timestamps or subject lines. Email “Re:” threads and Slack thread_ts values are hints, not policy. Production systems assign a stable session_id at first intent classification and bind every inbound event to (session_id, thread_id, run_id) before the model sees text.

Session lifecycle FSM

Harbor Portal uses a six-state session FSM with hysteresis so borderline idle traffic does not flap:

OPEN — first user message received; intent lane assigned; entitlements loaded.
ACTIVE — at least one run completed in the last activity window (default 30 minutes of user or agent activity).
IDLE — no activity past soft idle (30 min); new messages may resume without compaction. Tool leases from prior runs are released.
STALE — past hard idle (24 h for IT; 7 d for low-urgency FAQ). Incoming messages trigger compaction or “start fresh?” prompt.
ARCHIVED — session closed with summary written; read-only except explicit reopen. Retention TTL starts.
HANDED_OFF — ownership moved to human or specialist bot via typed transfer package; original session frozen.

Transitions are event-driven: USER_MESSAGE, RUN_COMPLETED, IDLE_TIMER, RESOLVE, ESCALATE, ADMIN_REOPEN. Each transition emits an audit event with prior and next state for compliance dashboards.

Thread lineage and fork rules

When a user pivots (“actually this is about payroll, not VPN”), the classifier may fork a child thread instead of polluting the parent:

Parent thread keeps VPN tool results and remains ARCHIVED or IDLE.
Child thread inherits session_id but starts with a structured rollup of parent context (not full verbatim history).
Lineage pointer parent_thread_id enables analytics on topic switches without merging incompatible tool namespaces.

Fork rules prevent the common bug where a single messages[] array grows unbounded across unrelated asks. Compaction jobs (see below) run per thread, not per channel inbox.

Session store schema (minimal production shape)

Persist session metadata separately from message bodies:

session {
  session_id, tenant_id, user_id,
  state, intent_lane,
  opened_at, last_activity_at,
  idle_policy_id, retention_class,
  summary_text, summary_version,
  handoff_target, archive_reason
}
thread {
  thread_id, session_id, parent_thread_id,
  channel, external_thread_key,
  message_count, token_estimate,
  compaction_generation
}
resume_token {
  token_hash, session_id, thread_id,
  issued_at, expires_at, max_uses,
  snapshot_ref
}

Message bodies live in object storage or a wide-column store keyed by (thread_id, seq). Hot paths read only summary + last k turns unless a context budget expansion is approved. Session rows stay small enough to cache in Redis for admission control.

Idle policies and compaction triggers

Idle is not one timer. Harbor Portal defines policy profiles:

IT_INCIDENT — soft 30 min, hard 24 h, auto-compact at STALE entry.
ACCESS_REQUEST — hard 72 h; requires user confirmation to resume after STALE.
FAQ — hard 7 d; may archive without compaction if under 4 turns.

Entering STALE fires an async compaction job using the structured rollup pipeline: preserve ticket ids, device ids, and policy versions as anchor invariants; archive raw tool JSON to cold storage. The model receives summary + recent turns on resume, not the full 180-message Slack export.

Resume tokens and safe re-entry

When a user returns after IDLE or STALE, the channel may not reliably send session_id (deep links break, email clients strip query params). Issue a short-lived resume token in the last agent message:

Signed payload: session_id, thread_id, compaction_generation, expires_at.
Single-use or capped uses to prevent link forwarding leaks across users.
On redeem: verify state is resumable (not ARCHIVED unless reopen allowed); reject if compaction_generation on token lags store (stale link).

If token redeem fails, fall back to intent match: “Are you following up on ticket #8842 (VPN)?” with buttons — never silently attach to the wrong session.

Multi-channel continuity without merged inboxes

Users switch channels mid-incident: Slack alert, then phone, then email screenshot. Map external keys without merging message stores:

Channel binding table links (channel, external_thread_key) → session_id.
Voice-to-chat creates a new thread with transcript injection and a HANDED_OFF or ACTIVE handoff record.
Email Message-ID threading sets external_thread_key; In-Reply-To must match an open session or token redeem path.

Do not copy Slack messages into email bodies for context; inject session summary at run start so semantic caches namespace on stable session metadata, not channel-specific noise.

Harbor Portal refactor (case study)

Before: Slack bot keyed sessions on channel_id + user_id with a 48-hour Redis TTL. Follow-ups in side threads missed the key. Email used a separate heap with no linkage. STALE sessions still injected 40+ turns into the prompt. Repeat contacts hit 38%; mean prompt size 11.2k tokens.

After: Unified session service with FSM above, per-lane idle policies, compaction on STALE, resume tokens in closing messages, and channel binding table. Run checkpoints remained in the existing WAL layer for crash recovery only — not for conversation boundaries.

Results (30-day A/B): repeat contacts 38% → 7.4%; mean prompt 11.2k → 4.1k tokens; wrongful policy citations (stale context) 9 → 0.3 per 1k sessions; p95 time-to-resolution −22%.

Decision table: session lifecycle vs adjacent patterns

Approach	Best for	Weak when
Stateless chat (no session id)	One-shot Q&A, demos	Multi-day incidents, compliance audit, tool-heavy workflows
Channel-native threading only (Slack thread_ts)	Single-channel pilots	Cross-channel follow-up, email + chat, handoffs
Run checkpointing only	Crash recovery mid-tool	Defining when a conversation ends; idle staleness
Full session lifecycle FSM (this guide)	Enterprise support, ITSM, regulated tenants	Ultra-short FAQ bots with no tool state
Handoff packages without session FSM	One-time escalations	Repeat contacts, resume after idle, forked topics

Common pitfalls

TTL equals session policy — Redis expiry deletes metadata while users still reference ticket numbers in email.
One global idle timeout — FAQ and Sev-1 incidents need different STALE thresholds.
Resuming ARCHIVED sessions silently — Reopen must be explicit with audit trail.
Compaction without anchor invariants — Summaries drop device ids; tools fetch wrong assets.
Merging unrelated threads on user id alone — Colleagues sharing a kiosk or delegate inbox cross-contaminate.
Resume tokens without generation counters — Stale links attach to post-compaction state incorrectly.
Ignoring run cancel on session ARCHIVE — Orphan tools keep writing after resolve; cascade cancel on archive.

Production checklist

Assign session_id at first classified intent; never infer solely from channel keys.
Implement session FSM with OPEN → ACTIVE → IDLE → STALE → ARCHIVED (+ HANDED_OFF).
Define per-lane idle policies; document soft vs hard idle in runbooks.
Store session metadata separately from message bodies; cap hot-path reads.
Fork child threads on topic pivot; keep parent_thread_id lineage.
Trigger compaction on STALE entry; bump compaction_generation.
Issue signed resume tokens in closing messages; bind to generation.
Maintain channel binding table for cross-channel continuity.
On ARCHIVE, cancel in-flight runs and release tool leases.
Emit session.state transitions on traces for repeat-contact analytics.
Retention: ARCHIVED sessions TTL per compliance class; purge cold storage on schedule.
Load-test session admission under burst (Monday morning ticket spikes).

Key takeaways

Sessions bound goals; threads bound message lineage; runs bound single model loops.
Idle policies need intent lanes, not one global timeout.
Resume tokens and compaction generations prevent stale context re-entry.
Channel adapters map external keys; they do not define session policy.
Harbor Portal cut repeat contacts from 38% to 7.4% with an explicit session FSM.