Guide
LangChain fundamentals explained
Raw LLM APIs return text. Production apps need prompt templates, structured parsing,
retrieval over documents, tool loops, session memory, retries, and traces you can replay
when a customer complains. LangChain is the most widely adopted Python
(and JavaScript) framework for wiring those pieces together. Its modern surface is
LCEL (LangChain Expression Language): a composable
Runnable interface where prompts, models, parsers, and retrievers pipe
together with the | operator. This guide covers core primitives, agents and
tools, memory and
RAG
integration, LangSmith observability, how LangChain relates to
agent patterns
and the
Model Context Protocol,
a Harbor Support worked example, a framework decision table, pitfalls, and a production
checklist.
What LangChain is (and is not)
LangChain is an orchestration library, not a model host. You still
call OpenAI, Anthropic, Google, or open-weight endpoints through provider-specific
chat model classes (ChatOpenAI, ChatAnthropic, etc.). LangChain
standardizes how you build chains, attach tools, load documents, and swap providers
without rewriting business logic.
It is not the only way to ship LLM features. Simple one-shot completions, fixed three-step pipelines, or teams standardized on MCP tool servers may need less framework than LangChain provides. Reach for LangChain when you have multiple composable steps, several model providers, or agent loops that would otherwise sprawl across ad-hoc Python functions.
The Runnable contract
Every LCEL component implements invoke, batch,
stream, and ainvoke with consistent input/output typing.
That uniformity means the same chain runs synchronously in a Flask route, streams tokens
to a WebSocket, or executes async in FastAPI without duplicate code paths.
LCEL: composing chains with the pipe operator
The canonical pattern chains a prompt template, chat model, and output parser:
chain = prompt | model | parser
result = chain.invoke({"topic": "capacity utilization"})
Each stage is a Runnable. LangChain handles passing outputs forward,
optional streaming of intermediate steps, and configurable callbacks for tracing.
Complex flows use RunnableParallel (fan-out), RunnableBranch
(conditional routing), and RunnableLambda (wrap plain Python).
Prompt templates
ChatPromptTemplate defines system, human, and assistant message slots with
variable placeholders. Few-shot examples live in FewShotChatMessagePromptTemplate.
Keep templates version-controlled; treat them like SQL queries, not throwaway strings.
System prompts should encode policy (cite sources, refuse unsafe requests) while user
templates carry task-specific variables.
Output parsers
Models emit strings; parsers convert to structured data. StrOutputParser
extracts text from message objects. PydanticOutputParser and
JsonOutputParser enforce schemas for downstream code. For API-native
structured outputs, prefer provider
structured output modes
where available, then parse with LangChain wrappers for portability.
Chat models, embeddings, and provider abstraction
LangChain wraps dozens of model providers behind common interfaces. Swap
ChatOpenAI(model="gpt-4o-mini") for ChatAnthropic(model="claude-sonnet-4-20250514")
and most chain code stays identical. Set temperature, max tokens, stop sequences, and
reasoning effort through constructor kwargs or model.bind() for per-invocation
overrides.
Embeddings classes (OpenAIEmbeddings,
HuggingFaceEmbeddings) feed vector stores for RAG. Batch embedding calls
through embed_documents to amortize HTTP overhead. Cache embedding vectors
when source documents change infrequently.
Use init_chat_model("provider:model") (LangChain 0.2+) for environment-driven
model selection in staging vs production. Never hardcode API keys; load from environment
or secret managers and rotate on leak.
Tools and agents
LangChain implements the same tool loop described in our
function calling guide:
bind tools to a model, execute returned tool calls, append results, repeat. Define tools
with the @tool decorator or StructuredTool.from_function,
including rich docstrings — the model reads descriptions, not your source code.
Prebuilt agent executors
create_react_agent and provider-specific agent constructors wire a model,
tool list, and prompt into a loop with step limits. AgentExecutor (legacy)
and LangGraph-based agents (current direction) handle iteration, early stopping on
duplicate calls, and returning structured final answers.
For production, prefer LangGraph when you need explicit state machines: human approval nodes, checkpointing, parallel tool branches, or retry subgraphs. LangGraph is maintained by the LangChain team but models control flow as a graph rather than an opaque while-loop — easier to test and audit.
Tool design habits
- One responsibility per tool; split <->write pairs.
- Return JSON-serializable summaries, not raw 50 MB payloads.
- Surface actionable error strings the model can react to.
- Validate arguments with Pydantic before side effects run.
Memory and conversation state
Stateless APIs require you to resend full history each turn. LangChain memory classes automate that bookkeeping:
- ChatMessageHistory — stores raw messages in memory or Redis.
- ConversationBufferMemory — injects entire history (watch token limits).
- ConversationSummaryMemory — compresses older turns via a summarizer model.
- Vector store memory — retrieves relevant past facts by semantic search.
Long-running agents should externalize state: user IDs, draft order IDs, and permissions belong in your database, not buried in chat transcripts. Pair LangChain memory with our LLM agent memory guide patterns for summarization and scratchpads.
Document loaders, text splitters, and RAG
LangChain’s document ecosystem loads PDFs, HTML, Notion exports, S3 objects, and
databases into Document objects (page content + metadata). Text splitters
(RecursiveCharacterTextSplitter) chunk content with overlap for retrieval
quality. Vector stores (Chroma, Pinecone, pgvector via PGVector) index
embeddings; retrievers expose invoke(query) as a Runnable you pipe into
prompts.
A minimal RAG chain:
rag_chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt
| model
| StrOutputParser()
)
LangChain does not replace RAG fundamentals: chunk size, hybrid search, reranking, and
evaluation still determine quality. Use LangChain for glue; follow our
RAG guide
and
RAG evaluation guide
for architecture decisions. create_retrieval_chain and
create_history_aware_retriever bundle common patterns for conversational
Q&A over docs.
LangSmith: tracing, datasets, and evals
LangSmith is LangChain’s hosted observability platform (optional
but valuable in production). Set LANGCHAIN_TRACING_V2=true and
LANGCHAIN_API_KEY; every chain run records inputs, outputs, latency, and
token usage. Replay failed customer sessions, diff prompt versions, and export traces into
regression datasets.
LangSmith supports online evaluators (LLM-as-judge, custom Python) and offline batch runs against golden questions. Tie eval hooks to CI: block deploys when faithfulness or tool selection accuracy drops. This complements — not replaces — benchmark evaluation on static suites.
Deployment, async, and streaming
Expose chains through FastAPI or LangServe (add_routes) for HTTP
/invoke and /stream endpoints. Use chain.astream_events
for token-by-token SSE to browsers. Batch offline jobs with chain.batch and
configurable max concurrency.
Containerize with explicit dependency pins; LangChain releases frequently and minor version bumps occasionally rename imports. Lock versions in production requirements and read changelogs before upgrading. For self-hosted inference, point chat models at vLLM or Ollama OpenAI-compatible URLs via base URL overrides.
Worked example: Harbor Support ticket router
Harbor Support receives 4,000 tickets daily across billing, shipping, and account security. Operators need an assistant that classifies intent, retrieves policy snippets, and either drafts a reply or escalates to a human with a structured summary.
Pipeline design
A LangChain LCEL pipeline runs in three stages. Stage one: a
ChatPromptTemplate with few-shot examples feeds
ChatOpenAI(model="gpt-4o-mini", temperature=0) and a
PydanticOutputParser emitting category,
urgency, and needs_human fields. Stage two: a
RunnableBranch routes non-escalations to a RAG chain —
PGVector retriever over chunked policy PDFs (k=4), then answer synthesis
with citation instructions. Stage three: a @tool named
create_zendesk_note writes drafts via API when confidence exceeds 0.85;
otherwise the chain stops with escalation metadata only.
Operations
Conversation history lives in Redis via RedisChatMessageHistory.
LangSmith traces every ticket; weekly evals sample 200 closed tickets comparing draft
quality scores. Prompt templates live in git; deploys tag LangSmith datasets with commit
SHAs. P95 latency target: under 8 seconds including retrieval. Human agents override
roughly 18% of drafts — feedback loops feed misclassified examples back into the
few-shot set after review.
Framework decision table
| Need | Prefer | Why |
|---|---|---|
| Multi-step chains, several providers, agents | LangChain + LangGraph | LCEL composition, tool loops, tracing integration |
| Document-heavy Q&A, indexing focus | LlamaIndex (or LangChain retrievers only) | Indexing abstractions tuned for retrieval-first apps |
| Single provider, fixed 2-step flow | Raw SDK + plain Python | Less dependency surface, easier debugging |
| Standardized tool servers across products | MCP hosts + thin client | Tools portable between Claude Desktop, IDE agents, internal apps |
| JavaScript full-stack | LangChain.js or Vercel AI SDK | Match team stack; LCEL concepts port to JS |
| Strict compliance, minimal magic | Explicit state machine (LangGraph or hand-rolled) | Auditable nodes and edges beat opaque executors |
Common pitfalls
- Framework soup — importing every LangChain submodule; start with LCEL primitives and add packages only when needed.
- Hidden token burn — buffer memory on long chats without summarization blows context and cost.
- Tool descriptions as afterthoughts — vague docstrings cause wrong tool selection; rewrite until a human would pick correctly.
- Skipping evals — LangSmith enabled but no regression datasets; you still fly blind on prompt changes.
- Blocking I/O in async routes — calling
invokeinside FastAPI async handlers stalls the event loop; useainvoke. - Version drift — tutorials referencing deprecated
LLMChainandConversationChain; migrate to LCEL. - Trusting retrieved text — RAG chunks can carry injection payloads; sanitize before they reach the model (see prompt injection defenses).
Production checklist
- Pin LangChain, LangGraph, and provider SDK versions in lockfiles.
- Store prompts in version control; tag LangSmith datasets per release.
- Enable tracing in staging and production with PII redaction rules.
- Set max agent iterations, tool timeouts, and duplicate-call detection.
- Unit-test Runnable stages with mocked models; integration-test retrievers against real vector DB.
- Log cost per request (tokens in/out) and alert on anomalies.
- Document which tools have side effects and require human approval.
- Run weekly offline evals on representative queries; block deploy on regressions.
- Provide deterministic fallback (human queue) when chains fail or exceed latency SLO.
- Review quarterly whether raw SDK or MCP would simplify your actual use case.
Key takeaways
- LangChain standardizes LLM app plumbing through the Runnable interface and LCEL pipe composition.
- Tools and agents implement function-calling loops; LangGraph adds explicit, testable control flow.
- Memory and retrievers integrate as Runnables but still need sound RAG and context-window discipline.
- LangSmith closes the loop with traces, datasets, and evals tied to prompt versions.
- Choose LangChain for composable multi-step apps; prefer thinner stacks when requirements stay simple.
Related reading
- AI agents and tool use explained — ReAct loops and guardrails LangChain agents implement
- RAG explained — chunking, hybrid search, and retrieval quality fundamentals
- LLM function calling explained — provider APIs underlying LangChain tools
- Model Context Protocol (MCP) explained — when standardized tool servers beat in-process LangChain tools