Guide

LangChain fundamentals explained

Raw LLM APIs return text. Production apps need prompt templates, structured parsing, retrieval over documents, tool loops, session memory, retries, and traces you can replay when a customer complains. LangChain is the most widely adopted Python (and JavaScript) framework for wiring those pieces together. Its modern surface is LCEL (LangChain Expression Language): a composable Runnable interface where prompts, models, parsers, and retrievers pipe together with the | operator. This guide covers core primitives, agents and tools, memory and RAG integration, LangSmith observability, how LangChain relates to agent patterns and the Model Context Protocol, a Harbor Support worked example, a framework decision table, pitfalls, and a production checklist.

What LangChain is (and is not)

LangChain is an orchestration library, not a model host. You still call OpenAI, Anthropic, Google, or open-weight endpoints through provider-specific chat model classes (ChatOpenAI, ChatAnthropic, etc.). LangChain standardizes how you build chains, attach tools, load documents, and swap providers without rewriting business logic.

It is not the only way to ship LLM features. Simple one-shot completions, fixed three-step pipelines, or teams standardized on MCP tool servers may need less framework than LangChain provides. Reach for LangChain when you have multiple composable steps, several model providers, or agent loops that would otherwise sprawl across ad-hoc Python functions.

The Runnable contract

Every LCEL component implements invoke, batch, stream, and ainvoke with consistent input/output typing. That uniformity means the same chain runs synchronously in a Flask route, streams tokens to a WebSocket, or executes async in FastAPI without duplicate code paths.

LCEL: composing chains with the pipe operator

The canonical pattern chains a prompt template, chat model, and output parser:

chain = prompt | model | parser
result = chain.invoke({"topic": "capacity utilization"})

Each stage is a Runnable. LangChain handles passing outputs forward, optional streaming of intermediate steps, and configurable callbacks for tracing. Complex flows use RunnableParallel (fan-out), RunnableBranch (conditional routing), and RunnableLambda (wrap plain Python).

Prompt templates

ChatPromptTemplate defines system, human, and assistant message slots with variable placeholders. Few-shot examples live in FewShotChatMessagePromptTemplate. Keep templates version-controlled; treat them like SQL queries, not throwaway strings. System prompts should encode policy (cite sources, refuse unsafe requests) while user templates carry task-specific variables.

Output parsers

Models emit strings; parsers convert to structured data. StrOutputParser extracts text from message objects. PydanticOutputParser and JsonOutputParser enforce schemas for downstream code. For API-native structured outputs, prefer provider structured output modes where available, then parse with LangChain wrappers for portability.

Chat models, embeddings, and provider abstraction

LangChain wraps dozens of model providers behind common interfaces. Swap ChatOpenAI(model="gpt-4o-mini") for ChatAnthropic(model="claude-sonnet-4-20250514") and most chain code stays identical. Set temperature, max tokens, stop sequences, and reasoning effort through constructor kwargs or model.bind() for per-invocation overrides.

Embeddings classes (OpenAIEmbeddings, HuggingFaceEmbeddings) feed vector stores for RAG. Batch embedding calls through embed_documents to amortize HTTP overhead. Cache embedding vectors when source documents change infrequently.

Use init_chat_model("provider:model") (LangChain 0.2+) for environment-driven model selection in staging vs production. Never hardcode API keys; load from environment or secret managers and rotate on leak.

Tools and agents

LangChain implements the same tool loop described in our function calling guide: bind tools to a model, execute returned tool calls, append results, repeat. Define tools with the @tool decorator or StructuredTool.from_function, including rich docstrings — the model reads descriptions, not your source code.

Prebuilt agent executors

create_react_agent and provider-specific agent constructors wire a model, tool list, and prompt into a loop with step limits. AgentExecutor (legacy) and LangGraph-based agents (current direction) handle iteration, early stopping on duplicate calls, and returning structured final answers.

For production, prefer LangGraph when you need explicit state machines: human approval nodes, checkpointing, parallel tool branches, or retry subgraphs. LangGraph is maintained by the LangChain team but models control flow as a graph rather than an opaque while-loop — easier to test and audit.

Tool design habits

  • One responsibility per tool; split <->write pairs.
  • Return JSON-serializable summaries, not raw 50 MB payloads.
  • Surface actionable error strings the model can react to.
  • Validate arguments with Pydantic before side effects run.

Memory and conversation state

Stateless APIs require you to resend full history each turn. LangChain memory classes automate that bookkeeping:

  • ChatMessageHistory — stores raw messages in memory or Redis.
  • ConversationBufferMemory — injects entire history (watch token limits).
  • ConversationSummaryMemory — compresses older turns via a summarizer model.
  • Vector store memory — retrieves relevant past facts by semantic search.

Long-running agents should externalize state: user IDs, draft order IDs, and permissions belong in your database, not buried in chat transcripts. Pair LangChain memory with our LLM agent memory guide patterns for summarization and scratchpads.

Document loaders, text splitters, and RAG

LangChain’s document ecosystem loads PDFs, HTML, Notion exports, S3 objects, and databases into Document objects (page content + metadata). Text splitters (RecursiveCharacterTextSplitter) chunk content with overlap for retrieval quality. Vector stores (Chroma, Pinecone, pgvector via PGVector) index embeddings; retrievers expose invoke(query) as a Runnable you pipe into prompts.

A minimal RAG chain:

rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

LangChain does not replace RAG fundamentals: chunk size, hybrid search, reranking, and evaluation still determine quality. Use LangChain for glue; follow our RAG guide and RAG evaluation guide for architecture decisions. create_retrieval_chain and create_history_aware_retriever bundle common patterns for conversational Q&A over docs.

LangSmith: tracing, datasets, and evals

LangSmith is LangChain’s hosted observability platform (optional but valuable in production). Set LANGCHAIN_TRACING_V2=true and LANGCHAIN_API_KEY; every chain run records inputs, outputs, latency, and token usage. Replay failed customer sessions, diff prompt versions, and export traces into regression datasets.

LangSmith supports online evaluators (LLM-as-judge, custom Python) and offline batch runs against golden questions. Tie eval hooks to CI: block deploys when faithfulness or tool selection accuracy drops. This complements — not replaces — benchmark evaluation on static suites.

Deployment, async, and streaming

Expose chains through FastAPI or LangServe (add_routes) for HTTP /invoke and /stream endpoints. Use chain.astream_events for token-by-token SSE to browsers. Batch offline jobs with chain.batch and configurable max concurrency.

Containerize with explicit dependency pins; LangChain releases frequently and minor version bumps occasionally rename imports. Lock versions in production requirements and read changelogs before upgrading. For self-hosted inference, point chat models at vLLM or Ollama OpenAI-compatible URLs via base URL overrides.

Worked example: Harbor Support ticket router

Harbor Support receives 4,000 tickets daily across billing, shipping, and account security. Operators need an assistant that classifies intent, retrieves policy snippets, and either drafts a reply or escalates to a human with a structured summary.

Pipeline design

A LangChain LCEL pipeline runs in three stages. Stage one: a ChatPromptTemplate with few-shot examples feeds ChatOpenAI(model="gpt-4o-mini", temperature=0) and a PydanticOutputParser emitting category, urgency, and needs_human fields. Stage two: a RunnableBranch routes non-escalations to a RAG chain — PGVector retriever over chunked policy PDFs (k=4), then answer synthesis with citation instructions. Stage three: a @tool named create_zendesk_note writes drafts via API when confidence exceeds 0.85; otherwise the chain stops with escalation metadata only.

Operations

Conversation history lives in Redis via RedisChatMessageHistory. LangSmith traces every ticket; weekly evals sample 200 closed tickets comparing draft quality scores. Prompt templates live in git; deploys tag LangSmith datasets with commit SHAs. P95 latency target: under 8 seconds including retrieval. Human agents override roughly 18% of drafts — feedback loops feed misclassified examples back into the few-shot set after review.

Framework decision table

Need Prefer Why
Multi-step chains, several providers, agents LangChain + LangGraph LCEL composition, tool loops, tracing integration
Document-heavy Q&A, indexing focus LlamaIndex (or LangChain retrievers only) Indexing abstractions tuned for retrieval-first apps
Single provider, fixed 2-step flow Raw SDK + plain Python Less dependency surface, easier debugging
Standardized tool servers across products MCP hosts + thin client Tools portable between Claude Desktop, IDE agents, internal apps
JavaScript full-stack LangChain.js or Vercel AI SDK Match team stack; LCEL concepts port to JS
Strict compliance, minimal magic Explicit state machine (LangGraph or hand-rolled) Auditable nodes and edges beat opaque executors

Common pitfalls

  • Framework soup — importing every LangChain submodule; start with LCEL primitives and add packages only when needed.
  • Hidden token burn — buffer memory on long chats without summarization blows context and cost.
  • Tool descriptions as afterthoughts — vague docstrings cause wrong tool selection; rewrite until a human would pick correctly.
  • Skipping evals — LangSmith enabled but no regression datasets; you still fly blind on prompt changes.
  • Blocking I/O in async routes — calling invoke inside FastAPI async handlers stalls the event loop; use ainvoke.
  • Version drift — tutorials referencing deprecated LLMChain and ConversationChain; migrate to LCEL.
  • Trusting retrieved text — RAG chunks can carry injection payloads; sanitize before they reach the model (see prompt injection defenses).

Production checklist

  • Pin LangChain, LangGraph, and provider SDK versions in lockfiles.
  • Store prompts in version control; tag LangSmith datasets per release.
  • Enable tracing in staging and production with PII redaction rules.
  • Set max agent iterations, tool timeouts, and duplicate-call detection.
  • Unit-test Runnable stages with mocked models; integration-test retrievers against real vector DB.
  • Log cost per request (tokens in/out) and alert on anomalies.
  • Document which tools have side effects and require human approval.
  • Run weekly offline evals on representative queries; block deploy on regressions.
  • Provide deterministic fallback (human queue) when chains fail or exceed latency SLO.
  • Review quarterly whether raw SDK or MCP would simplify your actual use case.

Key takeaways

  • LangChain standardizes LLM app plumbing through the Runnable interface and LCEL pipe composition.
  • Tools and agents implement function-calling loops; LangGraph adds explicit, testable control flow.
  • Memory and retrievers integrate as Runnables but still need sound RAG and context-window discipline.
  • LangSmith closes the loop with traces, datasets, and evals tied to prompt versions.
  • Choose LangChain for composable multi-step apps; prefer thinner stacks when requirements stay simple.

Related reading