LangChain
v1.2The developer interface for building with LLMs. One API for every model, composable chains, tools, memory, and structured output.
LangChain is an open-source framework that gives you a unified API over every LLM provider, a composable pipeline system (LCEL), and a middleware layer for agents. This article explains what it is, when it helps, when to skip it, and how the ecosystem fits together.
LangChain v1 replaced hooks with middleware, InjectedState with ToolRuntime, and create_react_agent with create_agent. This guide covers the migration order, what breaks silently, and how to test each step — not just what changed.
Three layers of the same stack — not competing frameworks. Here is when each layer earns its place, what it costs you, and how to migrate down when your requirements outgrow it.
How to choose a provider, wire it up, configure it, and handle the failures that happen in production. LangChain's ChatModel interface gives you one API for all providers — but the decisions around which provider, what configuration, and how to recover from errors are yours to make.
A field-by-field reference for every message class in LangChain — HumanMessage, AIMessage, SystemMessage, ToolMessage, AIMessageChunk, RemoveMessage, and legacy types. Know what each field does, what breaks when you get it wrong, and how providers differ.
Messages are LangChain's transport layer — every model call is a list of typed messages in, one message out. This article covers what each message type carries, how content blocks handle multimodal I/O, and how to manage message growth before it blows your context budget.
LangChain v1 normalizes every provider's output into standard content_blocks — one API for text, reasoning, citations, tool calls, and images across Anthropic, OpenAI, and Google.
LCEL composes Runnables into chains with |. Understand when to use it, how streaming actually works through each step, and the type contracts that break in production.
RunnableParallel, RunnableBranch, RunnableLambda, and RunnableConfig form the production toolkit for LCEL pipelines. Know when each earns its place, how they fail, and how to compose them without regrets at 3am.
LangChain's callback system hooks into every stage of chain execution — but most teams reach for it when LangSmith, astream_events, or @traceable would serve them better. This article teaches you which mechanism to reach for, how to write production-grade handlers, and how callbacks fail in ways that bring down your whole chain.
The six parameters that separate a production LLM call from a fragile prototype: temperature, max_tokens, timeout, max_retries, rate limiting, and usage tracking. Each has a failure mode that won't surface until you're in production.
batch() parallelizes LLM calls client-side — all requests fire concurrently, results return together. When you need 50%+ cost savings and can tolerate ~1h latency, use your provider's async batch API instead. This article shows you which to pick, how to handle partial failures, and what a production pipeline looks like.
init_chat_model() lets you define a chain once and swap the underlying model at runtime via config — no redeploy needed. This article covers when that's worth the complexity, how config resolution actually works, what configurable_fields='any' silently exposes to attackers, and how to gate cost before your bill 50× overnight.
Pass images, audio, PDFs, and video to multimodal models using LangChain's standard content blocks. LangChain v1 introduced a provider-agnostic format that works across GPT-4o, Claude, and Gemini — this article covers both the old provider-native format and the new standard, a capability matrix across providers, and a production router that sends each modality to the right model. For deciding when multimodal is the right tool and what it costs, see Multimodal Models.
Reasoning models (Claude adaptive thinking, OpenAI o3/o4-mini, Gemini 3 Deep Think) spend internal tokens on a scratchpad before producing the final answer. They cost 5-10x more per call than standard models and add latency. This article covers when that tradeoff is worth it, how each provider's API works in 2026, and how LangChain gives you a single parsing interface across all of them.
Server-side tools execute on the provider's infrastructure — your code binds them like local tools, but the provider runs them, bills per call, and injects results directly into the model's context. Knowing the cost model, result block types, and budget controls is what separates a demo from production.
Running models locally with Ollama solves three real problems: data that can't leave your machine, usage patterns that make cloud APIs expensive at scale, and offline operation. This article walks through the decision, the current model landscape, hardware requirements, and the LangChain integration — including reasoning models and structured output.
Tool calling follows a three-step cycle — invoke the model, execute its tool calls, pass results back as ToolMessages, repeat. Understanding this loop is the foundation of every tool-using agent, but shipping it to production requires error handling, token budget awareness, and a clear decision on when to use a manual loop versus create_agent.
Every production LLM bug traces back to a prompt decision made without thinking about tokens, injection, or testability. This article covers when to use prompt templates, how to budget few-shot examples, how to defend against injection, and how to build a prompt you can actually evaluate.
How to choose between function_calling, json_schema, and native structured output. Schema design, validation layers, failure modes, and the cost math for each strategy.
LangChain has four ways to get structured output — and three of them are the wrong choice most of the time. This article maps the decision: when to use with_structured_output(), when JsonOutputParser is still the right tool, and what to do with the legacy parsers you inherited.
Tool calling is how LLMs act on the world. This article covers the full stack: how LangChain converts your functions to JSON schemas, how the model decides which tool to call and when, the complete bind_tools → tool_calls → ToolMessage cycle, production patterns like InjectedToolArg and tool artifacts, cost math for tool-heavy agents, and the failure modes that will hit you in production.
The extras attribute gives you access to provider-specific capabilities — extended thinking, prompt caching, strict schemas — without breaking your portable LangChain code. This article covers when to use extras, what they actually cost, and what breaks when you do.
Static tool registries break in multi-tenant agents — an admin tool visible to a free user is an auth bug. Learn when dynamic filtering is worth the middleware complexity, which of the four strategies to pick, and how LangChain's built-in LLMToolSelectorMiddleware handles the hardest case automatically.
LangChain gives you three mechanisms for tool error handling: ToolNode's built-in handle_tool_errors for LangGraph workflows, ToolException for per-tool control, and @wrap_tool_call middleware for cross-cutting production concerns. Knowing which to use — and how to write error messages the model can act on — is what separates a demo from a production agent.
How tool names, descriptions, schemas, and examples influence model selection accuracy. Covers the full production toolkit: namespacing, tool use examples, schema design, error surfaces, scaling with Tool RAG, token cost math, and a concrete eval methodology.
Parallel tool calling lets a model request multiple independent tools in one response instead of one at a time. This article covers when it saves you tokens and latency, when it causes race conditions, how to configure it across providers, and what production failure looks like.
ToolNode is the prebuilt LangGraph node that handles tool execution — parallelism, error routing, and ToolMessage creation included. ToolRuntime is the parameter that gives any tool inside that node access to the agent's state, context, and store without those values leaking into the schema the model sees.
RunnableWithMessageHistory wraps any LCEL chain with per-session conversation history — but before you use it, you need to know when LangGraph is the better choice, what unbounded history costs you per request, and how to defend against the two failure modes that kill most production chatbots.
Message history is your biggest uncontrolled cost in production agents. This article covers the decision between transient and persistent trimming, when summarization beats deletion, and the four failure modes that produce wrong answers without throwing exceptions.
BaseChatMessageHistory is the interface every storage backend implements, but picking the wrong backend — or ignoring LangGraph checkpointers entirely — will cost you in production. This article covers how to decide, configure, and operate each option.
Middleware are hooks that intercept every model and tool call in your agent — without touching your agent's core logic. This article teaches when to use middleware vs. callbacks or graph nodes, how execution order works, and how to stack middleware for production agents.
LangChain ships 14 production-ready middleware classes and Deep Agents adds 2 more. This article is organized around decisions: which ones your agent needs, how to order them, and what breaks when you get it wrong.
Build production middleware that intercepts model calls, gates tool execution, injects dynamic context, and writes state — using node-style hooks for sequential logic and wrap-style hooks when you need control over whether and how many times an operation runs.
create_agent compiles a full agent runtime on LangGraph. Give it a model and tools — it handles the reasoning loop, tool dispatch, middleware, checkpointing, and stopping conditions. This article covers when to use it, what each parameter does, how the loop costs money, and how it fails.
System prompts anchor your agent's behavior across every invocation — but only add one when it earns its tokens. This article covers when to omit, how to structure for production, how to cache large prompts at ~90% cost reduction, and where dynamic prompts open injection vectors you must close.
Production agents carry more than messages. Learn when to extend AgentState, how to design schemas that don't blow up your token budget, and how to avoid the serialization and state-explosion bugs that only show up after you deploy.
Route agent turns to cheaper models when the task is simple and powerful models when it's complex — using @wrap_model_call to intercept every LLM request and swap the model based on conversation state, user tier, or cost targets. This article starts with whether you should route at all, walks through real cost math, covers the five ways routing silently fails in production, and ends with a 30-day rollout runbook.
Decide whether to stream, pick the right mode for your UI, ship it over HTTP with async streaming, and handle the failures that only appear in production. All patterns use create_agent with version='v2'.
Before wiring up ProviderStrategy or ToolStrategy, you need to know when structured output will hurt you — streaming breaks, retries compound cost, and over-constrained schemas hallucinate values. This article covers the decision, the cost math, two failure modes most tutorials skip, and the schema patterns that cut retry rates.
How to get text out of PDFs, web pages, Notion, and 200+ other sources — and into your RAG pipeline. Covers loader selection, memory-safe loading, metadata strategy, failure modes, and the production pipeline pattern.
LangChain's text splitter API: when to split, which splitter to choose, token-based production splitting, metadata propagation, and the three failure modes that destroy RAG quality.
Embedding models convert text into vectors for semantic search and RAG. This article covers the 2026 model landscape, cost math at scale, production patterns, and the hidden traps — especially the re-embedding trap when you switch models.
How to choose, configure, and operate a vector store for production RAG — covering index types, cost math, failure modes, multi-tenancy, and migration strategy across FAISS, Chroma, pgvector, Qdrant, and Pinecone.
Retrievers wrap vector stores in LangChain's Runnable interface — but choosing the wrong one costs latency and money. Decision framework, cost math, and evaluation code for MultiQuery, Parent, SelfQuery, Compression, Ensemble, and custom retrievers.
Context engineering is the discipline of curating the smallest set of high-signal tokens that maximize the probability of a good outcome. In LangChain, this means deciding what goes into every model call, how tools read and write state, and what happens between steps — using State, Store, and Runtime Context as your three levers.
When to add guardrails, how to architect a cost-aware stack across all five middleware hooks, and how to know they work. Covers before_agent input filters, wrap_tool_call for tool-level security, after_agent output safety, false positive management, and guardrail evaluation.
LangChain's Runtime object is a dependency injection system for tools and middleware. Instead of reaching for globals or thread-locals, you pass per-invocation config (user ID, tenant, feature flags) through context_schema and read it anywhere via runtime.context — without exposing it to the model.
When to use MCP vs direct tools, multi-server orchestration, interceptor composition for production, failure handling, and testing patterns for LangChain agents.
ToolRuntime bundles state, store, context, and streaming into a single typed parameter for tools. This article is about when to use it, which data goes where, and what breaks in production when you choose wrong.