Middleware (v1.0+)

Middleware are hooks that intercept every model and tool call in your agent — without touching your agent's core logic. This article teaches when to use middleware vs. callbacks or graph nodes, how execution order works, and how to stack middleware for production agents.

Quick Reference

→from langchain.agents.middleware import AgentMiddleware, before_model, after_model, wrap_model_call
→Middleware runs before/after every model or tool call inside create_agent
→Node-style (before/after): returns state diffs, runs sequentially
→Wrap-style (wrap_model_call): controls the call itself — retry, cache, route
→before_* hooks run in list order; after_* run in reverse; wrap_* nest
→Use middleware for cross-cutting concerns — use graph nodes for business logic
→16+ built-in middleware: SummarizationMiddleware, PIIMiddleware, ModelFallbackMiddleware, and more

When to Use Middleware (and When Not To)

Middleware is the right tool when a concern applies to every model or tool call and has no business logic — logging, PII redaction, token-window management, rate limiting, fallback chains. When you need to branch on domain rules, build routing logic, or make decisions that depend on your application's state, use a graph node or custom LangGraph pre/post hook instead.

middleware for cross-cutting concerns — graph nodes for business logic — callbacks for observability-only

Scenario	Right tool	Why
Log token usage on every call	Middleware (@after_model)	Cross-cutting, no business logic
Redact PII before the model sees it	Middleware (PIIMiddleware or @before_model)	Applies to every call, stateless
Prevent context window overflow	Middleware (SummarizationMiddleware)	Transparent to agent logic
Retry on rate-limit errors	Middleware (ModelRetryMiddleware)	Infrastructure concern
Route to a support sub-agent if intent = 'billing'	Graph node / router	Business logic, domain-specific
Validate that output matches schema before returning to user	Graph node + conditional edge	Decision with branching
Trace a specific chain step in LangSmith	Callback handler	Observability only, no state change

Middleware runs on every model call — not just once

A single agent.invoke() may trigger 5–10 model calls in a multi-step task. Any middleware you stack multiplies in execution. Put heavy computation outside the middleware layer (e.g., pre-compute embeddings before the agent starts). Lightweight hooks — token counting, string replacement — are fine.

The Middleware Lifecycle

LangChain agents expose six hook points. Four are node-style (before/after) — they return state diffs. Two are wrap-style (wrap_model_call, wrap_tool_call) — they surround the actual call and give you control over whether it runs, how many times, and with what arguments.

Node-Style vs Wrap-Style

The choice between node-style and wrap-style hooks is the most important design decision when writing middleware. Node-style hooks are simpler: they receive the current state and return a dict (or None) to merge into it. Wrap-style hooks receive the request and a handler callable — they decide whether to call handler() at all, which makes retry and caching possible.

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.