Middleware (v1.0+)
Middleware are hooks that intercept every model and tool call in your agent — without touching your agent's core logic. This article teaches when to use middleware vs. callbacks or graph nodes, how execution order works, and how to stack middleware for production agents.
Quick Reference
- →from langchain.agents.middleware import AgentMiddleware, before_model, after_model, wrap_model_call
- →Middleware runs before/after every model or tool call inside create_agent
- →Node-style (before/after): returns state diffs, runs sequentially
- →Wrap-style (wrap_model_call): controls the call itself — retry, cache, route
- →before_* hooks run in list order; after_* run in reverse; wrap_* nest
- →Use middleware for cross-cutting concerns — use graph nodes for business logic
- →16+ built-in middleware: SummarizationMiddleware, PIIMiddleware, ModelFallbackMiddleware, and more
When to Use Middleware (and When Not To)
Middleware is the right tool when a concern applies to every model or tool call and has no business logic — logging, PII redaction, token-window management, rate limiting, fallback chains. When you need to branch on domain rules, build routing logic, or make decisions that depend on your application's state, use a graph node or custom LangGraph pre/post hook instead.
middleware for cross-cutting concerns — graph nodes for business logic — callbacks for observability-only
| Scenario | Right tool | Why |
|---|---|---|
| Log token usage on every call | Middleware (@after_model) | Cross-cutting, no business logic |
| Redact PII before the model sees it | Middleware (PIIMiddleware or @before_model) | Applies to every call, stateless |
| Prevent context window overflow | Middleware (SummarizationMiddleware) | Transparent to agent logic |
| Retry on rate-limit errors | Middleware (ModelRetryMiddleware) | Infrastructure concern |
| Route to a support sub-agent if intent = 'billing' | Graph node / router | Business logic, domain-specific |
| Validate that output matches schema before returning to user | Graph node + conditional edge | Decision with branching |
| Trace a specific chain step in LangSmith | Callback handler | Observability only, no state change |
A single agent.invoke() may trigger 5–10 model calls in a multi-step task. Any middleware you stack multiplies in execution. Put heavy computation outside the middleware layer (e.g., pre-compute embeddings before the agent starts). Lightweight hooks — token counting, string replacement — are fine.