LangChain/Memory & Middleware
Intermediate10 min

Middleware (v1.0+)

Middleware are hooks that intercept every model and tool call in your agent — without touching your agent's core logic. This article teaches when to use middleware vs. callbacks or graph nodes, how execution order works, and how to stack middleware for production agents.

Quick Reference

  • from langchain.agents.middleware import AgentMiddleware, before_model, after_model, wrap_model_call
  • Middleware runs before/after every model or tool call inside create_agent
  • Node-style (before/after): returns state diffs, runs sequentially
  • Wrap-style (wrap_model_call): controls the call itself — retry, cache, route
  • before_* hooks run in list order; after_* run in reverse; wrap_* nest
  • Use middleware for cross-cutting concerns — use graph nodes for business logic
  • 16+ built-in middleware: SummarizationMiddleware, PIIMiddleware, ModelFallbackMiddleware, and more

When to Use Middleware (and When Not To)

Middleware is the right tool when a concern applies to every model or tool call and has no business logic — logging, PII redaction, token-window management, rate limiting, fallback chains. When you need to branch on domain rules, build routing logic, or make decisions that depend on your application's state, use a graph node or custom LangGraph pre/post hook instead.

yesnoyesnoyesnoCross-cutting concern?logging, PII, retries, rate-limitingUse Middlewareruns on every model/tool callBusiness logic?routing, branching, domain rulesUse Graph NodeLangGraph custom nodeObservability only?metrics, tracing, no state changeUse CallbacksLangChain callback handlersUse LangGraph Hookspre/post node hooks

middleware for cross-cutting concerns — graph nodes for business logic — callbacks for observability-only

ScenarioRight toolWhy
Log token usage on every callMiddleware (@after_model)Cross-cutting, no business logic
Redact PII before the model sees itMiddleware (PIIMiddleware or @before_model)Applies to every call, stateless
Prevent context window overflowMiddleware (SummarizationMiddleware)Transparent to agent logic
Retry on rate-limit errorsMiddleware (ModelRetryMiddleware)Infrastructure concern
Route to a support sub-agent if intent = 'billing'Graph node / routerBusiness logic, domain-specific
Validate that output matches schema before returning to userGraph node + conditional edgeDecision with branching
Trace a specific chain step in LangSmithCallback handlerObservability only, no state change
Middleware runs on every model call — not just once

A single agent.invoke() may trigger 5–10 model calls in a multi-step task. Any middleware you stack multiplies in execution. Put heavy computation outside the middleware layer (e.g., pre-compute embeddings before the agent starts). Lightweight hooks — token counting, string replacement — are fine.