Pre/Post Model Hooks
The right place to intercept every LLM call is not inside your nodes. LangGraph's pre_model_hook and post_model_hook, and LangChain 1.0's AgentMiddleware, give you a composable layer for context trimming, guardrails, cost tracking, and output validation — without polluting business logic.
Quick Reference
- →pre_model_hook runs before every LLM call — trim context, validate input, inject dynamic prompts
- →post_model_hook runs after every LLM call — track tokens, validate output, audit log
- →SummarizationMiddleware replaces manual trimming — configure trigger=("tokens", N) and it handles message-pair integrity
- →Hooks are a latency multiplier: every hook adds time to every LLM call — keep logic under 5ms, DB queries belong in nodes
- →AgentMiddleware composes: before_model runs forward through the list, after_model runs in reverse
- →Exceptions in hooks abort the agent.invoke() call — always catch anticipated failures explicitly
- →create_react_agent hooks (v2) are LangGraph-native; AgentMiddleware on create_agent is the LangChain 1.0 future
When (Not) to Intercept Model Calls
Before writing a hook, answer one question: does this concern need to run before or after every LLM call in this agent? If yes — it's cross-cutting, and a hook is the right place. If no — if it's conditional, depends on specific state, or routes between nodes — it belongs in a dedicated node or edge. Hooks that grow beyond lightweight cross-cutting concerns become invisible complexity: they run on every call, they compose in ways that aren't obvious from reading the graph, and they fail in ways the graph can't retry.
choose before you write a line of hook code
A hook earns its place if: (1) it should run on every LLM call in this agent, (2) it doesn't need its own retry or error-handling path, and (3) it completes in under 5ms. Fail any of those three, and the logic belongs in a node.
Good candidates: trimming messages to fit the context window, injecting a current timestamp into the system prompt, logging token counts, redacting PII from inputs before they reach the model. Bad candidates: checking a database to decide which tool to enable, calling an external API for rate-limiting, running a slow embedding call to retrieve context. Those last three need their own nodes — with explicit retry policies, error branches, and observable state.