Intermediate14 min

Pre/Post Model Hooks

The right place to intercept every LLM call is not inside your nodes. LangGraph's pre_model_hook and post_model_hook, and LangChain 1.0's AgentMiddleware, give you a composable layer for context trimming, guardrails, cost tracking, and output validation — without polluting business logic.

Quick Reference

→pre_model_hook runs before every LLM call — trim context, validate input, inject dynamic prompts
→post_model_hook runs after every LLM call — track tokens, validate output, audit log
→SummarizationMiddleware replaces manual trimming — configure trigger=("tokens", N) and it handles message-pair integrity
→Hooks are a latency multiplier: every hook adds time to every LLM call — keep logic under 5ms, DB queries belong in nodes
→AgentMiddleware composes: before_model runs forward through the list, after_model runs in reverse
→Exceptions in hooks abort the agent.invoke() call — always catch anticipated failures explicitly
→create_react_agent hooks (v2) are LangGraph-native; AgentMiddleware on create_agent is the LangChain 1.0 future

When (Not) to Intercept Model Calls

Before writing a hook, answer one question: does this concern need to run before or after every LLM call in this agent? If yes — it's cross-cutting, and a hook is the right place. If no — if it's conditional, depends on specific state, or routes between nodes — it belongs in a dedicated node or edge. Hooks that grow beyond lightweight cross-cutting concerns become invisible complexity: they run on every call, they compose in ways that aren't obvious from reading the graph, and they fail in ways the graph can't retry.

choose before you write a line of hook code

The three-line test for hooks

A hook earns its place if: (1) it should run on every LLM call in this agent, (2) it doesn't need its own retry or error-handling path, and (3) it completes in under 5ms. Fail any of those three, and the logic belongs in a node.

Good candidates: trimming messages to fit the context window, injecting a current timestamp into the system prompt, logging token counts, redacting PII from inputs before they reach the model. Bad candidates: checking a database to decide which tool to enable, calling an external API for rate-limiting, running a slow embedding call to retrieve context. Those last three need their own nodes — with explicit retry policies, error branches, and observable state.

Three Generations: Callbacks → Hooks → Middleware

There are three distinct APIs for intercepting model calls in the LangChain/LangGraph stack, and most articles online teach generation 1. Using the wrong generation means deprecated warnings today and broken code when LangGraph 0.x is retired at end of 2026.

Pre-Model Patterns

Three pre-model patterns cover 90% of production cases: context management (trimming or summarizing history before it overflows the model's context window), dynamic system prompt injection (adding runtime context the node shouldn't hardcode), and input guardrails (rejecting or sanitizing inputs before they reach the model).

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.