Advanced10 min
Context Window Management
Token budgets, message trimming, compression strategies, and sliding window patterns to keep your agent within context limits.
Quick Reference
- →Use trim_messages() to automatically truncate conversation history to a target token count while preserving the system message
- →RemoveMessage allows surgical deletion of specific messages from state — useful for dropping tool results after they are processed
- →Sliding window pattern: keep the last N messages plus a running summary of older context for long conversations
- →Set explicit token budgets per node — allocate context window space between system prompt, memory, tools, and conversation
- →Compression strategies: summarize tool outputs inline, collapse repetitive exchanges, and extract key facts into structured state
Token Budget Allocation
Budget your context window: leave room for the model to reason and respond
| Component | Budget | Example (200K window) |
|---|---|---|
| System prompt | 5-10% | 10K-20K tokens |
| Tool descriptions | 5-10% | 10K-20K tokens |
| Long-term memory | 5-10% | 10K-20K tokens |
| Conversation history | 50-60% | 100K-120K tokens |
| Output budget | 10-20% | 20K-40K tokens |
| Safety margin | 10% | 20K tokens |
Tool results are the biggest budget risk
A single search tool result can return 5,000+ tokens. Three search calls in a conversation consume 15K tokens just for tool results. Always truncate or summarize tool outputs before appending to state.