Agent Architecture/System Design
Advanced10 min

Context Window Management

Token budgets, message trimming, compression strategies, and sliding window patterns to keep your agent within context limits.

Quick Reference

  • Use trim_messages() to automatically truncate conversation history to a target token count while preserving the system message
  • RemoveMessage allows surgical deletion of specific messages from state — useful for dropping tool results after they are processed
  • Sliding window pattern: keep the last N messages plus a running summary of older context for long conversations
  • Set explicit token budgets per node — allocate context window space between system prompt, memory, tools, and conversation
  • Compression strategies: summarize tool outputs inline, collapse repetitive exchanges, and extract key facts into structured state

Token Budget Allocation

Context Window — 200K tokensSystem8%16K tokensRetrieved Docs35%70K tokensConversation History30%60K tokensAvailable for Response27%54K tokensDanger ZoneWhen available space shrinks, the model truncates or loses context

Budget your context window: leave room for the model to reason and respond

ComponentBudgetExample (200K window)
System prompt5-10%10K-20K tokens
Tool descriptions5-10%10K-20K tokens
Long-term memory5-10%10K-20K tokens
Conversation history50-60%100K-120K tokens
Output budget10-20%20K-40K tokens
Safety margin10%20K tokens
Tool results are the biggest budget risk

A single search tool result can return 5,000+ tokens. Three search calls in a conversation consume 15K tokens just for tool results. Always truncate or summarize tool outputs before appending to state.