LangChain/Memory & Middleware
Intermediate14 min

Managing Message History

Message history is your biggest uncontrolled cost in production agents. This article covers the decision between transient and persistent trimming, when summarization beats deletion, and the four failure modes that produce wrong answers without throwing exceptions.

Quick Reference

  • wrap_model_call — transient trim: model sees less, checkpointer is unchanged
  • @before_model + RemoveMessage(REMOVE_ALL_MESSAGES) — persistent trim: state is permanently rewritten
  • RemoveMessage(id=m.id) — surgical delete of a specific message by ID
  • SummarizationMiddleware(model, trigger={'tokens': N}, keep={'messages': N}) — auto-compress old turns
  • trim_messages(max_tokens, strategy='last', start_on='human') — LCEL chain utility
  • Always delete tool-call messages and their ToolMessage results together
  • Log token count per turn — silent truncation produces wrong answers, not exceptions

What Message History Actually Costs You

A typical agent turn — user message + tool calls + tool results + AI reply — runs 600–1,200 tokens. At 50 turns, that's 30,000–60,000 tokens per call. With claude-opus-4-7 at $15/$75 per million tokens in/out, a 50-turn conversation costs roughly $0.54–$1.08 per call — before any actual work. At 100 turns, double that. Most agent bugs in production aren't logic errors; they're token budget failures that show up as degraded reasoning or silent truncation.

TurnsTokens (low est.)Tokens (high est.)Cost @ claude-opus-4-7
106,00012,000$0.09–$0.18
3018,00036,000$0.27–$0.54
5030,00060,000$0.54–$1.08
10060,000120,000$1.08–$2.16

Assumptions: 600–1,200 tokens/turn average. claude-opus-4-7 input at $15/M, output at $75/M, estimated 90% input / 10% output split. These are floor estimates — tool-heavy agents with large tool results can exceed 3,000 tokens/turn easily.

Context Window Token Budget (200K)System PromptfixedConversation Historygrows over timeTool ResultsvariableAvailable SpaceshrinksAfter Trimming / SummarizationSystem PromptSummarized HistorycompressedTool ResultsAvailable Spacemore room!Trimming StrategiesSliding windowdrop oldest messagesSummarizationcompress history via LLMSelective pruningkeep high-value turnssummarize

Budget your context window: system prompt is fixed, history grows, available space shrinks