★ OverviewIntermediate15 min
Agent Memory Systems
When to add memory to your agent, how the two-layer architecture works, what it costs in tokens and money, and the six ways it fails silently in production.
Quick Reference
- →Most agents only need short-term memory (conversation buffer) + token trimming — add long-term memory only when you have evidence users need cross-session context
- →Short-term memory = conversation history in the graph state (messages list), lost when the thread ends
- →Long-term memory = persisted key-value store (LangGraph Store API) that survives across threads
- →Episodic memory = conversation summaries stored as searchable documents for relevant past context
- →Use InMemorySaver for dev checkpointing, PostgresSaver for production; use InMemoryStore for dev, AsyncPostgresStore for production
- →Memory injection overhead: each injected memory adds ~80 tokens; 10 memories = ~800 extra input tokens per request
- →Token management: use trim_messages() for budget enforcement, RemoveMessage for surgical removal of large tool outputs
- →Memory fails silently — conflicts, staleness, and retrieval noise produce wrong answers, not exceptions
Do You Need Agent Memory?
Start with LangGraph if building new — use RWMH for existing LCEL chains
Most production agents don't need long-term memory. Before building the infrastructure, map your use case to the right level:
| Memory level | When to use | Infrastructure cost | Real limitation |
|---|---|---|---|
| None | Stateless tools, one-shot queries, CI pipelines | Zero | Agent forgets everything between invocations |
| Short-term only | Chatbots, support agents, coding assistants | Checkpointer (SQLite or Postgres) | Context window fills up on long conversations |
| Short + long-term | Personal assistants, sales agents, customer service | Checkpointer + Store (Postgres + pgvector) | Stale memories degrade answers; requires eviction policy |
| Short + long + episodic | Coaching agents, relationship-aware assistants, recurring SaaS users | Checkpointer + Store + vector search + summarization LLM calls | Expensive at scale; retrieval noise; complex to evaluate |
Start with the simplest level that solves the problem
If users interact with your agent once and leave, long-term memory has no beneficiary. If users return but each conversation is self-contained (e.g., 'help me debug this function'), short-term memory with token trimming is all you need. Add long-term memory only when you can measure that users are hurt by the agent not remembering them across sessions.