★ OverviewIntermediate15 min

Agent Memory Systems

When to add memory to your agent, how the two-layer architecture works, what it costs in tokens and money, and the six ways it fails silently in production.

Quick Reference

→Most agents only need short-term memory (conversation buffer) + token trimming — add long-term memory only when you have evidence users need cross-session context
→Short-term memory = conversation history in the graph state (messages list), lost when the thread ends
→Long-term memory = persisted key-value store (LangGraph Store API) that survives across threads
→Episodic memory = conversation summaries stored as searchable documents for relevant past context
→Use InMemorySaver for dev checkpointing, PostgresSaver for production; use InMemoryStore for dev, AsyncPostgresStore for production
→Memory injection overhead: each injected memory adds ~80 tokens; 10 memories = ~800 extra input tokens per request
→Token management: use trim_messages() for budget enforcement, RemoveMessage for surgical removal of large tool outputs
→Memory fails silently — conflicts, staleness, and retrieval noise produce wrong answers, not exceptions

Do You Need Agent Memory?

Start with LangGraph if building new — use RWMH for existing LCEL chains

Most production agents don't need long-term memory. Before building the infrastructure, map your use case to the right level:

Memory level	When to use	Infrastructure cost	Real limitation
None	Stateless tools, one-shot queries, CI pipelines	Zero	Agent forgets everything between invocations
Short-term only	Chatbots, support agents, coding assistants	Checkpointer (SQLite or Postgres)	Context window fills up on long conversations
Short + long-term	Personal assistants, sales agents, customer service	Checkpointer + Store (Postgres + pgvector)	Stale memories degrade answers; requires eviction policy
Short + long + episodic	Coaching agents, relationship-aware assistants, recurring SaaS users	Checkpointer + Store + vector search + summarization LLM calls	Expensive at scale; retrieval noise; complex to evaluate

Start with the simplest level that solves the problem

If users interact with your agent once and leave, long-term memory has no beneficiary. If users return but each conversation is self-contained (e.g., 'help me debug this function'), short-term memory with token trimming is all you need. Add long-term memory only when you can measure that users are hurt by the agent not remembering them across sessions.

How Memory Works: Checkpointer + Store

Checkpointer = thread-scoped auto-save · Store = cross-thread shared memory

Short-Term Memory: Conversation State

Short-term memory is the conversation history — the messages list in the graph state, persisted turn-to-turn by the checkpointer. MessagesState manages it automatically; the checkpointer serializes and restores the full state across requests:

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.