Agent Architecture/Agent Memory
★ OverviewIntermediate15 min

Agent Memory Systems

When to add memory to your agent, how the two-layer architecture works, what it costs in tokens and money, and the six ways it fails silently in production.

Quick Reference

  • Most agents only need short-term memory (conversation buffer) + token trimming — add long-term memory only when you have evidence users need cross-session context
  • Short-term memory = conversation history in the graph state (messages list), lost when the thread ends
  • Long-term memory = persisted key-value store (LangGraph Store API) that survives across threads
  • Episodic memory = conversation summaries stored as searchable documents for relevant past context
  • Use InMemorySaver for dev checkpointing, PostgresSaver for production; use InMemoryStore for dev, AsyncPostgresStore for production
  • Memory injection overhead: each injected memory adds ~80 tokens; 10 memories = ~800 extra input tokens per request
  • Token management: use trim_messages() for budget enforcement, RemoveMessage for surgical removal of large tool outputs
  • Memory fails silently — conflicts, staleness, and retrieval noise produce wrong answers, not exceptions

Do You Need Agent Memory?

Need conversationmemory?NoStatelessYesUsing LangGraphor new project?YesLangGraphcheckpointerNo / LCEL chainLong conversationsor token budget?NoRWMHsimple historyYesRWMH + trim_messages / summary

Start with LangGraph if building new — use RWMH for existing LCEL chains

Most production agents don't need long-term memory. Before building the infrastructure, map your use case to the right level:

Memory levelWhen to useInfrastructure costReal limitation
NoneStateless tools, one-shot queries, CI pipelinesZeroAgent forgets everything between invocations
Short-term onlyChatbots, support agents, coding assistantsCheckpointer (SQLite or Postgres)Context window fills up on long conversations
Short + long-termPersonal assistants, sales agents, customer serviceCheckpointer + Store (Postgres + pgvector)Stale memories degrade answers; requires eviction policy
Short + long + episodicCoaching agents, relationship-aware assistants, recurring SaaS usersCheckpointer + Store + vector search + summarization LLM callsExpensive at scale; retrieval noise; complex to evaluate
Start with the simplest level that solves the problem

If users interact with your agent once and leave, long-term memory has no beneficiary. If users return but each conversation is self-contained (e.g., 'help me debug this function'), short-term memory with token trimming is all you need. Add long-term memory only when you can measure that users are hurt by the agent not remembering them across sessions.