Advanced20 min

LangMem SDK & Store API

LangMem is an LLM-powered extraction layer that automatically identifies and persists structured facts from conversations. This article covers when to use it (and when not to), all three APIs with correct signatures, cost analysis, memory quality evaluation, failure modes, and GDPR deletion.

Quick Reference

→LangMem uses a full LLM call to extract structured memories — every invocation costs money and adds latency
→Three API tiers: create_memory_manager (stateless), create_memory_store_manager (stateful, recommended), and memory tools (agent-driven)
→Model strings require provider prefix: "anthropic:claude-sonnet-4-6", not "claude-sonnet-4-6"
→The schemas parameter accepts Pydantic models to enforce typed extraction (UserPreference, UserFact, etc.)
→create_memory_store_manager handles search-extract-persist in one call — the recommended default for production
→Memory tools (create_manage_memory_tool + create_search_memory_tool) let the agent decide what to remember and when to search
→Each extraction with Sonnet: ~$0.009 for 10-msg convos, ~$0.044 for 50-msg convos — use Haiku for 15× cheaper extraction
→Always run extraction as a background task — never block the user-facing response path on a memory LLM call

Should You Use LangMem?

LangMem is a library that uses an LLM call to extract structured memories from conversations. Every extraction is an LLM call. Before adopting it, answer two questions: (1) Is memory extraction complex enough to warrant an LLM, or can you write a 20-line deterministic parser? (2) Can you afford an extra LLM call per conversation turn or per conversation end?

choose based on how much lifecycle management you want LangMem to own

Use LangMem when...	Skip LangMem when...
You need to extract nuanced facts requiring language understanding ("I prefer responses that feel collegial but precise")	Your memory needs are 3-5 simple preference fields (language, timezone) — a deterministic parser is cheaper
You want automatic conflict resolution across extraction calls (enable_updates=True)	You need sub-second memory updates — LangMem extraction is 3-60 seconds
You use LangGraph and want native BaseStore integration	You are framework-agnostic and don't want the LangChain dependency
You need structured memory schemas (Pydantic models) to enforce extraction types	Your daily conversation volume makes per-conversation LLM calls cost-prohibitive at the cheapest model
You want agent-driven memory where the agent decides what to remember (tools API)	You already have a working extraction prompt and just need store.put() calls

LangMem is an LLM call, not a database call

Each create_memory_manager invocation sends your conversation + existing memories to an LLM and waits for structured output. On longer conversations, reported p95 latency can reach 60 seconds. Budget accordingly and always run extraction off the user-facing path.

Real project

A team building a customer support agent initially used LangMem to extract all user facts. After 2 weeks of LangSmith traces, they found 80% of memory writes were simple preference updates (language, timezone) that a deterministic parser handled in <10ms. They kept LangMem only for the remaining 20% — complex facts requiring LLM reasoning — and cut their memory extraction costs by 4×.

Learn this in → cost-analysis

The Three LangMem APIs

LangMem provides three APIs at increasing levels of abstraction. They all use the same underlying LLM-based extraction, but differ in what they manage for you.

Structured Memory with Pydantic Schemas

By default, LangMem extracts unstructured Memory objects (a content string). For production, define Pydantic models to enforce structure on extracted memories. The schemas parameter tells the memory manager what types of memories to extract and how to format them.

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.