LangGraph/Persistence
Intermediate14 min

Persistence: Never Lose State

LangGraph checkpointers auto-save state after every superstep — enabling resume after crashes, human-in-the-loop pauses, and time travel. This article explains when to use persistence, which backend to choose, the write-amplification trap that breaks production, and how to monitor it before it pages you.

Quick Reference

  • Checkpointers auto-save full state after every superstep — no manual save calls
  • InMemorySaver for dev/tests only — lost on restart; PostgresSaver for production
  • graph.compile(checkpointer=saver) + thread_id in config = full persistence in two lines
  • Durability modes: exit (fast, 1 checkpoint), async (default-ish), sync (most durable)
  • PostgresSaver INSERTs full state every superstep — 50 steps = 50 rows, not 1
  • Checkpointers are thread-scoped; use BaseStore for cross-thread / cross-session memory
  • EncryptedSerializer wraps any checkpointer — set LANGGRAPH_AES_KEY to enable AES encryption

When You Don't Need Persistence

Not every graph needs a checkpointer. Persistence adds write latency, storage cost, and operational overhead. Skip it when the graph is stateless (each invocation is independent), idempotent and fast (retrying from scratch is cheaper than checkpointing), or purely synchronous with no possibility of interruption. Adding a checkpointer to a simple one-shot classification graph adds database round trips on every node transition — you pay that cost every call. Add persistence when you need at least one of: resume after crash, human-in-the-loop interrupts, time travel debugging, or conversation threads that survive restarts.

NeedUse persistence?
Resume after OOM / deploy restartYes
Human review / approval step mid-graphYes
Time travel debugging (replay from step N)Yes
Multi-turn conversation threads across restartsYes
Single-shot classification or extraction (no retry needed)No
Stateless transform pipeline (each call independent)No
Sub-second latency budget, no interruption possibleNo