Durable Execution
Auto-persistence of execution state, crash recovery, resumability across server restarts. How LangGraph guarantees exactly-once execution semantics.
Quick Reference
- →Durable execution: every super-step is checkpointed so the graph can resume after a crash
- →Exactly-once semantics: completed nodes are not re-executed on recovery — only the interrupted node reruns
- →Crash recovery: restart the server, call invoke() with the same thread_id, and execution continues from the last checkpoint
- →Resumability: works across server restarts, deployments, and even infrastructure migrations
- →Requires a persistent checkpointer (PostgresSaver or SqliteSaver) — MemorySaver does not support durable execution
What is Durable Execution?
Durable execution means the graph's state is automatically persisted at every step, so if the server crashes, execution resumes from the exact point of failure. No work is lost, no manual recovery logic is needed. The graph picks up where it left off.
Traditional web services lose all in-flight state when they crash. If an agent is 8 steps into a 12-step workflow and the server restarts, those 8 steps are gone -- the user has to start over. Durable execution eliminates this by checkpointing after every super-step. The checkpointer writes state to a persistent backend (Postgres, SQLite) so it survives process death, deploys, and infrastructure changes. This is only possible with a persistent checkpointer -- MemorySaver stores state in the process's memory, which is wiped on crash. PostgresSaver or SqliteSaver is required for true durability.