LangGraph/Persistence
Advanced14 min

Time Travel

Time travel gives you a flight recorder for your agent: load any past checkpoint, replay from it, or fork with corrected state. The API is three methods — the hard part is knowing when to use them and how not to cause production incidents in the process.

Quick Reference

  • get_state_history(config, limit=N): iterates checkpoints newest-first; always set limit on long-lived threads
  • get_state_history(config, filter={...}): filter by metadata fields (source node, custom tags)
  • Replay: app.invoke(None, checkpoint_config) re-executes all nodes after the checkpoint — side effects re-fire
  • Fork: fork_config = app.update_state(checkpoint_config, new_values); app.invoke(None, fork_config)
  • as_node on update_state() controls which node the update 'comes from' — determines what runs next
  • Subgraph time travel requires checkpointer=True on the subgraph; use get_state(config, subgraphs=True)
  • Interrupts re-trigger during replay — graph pauses again at interrupt() points awaiting Command(resume=...)

When (Not) to Use Time Travel

Need to inspect or modifypast agent execution?Yes / No?YESWhich use case?debugging / HITL undo / A/B pathsPost-mortemdebuggingHuman-in-the-loopcorrection (undo)A/B pathtestingNODon't use time travelautomated retry → use LG retryhot-path rollback → too slowTime travel is for humans debugging agents — not automated workflowsCost: every get_state_history() call is a checkpoint DB read

time travel: valid use cases vs anti-patterns

Time travel is the right tool in three situations: post-mortem debugging of production failures (load the exact state that caused a bad output and replay it), human-in-the-loop correction (fork from a past checkpoint when a user wants to undo an agent decision), and A/B path testing (run two forks from the same checkpoint with different prompts or tools). Everything else is a misuse.

Time travel is not a retry mechanism

Automated retry loops built on get_state_history() + invoke() are slower than LangGraph's built-in retry_policy and re-execute side effects. Use retry_policy on nodes for transient failures. Use time travel for human-initiated debugging and correction only.

ScenarioUse time travel?Better alternative
Post-mortem debug of agent failureYes
User wants to undo last agent actionYes (fork)
A/B test two prompt variantsYes (fork)
Retry on transient API errorNoretry_policy on node
Roll back bad deployment automaticallyNocanary + feature flags
Reproduce a test caseSometimessnapshot tests if deterministic