Migration & Graph Versioning
Most teams don't need graph versioning — this article starts there. For those who do: how to version state schemas, write safe migration functions with error handling and testing, ship blue-green agent deployments, monitor migrations in production, and recover when they go wrong.
Quick Reference
- →Most agents don't need graph versioning — short-lived threads (minutes to hours) can just redeploy with no migration
- →Graph versioning is only necessary when threads span multiple deployments and carry state that will break under schema changes
- →The best migration is the one you never write: use Optional fields with defaults to avoid breaking changes entirely
- →Lazy migration transforms checkpoints on read — not in bulk — avoiding downtime and spreading cost over time
- →Always include a schema_version field; without it you cannot determine which migrations to apply
- →Test migrations in CI against real production checkpoint snapshots — synthetic data misses the edge cases that crash production
- →Monitor migration success/failure rate on every read — a migrating checkpointer that silently corrupts state is worse than no migration
- →Never delete an old graph version while active threads still reference it — check active thread count first, every time
When You Need Graph Versioning (and When You Don't)
If your agent threads are short-lived — minutes to a few hours — you don't need graph versioning. Deploy a new version and active threads finish on the old container before it spins down. Long-lived threads are where versioning earns its complexity.
The decision hinges on thread lifetime relative to your deploy frequency. A customer support bot handles turns that complete in seconds — deploy whenever you want. A research agent that runs for hours across user sessions needs versioning from day one. Before building any migration infrastructure, place your agent in one of three categories.
Is my change breaking? → choose the path with least migration work
| Deployment Pattern | Thread Lifetime | Versioning Strategy | Migration Needed? |
|---|---|---|---|
| Stateless API wrapper | Single request | Just redeploy | Never |
| Chat assistant | Minutes to hours | Blue-green with short drain window | Rarely — additive changes only |
| Task/research agent | Hours to days | Blue-green + lazy migration for schema changes | Sometimes |
| Long-running workflow | Days to weeks | Full versioning pipeline required | Yes — always plan for it |
You can always upgrade from 'just redeploy' to 'lazy migration' as your threads grow longer. You cannot easily undo a corrupted checkpoint. Match your versioning infrastructure to your actual thread lifetime, not to a theoretical worst case.