Migration & Graph Versioning

Most teams don't need graph versioning — this article starts there. For those who do: how to version state schemas, write safe migration functions with error handling and testing, ship blue-green agent deployments, monitor migrations in production, and recover when they go wrong.

Quick Reference

→Most agents don't need graph versioning — short-lived threads (minutes to hours) can just redeploy with no migration
→Graph versioning is only necessary when threads span multiple deployments and carry state that will break under schema changes
→The best migration is the one you never write: use Optional fields with defaults to avoid breaking changes entirely
→Lazy migration transforms checkpoints on read — not in bulk — avoiding downtime and spreading cost over time
→Always include a schema_version field; without it you cannot determine which migrations to apply
→Test migrations in CI against real production checkpoint snapshots — synthetic data misses the edge cases that crash production
→Monitor migration success/failure rate on every read — a migrating checkpointer that silently corrupts state is worse than no migration
→Never delete an old graph version while active threads still reference it — check active thread count first, every time

When You Need Graph Versioning (and When You Don't)

Most teams don't need this

If your agent threads are short-lived — minutes to a few hours — you don't need graph versioning. Deploy a new version and active threads finish on the old container before it spins down. Long-lived threads are where versioning earns its complexity.

The decision hinges on thread lifetime relative to your deploy frequency. A customer support bot handles turns that complete in seconds — deploy whenever you want. A research agent that runs for hours across user sessions needs versioning from day one. Before building any migration infrastructure, place your agent in one of three categories.

Is my change breaking? → choose the path with least migration work

Deployment Pattern	Thread Lifetime	Versioning Strategy	Migration Needed?
Stateless API wrapper	Single request	Just redeploy	Never
Chat assistant	Minutes to hours	Blue-green with short drain window	Rarely — additive changes only
Task/research agent	Hours to days	Blue-green + lazy migration for schema changes	Sometimes
Long-running workflow	Days to weeks	Full versioning pipeline required	Yes — always plan for it

Start with the simplest strategy that fits your thread lifetime

You can always upgrade from 'just redeploy' to 'lazy migration' as your threads grow longer. You cannot easily undo a corrupted checkpoint. Match your versioning infrastructure to your actual thread lifetime, not to a theoretical worst case.

The Breaking-Change Taxonomy

Agents have long-lived state that spans deployments

Unlike stateless APIs, agent threads persist across deployments. A user's conversation started on v1 of your graph must continue working after you deploy v2. Breaking state changes crash active threads — silently, on resume, one user at a time.

Backward-Compatible Changes First

The best migration is the one you never write

If you can express your change as an additive modification with sensible defaults, do that instead. The pattern hierarchy: (1) optional fields with defaults, (2) new nodes or edges, (3) migration function. Reach for (3) only when (1) and (2) are exhausted.

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.