Long-Running Agents
Architecting agents that run for hours or days: durable execution, checkpointing strategies, progress reporting, and timeout/budget management to prevent runaway costs.
Quick Reference
- →Long-running agents (minutes to days) need fundamentally different architecture than request-response agents (seconds)
- →Durable execution: use frameworks like Temporal, Inngest, or LangGraph's persistent checkpointing to survive crashes
- →Checkpoint after every significant step — the cost of re-doing work after a crash far exceeds the cost of saving state
- →Progress webhooks: push status updates to your frontend so users know the agent is still working
- →Budget management: set hard limits on cost ($), time (hours), and actions (iterations) — enforce them in the agent loop
- →Always design a graceful shutdown: save current state, report partial results, and allow manual resume
Why Long-Running Agents Are Different
A typical agent handles a request in 5-30 seconds: receive input, call LLM a few times, return result. Long-running agents — data migration, research synthesis, code generation pipelines — run for minutes, hours, or even days. Everything that works for short-lived agents breaks at this timescale: server restarts kill your process, LLM rate limits require backoff, and costs can spiral without budgets.
| Challenge | Short-Lived Agent | Long-Running Agent |
|---|---|---|
| Process lifecycle | Lives within one HTTP request | Must survive deploys, restarts, crashes |
| State | In-memory, lost on completion | Must be persisted and resumable |
| Failure recovery | Retry the whole request | Resume from last checkpoint |
| Cost | Predictable ($0.01-0.50 per request) | Unbounded without budget limits |
| User experience | Loading spinner → result | Progress bar, status updates, partial results |
| Rate limits | Rare for single requests | Guaranteed over hours of API calls |
| Observability | Single trace | Distributed trace spanning hours |
A long-running agent without a budget can burn through $100+ in API calls before anyone notices. Always set a hard dollar limit that kills the agent when exceeded. Better to stop early than explain an unexpected bill.