★ OverviewAdvanced18 min

Long-Running Agents

When to build an agent that runs for hours instead of seconds — which orchestration framework to choose, how to compute real costs, the five ways long-running agents fail in production, and a reference implementation with checkpointing, error classification, idempotency, and budget enforcement.

Quick Reference

→If the task takes under 5 minutes and is safe to retry fully, skip durable execution — just retry the whole request
→Long-running agents fail in five specific ways: retry storms (429s), context window exhaustion, state serialization failures, idempotency violations, and session drift
→LangGraph checkpoints between nodes; Temporal checkpoints within workflow steps; Managed Agents (public beta) handles both — choose based on how much mid-step durability you need
→Cost example: a 2-hour research agent (200 LLM calls, 2K input / 800 output tokens each) at Sonnet 4.6 pricing costs ~$3.60 in tokens; Opus 4.7 costs ~$6.00
→Classify errors before retrying: transient (5xx) → exponential backoff with jitter; rate-limited (429) → honor Retry-After exactly; fatal (400/401/403) → bubble up immediately
→Wrap every side effect (email, DB write, webhook) in an idempotency key — resumption replays everything after the last checkpoint
→Context degrades non-linearly past ~60% utilization — long sessions need compaction or summarization to maintain quality across session boundaries
→Always return partial results when a budget fires — 70% of a research task is useful; a complete failure is not

When to Build a Long-Running Agent

A typical agent handles a request in 5–30 seconds: receive input, call an LLM a few times, return a result. Long-running agents — data migration pipelines, research synthesis, code generation workflows — run for minutes, hours, or days. The decision to go long-running is a significant architectural commitment. Over-engineer a 2-minute task with Temporal and you've added weeks of infrastructure. Under-engineer a 2-hour task with in-memory state and you'll lose progress on the first deploy.

Under 5 min with safe retries: no durable execution needed. Over 5 min: choose orchestration tier by side-effect risk and mid-step failure tolerance.

Dimension	Short-Lived (< 5 min)	Long-Running (> 5 min)
Process lifecycle	One HTTP request	Must survive deploys, restarts, crashes
State	In-memory, lost on completion	Persisted externally, resumable
Failure recovery	Retry the entire request	Resume from last checkpoint
Side effects	Usually idempotent to retry	Replay risk: duplicate emails, writes
User experience	Loading spinner → result	Progress updates, partial results
Rate limits	Rare for single requests	Guaranteed over hours of API calls
Observability	Single trace, seconds long	Distributed trace spanning hours

When NOT to go long-running

If your task runs in under 5 minutes and has no side effects that break on retry, don't use durable execution. The overhead of Temporal or LangGraph + PostgresSaver adds latency, operational complexity, and storage costs. The simpler pattern — retry the whole request with exponential backoff — works for most tasks. Reach for durable execution only when task duration, irreversible side effects, or crash-recovery requirements make full-retry unacceptable.

What Long-Running Agents Actually Cost

Before writing a line of code, compute your cost ceiling. A long-running agent's token bill is predictable if you do the math. Example: a 2-hour research agent that makes 200 LLM calls, averaging 2,000 input tokens and 800 output tokens per call.

Choosing an Orchestration Framework

Five options exist for running long-lived agent workloads. They differ significantly on checkpoint granularity, mid-step crash recovery, setup cost, and operational overhead.

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.