RetryPolicy, Error Taxonomy & CachePolicy
Most LangGraph error handling fails not from missing retry logic, but from misclassifying errors. This article covers how LangGraph handles errors by default, a 4-category taxonomy for routing errors to the right handler, and the three production tools — RetryPolicy, self-healing feedback loops, and CachePolicy — with their failure modes and multi-layer retry pitfalls.
Quick Reference
- →RetryPolicy defaults: max_attempts=3, initial_interval=0.5s, backoff_factor=2.0, max_interval=128s, jitter=True
- →default_retry_on excludes ValueError, TypeError, RuntimeError, OSError and 8 more — code bugs are NOT retried by default
- →For requests/httpx, default_retry_on only retries 5xx status codes — not 4xx
- →When max_attempts is exhausted, the exception bubbles up and the graph stops — no built-in fallback hook
- →Error taxonomy: transient → RetryPolicy, LLM-recoverable → feedback loop, user-fixable → interrupt(), unexpected → bubble up
- →CachePolicy requires both cache_policy on the node AND cache=... at compile() — missing either produces zero cache hits with no error
- →LLM SDK retries (max_retries) and LangGraph RetryPolicy multiply: worst case = sdk_max_retries × lg_max_attempts total API calls
- →runtime.execution_info.node_attempt gives the current 1-indexed attempt number for in-node fallback logic
When Default Error Handling Is Enough
Before adding retry logic, ask whether you need it. Graphs that only call a local LLM with a managed provider (Claude, GPT-4) already get HTTP-level retries from the SDK. Graphs with no external API calls will mostly surface programming errors — which should not be retried. Custom error handling adds complexity; add it only when you can name the specific failure mode it addresses.
Add RetryPolicy when your node calls an external API that transiently fails (rate limits, network timeouts, 503s). Add a feedback loop when the LLM produces structurally invalid output. Add interrupt() when a human needs to provide missing information. Leave everything else to bubble up — unexpected errors are bugs, not runtime conditions.