Advanced10 min
Graceful Degradation
Building resilient agents with fallback chains, circuit breakers, stale cache serving, and feature degradation under load — so your agent always returns something useful.
Quick Reference
- →Fallback chains: primary model → cheaper model → cached response → static fallback → error message
- →Circuit breakers: after N consecutive failures, stop calling the failing service and route to fallback automatically
- →Stale serving: when live AI is unavailable, serve cached results with a 'this may be outdated' disclaimer
- →Feature degradation: under heavy load, disable expensive features (tools, RAG) and serve simpler responses
- →The goal: never show users an error page — always return something useful, even if it's not the best possible answer
Why Graceful Degradation Matters
AI services fail regularly: LLM APIs go down, rate limits kick in, vector databases become slow under load, tool APIs return errors. The question is not whether your agent will face failures — it's whether it handles them gracefully or crashes. A well-degraded response ('Here's a cached answer, it may be slightly outdated') is infinitely better than an error page.
| Failure Scenario | Bad Outcome | Graceful Degradation |
|---|---|---|
| LLM API down | 500 error to user | Serve cached response for similar query |
| Rate limited (429) | Request fails, user retries | Switch to backup model, queue if necessary |
| Vector DB slow | 30s+ response time | Skip RAG, answer from model knowledge with disclaimer |
| Tool API timeout | Agent hangs forever | Return partial result, note which tools failed |
| Traffic spike (10x normal) | All requests slow or fail | Disable expensive features, serve simpler responses |