Agent Architecture/System Design
Advanced10 min

Graceful Degradation

Building resilient agents with fallback chains, circuit breakers, stale cache serving, and feature degradation under load — so your agent always returns something useful.

Quick Reference

  • Fallback chains: primary model → cheaper model → cached response → static fallback → error message
  • Circuit breakers: after N consecutive failures, stop calling the failing service and route to fallback automatically
  • Stale serving: when live AI is unavailable, serve cached results with a 'this may be outdated' disclaimer
  • Feature degradation: under heavy load, disable expensive features (tools, RAG) and serve simpler responses
  • The goal: never show users an error page — always return something useful, even if it's not the best possible answer

Why Graceful Degradation Matters

AI services fail regularly: LLM APIs go down, rate limits kick in, vector databases become slow under load, tool APIs return errors. The question is not whether your agent will face failures — it's whether it handles them gracefully or crashes. A well-degraded response ('Here's a cached answer, it may be slightly outdated') is infinitely better than an error page.

Failure ScenarioBad OutcomeGraceful Degradation
LLM API down500 error to userServe cached response for similar query
Rate limited (429)Request fails, user retriesSwitch to backup model, queue if necessary
Vector DB slow30s+ response timeSkip RAG, answer from model knowledge with disclaimer
Tool API timeoutAgent hangs foreverReturn partial result, note which tools failed
Traffic spike (10x normal)All requests slow or failDisable expensive features, serve simpler responses