Advanced18 min

Graceful Degradation

How to design AI agents that always return something useful — even when LLM APIs fail, rate limits hit, or traffic spikes. Covers fallback chains, circuit breakers, semantic degradation detection, and progressive load shedding.

Quick Reference

→Three failure types: binary (5xx), partial (slow 200), semantic (fast 200, wrong content) — most teams only handle the first
→Fallback chain: primary model → cheaper model → cached response → static message — always return something
→Circuit breakers: after 5 consecutive failures, route to fallback instantly; cooldown 120s for LLM rate limits, 30s for vector DBs
→Semantic degradation: the API returns 200 OK but content quality silently drops — your monitors say green, users get wrong answers
→Canary queries: send a known Q+A through your agent every 5 minutes; alert when similarity score drops below threshold
→Load shedding: derive thresholds from measured peak RPM, not invented numbers — start shedding RAG at 2× normal peak
→Disable SDK auto-retry (max_retries=0) when you own the fallback logic — otherwise both the SDK and your code retry

When Graceful Degradation Is Worth the Complexity

Fallback chains and circuit breakers add real complexity. Before building them, decide whether your use case actually needs them.

Use case	Build degradation?	Why
User-facing chat or assistant	Yes	Users are waiting; an error page ends the session
Real-time product search or recommendations	Yes	Revenue impact is immediate when answers disappear
Internal batch pipeline	No	Failures are visible to engineers; retry at job level
Offline data processing or enrichment	No	Correctness matters more than availability; fail loudly
Developer tooling used by your own team	Maybe	Depends on how disruptive a downtime is to their workflow
Prototype or MVP under active development	No	Complexity slows iteration; add it when you have traffic

Start with a cache and one fallback model

You don't need a full circuit breaker framework on day one. A response cache and a fallback to Haiku on any APIStatusError 5xx gets you 80% of the reliability for 10% of the code. Add circuit breakers when you have enough production data to tune the thresholds.

Three Kinds of Failure

Not all failures look the same in production. You need a different strategy for each.

Fallback Chains

A fallback chain is a priority-ordered list of strategies. Each level is cheaper and less capable, but always returns something. The chain stops at the first level that succeeds.

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.