Dynamic Model Selection
Route agent turns to cheaper models when the task is simple and powerful models when it's complex — using @wrap_model_call to intercept every LLM request and swap the model based on conversation state, user tier, or cost targets. This article starts with whether you should route at all, walks through real cost math, covers the five ways routing silently fails in production, and ends with a 30-day rollout runbook.
Quick Reference
- →@wrap_model_call intercepts the model request before it reaches the LLM — use request.override(model=...) to swap
- →Default model in create_agent is the fallback — middleware upgrades selectively, not the other way around
- →Route by message count, tool result size, user plan, token budget, or any signal in state/context
- →Cost math first: routing only saves money when >50% of traffic can downgrade AND the cheap model handles those turns correctly
- →Structured output mismatch is the #1 silent failure — verify the cheaper model can produce the same schema before routing to it
- →Provider fallback (try/except in wrap_model_call) is not free — different providers have different tool calling behavior
- →Log every routing decision with the signal values that triggered it — you need this for eval and drift detection
Should I Use Dynamic Model Selection?
Dynamic model selection adds complexity: you now have multiple models to test, routing logic to maintain, and thresholds to calibrate. The only reason to add that complexity is meaningful cost savings on a traffic mix where a meaningful fraction of turns are genuinely simple. Before building the middleware, answer these four questions.
A router only pays off when none of these simpler exits apply
| Question | If yes | If no |
|---|---|---|
| Is your monthly LLM spend above ~$1,000/mo? | Routing overhead (code + eval + monitoring) is worth it | Single model — routing won't recover the implementation cost |
| Do >40% of your turns look genuinely simple (short messages, no large tool results)? | A cheap model can handle them — routing saves real money | All your traffic is complex — routing to a cheaper model risks quality on most turns |
| Does your cheaper model support all the tools and schemas your agent uses? | Safe to route to it | You'll hit structured output mismatches or tool call failures — fix this first |
| Do you have an eval harness that can measure quality per turn? | You can verify routing doesn't hurt quality | Build the eval before routing — otherwise you won't know when it breaks |
Don't route if: (1) Your agent relies on multi-turn reasoning where the cheap model's shorter context or weaker reasoning corrupts a chain across turns. (2) Your agent uses structured output and the cheap model hasn't been tested against your exact schemas. (3) Your p99 latency SLA is tight — routing adds code-path overhead even when no model swap occurs.