Intermediate7 min
Dynamic Model Selection
Route to cheaper models for simple turns and powerful models for complex ones. @wrap_model_call intercepts every LLM request and lets you swap the model based on state, context, or cost targets.
Quick Reference
- →@wrap_model_call intercepts the model request before it reaches the LLM
- →request.override(model=new_model) swaps the model for that call only
- →Route by message count, task complexity, user plan, or token budget
- →Default model in create_agent is the fallback — middleware overrides it selectively
- →Pre-bound models (bind_tools already called) don't work with structured output + dynamic selection
Why Switch Models Mid-Conversation
Not every turn requires your most powerful model. A greeting, a clarifying question, or a simple lookup can run on a fast cheap model. A complex multi-step analysis, a code review, or a long-context synthesis needs your best. Routing dynamically — without the user noticing — cuts cost and latency on simple turns while preserving quality on hard ones.
| Signal | Route to |
|---|---|
| Short conversation, simple question | Fast cheap model (gpt-4.1-mini) |
| Long conversation, many tool results | Powerful model (gpt-4.1, claude-opus) |
| User on free plan | Budget model |
| User on enterprise plan | Best available model |
| Tool result > 10k tokens | Long-context model |