LangChain/Agents
Intermediate7 min

Dynamic Model Selection

Route to cheaper models for simple turns and powerful models for complex ones. @wrap_model_call intercepts every LLM request and lets you swap the model based on state, context, or cost targets.

Quick Reference

  • @wrap_model_call intercepts the model request before it reaches the LLM
  • request.override(model=new_model) swaps the model for that call only
  • Route by message count, task complexity, user plan, or token budget
  • Default model in create_agent is the fallback — middleware overrides it selectively
  • Pre-bound models (bind_tools already called) don't work with structured output + dynamic selection

Why Switch Models Mid-Conversation

Not every turn requires your most powerful model. A greeting, a clarifying question, or a simple lookup can run on a fast cheap model. A complex multi-step analysis, a code review, or a long-context synthesis needs your best. Routing dynamically — without the user noticing — cuts cost and latency on simple turns while preserving quality on hard ones.

SignalRoute to
Short conversation, simple questionFast cheap model (gpt-4.1-mini)
Long conversation, many tool resultsPowerful model (gpt-4.1, claude-opus)
User on free planBudget model
User on enterprise planBest available model
Tool result > 10k tokensLong-context model