LangChain/Agents
Advanced18 min

Dynamic Model Selection

Route agent turns to cheaper models when the task is simple and powerful models when it's complex — using @wrap_model_call to intercept every LLM request and swap the model based on conversation state, user tier, or cost targets. This article starts with whether you should route at all, walks through real cost math, covers the five ways routing silently fails in production, and ends with a 30-day rollout runbook.

Quick Reference

  • @wrap_model_call intercepts the model request before it reaches the LLM — use request.override(model=...) to swap
  • Default model in create_agent is the fallback — middleware upgrades selectively, not the other way around
  • Route by message count, tool result size, user plan, token budget, or any signal in state/context
  • Cost math first: routing only saves money when >50% of traffic can downgrade AND the cheap model handles those turns correctly
  • Structured output mismatch is the #1 silent failure — verify the cheaper model can produce the same schema before routing to it
  • Provider fallback (try/except in wrap_model_call) is not free — different providers have different tool calling behavior
  • Log every routing decision with the signal values that triggered it — you need this for eval and drift detection

Should I Use Dynamic Model Selection?

Dynamic model selection adds complexity: you now have multiple models to test, routing logic to maintain, and thresholds to calibrate. The only reason to add that complexity is meaningful cost savings on a traffic mix where a meaningful fraction of turns are genuinely simple. Before building the middleware, answer these four questions.

Need routing?Latency < 500ms?Yes →PII / compliance?Yes →Single task type?Yes →Cost is top priority?Yes →Skip routingone fast model — no overheadSelf-hostLlama 4 · DeepSeek V3.2Hardcode per taskno classifier neededCascade patterncheap first, escalate on failUse a model routermixed complexity, budget mattersNo to all ↓

A router only pays off when none of these simpler exits apply

QuestionIf yesIf no
Is your monthly LLM spend above ~$1,000/mo?Routing overhead (code + eval + monitoring) is worth itSingle model — routing won't recover the implementation cost
Do >40% of your turns look genuinely simple (short messages, no large tool results)?A cheap model can handle them — routing saves real moneyAll your traffic is complex — routing to a cheaper model risks quality on most turns
Does your cheaper model support all the tools and schemas your agent uses?Safe to route to itYou'll hit structured output mismatches or tool call failures — fix this first
Do you have an eval harness that can measure quality per turn?You can verify routing doesn't hurt qualityBuild the eval before routing — otherwise you won't know when it breaks
When NOT to route

Don't route if: (1) Your agent relies on multi-turn reasoning where the cheap model's shorter context or weaker reasoning corrupts a chain across turns. (2) Your agent uses structured output and the cheap model hasn't been tested against your exact schemas. (3) Your p99 latency SLA is tight — routing adds code-path overhead even when no model swap occurs.