Model Routing
Route queries to the right model based on complexity: send simple questions to cheap, fast models and complex reasoning tasks to expensive, capable models. Achieve 40-60% cost reduction with intelligent routing while maintaining quality on hard queries.
Quick Reference
- →Core insight: 60-70% of production queries can be handled by a model 10x cheaper than your best model
- →Routing strategies: keyword-based (simplest), embedding similarity (moderate), classifier model (best)
- →Typical savings: 40-60% cost reduction with <2% quality loss on a well-tuned router
- →Fallback on quality: if the cheap model's confidence is low, re-route to the expensive model
- →Measure routing accuracy: track per-model satisfaction rates to catch systematic misrouting
The Economics of Model Routing
In a typical support agent, 60-70% of queries are simple factual lookups or greetings. Sending 'What are your business hours?' to GPT-5.4 ($2.00/1M input tokens) instead of o4-mini ($1.10/1M input tokens) is a 2x overspend for identical quality.
| Query Complexity | Example | Best Model Tier | Cost/1K tokens |
|---|---|---|---|
| Simple factual | What are your business hours? | o4-mini / Llama 4 8B | $0.0011 |
| Moderate lookup | How do I reset my password with 2FA enabled? | o4-mini / Llama 4 8B | $0.0011 |
| Multi-step reasoning | Compare plan A vs plan B for my usage pattern | GPT-5.4 / Llama 4 70B | $0.002 |
| Complex analysis | Analyze this error log and suggest a fix | GPT-5.4 / Claude Sonnet 4.6 | $0.003 |
| Expert reasoning | Review this contract clause for compliance issues | GPT-5.4 / Claude Opus 4.6 | $0.015 |
A model router sits between the user and the models, classifying each query by complexity and routing to the appropriate tier. The router itself must be fast (sub-10ms) and cheap (no LLM call for the routing decision). The savings come from the price differential between tiers: routing 65% of traffic to a model that is 10-16x cheaper reduces total cost by 40-60%.