Production & Scale/Inference Optimization
Advanced11 min

Model Routing

Route queries to the right model based on complexity: send simple questions to cheap, fast models and complex reasoning tasks to expensive, capable models. Achieve 40-60% cost reduction with intelligent routing while maintaining quality on hard queries.

Quick Reference

  • Core insight: 60-70% of production queries can be handled by a model 10x cheaper than your best model
  • Routing strategies: keyword-based (simplest), embedding similarity (moderate), classifier model (best)
  • Typical savings: 40-60% cost reduction with <2% quality loss on a well-tuned router
  • Fallback on quality: if the cheap model's confidence is low, re-route to the expensive model
  • Measure routing accuracy: track per-model satisfaction rates to catch systematic misrouting

The Economics of Model Routing

Not every query needs your best model

In a typical support agent, 60-70% of queries are simple factual lookups or greetings. Sending 'What are your business hours?' to GPT-5.4 ($2.00/1M input tokens) instead of o4-mini ($1.10/1M input tokens) is a 2x overspend for identical quality.

Query ComplexityExampleBest Model TierCost/1K tokens
Simple factualWhat are your business hours?o4-mini / Llama 4 8B$0.0011
Moderate lookupHow do I reset my password with 2FA enabled?o4-mini / Llama 4 8B$0.0011
Multi-step reasoningCompare plan A vs plan B for my usage patternGPT-5.4 / Llama 4 70B$0.002
Complex analysisAnalyze this error log and suggest a fixGPT-5.4 / Claude Sonnet 4.6$0.003
Expert reasoningReview this contract clause for compliance issuesGPT-5.4 / Claude Opus 4.6$0.015

A model router sits between the user and the models, classifying each query by complexity and routing to the appropriate tier. The router itself must be fast (sub-10ms) and cheap (no LLM call for the routing decision). The savings come from the price differential between tiers: routing 65% of traffic to a model that is 10-16x cheaper reduces total cost by 40-60%.