Classification & Routing Patterns
Query routing is the highest-leverage optimization in an agent system — it determines which model, which tools, and how much context each query gets. This article covers the three routing strategies (keyword, embedding, LLM), how to cascade them in production, how to evaluate and monitor router accuracy, and how to defend against the failure modes that will bite you.
Quick Reference
- →Keyword matching is free and < 1ms; use it as tier 1 for known exact patterns (commands, error codes, trigger words)
- →Embedding similarity runs in ~20ms at ~$0.000002/query; use it for semantic matching when keyword fails
- →LLM classification handles nuance and ambiguity at ~800ms / ~$0.0005/query; use it only when embedding confidence is below threshold
- →Tiered routing (keyword → embedding → LLM) reduces average cost ~80% vs routing all queries to LLM — exact savings depend on your traffic mix
- →Set per-class recall gates in CI — a router that misclassifies 20% of billing queries silently costs more than it saves
- →Monitor the confidence distribution weekly — a rising fallback rate means your query distribution shifted; re-baseline before accuracy degrades
- →Validate route names after LLM classification; a hallucinated route name sends queries to the fallback handler with no error signal
When Routing Is Overhead
Before building a router, ask whether your system actually benefits from one. Routing adds a classification step, a new failure mode, and an ongoing maintenance burden. The savings only materialize when your queries have meaningfully different complexity profiles — and when you have enough volume for the savings to matter.
Fewer than 3 distinct query categories: the overhead exceeds the savings. Uniform query complexity: if all queries need the same model and tools, routing just adds latency. Low volume (< 500 queries/day): the operational complexity isn't worth the cost savings. Your system prompt already does triage via conditional tool use: adding a router layer duplicates the classification.
Routing earns its place when you have categories with meaningfully different cost profiles — a FAQ answer costs 50× less than a multi-step technical debug — and enough volume that 80% cost savings on the cheap category compounds into real money. If your cheapest and most expensive queries cost the same to handle, routing is theatre.