Advanced18 min

Classification & Routing Patterns

Query routing is the highest-leverage optimization in an agent system — it determines which model, which tools, and how much context each query gets. This article covers the three routing strategies (keyword, embedding, LLM), how to cascade them in production, how to evaluate and monitor router accuracy, and how to defend against the failure modes that will bite you.

Quick Reference

→Keyword matching is free and < 1ms; use it as tier 1 for known exact patterns (commands, error codes, trigger words)
→Embedding similarity runs in ~20ms at ~$0.000002/query; use it for semantic matching when keyword fails
→LLM classification handles nuance and ambiguity at ~800ms / ~$0.0005/query; use it only when embedding confidence is below threshold
→Tiered routing (keyword → embedding → LLM) reduces average cost ~80% vs routing all queries to LLM — exact savings depend on your traffic mix
→Set per-class recall gates in CI — a router that misclassifies 20% of billing queries silently costs more than it saves
→Monitor the confidence distribution weekly — a rising fallback rate means your query distribution shifted; re-baseline before accuracy degrades
→Validate route names after LLM classification; a hallucinated route name sends queries to the fallback handler with no error signal

When Routing Is Overhead

Before building a router, ask whether your system actually benefits from one. Routing adds a classification step, a new failure mode, and an ongoing maintenance burden. The savings only materialize when your queries have meaningfully different complexity profiles — and when you have enough volume for the savings to matter.

Skip routing if any of these apply

Fewer than 3 distinct query categories: the overhead exceeds the savings. Uniform query complexity: if all queries need the same model and tools, routing just adds latency. Low volume (< 500 queries/day): the operational complexity isn't worth the cost savings. Your system prompt already does triage via conditional tool use: adding a router layer duplicates the classification.

Routing earns its place when you have categories with meaningfully different cost profiles — a FAQ answer costs 50× less than a multi-step technical debug — and enough volume that 80% cost savings on the cheap category compounds into real money. If your cheapest and most expensive queries cost the same to handle, routing is theatre.

Strategy Comparison and Cost Math

There are three routing strategies, each with a different latency/cost/accuracy tradeoff. The numbers below are computed from published pricing (Haiku: $0.80/$4.00 per MTok input/output; text-embedding-3-small: $0.02/MTok), assuming a typical 300-token classifier input and 60-token output.

Intent-Based Routing

Intent classification identifies what the user wants to do and routes to a specialized handler. Each handler has its own system prompt, tools, and model — optimized for that specific task. The classifier itself should use the cheapest capable model (Haiku), since it's doing a narrow, structured task with a predictable output format.

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.