Router-Based RAG: Multi-Source Knowledge
Route queries to the optimal retrieval source — vector store, SQL database, API, or web search. This article covers when routing earns its keep (and when querying all sources is cheaper), production-grade implementation with safe SQL and model tiering, multi-source fusion with reciprocal rank fusion, fallback chains, and the RAG-specific failure modes that classification accuracy alone won't catch.
Quick Reference
- →Router-Based RAG classifies the query once and routes to the optimal retrieval source — vector, SQL, API, or web search
- →Use method='json_schema' with a reasoning-first Pydantic schema for guaranteed valid routing decisions
- →Model tiering: Haiku classifies (~$0.0008/query), Sonnet generates — routing adds <10% to per-query cost
- →Multi-source fusion: Send() API fans out to parallel sources, RRF merges (score = sum of 1/(60+rank))
- →Fallback chains: primary source → quality check → secondary source → last resort → generate
- →Safe SQL: validate generated SQL against table/operation allowlist, never execute raw LLM output
- →Cross-reference: general classification, eval, and drift machinery is in the Router Pattern article
Should I Route Retrieval at All?
Classify query → route to the best data source → generate grounded answer
Routing adds a hop. Every query pays the cost of a classification call before any retrieval happens. On a single vector store, that's pure overhead — standard RAG is strictly better. The question isn't 'how do I build a router?' It's 'do I have the conditions that make routing pay for itself?'
| Condition | Routing is mostly tax | Routing pays for itself |
|---|---|---|
| Number of sources | 1–2 homogeneous vector stores | 3+ heterogeneous backends (vector + SQL + API) |
| Query types | All queries are semantic lookups | Queries split cleanly: docs vs. metrics vs. real-time |
| Source query cost | All sources are cheap (<5ms vector search) | One source is expensive (SQL, rate-limited API) |
| Source interfaces | All sources share the same retrieval API | Sources have fundamentally different interfaces |
| Query volume | <1K queries/day — routing cost is negligible either way | >10K queries/day — cost difference compounds |
At 10K queries/day with 3 sources: querying all 3 in parallel costs 3× retrieval + embedding. Routing costs 1× Haiku classification + 1× retrieval. If source queries are cheap (<5ms vector search on the same cluster), query-all may cost less than adding a classification call. If one source is a rate-limited API or an expensive SQL join, routing pays for itself immediately. Run the numbers for your specific sources before building.
This article focuses on the RAG-specific concerns: heterogeneous retrieval backends, multi-source fusion, source-specific failure modes, and safe SQL. For the general classification strategies (hybrid rules → LLM fallback, eval harness, drift detection, first-30-days runbook), see the Router Pattern article.