Advanced RAG/Advanced Patterns
Advanced16 min

Router-Based RAG: Multi-Source Knowledge

Route queries to the optimal retrieval source — vector store, SQL database, API, or web search. This article covers when routing earns its keep (and when querying all sources is cheaper), production-grade implementation with safe SQL and model tiering, multi-source fusion with reciprocal rank fusion, fallback chains, and the RAG-specific failure modes that classification accuracy alone won't catch.

Quick Reference

  • Router-Based RAG classifies the query once and routes to the optimal retrieval source — vector, SQL, API, or web search
  • Use method='json_schema' with a reasoning-first Pydantic schema for guaranteed valid routing decisions
  • Model tiering: Haiku classifies (~$0.0008/query), Sonnet generates — routing adds <10% to per-query cost
  • Multi-source fusion: Send() API fans out to parallel sources, RRF merges (score = sum of 1/(60+rank))
  • Fallback chains: primary source → quality check → secondary source → last resort → generate
  • Safe SQL: validate generated SQL against table/operation allowlist, never execute raw LLM output
  • Cross-reference: general classification, eval, and drift machinery is in the Router Pattern article

Should I Route Retrieval at All?

QueryRouter (classify)structured outputVector StoreProduct docsSQL DatabaseMetrics, usersWeb SearchReal-time dataGenerate Answer

Classify query → route to the best data source → generate grounded answer

Routing adds a hop. Every query pays the cost of a classification call before any retrieval happens. On a single vector store, that's pure overhead — standard RAG is strictly better. The question isn't 'how do I build a router?' It's 'do I have the conditions that make routing pay for itself?'

ConditionRouting is mostly taxRouting pays for itself
Number of sources1–2 homogeneous vector stores3+ heterogeneous backends (vector + SQL + API)
Query typesAll queries are semantic lookupsQueries split cleanly: docs vs. metrics vs. real-time
Source query costAll sources are cheap (<5ms vector search)One source is expensive (SQL, rate-limited API)
Source interfacesAll sources share the same retrieval APISources have fundamentally different interfaces
Query volume<1K queries/day — routing cost is negligible either way>10K queries/day — cost difference compounds
Do the cost math before building the router

At 10K queries/day with 3 sources: querying all 3 in parallel costs 3× retrieval + embedding. Routing costs 1× Haiku classification + 1× retrieval. If source queries are cheap (<5ms vector search on the same cluster), query-all may cost less than adding a classification call. If one source is a rate-limited API or an expensive SQL join, routing pays for itself immediately. Run the numbers for your specific sources before building.

General classification machinery is in the Router Pattern article

This article focuses on the RAG-specific concerns: heterogeneous retrieval backends, multi-source fusion, source-specific failure modes, and safe SQL. For the general classification strategies (hybrid rules → LLM fallback, eval harness, drift detection, first-30-days runbook), see the Router Pattern article.