Router-Based RAG: Multi-Source Knowledge

Route queries to the optimal retrieval source — vector store, SQL database, API, or web search. This article covers when routing earns its keep (and when querying all sources is cheaper), production-grade implementation with safe SQL and model tiering, multi-source fusion with reciprocal rank fusion, fallback chains, and the RAG-specific failure modes that classification accuracy alone won't catch.

Quick Reference

→Router-Based RAG classifies the query once and routes to the optimal retrieval source — vector, SQL, API, or web search
→Use method='json_schema' with a reasoning-first Pydantic schema for guaranteed valid routing decisions
→Model tiering: Haiku classifies (~$0.0008/query), Sonnet generates — routing adds <10% to per-query cost
→Multi-source fusion: Send() API fans out to parallel sources, RRF merges (score = sum of 1/(60+rank))
→Fallback chains: primary source → quality check → secondary source → last resort → generate
→Safe SQL: validate generated SQL against table/operation allowlist, never execute raw LLM output
→Cross-reference: general classification, eval, and drift machinery is in the Router Pattern article

Should I Route Retrieval at All?

Classify query → route to the best data source → generate grounded answer

Routing adds a hop. Every query pays the cost of a classification call before any retrieval happens. On a single vector store, that's pure overhead — standard RAG is strictly better. The question isn't 'how do I build a router?' It's 'do I have the conditions that make routing pay for itself?'

Condition	Routing is mostly tax	Routing pays for itself
Number of sources	1–2 homogeneous vector stores	3+ heterogeneous backends (vector + SQL + API)
Query types	All queries are semantic lookups	Queries split cleanly: docs vs. metrics vs. real-time
Source query cost	All sources are cheap (<5ms vector search)	One source is expensive (SQL, rate-limited API)
Source interfaces	All sources share the same retrieval API	Sources have fundamentally different interfaces
Query volume	<1K queries/day — routing cost is negligible either way	>10K queries/day — cost difference compounds

Do the cost math before building the router

At 10K queries/day with 3 sources: querying all 3 in parallel costs 3× retrieval + embedding. Routing costs 1× Haiku classification + 1× retrieval. If source queries are cheap (<5ms vector search on the same cluster), query-all may cost less than adding a classification call. If one source is a rate-limited API or an expensive SQL join, routing pays for itself immediately. Run the numbers for your specific sources before building.

General classification machinery is in the Router Pattern article

This article focuses on the RAG-specific concerns: heterogeneous retrieval backends, multi-source fusion, source-specific failure modes, and safe SQL. For the general classification strategies (hybrid rules → LLM fallback, eval harness, drift detection, first-30-days runbook), see the Router Pattern article.

How RAG Routing Works

The flow is four steps: classify the query with structured output, route to the matching retrieval node via a conditional edge, retrieve from that source, generate an answer. Each source has a different interface — semantic search, SQL, HTTP API — and different failure modes. The classification step is a single structured output call constrained to valid source names.

Routing Cost vs. Query-All-Sources

Three retrieval architectures, each with different cost and latency profiles. Assumptions: 3 sources (1 vector store, 1 SQL database, 1 external API), 10K queries/day, Haiku at $0.08/MTok input + $0.40/MTok output, Sonnet at $3.00/MTok input + $15.00/MTok output. Classification prompt ~600 tokens input, 50 tokens output. Retrieval costs vary by source.

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.