Multi-Hop Retrieval
Multi-hop retrieval handles questions that require combining facts from multiple documents — but it costs 3–6× more per query than single-hop and compounds retrieval errors at each step. This article covers when multi-hop is worth it, which of three patterns to use, and how to evaluate and monitor it before trusting it in production.
Quick Reference
- →Multi-hop: questions where no single chunk contains the full answer — multiple retrieval rounds, each informed by the previous
- →Three patterns: iterative (reactive gap analysis), decomposition (proactive planning + parallel), LangGraph agent (state machine)
- →Each hop costs ~200ms retrieval + ~500ms LLM gap analysis — a 3-hop query takes 2–3 seconds minimum
- →Cost is 3–6× higher than single-hop; use GPT-5.4-mini for gap analysis, reserve full model for final answer only
- →Errors compound: 85% per-hop precision → 85%³ = 61% end-to-end. At 80% precision → hop 3 is 51%
- →Always compare multi-hop against single-hop baseline before deploying — it doesn't always win
- →Route by query complexity first: classify before retrieval and keep the single-hop fast path for 70–80% of traffic
When (and When Not) to Use Multi-Hop
Standard RAG retrieves once and generates. Multi-hop retrieval performs multiple rounds, using each result to refine the next query. It's worth the cost and latency overhead for four specific question patterns:
- ▸Comparison questions: 'How does Product A's refund policy compare to Product B's?' — needs separate retrievals for each product
- ▸Aggregation questions: 'What is the total headcount across all engineering teams?' — needs each team's page and a summation step
- ▸Reasoning chains: 'Is this company eligible for the R&D tax credit?' — needs the eligibility rules AND the company's qualifying activities, from different documents
- ▸Temporal questions: 'How has our data retention policy changed since 2023?' — needs multiple versions of the policy
Multi-hop is not the right tool if the real problem is chunking strategy. Before reaching for multi-hop, ask: would larger chunks, parent-child chunking, or document-level summaries make the question single-hop? Multi-hop adds latency and compounds errors — fix the chunking first if that's the actual gap.
Skip multi-hop when: (1) the question requires comparing entities that have rich relationship structure — use Graph RAG instead; (2) the answer exists in a single document but your chunks are too small to capture it — fix your chunking strategy; (3) you're answering factoid questions like 'What is the CEO's name?' — single-hop is always faster and cheaper.
Each hop: retrieve → analyze → refine. Stop when sufficient or hop limit reached.