Advanced RAG/Advanced Patterns
Advanced16 min

Multi-Hop Retrieval

Multi-hop retrieval handles questions that require combining facts from multiple documents — but it costs 3–6× more per query than single-hop and compounds retrieval errors at each step. This article covers when multi-hop is worth it, which of three patterns to use, and how to evaluate and monitor it before trusting it in production.

Quick Reference

  • Multi-hop: questions where no single chunk contains the full answer — multiple retrieval rounds, each informed by the previous
  • Three patterns: iterative (reactive gap analysis), decomposition (proactive planning + parallel), LangGraph agent (state machine)
  • Each hop costs ~200ms retrieval + ~500ms LLM gap analysis — a 3-hop query takes 2–3 seconds minimum
  • Cost is 3–6× higher than single-hop; use GPT-5.4-mini for gap analysis, reserve full model for final answer only
  • Errors compound: 85% per-hop precision → 85%³ = 61% end-to-end. At 80% precision → hop 3 is 51%
  • Always compare multi-hop against single-hop baseline before deploying — it doesn't always win
  • Route by query complexity first: classify before retrieval and keep the single-hop fast path for 70–80% of traffic

When (and When Not) to Use Multi-Hop

Standard RAG retrieves once and generates. Multi-hop retrieval performs multiple rounds, using each result to refine the next query. It's worth the cost and latency overhead for four specific question patterns:

  • Comparison questions: 'How does Product A's refund policy compare to Product B's?' — needs separate retrievals for each product
  • Aggregation questions: 'What is the total headcount across all engineering teams?' — needs each team's page and a summation step
  • Reasoning chains: 'Is this company eligible for the R&D tax credit?' — needs the eligibility rules AND the company's qualifying activities, from different documents
  • Temporal questions: 'How has our data retention policy changed since 2023?' — needs multiple versions of the policy

Multi-hop is not the right tool if the real problem is chunking strategy. Before reaching for multi-hop, ask: would larger chunks, parent-child chunking, or document-level summaries make the question single-hop? Multi-hop adds latency and compounds errors — fix the chunking first if that's the actual gap.

When to skip multi-hop

Skip multi-hop when: (1) the question requires comparing entities that have rich relationship structure — use Graph RAG instead; (2) the answer exists in a single document but your chunks are too small to capture it — fix your chunking strategy; (3) you're answering factoid questions like 'What is the CEO's name?' — single-hop is always faster and cheaper.

User QueryRetrievevector search · k=4Analyze GapsLLM callEnoughinfo?YesGenerate Answerfinal LLM callNorefine querynext hop →cap at 2–3hops maxeach hop +~700ms

Each hop: retrieve → analyze → refine. Stop when sufficient or hop limit reached.