Multi-Hop Retrieval

Multi-hop retrieval handles questions that require combining facts from multiple documents — but it costs 3–6× more per query than single-hop and compounds retrieval errors at each step. This article covers when multi-hop is worth it, which of three patterns to use, and how to evaluate and monitor it before trusting it in production.

Quick Reference

→Multi-hop: questions where no single chunk contains the full answer — multiple retrieval rounds, each informed by the previous
→Three patterns: iterative (reactive gap analysis), decomposition (proactive planning + parallel), LangGraph agent (state machine)
→Each hop costs ~200ms retrieval + ~500ms LLM gap analysis — a 3-hop query takes 2–3 seconds minimum
→Cost is 3–6× higher than single-hop; use GPT-5.4-mini for gap analysis, reserve full model for final answer only
→Errors compound: 85% per-hop precision → 85%³ = 61% end-to-end. At 80% precision → hop 3 is 51%
→Always compare multi-hop against single-hop baseline before deploying — it doesn't always win
→Route by query complexity first: classify before retrieval and keep the single-hop fast path for 70–80% of traffic

When (and When Not) to Use Multi-Hop

Standard RAG retrieves once and generates. Multi-hop retrieval performs multiple rounds, using each result to refine the next query. It's worth the cost and latency overhead for four specific question patterns:

▸Comparison questions: 'How does Product A's refund policy compare to Product B's?' — needs separate retrievals for each product
▸Aggregation questions: 'What is the total headcount across all engineering teams?' — needs each team's page and a summation step
▸Reasoning chains: 'Is this company eligible for the R&D tax credit?' — needs the eligibility rules AND the company's qualifying activities, from different documents
▸Temporal questions: 'How has our data retention policy changed since 2023?' — needs multiple versions of the policy

Multi-hop is not the right tool if the real problem is chunking strategy. Before reaching for multi-hop, ask: would larger chunks, parent-child chunking, or document-level summaries make the question single-hop? Multi-hop adds latency and compounds errors — fix the chunking first if that's the actual gap.

When to skip multi-hop

Skip multi-hop when: (1) the question requires comparing entities that have rich relationship structure — use Graph RAG instead; (2) the answer exists in a single document but your chunks are too small to capture it — fix your chunking strategy; (3) you're answering factoid questions like 'What is the CEO's name?' — single-hop is always faster and cheaper.

Each hop: retrieve → analyze → refine. Stop when sufficient or hop limit reached.

Three Multi-Hop Patterns Compared

There are three production-viable multi-hop patterns. The right one depends on whether you know the sub-queries upfront, how much observability you need, and whether your infrastructure supports async execution.

Pattern 1: Iterative Retrieve-Reason-Retrieve

Iterative retrieval is the simplest pattern. After the first retrieval, an LLM inspects the results and decides whether more context is needed. If yes, it generates a targeted follow-up query and retrieves again. This continues until the LLM has enough context or a maximum hop count is reached. Use this when you can't predict the sub-queries upfront.

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.