Debugging Retrieval Failures
Systematic approach to diagnosing RAG failures: is it a retrieval problem or a generation problem? Common failure modes, debugging toolkit, and fixing the most frequent issues.
Quick Reference
- →Step 1: Check what was retrieved — if the right docs aren't in the context, it's a retrieval problem
- →Step 2: If docs are correct but answer is wrong, it's a generation problem (prompt/model issue)
- →Common retrieval failures: chunk too small, embedding mismatch, stale index, missing metadata filter
- →Common generation failures: hallucination, ignoring context, wrong synthesis of correct information
- →Always log retrieved docs, similarity scores, and the final prompt for every query
Systematic Diagnosis Approach
When a RAG system gives a wrong answer, most teams immediately blame the LLM. But in 70-80% of cases, the root cause is retrieval — the right documents were never found. A systematic diagnosis separates retrieval failures from generation failures, then drills into the specific cause. This saves hours of debugging by narrowing the problem space immediately.
- ▸Step 1: Reproduce the failure — get the exact query that produced the wrong answer
- ▸Step 2: Inspect retrieved documents — are any of them relevant to the question?
- ▸Step 3: If no relevant docs retrieved → RETRIEVAL problem. Investigate chunking, embedding, and search configuration.
- ▸Step 4: If relevant docs retrieved but answer is wrong → GENERATION problem. Investigate prompt, model, and context formatting.
- ▸Step 5: If relevant docs are partially retrieved → RETRIEVAL+GENERATION hybrid problem. The context has some signal but not enough.
Always check the retrieved documents first. Print them. Read them. Is the answer to the user's question in those documents? If not, no prompt engineering or model upgrade will fix the problem. If yes, the fix is in the generation step — prompt, model, or context formatting.