AI Engineering Judgment/AI Debugging & Troubleshooting
Intermediate10 min

Why Your RAG Returns Garbage

RAG failures are either retrieval problems (wrong chunks retrieved) or generation problems (good context but bad synthesis). Learn to diagnose which stage is broken, fix common issues like chunk boundaries and embedding drift, and build a RAG debugging pipeline.

Quick Reference

  • Step 1 of every RAG debug: look at the retrieved chunks — is the answer in them?
  • If the answer IS in the chunks but the response is wrong → generation problem (prompt, context window)
  • If the answer is NOT in the chunks → retrieval problem (embeddings, chunking, query)
  • Common retrieval failure: chunk boundaries split the answer across two chunks
  • Common generation failure: model ignores relevant context because it is buried in irrelevant chunks
  • Always retrieve more chunks than you need, then re-rank before sending to the LLM

The RAG Debugging Flowchart

When a RAG system returns a bad answer, the first question is always: is it a retrieval problem or a generation problem? The debugging strategy is completely different for each. Retrieval problems require fixing your indexing pipeline (chunking, embeddings, metadata). Generation problems require fixing your synthesis prompt or context ordering.

SymptomLikely StageFirst Thing to Check
Answer is completely wrongRetrievalAre any relevant chunks in the top-10 results?
Answer is partially correct but misses key detailsRetrievalIs the relevant info split across chunk boundaries?
Answer contradicts the source documentsGenerationCheck if the model is hallucinating despite having correct context
Answer is generic/vague despite specific docs existingRetrievalIs the query embedding matching the right semantic space?
Answer cites the wrong sourceGenerationCheck chunk metadata — are source labels correct?
Answer is good for some queries, garbage for othersBothCompare retrieval quality across failing vs passing queries
RAG debugger that inspects every stage of the pipeline