Advanced RAG/Advanced Patterns
Advanced16 min

Self-Corrective RAG: Grade, Rewrite, Re-Retrieve

Corrective RAG adds document grading and query rewriting to the retrieval loop — if retrieved documents don't answer the question, the system rewrites the query and retrieves again. This article covers when the complexity is justified, the real cost and latency tradeoffs, and how to build, evaluate, and monitor the grading loop in production.

Quick Reference

  • Only add self-corrective RAG if ≥20% of your queries return zero relevant documents in top-5 — measure first
  • The grade-rewrite loop adds ~$0.009 and ~2s per query at zero retries; ~$0.019 and ~5.7s with one retry (sequential grading)
  • Use structured output (GradeDocuments) for consistent yes/no relevance judgments with reasoning
  • Hybrid grading: skip LLM for scores <0.3 (reject) and >0.8 (accept); grade only the 0.3–0.8 band
  • Set max retries to 2 — a third retry almost never finds documents that two failed to retrieve
  • Build a grading eval set (30+ queries, human-labeled) before deploying — a miscalibrated grader makes things worse

When Self-Corrective RAG Is Overkill

Self-corrective RAG solves one specific problem: the retriever returns irrelevant documents because the query was ambiguous, jargon-heavy, or terminologically mismatched with the corpus. If that's not your problem, the grading loop adds cost and latency without improving answers.

Measure before you build

Take 50 real queries from your production traffic. For each, retrieve top-5 documents and manually check whether at least one is relevant. If fewer than 20% of queries have zero relevant documents in top-5, self-corrective RAG won't help — fix your chunking strategy or embedding model first.

Your situationRight fix
Standard RAG returns relevant docs for 90%+ of queriesDon't add self-corrective RAG — you're solving the wrong problem
Retrieval works but answers are hallucinatedAdd answer validation or a better generation model — document grading won't help
Relevant docs exist but aren't retrieved — query/corpus terminology mismatchSelf-corrective RAG with query rewriting is the right fix
Latency budget is under 2 secondsSequential grading alone takes ~2s — don't add this loop
Small, well-indexed corpus with consistent terminologyStandard RAG is sufficient — the grader will rarely trigger

The case where self-corrective RAG clearly earns its complexity: users write queries using colloquial or domain-inconsistent language ('the thing that makes blood clot' instead of 'coagulation cascade'), and your corpus uses formal terminology. The grader catches the mismatch and the rewriter bridges it.