Advanced RAG/Advanced Patterns
Advanced14 min

Conversational RAG

How to make RAG handle multi-turn conversations: when to add it, how query condensation works under the hood, the LangGraph and legacy chain implementations, context budget math, failure modes, and the production architecture.

Quick Reference

  • Follow-up questions are incomplete without context — 'What about the pricing?' retrieves nothing useful without knowing which product
  • Query condensation rewrites a follow-up + conversation history into a standalone query — this is the only new step vs. single-turn RAG
  • Condensation subsumes coreference resolution — no separate pronoun-replacement step needed
  • LangGraph with checkpointers (MemorySaver, PostgresSaver) is the recommended approach for stateful conversational RAG
  • create_history_aware_retriever and create_retrieval_chain moved to langchain-classic in LangChain 1.0 — new projects use LangGraph
  • Every conversational turn costs at minimum 2 LLM calls (condensation + generation) — measure this before adding it
  • Context budget is the key tradeoff: history grows ~300 tokens per turn, directly competing with retrieved docs
  • Always log the condensed query — it is the single most important debugging signal for conversational RAG

When You Don't Need Conversational RAG

Conversational RAG is needed when: (1) users have multi-turn sessions, (2) follow-up questions reference prior context, and (3) those follow-ups trigger new retrieval. All three conditions must be true. If your system handles one-shot queries, or if users clarify their input without needing new documents retrieved, you don't need this — and adding it wastes compute on every turn after the first.

PatternExampleNeed Conversational RAG?Simpler Alternative
Single-turn Q&ASearch docs, get answer, doneNoStandard RAG
Multi-turn clarificationsUser rephrases same questionNo — no new retrieval neededPass full rephrased question as-is
Multi-turn topic continuity'Tell me about Lambda' → 'What about pricing?'Yes
Multi-turn topic shiftsLambda pricing → then asks about DynamoDBYes, but history weight should drop on shiftDetect shift, truncate history
Cost check before you build this

Every conversational turn costs at minimum 2 LLM calls: one to condense the query, one to generate the answer. A 10-turn conversation costs 20+ LLM calls. If your analytics show >60% of sessions have only 1 turn, you're paying for infrastructure that handles 40% of traffic. Check your session length distribution before adding conversational RAG.

Three signals you actually need it: your logs show >40% of sessions have 3+ turns; users report 'it forgot what I asked'; or your retrieval logs show follow-up queries returning irrelevant results. If none of these are true, don't build it yet.