Conversational RAG

How to make RAG handle multi-turn conversations: when to add it, how query condensation works under the hood, the LangGraph and legacy chain implementations, context budget math, failure modes, and the production architecture.

Quick Reference

→Follow-up questions are incomplete without context — 'What about the pricing?' retrieves nothing useful without knowing which product
→Query condensation rewrites a follow-up + conversation history into a standalone query — this is the only new step vs. single-turn RAG
→Condensation subsumes coreference resolution — no separate pronoun-replacement step needed
→LangGraph with checkpointers (MemorySaver, PostgresSaver) is the recommended approach for stateful conversational RAG
→create_history_aware_retriever and create_retrieval_chain moved to langchain-classic in LangChain 1.0 — new projects use LangGraph
→Every conversational turn costs at minimum 2 LLM calls (condensation + generation) — measure this before adding it
→Context budget is the key tradeoff: history grows ~300 tokens per turn, directly competing with retrieved docs
→Always log the condensed query — it is the single most important debugging signal for conversational RAG

When You Don't Need Conversational RAG

Conversational RAG is needed when: (1) users have multi-turn sessions, (2) follow-up questions reference prior context, and (3) those follow-ups trigger new retrieval. All three conditions must be true. If your system handles one-shot queries, or if users clarify their input without needing new documents retrieved, you don't need this — and adding it wastes compute on every turn after the first.

Pattern	Example	Need Conversational RAG?	Simpler Alternative
Single-turn Q&A	Search docs, get answer, done	No	Standard RAG
Multi-turn clarifications	User rephrases same question	No — no new retrieval needed	Pass full rephrased question as-is
Multi-turn topic continuity	'Tell me about Lambda' → 'What about pricing?'	Yes	—
Multi-turn topic shifts	Lambda pricing → then asks about DynamoDB	Yes, but history weight should drop on shift	Detect shift, truncate history

Cost check before you build this

Every conversational turn costs at minimum 2 LLM calls: one to condense the query, one to generate the answer. A 10-turn conversation costs 20+ LLM calls. If your analytics show >60% of sessions have only 1 turn, you're paying for infrastructure that handles 40% of traffic. Check your session length distribution before adding conversational RAG.

Three signals you actually need it: your logs show >40% of sessions have 3+ turns; users report 'it forgot what I asked'; or your retrieval logs show follow-up queries returning irrelevant results. If none of these are true, don't build it yet.

How Query Condensation Works

Condensation is the only addition to the standard RAG pipeline — everything else is unchanged

Building Conversational RAG

Two implementation paths: LangGraph (recommended for new projects) and the classic chain approach (still works via langchain-classic). LangGraph manages state automatically via checkpointers — no manual history list management. The classic chains approach is simpler for small scripts but requires you to manage history, persistence, and debugging yourself.

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.