Advanced RAG/Advanced Patterns
Advanced10 min

Conversational RAG

Handling multi-turn conversations in RAG: resolving follow-up questions, history-aware retrieval, coreference resolution, and context window management across turns.

Quick Reference

  • Follow-up questions reference prior context: 'What about their pricing?' needs to know what 'their' refers to
  • History-aware retriever: condenses chat history + follow-up into a standalone search query
  • Coreference resolution: replacing pronouns and references with explicit entity names before retrieval
  • Context window management: summarize or drop old messages to stay within token limits
  • LangChain's create_history_aware_retriever handles the condensation step automatically

The Follow-Up Problem

In a conversation, users naturally use follow-up questions that reference previous context. 'Tell me about AWS Lambda' → 'What about the pricing?' → 'How does it compare to Cloud Functions?'. Each follow-up question is incomplete without the conversation history. If you search the vector store for 'What about the pricing?', you'll get random pricing information because the query doesn't specify what product. The retriever needs to understand that 'the pricing' refers to 'AWS Lambda pricing'.

The problem: follow-up queries without context
The core insight

The fix is simple in concept: before searching, rewrite the follow-up question as a standalone question that includes all necessary context from the conversation history. 'What about the pricing?' + history about Lambda → 'What is the pricing for AWS Lambda?'. This is called query condensation or history-aware retrieval.