Advanced RAG/Search Quality
Advanced10 min

Query Transformation

Techniques to improve retrieval by transforming user queries before search: HyDE, multi-query expansion, step-back prompting, and query decomposition.

Quick Reference

  • HyDE: generate a hypothetical answer, embed that instead of the question — bridges the query-document gap
  • Multi-query: expand one question into 3-5 variations to improve recall across different phrasings
  • Step-back prompting: ask a more general question first to retrieve broader context
  • Query decomposition: break a complex question into sub-queries, retrieve for each independently
  • Query transformation addresses the fundamental vocabulary mismatch between how users ask and how documents are written

The Query-Document Vocabulary Mismatch

Users ask questions in natural language. Documents are written in technical prose. The gap between these two styles is the root cause of most retrieval failures. A user asks 'Why is my app slow?' but the relevant document says 'Performance optimization: reducing API response latency through connection pooling.' The embedding model may not bridge this gap because the surface-level vocabulary is completely different. Query transformation techniques reshape the user's question to better match document vocabulary.

Asymmetric retrieval

Queries are short (5-15 words) and in question form. Documents are long (100-500 words) and in declarative form. Embedding models are trained on both, but the embedding space is inherently asymmetric. Query transformation makes the query look more like a document, or generates multiple query variants to cast a wider net.

  • Short query, long document: user asks 'auth error' but the doc explains OAuth2 token refresh flows in detail
  • Question vs statement: user asks 'How does X work?' but the doc states 'X operates by...'
  • Colloquial vs technical: user says 'make it faster' but docs discuss 'latency optimization' and 'throughput engineering'
  • Incomplete context: user asks 'What about pricing?' referencing a product discussed 3 messages ago