Query Transformation
Query transformation closes the gap between how users ask questions and how documents are written — but most teams apply it too eagerly. This article covers when to transform, which technique fits which failure mode, how to measure improvement, and what it costs at scale.
Quick Reference
- →Never add query transformation without first measuring baseline recall — you may not need it
- →Multi-query expansion is the safest default: rephrases without hallucination risk, works across all domains
- →HyDE: generate a hypothetical answer and embed that instead of the question — best for vague or short queries
- →Decomposition: break multi-part questions into independent sub-queries, retrieve for each, then combine
- →Step-back prompting: ask a more general question to retrieve foundational context alongside the specific answer
- →Each technique adds one LLM call per query — at 100K queries/day, HyDE costs ~$97/day vs. $1/day for raw retrieval
- →Cache transformed queries in Redis (1h TTL): same query → same transformation → skip the LLM call
When NOT to Transform Queries
Before adding any query transformation, run your retrieval without it. Query transformation adds latency and cost to every query — you need to confirm it actually helps before paying that price. Many RAG failures have nothing to do with query vocabulary: the document isn't in the index, the chunks are too large, or the embedding model was chosen poorly. Transforming a query doesn't fix any of those.
Your queries are already precise and specific (technical IDs, exact model names, API endpoints). Your documents use the same vocabulary as users (internal team tooling, customer-facing product docs written with user language). Your retrieval problem is recall from a missing document, not vocabulary mismatch. Your system handles < 1,000 queries/day — at that scale, the engineering cost outweighs the benefit.
- ▸Query transformation is a vocabulary bridge — it doesn't fix missing documents, bad chunking, or wrong embedding models
- ▸Measure recall@5 on 50 representative queries before and after — if improvement < 15%, the latency and cost aren't worth it
- ▸Over-application is common: teams add HyDE universally and then wonder why responses got slower and costs tripled
- ▸Start with multi-query if you must start somewhere — it has the lowest risk and works across domains