Advanced14 min

Query Transformation

Query transformation closes the gap between how users ask questions and how documents are written — but most teams apply it too eagerly. This article covers when to transform, which technique fits which failure mode, how to measure improvement, and what it costs at scale.

Quick Reference

→Never add query transformation without first measuring baseline recall — you may not need it
→Multi-query expansion is the safest default: rephrases without hallucination risk, works across all domains
→HyDE: generate a hypothetical answer and embed that instead of the question — best for vague or short queries
→Decomposition: break multi-part questions into independent sub-queries, retrieve for each, then combine
→Step-back prompting: ask a more general question to retrieve foundational context alongside the specific answer
→Each technique adds one LLM call per query — at 100K queries/day, HyDE costs ~$97/day vs. $1/day for raw retrieval
→Cache transformed queries in Redis (1h TTL): same query → same transformation → skip the LLM call

When NOT to Transform Queries

Before adding any query transformation, run your retrieval without it. Query transformation adds latency and cost to every query — you need to confirm it actually helps before paying that price. Many RAG failures have nothing to do with query vocabulary: the document isn't in the index, the chunks are too large, or the embedding model was chosen poorly. Transforming a query doesn't fix any of those.

Skip transformation when:

Your queries are already precise and specific (technical IDs, exact model names, API endpoints). Your documents use the same vocabulary as users (internal team tooling, customer-facing product docs written with user language). Your retrieval problem is recall from a missing document, not vocabulary mismatch. Your system handles < 1,000 queries/day — at that scale, the engineering cost outweighs the benefit.

▸Query transformation is a vocabulary bridge — it doesn't fix missing documents, bad chunking, or wrong embedding models
▸Measure recall@5 on 50 representative queries before and after — if improvement < 15%, the latency and cost aren't worth it
▸Over-application is common: teams add HyDE universally and then wonder why responses got slower and costs tripled
▸Start with multi-query if you must start somewhere — it has the lowest risk and works across domains

The Query-Document Gap

When query transformation is warranted, it addresses one core problem: users and documents use different words to describe the same thing. A user asks 'Why is my app slow?' — the relevant document says 'Performance optimization: reducing API response latency through connection pooling.' The embedding model places these in different regions of the vector space because the surface-level vocabulary diverges. Query transformation reshapes the query to look more like a document, or generates multiple query variants to cast a wider net.

Four Techniques, One Decision

The four techniques address different failure modes. The decision isn't which technique is 'best' — it's which failure mode you have. Mismatching technique to failure mode wastes latency without fixing retrieval.

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.