Advanced RAG/Search Quality
Intermediate11 min

Hybrid Search

Combining keyword search (BM25) with semantic vector search for superior retrieval. Reciprocal Rank Fusion, weighted scoring, and when keyword search beats embeddings.

Quick Reference

  • Hybrid search combines keyword (BM25) and semantic (vector) search for better recall than either alone
  • BM25 excels at exact matches: product codes, error messages, acronyms, proper nouns
  • Semantic search excels at meaning: paraphrases, synonyms, conceptual similarity
  • Reciprocal Rank Fusion (RRF) is the standard method for merging ranked lists from multiple retrievers
  • Typical weight split: 0.3-0.4 BM25 + 0.6-0.7 semantic — tune based on your query types

Why Semantic Search Alone Isn't Enough

Semantic search with embeddings works beautifully for natural language questions — 'How do I reset my password?' matches 'Steps to change your login credentials.' But it fails spectacularly on exact matches. When a user searches for 'ERR_CONNECTION_REFUSED' or 'SKU-12345' or 'HIPAA compliance', the embedding model may not place these near the right documents because these terms carry meaning through their exact form, not their semantic content.

Query TypeKeyword (BM25) WinsSemantic Wins
Error codesERR_SSL_PROTOCOL_ERRORWhat does the SSL error mean?
Product IDsiPhone 15 Pro Maxlatest Apple flagship phone
AcronymsGDPR complianceEuropean data privacy regulation
Code referencesuseEffect cleanup functionHow to handle side effects in React
Exact phrasesTerms and Conditions Section 4.2What are the cancellation rules?
The complementary strengths

BM25 and semantic search fail on different queries. BM25 misses semantic matches (synonyms, paraphrases). Semantic search misses lexical matches (exact terms, IDs, codes). Hybrid search covers both failure modes. In benchmarks, hybrid search typically improves recall@10 by 10-25% over either method alone.