Hybrid Search
Combining keyword search (BM25) with semantic vector search for superior retrieval. Reciprocal Rank Fusion, weighted scoring, and when keyword search beats embeddings.
Quick Reference
- →Hybrid search combines keyword (BM25) and semantic (vector) search for better recall than either alone
- →BM25 excels at exact matches: product codes, error messages, acronyms, proper nouns
- →Semantic search excels at meaning: paraphrases, synonyms, conceptual similarity
- →Reciprocal Rank Fusion (RRF) is the standard method for merging ranked lists from multiple retrievers
- →Typical weight split: 0.3-0.4 BM25 + 0.6-0.7 semantic — tune based on your query types
Why Semantic Search Alone Isn't Enough
Semantic search with embeddings works beautifully for natural language questions — 'How do I reset my password?' matches 'Steps to change your login credentials.' But it fails spectacularly on exact matches. When a user searches for 'ERR_CONNECTION_REFUSED' or 'SKU-12345' or 'HIPAA compliance', the embedding model may not place these near the right documents because these terms carry meaning through their exact form, not their semantic content.
| Query Type | Keyword (BM25) Wins | Semantic Wins |
|---|---|---|
| Error codes | ERR_SSL_PROTOCOL_ERROR | What does the SSL error mean? |
| Product IDs | iPhone 15 Pro Max | latest Apple flagship phone |
| Acronyms | GDPR compliance | European data privacy regulation |
| Code references | useEffect cleanup function | How to handle side effects in React |
| Exact phrases | Terms and Conditions Section 4.2 | What are the cancellation rules? |
BM25 and semantic search fail on different queries. BM25 misses semantic matches (synonyms, paraphrases). Semantic search misses lexical matches (exact terms, IDs, codes). Hybrid search covers both failure modes. In benchmarks, hybrid search typically improves recall@10 by 10-25% over either method alone.