Intermediate16 min

Hybrid Search

Hybrid search (BM25 + semantic) outperforms either method alone on mixed-query corpora — but not on every corpus. This article covers when to add it, how to fuse results with RRF, how to measure improvement, when LangChain's in-memory BM25 breaks, and how reranking fits in.

Quick Reference

→Hybrid search = BM25 keyword search + vector semantic search, results merged via Reciprocal Rank Fusion (RRF)
→BM25 excels at exact matches: error codes, product IDs, acronyms, proper nouns
→Semantic search excels at meaning: paraphrases, synonyms, conceptual questions
→RRF score = sum of 1/(k + rank) across retrievers; k=60 from the original 2009 paper
→LangChain's BM25Retriever is in-memory only — rebuild on every restart, breaks past ~100K docs
→After hybrid retrieval: add a cross-encoder reranker for precision; skip if latency budget is tight
→Tune weights against a labeled query set — 0.4/0.6 is a starting point, not a universal answer

When Hybrid Search Is Overkill

Add complexity only when you've confirmed the failure mode. Hybrid search adds indexing overhead, two retrieval paths, and a fusion step. Before adding it, check whether you actually have the retrieval gap it solves.

Corpus / Query Pattern	Use Hybrid?	Reason
Pure Q&A knowledge base, users ask in natural language	No — semantic alone is fine	No exact-term queries; BM25 adds noise
Product catalog searched by SKU / model number	No — BM25 alone is fine	No semantic gap; embeddings waste compute
Mixed: some users type error codes, others ask questions	Yes	Classic hybrid case — both gaps exist
Code documentation + developer queries	Yes	Exact function/symbol matches + semantic intent
Legal / compliance docs with exact citations	Yes	Statute numbers and conceptual questions coexist
Chat history over tiny corpus (<500 docs)	No — doesn't matter	Recall is fine with either; not worth the complexity

Measure first

If you add hybrid search without measuring recall before and after, you have no idea whether it helped. Run 20 representative queries through BM25-only and semantic-only, score recall@5 manually, then add hybrid and remeasure. If you skip this, you're guessing.

How Semantic and Keyword Search Fail Differently

BM25 and semantic search have complementary blind spots. BM25 treats tokens as opaque strings — it finds 'ERR_SSL_PROTOCOL_ERROR' exactly, but 'how do I fix the SSL handshake problem?' won't match that document because 'SSL handshake problem' shares no tokens. Semantic search inverts this: it encodes meaning into a vector, so paraphrases match well, but 'SKU-10294-B' and 'SKU-10295-B' may have nearly identical embeddings because the model learned that product codes are semantically similar regardless of the suffix.

Combining Results: RRF, Weighted Scoring, and RelativeScoreFusion

Once you have two ranked lists, you need to merge them. Three strategies exist: Reciprocal Rank Fusion (standard), weighted score normalization (fragile), and Weaviate's RelativeScoreFusion (a middle ground). RRF is almost always the right default.

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.