LangChain/Data Pipeline
Intermediate16 min

Embedding Models

Embedding models convert text into vectors for semantic search and RAG. This article covers the 2026 model landscape, cost math at scale, production patterns, and the hidden traps — especially the re-embedding trap when you switch models.

Quick Reference

  • embed_documents() for corpus indexing — embed_query() for search queries. They may use different internal prefixes.
  • text-embedding-3-small ($0.02/1M) is the English-text default. voyage-4-lite is the same price with stronger retrieval.
  • Gemini Embedding 2 (#1 MTEB multilingual, 3072 dims, 8192 token input) is free during preview.
  • Switching models means re-embedding your entire corpus. Plan before you pick.
  • Reduce dimensions to 512 via the Matryoshka `dimensions` parameter — cuts storage 66% with minimal recall loss.
  • CacheBackedEmbeddings wraps any embedder and skips API calls on repeated text. Always use it.
  • Batch embed_documents() in groups of 256 to stay within rate limits and maximize throughput.
  • Measure retrieval hit-rate@5 on a held-out eval set before calling embeddings 'good enough'.

When NOT to Use Embeddings

Before reaching for an embedding model, check whether keyword search would work. BM25 (or Elasticsearch/OpenSearch) is faster, cheaper, and produces more interpretable results for exact-match queries like product SKUs, legal citation numbers, or error codes. Embeddings win when the user's vocabulary differs from the document vocabulary — 'heart attack' should match 'myocardial infarction.' If your users search with the same words the documents use, you may not need embeddings at all.

SignalLean toward keywordLean toward embeddings
Query vocabularySame as documentsDifferent from documents
Query typeExact IDs, codes, SKUsConceptual, open-ended
Corpus size< 10K documents> 100K documents
Latency budget< 10 ms P9950–200 ms P99 acceptable
Cost budget< $0.01/dayWilling to pay for quality
Hybrid search first

Most production RAG systems end up with hybrid search: BM25 for exact matches + embeddings for semantic matches + a reranker to merge results. Start with keyword search, add embeddings only when you can measure the recall improvement.