Advanced RAG/RAG Fundamentals
Intermediate16 min

Embedding Models Compared

The 2026 embedding landscape has 4 commercial providers, a tier of free open-weight models that match commercial quality, and Matryoshka compression that cuts storage 92% with minimal recall loss. This article covers how to choose, evaluate, and migrate embedding models for production RAG.

Quick Reference

  • text-embedding-3-small / voyage-4-lite: $0.02/1M tokens — best default for English RAG
  • Voyage 4 family (Jan 2026): shared embedding space across tiers, voyage-4-nano is free + open-weight
  • Cohere Embed v4: multimodal (text + images), 1024 dims, $0.12/1M tokens
  • Google Gemini Embedding 2: 5 input types (text, image, video, audio, docs), tops MTEB v1 English
  • Open-source leaders (2026): Jina v5, BGE-M3, Qwen3-Embedding-8B — competitive with commercial on benchmarks
  • Matryoshka: cut 3072 → 256 dims for 92% storage savings, ~6% recall loss — measure on your data
  • MTEB v2 scores ≠ MTEB v1 — never compare numbers across benchmark versions

The 2026 Embedding Landscape

In 2024, the embedding model choice was: OpenAI, Cohere, or self-host BGE. In 2026, there are three distinct tiers, four major commercial providers, and open-weight models that trade blows with commercial APIs on benchmarks. The most important development: multimodal embeddings are production-ready. Cohere Embed v4, Google Gemini Embedding 2, and Voyage multimodal-3.5 all handle mixed text-and-image content in a single embedding space. If your RAG corpus contains PDFs with figures, product images, or screenshots, single-model multimodal retrieval is now a realistic option.

Embedding Provider Landscape · April 2026Commercial APIPay per token · no infraOpenAI$0.02–$0.13/1M · textVoyage AI$0.02–$0.12/1M · textCohere Embed v4$0.12/1M · text + imageGemini Embedding 2$0.20/1M · 5 input typesOpen-Weight / FreeDownload & run · no costJina v532K ctx · 119 langs · Apache 2BGE-M38K ctx · multilingual · MITQwen3-Emb-8B#1 open MTEB · Apache 2voyage-4-nanofree · open-weight · Apache 2Self-HostedGPU/CPU server · full controlAny model aboveno data leaves infravLLM / TEIinference serverNVIDIA NV-Embed-v272.31 MTEB · open-weightAir-gappedcompliance / regulated envs

embedding provider tiers · 3 deployment modes · April 2026

ModelDimsMTEB v1 (Eng)Price/1MMultimodalContext
text-embedding-3-small153662.3$0.02No8K
voyage-4-lite1024~63$0.02No32K
voyage-41024~68$0.06No32K
Cohere Embed v41024$0.12Text + image128K
Gemini Embedding 2307268.32$0.20 (free preview)5 types8K
text-embedding-3-large307264.6$0.13No8K
MTEB v1 and v2 scores are not comparable

MTEB v2 launched in 2026 with a different task mix and scoring methodology. A model scoring 72 on MTEB v2 cannot be compared to a model scoring 68 on MTEB v1. Always check which benchmark version a score refers to before using it to choose a model.