Advanced RAG/RAG Fundamentals
Intermediate11 min

Embedding Models Compared

Comparing OpenAI, Cohere, and open-source embedding models for RAG. Dimensions, pricing, MTEB benchmarks, and Matryoshka embeddings for cost optimization.

Quick Reference

  • text-embedding-3-small: best cost/quality ratio for most RAG use cases ($0.02/1M tokens)
  • text-embedding-3-large: highest quality from OpenAI, supports dimension reduction ($0.13/1M tokens)
  • Cohere embed-v3: strongest multilingual support with 100+ languages
  • Open-source BGE/E5/GTE: self-hosted, no API costs, competitive quality on MTEB
  • Matryoshka embeddings let you truncate dimensions (3072 → 256) to save 90%+ storage with ~5% quality loss

OpenAI Embedding Models

OpenAI's text-embedding-3 family is the most widely used in production RAG systems. The 'small' variant offers an excellent cost-to-quality ratio and is sufficient for the majority of use cases. The 'large' variant scores higher on benchmarks but costs 6.5x more. Both support Matryoshka dimension reduction, letting you trade a small amount of quality for significant storage savings.

ModelDimensionsMTEB ScorePrice/1M tokensMax Tokens
text-embedding-3-small153662.3$0.028191
text-embedding-3-large307264.6$0.138191
text-embedding-ada-002 (legacy)153661.0$0.108191
Using OpenAI embeddings with dimension reduction
Start with text-embedding-3-small

For 95% of RAG use cases, text-embedding-3-small at $0.02/1M tokens is the right choice. Embedding a 10,000-page knowledge base costs roughly $2-5. Only upgrade to 'large' if you've measured a meaningful retrieval quality difference on your specific dataset.