Advanced RAG/Advanced Embeddings
Intermediate7 min

Matryoshka & Variable-Dimension Embeddings

Matryoshka embeddings let you truncate vectors to any dimension — trade off storage and speed for quality. Use 256 dims for fast filtering, 1536 dims for final ranking.

Quick Reference

  • Matryoshka embeddings encode information hierarchically — early dimensions capture the most important signal
  • Truncate to any dimension: 64, 128, 256, 512, 768, 1536 — lower = faster + cheaper, higher = more accurate
  • Supported: OpenAI text-embedding-3-small/large (dimensions param), Nomic, jina-embeddings-v3
  • Two-stage retrieval: fast search with 256 dims → rerank with full 1536 dims
  • Storage savings: 256 dims = 6x less storage than 1536 dims — significant at scale
  • Quality tradeoff: 256 dims retains ~95% of 1536 performance on most benchmarks

How Matryoshka Embeddings Work

Standard embeddings spread information across all dimensions equally. Matryoshka embeddings (named after Russian nesting dolls) are trained so that the first N dimensions contain the most important information. You can truncate the vector to any length and still get a meaningful — if less precise — representation.

DimensionsStorage (per vector)Quality (vs full)SpeedUse Case
64256 bytes~85%FastestRough filtering, deduplication
2561 KB~95%FastPrimary retrieval for most apps
5122 KB~97%MediumHigh-quality retrieval
15366 KB100%SlowerMaximum accuracy, reranking