Intermediate7 min
Matryoshka & Variable-Dimension Embeddings
Matryoshka embeddings let you truncate vectors to any dimension — trade off storage and speed for quality. Use 256 dims for fast filtering, 1536 dims for final ranking.
Quick Reference
- →Matryoshka embeddings encode information hierarchically — early dimensions capture the most important signal
- →Truncate to any dimension: 64, 128, 256, 512, 768, 1536 — lower = faster + cheaper, higher = more accurate
- →Supported: OpenAI text-embedding-3-small/large (dimensions param), Nomic, jina-embeddings-v3
- →Two-stage retrieval: fast search with 256 dims → rerank with full 1536 dims
- →Storage savings: 256 dims = 6x less storage than 1536 dims — significant at scale
- →Quality tradeoff: 256 dims retains ~95% of 1536 performance on most benchmarks
How Matryoshka Embeddings Work
Standard embeddings spread information across all dimensions equally. Matryoshka embeddings (named after Russian nesting dolls) are trained so that the first N dimensions contain the most important information. You can truncate the vector to any length and still get a meaningful — if less precise — representation.
| Dimensions | Storage (per vector) | Quality (vs full) | Speed | Use Case |
|---|---|---|---|---|
| 64 | 256 bytes | ~85% | Fastest | Rough filtering, deduplication |
| 256 | 1 KB | ~95% | Fast | Primary retrieval for most apps |
| 512 | 2 KB | ~97% | Medium | High-quality retrieval |
| 1536 | 6 KB | 100% | Slower | Maximum accuracy, reranking |