Intermediate15 min

Vector Database Selection

How to choose a vector database for production RAG in 2026. Six databases compared honestly — quantization changes the cost math, migration is not a one-line change, and most teams will outgrow their first choice at a predictable threshold.

Quick Reference

→pgvector: runs in your existing Postgres, handles 5-10M vectors with HNSW — beyond that, query latency climbs past 50ms p99
→Pinecone: zero-ops serverless with BYOC in public preview; enforces 100 req/s per namespace; cold starts 2-10s after idle
→Qdrant v1.17+: $50M Series B (Mar 2026), Gridstore engine, quantization up to 64x memory reduction — biggest cost lever in 2026
→Weaviate v1.37: strongest native hybrid search (BM25 + vector) of any open-source vector DB; secure MCP server built in
→Milvus v2.6: RaBitQ 1-bit quantization at 1/32 original size with 95% recall — most aggressive compression available
→Chroma Cloud is GA with serverless search — no longer only a dev tool, but scale ceiling above 5M vectors is unproven
→Migration means re-embedding if dimensions differ, full reindex regardless, and 1-4 weeks of engineering time
→For most teams in 2026: start with pgvector, enable quantization when memory is tight, migrate to Qdrant or Milvus above 10M vectors

Do You Need a Dedicated Vector Database?

The first production decision is whether you need a vector database at all. Many teams jump to Pinecone or Weaviate before asking whether their current stack can handle the load. The answer depends on three variables: corpus size, query volume, and whether you need hybrid search (vector + keyword).

Start with need — not store names

The pgvector default

If your application already runs on Postgres, pgvector is the right default. It handles up to 5-10M vectors with HNSW indexing, supports SQL WHERE filters, and costs nothing additional. The only reasons to leave pgvector are: (1) you exceed 10M vectors and query latency climbs above your SLA, (2) you need hybrid search without building it yourself, or (3) you need multi-tenant isolation across thousands of tenants.

When keyword search is enough

If your corpus is English-language structured documents and your users search with domain-specific terms (product names, error codes, legal citations), Postgres full-text search with a GIN index often outperforms vector search. Vector search shines when queries are semantic ('how do I cancel my subscription') not lexical ('cancel subscription API endpoint'). Run both, measure recall on your golden set.

The 2026 Story: Quantization Changes the Cost Math

In 2025, running vector databases meant storing float32 embeddings at 4 bytes per dimension. At 1536 dimensions (OpenAI text-embedding-3-small), 10M vectors consumed ~58 GB of RAM just for the raw vectors — before HNSW graph overhead (typically 1.5-2x). In 2026, quantization eliminates most of that cost. Qdrant's Gridstore engine reduces memory up to 64x through 1.5-bit quantization. Milvus v2.6's RaBitQ achieves 1/32 the original size with 95% recall. These are not experimental features — they're production defaults in many deployments.

Six Databases, Honest Assessment

Each database below gets one paragraph and one key callout. Code examples belong in the LangChain vector stores article — this article is about decisions, not implementation.

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.