Vector Stores

How to choose, configure, and operate a vector store for production RAG — covering index types, cost math, failure modes, multi-tenancy, and migration strategy across FAISS, Chroma, pgvector, Qdrant, and Pinecone.

Quick Reference

→VectorStore interface: add_documents(), similarity_search(), similarity_search_with_score(), as_retriever()
→FAISS: zero infra, OOMs at ~10M 1536-dim vectors on 16GB RAM — use for dev, CI fixtures, and offline batch
→Chroma v1.5+: Rust core, hybrid BM25+vector, Cloud GA — no longer just a SQLite wrapper
→pgvector 0.8+: iterative index scans fix overfiltering; use HNSW for production, IVFFlat only if RAM is tight
→Pinecone Standard: $50/mo base + $4/M write units, $16/M read units, $0.33/GB/mo — prices as of April 2026
→Qdrant: best self-hosted option for >1M vectors; 1.5-bit quantization cuts RAM by up to 64x
→HNSW is the default in Chroma, Qdrant, and pgvector 0.7+ — only switch to IVFFlat if build time or RAM is the hard constraint

Should You Even Use a Dedicated Vector Store?

Most RAG prototypes reach for a vector store on day one. That's often wrong. A dedicated vector store adds infra, ops burden, and a consistency boundary. Before adding one, answer two questions: does your use case require semantic similarity that keyword search genuinely can't deliver, and do you have more than ~10K documents?

When full-text search is enough

PostgreSQL's `tsvector` + GIN index beats vector search on exact-term queries (product names, IDs, error codes), structured filters (date range, status, category), and corpora under 50K short documents. It also costs nothing extra if you're already running Postgres. Run both and measure recall@10 on your actual queries before committing to a vector store.

If your queries are genuinely semantic — 'find policies similar in intent to this clause' — and your corpus exceeds 50K documents, a vector store earns its place. Below that threshold, FAISS in-memory is often enough and costs zero infra.

Start with need — not store names

The vectorize-everything trap

Teams sometimes embed their entire database because it's easy. The cost: every retrieval call now hits a vector index instead of a B-tree, latency rises from microseconds to milliseconds, and result quality drops for structured queries. Embed only the fields where semantic similarity matters — document bodies, support tickets, long-form descriptions. Never embed IDs, timestamps, or enumerated states.

Index Types: The Real Engineering Decision

Every vector store exposes an index type that controls the recall/speed/memory tradeoff. This is the decision that actually matters in production — more than which store you pick. Getting it wrong means either slow queries at scale or silently missing relevant documents.

Store Comparison: Cost, Latency, and Scale

The choice of vector store is primarily a cost and ops decision, not a capability decision. All major stores support HNSW, metadata filtering, and hybrid search. What differs is the operational model and the bill.

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.