LangChain/Data Pipeline
Intermediate18 min

Vector Stores

How to choose, configure, and operate a vector store for production RAG — covering index types, cost math, failure modes, multi-tenancy, and migration strategy across FAISS, Chroma, pgvector, Qdrant, and Pinecone.

Quick Reference

  • VectorStore interface: add_documents(), similarity_search(), similarity_search_with_score(), as_retriever()
  • FAISS: zero infra, OOMs at ~10M 1536-dim vectors on 16GB RAM — use for dev, CI fixtures, and offline batch
  • Chroma v1.5+: Rust core, hybrid BM25+vector, Cloud GA — no longer just a SQLite wrapper
  • pgvector 0.8+: iterative index scans fix overfiltering; use HNSW for production, IVFFlat only if RAM is tight
  • Pinecone Standard: $50/mo base + $4/M write units, $16/M read units, $0.33/GB/mo — prices as of April 2026
  • Qdrant: best self-hosted option for >1M vectors; 1.5-bit quantization cuts RAM by up to 64x
  • HNSW is the default in Chroma, Qdrant, and pgvector 0.7+ — only switch to IVFFlat if build time or RAM is the hard constraint

Should You Even Use a Dedicated Vector Store?

Most RAG prototypes reach for a vector store on day one. That's often wrong. A dedicated vector store adds infra, ops burden, and a consistency boundary. Before adding one, answer two questions: does your use case require semantic similarity that keyword search genuinely can't deliver, and do you have more than ~10K documents?

When full-text search is enough

PostgreSQL's `tsvector` + GIN index beats vector search on exact-term queries (product names, IDs, error codes), structured filters (date range, status, category), and corpora under 50K short documents. It also costs nothing extra if you're already running Postgres. Run both and measure recall@10 on your actual queries before committing to a vector store.

If your queries are genuinely semantic — 'find policies similar in intent to this clause' — and your corpus exceeds 50K documents, a vector store earns its place. Below that threshold, FAISS in-memory is often enough and costs zero infra.

Need semantic search?NoPostgres Full-Texttsvector + GIN indexYesCorpus > 500Kvectors?NoFAISS / Chromano infra neededYesAlready usingPostgres?Yespgvectorno new infraNoQdrant / Pineconemanaged or self-hosted

Start with need — not store names

The vectorize-everything trap

Teams sometimes embed their entire database because it's easy. The cost: every retrieval call now hits a vector index instead of a B-tree, latency rises from microseconds to milliseconds, and result quality drops for structured queries. Embed only the fields where semantic similarity matters — document bodies, support tickets, long-form descriptions. Never embed IDs, timestamps, or enumerated states.