RAG Architecture Deep Dive
Two-pipeline architecture for RAG: the offline indexing pipeline and the online query pipeline. Components, data flow, and when RAG is the right approach vs fine-tuning or long context.
Quick Reference
- →RAG has two pipelines: indexing (offline, batch) and query (online, real-time)
- →Indexing pipeline: load → split → embed → store in vector database
- →Query pipeline: embed query → retrieve → (optional rerank) → generate answer
- →RAG beats fine-tuning when data changes frequently or you need source attribution
- →Long-context models reduce but don't eliminate the need for RAG — cost and latency still matter
- →The retriever is the most critical component — bad retrieval guarantees bad answers
Two-Pipeline Architecture
Every production RAG system is actually two separate pipelines that share a vector store. The indexing pipeline runs offline (or on a schedule) and converts raw documents into searchable embeddings. The query pipeline runs in real-time and uses those embeddings to find relevant context before generating an answer. Understanding this separation is fundamental — the indexing pipeline is a data engineering problem, while the query pipeline is an inference-time optimization problem.
Document Loaders → Text Splitters → Embedding Model → Vector Store. This runs when new documents arrive. It's batch-oriented, can be slow, and is optimized for throughput. You run this once per document, not once per query.
User Query → Query Embedding → Vector Search → (Optional: Rerank) → Context Assembly → LLM Generation → Answer. This runs on every user request. It must be fast (< 2 seconds total) and is optimized for latency and relevance.
| Aspect | Indexing Pipeline | Query Pipeline |
|---|---|---|
| Runs | Offline / scheduled | Real-time per request |
| Optimized for | Throughput | Latency |
| Bottleneck | Embedding API rate limits | Vector search + LLM generation |
| Failure impact | Stale or missing data | Wrong or no answer |
| Cost driver | Embedding tokens | LLM tokens + vector queries |