All Topics

Advanced RAG

Deep dive into retrieval-augmented generation: chunking strategies, hybrid search, re-ranking, graph RAG, and production RAG pipelines.

0/22
RAG Architecture Deep Dive

Two-pipeline architecture for RAG: the offline indexing pipeline and the online query pipeline. Components, data flow, and when RAG is the right approach vs fine-tuning or long context.

intermediate13 min
Chunking Strategies

Fixed-size, recursive, semantic, and document-aware chunking strategies. How chunk size affects retrieval quality, and how to choose the right approach for your data.

intermediate11 min
Embedding Models Compared

Comparing OpenAI, Cohere, and open-source embedding models for RAG. Dimensions, pricing, MTEB benchmarks, and Matryoshka embeddings for cost optimization.

intermediate11 min
Vector Database Selection

Comparing Pinecone, Weaviate, pgvector, Qdrant, and Chroma for production RAG. Features, pricing, scaling characteristics, and when to use each.

intermediate12 min
Hybrid Search

Combining keyword search (BM25) with semantic vector search for superior retrieval. Reciprocal Rank Fusion, weighted scoring, and when keyword search beats embeddings.

intermediate11 min
Re-Ranking

Using cross-encoder re-rankers to improve retrieval precision. Cohere Rerank, ColBERT, open-source re-rankers, and the cost/latency tradeoff of adding a reranking stage.

advanced10 min
Query Transformation

Techniques to improve retrieval by transforming user queries before search: HyDE, multi-query expansion, step-back prompting, and query decomposition.

advanced10 min
Metadata Filtering & Pre-Retrieval

Using metadata to narrow search scope before vector similarity. Attaching metadata during indexing, pre-filtering, self-querying retrievers, and combining filters with semantic search.

advanced10 min
Multi-Hop Retrieval

Handling questions that require combining information from multiple documents. Iterative retrieval, query decomposition into retrieval steps, and LangGraph-based multi-hop patterns.

advanced11 min
Graph RAG

Knowledge graphs for RAG: structured relationships vs semantic similarity, Microsoft's Graph RAG approach, building knowledge graphs from documents, and combining graph traversal with vector search.

advanced12 min
Agentic RAG

Moving from static RAG pipelines to agent-driven retrieval. The agent decides what to retrieve, when, from which source, and evaluates retrieval quality with self-reflection.

advanced11 min
Conversational RAG

Handling multi-turn conversations in RAG: resolving follow-up questions, history-aware retrieval, coreference resolution, and context window management across turns.

advanced10 min
Multimodal RAG

RAG beyond text: indexing images, tables, and diagrams from documents. PDF processing, multi-vector retrieval, and using vision models for table and image understanding.

advanced10 min
Self-Corrective RAG: Grade, Rewrite, Re-Retrieve

Corrective RAG adds document grading and question rewriting to the retrieval loop — if retrieved documents don't answer the question, the system rewrites the query and tries again.

advanced11 min
Router-Based RAG: Multi-Source Knowledge

Route queries to different retrieval sources based on classification — vector stores, SQL databases, APIs, or specialized indexes — for optimal answers from the right source.

advanced9 min
Ingestion Pipelines

Building production ingestion pipelines for RAG: batch vs streaming, incremental updates, change detection, and pipeline orchestration with Airflow and Prefect.

advanced12 min
Evaluating RAG Systems

Measuring RAG quality systematically: retrieval metrics (precision, recall, MRR, NDCG), generation metrics (faithfulness, relevance), the RAGAS framework, and building golden evaluation datasets.

advanced11 min
Debugging Retrieval Failures

Systematic approach to diagnosing RAG failures: is it a retrieval problem or a generation problem? Common failure modes, debugging toolkit, and fixing the most frequent issues.

advanced11 min
Cost & Latency Optimization

Reducing RAG costs and latency in production: embedding caching, dimensionality reduction, vector quantization, context stuffing strategies, and model tiering.

advanced10 min
Fine-Tuning Embeddings

Fine-tune embedding models on your domain data to improve retrieval quality by 10-25% — using contrastive learning with query-document pairs from your actual search logs.

advanced9 min
Matryoshka & Variable-Dimension Embeddings

Matryoshka embeddings let you truncate vectors to any dimension — trade off storage and speed for quality. Use 256 dims for fast filtering, 1536 dims for final ranking.

intermediate7 min
Bi-Encoder vs Cross-Encoder vs ColBERT

Three architectures for semantic retrieval: bi-encoders for fast search, cross-encoders for precise reranking, and ColBERT for the best of both — understand when to use each.

advanced9 min