Industry AI Agent System Design/Enterprise Workflows
Intermediate30 min

Design an Enterprise RAG System

A hellointerview-style system design deep dive into enterprise RAG systems like Glean, Notion AI, and Confluence AI. Covers requirements, core entities, the retrieval-augmented generation pipeline, and three production deep dives: ingestion and chunking strategies, access control at retrieval time, and multi-hop retrieval with agentic self-correction. Each deep dive walks through naive, better, and production-grade approaches with trade-offs.

Quick Reference

  • Access control is THE critical enterprise requirement — if a document is restricted, the AI must not leak its contents in any form
  • Hybrid search (dense embeddings plus sparse BM25) outperforms either alone by 10-15 percent on enterprise benchmarks
  • Document-structure-aware chunking (respect headings, tables, code blocks) produces far better retrieval than fixed-size token splits
  • Pre-filter access control (apply ACL during vector search) is preferred over post-filter because the LLM never sees restricted content
  • Multi-hop retrieval chains results across documents — but only 20-30 percent of queries actually need it
  • 10M-plus documents require incremental indexing with change detection — full re-index is too slow and expensive
  • Stale and contradictory information across document versions is the hardest quality problem in enterprise RAG
  • Every answer must include citations linking to source documents so users can verify and follow up

Understanding the Problem

An enterprise RAG system answers employee questions by searching across millions of internal documents — Confluence pages, Google Docs, Slack messages, PDFs, emails, code repositories, and knowledge bases. It retrieves relevant information, synthesizes it into a coherent answer, and cites its sources. This is not a general-purpose search engine. It operates within an organization where access control is the defining constraint: if an employee does not have permission to view a document, the RAG system must not leak any information from that document — not in the answer, not in the citations, not even in the count of results found. Products like Glean, Notion AI, Confluence AI, and Microsoft Copilot have made enterprise RAG a mainstream product category. From a system design perspective, this is a rich problem because it touches document ingestion (handling wildly different formats from dozens of sources), chunking strategies (how you split documents determines retrieval quality), access control enforcement (the critical enterprise requirement that consumer RAG products do not face), and multi-hop reasoning (questions that span multiple documents require iterative retrieval). The trade-offs are sharp: chunk too small and you lose context, chunk too large and you dilute relevance, enforce access control too loosely and you leak confidential information, enforce it too strictly and you reduce answer quality.

Real project

Glean indexes across 100-plus enterprise applications and enforces access control at retrieval time for organizations with 10,000-plus employees. Notion AI provides Q&A over workspace content with page-level permission enforcement. Confluence AI answers questions across knowledge base articles, respecting space and page restrictions. The key insight from all deployments: access control enforcement is non-negotiable in enterprise — a single incident where the AI reveals information from a restricted document (layoff plans, salary data, acquisition details) can shut down the entire deployment.

The Core Framing

This is fundamentally about building a system that finds relevant information across millions of diverse documents while strictly enforcing who can see what. The three hardest sub-problems are: (1) chunking documents in a way that preserves meaning and context for retrieval, (2) enforcing access control at the retrieval level so the LLM never sees restricted content, and (3) answering questions that require synthesizing information from multiple documents through iterative retrieval.