Design an AI Search Engine
A hellointerview-style system design deep dive into AI-powered search engines like Perplexity, SearchGPT, and Google AI Overviews. Covers requirements, core entities, the search-to-synthesis pipeline, and three production deep dives: query decomposition and tool routing, retrieval and reranking pipelines, and citation-grounded synthesis with source verification. Each deep dive walks through naive, better, and production-grade approaches with trade-offs.
Quick Reference
- →The core pipeline is: query understanding, search planning, parallel retrieval, reranking, and grounded synthesis with inline citations
- →Complex queries are decomposed into 3-5 sub-queries executed in parallel — this is the single biggest latency optimization
- →Every factual claim must link to a verifiable source — hallucinated citations destroy user trust faster than wrong answers
- →Cross-encoder reranking after initial retrieval improves answer quality by 20-30 percent over embedding similarity alone
- →Stream the answer token-by-token so users see results in 2-3 seconds even though full generation takes 10-15 seconds
- →Tool routing dispatches non-web queries (math, code, real-time data) to specialized backends instead of forcing everything through web search
- →Semantic caching with embedding similarity matching eliminates redundant computation for repeated and near-duplicate queries
- →Diversity enforcement in the reranking stage prevents the answer from being dominated by a single source perspective
Understanding the Problem
An AI search engine receives a natural language query, searches the web for relevant sources, retrieves and ranks documents, and synthesizes a comprehensive answer with inline citations. Unlike traditional search engines that return a ranked list of links, this system returns a direct answer grounded in real sources — the user gets the information they need without clicking through ten blue links and reading five articles. Products like Perplexity, SearchGPT, and Google AI Overviews have made this a mainstream product category, processing hundreds of millions of queries daily. From a system design perspective, this is a rich problem because it touches query understanding (decomposing complex questions into searchable sub-queries), information retrieval (finding the most relevant content from billions of web pages), synthesis (generating coherent answers from multiple sources), and trust (ensuring every claim is verifiably grounded in a cited source). The trade-offs are sharp: too few sources and the answer is incomplete, too many and the synthesis is slow and expensive, bad citations and user trust collapses, slow response and users switch back to traditional search.
Perplexity Pro Search processes millions of queries daily using a multi-step research pipeline that decomposes complex queries into sub-queries and executes parallel searches before synthesizing. Google AI Overviews integrates directly with Google's search index, giving it access to the most comprehensive web corpus but facing the challenge of synthesizing from an overwhelming number of candidates. SearchGPT combines OpenAI's language models with web browsing capabilities, emphasizing conversational follow-up and multi-turn research sessions. All three demonstrate the same fundamental lesson: the quality of retrieved sources determines the quality of the synthesized answer — garbage in, garbage out.
This is fundamentally about building a system that retrieves the right information from the web, ranks it precisely, and synthesizes it into a trustworthy answer with verifiable citations. The three hardest sub-problems are: (1) decomposing complex queries into effective search strategies that cover all aspects of the question, (2) selecting the most relevant passages from dozens of candidate documents without losing diversity, and (3) generating answers where every factual claim is genuinely supported by a cited source rather than hallucinated.