Advanced30 min

Design an AI Data Analyst Agent

A hellointerview-style system design deep dive into AI data analyst agents like ChatGPT Code Interpreter, Julius AI, and Databricks Genie. The agent translates natural language questions into SQL and Python, executes code against real databases in a sandbox, generates visualizations, and produces narrative insights. Covers requirements, core entities, the analysis loop, and three production deep dives: schema understanding and query generation, sandboxed code execution, and visualization and insight generation. Each deep dive walks through naive, better, and production-grade approaches with trade-offs.

Quick Reference

  • The agent writes and executes code against real data — read-only database access and sandboxed execution are non-negotiable safety requirements
  • Schema understanding uses adaptive retrieval with sample rows and column statistics to help the LLM write correct queries on unfamiliar databases
  • Self-correction by feeding SQL errors back to the LLM pushes accuracy from roughly 75 percent to over 90 percent within 2-3 retries
  • Sandboxed execution uses stateful notebook environments with read-only DB access and session persistence across conversation turns
  • Visualization uses narrative insights with interactive charts and follow-up suggestions to make analysis actionable for non-technical stakeholders
  • Multi-turn session state is critical — analysis builds on previous findings and re-running expensive queries wastes time and money
  • The iterative analysis loop (question, code generation, execution, interpretation, follow-up hypothesis) mirrors how a human analyst works
  • Auto-chart type selection based on data characteristics (time series gets line charts, category comparisons get bar charts) prevents misleading visualizations

Understanding the Problem

An AI data analyst agent is a system that receives analytical questions in natural language — 'What were our top 10 customers by revenue last quarter?' or 'Show me the monthly churn trend broken down by pricing tier' — and autonomously produces answers by writing and executing code against real databases and datasets. It generates SQL queries, runs Python for statistical analysis, creates visualizations, and explains findings in plain language. This is not a simple text-to-SQL translator. This is a full analytical assistant that conducts multi-step investigations, builds on previous findings across conversation turns, and produces the kind of output a human data analyst would deliver: clean charts, narrative insights, and suggested follow-up questions. Products like ChatGPT Code Interpreter, Julius AI, and Databricks Genie have made this a mainstream product category. From a system design perspective, this is a rich problem because it touches schema understanding (the agent must write correct queries against unfamiliar databases), execution safety (the agent runs arbitrary code against production data), state management (analysis is conversational and builds across turns), and presentation (raw query results must become actionable insights). The trade-offs are sharp: too little schema context and the agent writes wrong queries, too much freedom in execution and it can damage databases, too little session state and it re-runs expensive queries unnecessarily.

Real project

ChatGPT Code Interpreter pioneered the pattern of an LLM writing and executing Python in a sandboxed Jupyter environment, with access to uploaded files and the ability to generate charts and download results. Databricks Genie provides natural language access to enterprise data warehouses, using table metadata and column statistics to generate accurate SQL. Julius AI focuses on uploaded CSV and Excel analysis with automatic visualization. Each product makes different trade-offs on data connectivity, execution environment, and visualization capabilities, but all share the same fundamental architecture: an iterative loop of question understanding, code generation, sandboxed execution, and result interpretation.

The Core Framing

This is fundamentally about building a system that can understand database schemas, generate correct analytical code, execute it safely against real data, and present results as actionable insights. The three hardest sub-problems are: (1) writing correct queries against schemas the agent has never seen before, (2) executing arbitrary code safely without risking production databases, and (3) generating meaningful visualizations and narrative insights rather than dumping raw tables.