AI Engineering Judgment/When (Not) to Use AI
Intermediate11 min

Text-to-SQL & Structured Data

Natural language to SQL is one of the highest-value AI applications — and one of the hardest to get right. Learn the architecture, schema injection strategies, query validation, error recovery, and when to use code generation instead of SQL.

Quick Reference

  • Text-to-SQL accuracy depends more on schema descriptions than on the model — invest in metadata
  • Always validate generated SQL before execution: check for injection, expensive queries, and forbidden operations
  • Include table schemas, column descriptions, and 2-3 example queries in the prompt for best results
  • Error recovery loop (generate → execute → fix → retry) improves success rate from ~70% to ~90%
  • For complex analytics, consider generating Python/pandas code instead of SQL — it handles multi-step logic better
  • Never give the LLM write access to production databases — use read-only replicas with row limits

Text-to-SQL Architecture

A text-to-SQL system translates natural language questions into SQL queries against your database. The basic flow is: user question → schema injection → LLM generates SQL → validation → execution → format results. Each stage has failure modes that you need to handle explicitly.

Text-to-SQL Is Hard

Academic benchmarks (Spider, BIRD) report 70-85% accuracy on curated datasets. In production, with messy schemas, ambiguous questions, and complex joins, accuracy drops to 50-70% without careful engineering. The gap is closed by schema descriptions, few-shot examples, and error recovery — not by using a bigger model.

ComponentPurposeCommon Failure
Schema injectionTell the model about your tables and columnsToo many tables overwhelm the context; too few miss relevant tables
Query generationLLM writes the SQLWrong JOINs, incorrect column names, hallucinated tables
ValidationCheck the SQL before running itMissing validation allows destructive or expensive queries
ExecutionRun the SQL against the databaseTimeouts, permission errors, result set too large
Error recoveryFix and retry failed queriesNo recovery loop means 30% of queries just fail
Result formattingPresent results to the userRaw table output is hard to interpret for non-technical users
Core text-to-SQL pipeline with all stages