AI Engineering Judgment/When (Not) to Use AI
Advanced12 min

AI for Code Generation

Building coding assistants that generate, execute, test, and iterate on code safely. Learn execution sandboxing, validation loops, context management strategies, and where code generation helps versus where it introduces dangerous complexity.

Quick Reference

  • Never execute LLM-generated code in the same process or on the same machine as your production service
  • The generate → test → fix → verify loop is the core pattern for reliable code generation
  • Context selection (which files to include in the prompt) has more impact on quality than model choice
  • Sandboxing options: Docker containers, E2B, AWS Lambda, or WebAssembly runtimes
  • Code generation is most valuable for boilerplate, tests, and data transformations — least valuable for complex business logic
  • Always include existing tests and type definitions in context — they constrain the output space dramatically

Code Generation Architecture

A production code generation system has four stages: context assembly (which files and documentation to include), generation (the LLM call), execution and validation (run the code, run the tests), and iteration (if tests fail, feed errors back and retry). Skipping any stage results in unreliable output.

The Trust Gradient

Not all generated code deserves the same level of trust. A generated unit test is low risk — if it is wrong, it just fails. A generated database migration is high risk — if it is wrong, you lose data. Match your validation strategy to the risk level of the generated code.

Code TypeRisk LevelValidation StrategyAutomation Level
Unit testsLowRun the tests — if they pass, they are probably correctFully automated
Data transformationsLow-MediumRun on sample data, compare with expected outputAutomated with spot checks
API endpointsMediumGenerate tests alongside, run integration testsSemi-automated
Database queriesMediumEXPLAIN plan review, read-only executionSemi-automated
Infrastructure codeHighPlan/dry-run only, human review requiredHuman-in-the-loop
Database migrationsCriticalNever auto-execute, always human reviewManual only