Agent Architecture/Autonomous Agents
Advanced11 min

Code Execution Agents

Building agents that generate, execute, and iterate on code: the REPL pattern, sandboxing with Docker and E2B, validation through type checking and tests, and data analysis agents.

Quick Reference

  • The REPL pattern: generate code → execute in sandbox → observe output → iterate until correct
  • Always sandbox code execution: Docker containers, E2B, or Pyodide — never run LLM-generated code on your host
  • Validation pipeline: syntax check → type check → lint → execute → verify output — catch errors early and cheaply
  • Data analysis agents: generate pandas/SQL code, execute it, and iterate based on the results
  • Limit execution time (30s default), memory (512MB), and network access — prevent resource exhaustion

The REPL Pattern: Generate, Execute, Iterate

Code execution agents are among the most powerful agent patterns because they can solve problems that are impossible through text alone — data analysis, mathematical computation, file manipulation, API integration testing. The core pattern is simple: the LLM writes code, a sandbox executes it, the LLM reads the output, and iterates if needed. The key challenge is making this loop safe and reliable.

PhaseWhat HappensFailure ModeMitigation
GenerateLLM writes code based on taskSyntax errors, wrong approachSystem prompt with examples, preferred patterns
ValidateStatic checks before executionType errors, linting issuesRun mypy/tsc, ESLint/Ruff before execution
ExecuteRun code in sandboxRuntime errors, infinite loopsTimeout (30s), memory limit (512MB)
ObserveRead stdout, stderr, return valueIncomplete output, misleading errorsCapture both stdout and stderr, truncate long output
IterateLLM fixes errors and tries againSame error repeated, wrong fixInclude previous attempts in context, max 3 retries