Advanced11 min
Code Execution Agents
Building agents that generate, execute, and iterate on code: the REPL pattern, sandboxing with Docker and E2B, validation through type checking and tests, and data analysis agents.
Quick Reference
- →The REPL pattern: generate code → execute in sandbox → observe output → iterate until correct
- →Always sandbox code execution: Docker containers, E2B, or Pyodide — never run LLM-generated code on your host
- →Validation pipeline: syntax check → type check → lint → execute → verify output — catch errors early and cheaply
- →Data analysis agents: generate pandas/SQL code, execute it, and iterate based on the results
- →Limit execution time (30s default), memory (512MB), and network access — prevent resource exhaustion
The REPL Pattern: Generate, Execute, Iterate
Code execution agents are among the most powerful agent patterns because they can solve problems that are impossible through text alone — data analysis, mathematical computation, file manipulation, API integration testing. The core pattern is simple: the LLM writes code, a sandbox executes it, the LLM reads the output, and iterates if needed. The key challenge is making this loop safe and reliable.
| Phase | What Happens | Failure Mode | Mitigation |
|---|---|---|---|
| Generate | LLM writes code based on task | Syntax errors, wrong approach | System prompt with examples, preferred patterns |
| Validate | Static checks before execution | Type errors, linting issues | Run mypy/tsc, ESLint/Ruff before execution |
| Execute | Run code in sandbox | Runtime errors, infinite loops | Timeout (30s), memory limit (512MB) |
| Observe | Read stdout, stderr, return value | Incomplete output, misleading errors | Capture both stdout and stderr, truncate long output |
| Iterate | LLM fixes errors and tries again | Same error repeated, wrong fix | Include previous attempts in context, max 3 retries |