Computer Use Agents
When to use computer use versus API automation, the screenshot-analyze-act loop with the current computer_20251124 tool, real cost math that shows context growth dominates price, Docker and ephemeral VM sandboxing with prompt injection defense, verification and stuck detection, production failure modes, and a reference implementation using the latest Anthropic API.
Quick Reference
- →Computer use loop: screenshot → send to Claude → receive coordinate action → execute → repeat — each cycle takes 3-8 seconds
- →Current tool type: computer_20251124 — requires beta header anthropic-beta: computer-use-2025-11-24 on every request
- →Supported models: Opus 4.7 (high-res up to 2576px, no coordinate scaling needed), Opus 4.6, Sonnet 4.6
- →Cost grows quadratically without pruning: by step 20, you resend ~550K tokens of screenshot history per request — use history_limit=5
- →Stuck detection: compare 3 consecutive screenshots with pixel diff; if change < 2%, force Claude to try a different approach
- →Prompt injection via screen content is a unique attack surface — a page can display text that overrides your instructions
- →Companion tools: text_editor_20250728 and bash_20250124 can run alongside computer use for file editing and shell commands
- →Use computer use only when no API exists — if a structured endpoint is available, it will be 10-100x faster and cheaper
Should I Use Computer Use at All?
Computer use is the most expensive and fragile automation option available. Before building a computer use agent, ask one question: does the application expose an API, SDK, or browser-parseable DOM? If yes, use that. Computer use is for applications where no structured interface exists — legacy desktop software, mainframe terminals, old ERP systems that predate APIs, or cross-application workflows that span apps with no shared integration layer.
| Scenario | Use Computer Use? | Why / Alternative |
|---|---|---|
| Web app with a REST API | No | Use the API — 100x faster, zero visual fragility |
| Browser task (form fill, scraping) | No | Use Playwright or a browser agent that reads the DOM |
| Legacy desktop ERP with no API | Yes | No structured interface exists |
| Mainframe terminal emulator | Yes | xdotool/xvfb is the only available interface |
| Cross-app desktop workflow spanning 3 non-API apps | Yes | No shared integration layer |
| One-time data migration from old desktop app | Maybe | Worth it once; for recurring tasks, build an API connector |
| Daily recurring task via UI | No | Invest in the API integration — CU will break on every UI update |
A 20-step task with Sonnet 4.6 costs $1.74–$2.90 depending on whether you manage context history (see Section 3 for the math). The equivalent API call chain costs under $0.01. Computer use is not a shortcut — it is a last resort for applications with no API.
Anthropic also offers Claude Managed Agents (public beta as of April 2026) — a fully managed agent harness with built-in sandboxing, checkpointing, and server-sent event streaming. If you want Anthropic to handle the orchestration layer instead of building your own, Managed Agents is worth evaluating before hand-rolling a computer use agent.