Agent Architecture/Autonomous Agents
Advanced18 min

Computer Use Agents

When to use computer use versus API automation, the screenshot-analyze-act loop with the current computer_20251124 tool, real cost math that shows context growth dominates price, Docker and ephemeral VM sandboxing with prompt injection defense, verification and stuck detection, production failure modes, and a reference implementation using the latest Anthropic API.

Quick Reference

  • Computer use loop: screenshot → send to Claude → receive coordinate action → execute → repeat — each cycle takes 3-8 seconds
  • Current tool type: computer_20251124 — requires beta header anthropic-beta: computer-use-2025-11-24 on every request
  • Supported models: Opus 4.7 (high-res up to 2576px, no coordinate scaling needed), Opus 4.6, Sonnet 4.6
  • Cost grows quadratically without pruning: by step 20, you resend ~550K tokens of screenshot history per request — use history_limit=5
  • Stuck detection: compare 3 consecutive screenshots with pixel diff; if change < 2%, force Claude to try a different approach
  • Prompt injection via screen content is a unique attack surface — a page can display text that overrides your instructions
  • Companion tools: text_editor_20250728 and bash_20250124 can run alongside computer use for file editing and shell commands
  • Use computer use only when no API exists — if a structured endpoint is available, it will be 10-100x faster and cheaper

Should I Use Computer Use at All?

Computer use is the most expensive and fragile automation option available. Before building a computer use agent, ask one question: does the application expose an API, SDK, or browser-parseable DOM? If yes, use that. Computer use is for applications where no structured interface exists — legacy desktop software, mainframe terminals, old ERP systems that predate APIs, or cross-application workflows that span apps with no shared integration layer.

ScenarioUse Computer Use?Why / Alternative
Web app with a REST APINoUse the API — 100x faster, zero visual fragility
Browser task (form fill, scraping)NoUse Playwright or a browser agent that reads the DOM
Legacy desktop ERP with no APIYesNo structured interface exists
Mainframe terminal emulatorYesxdotool/xvfb is the only available interface
Cross-app desktop workflow spanning 3 non-API appsYesNo shared integration layer
One-time data migration from old desktop appMaybeWorth it once; for recurring tasks, build an API connector
Daily recurring task via UINoInvest in the API integration — CU will break on every UI update
Cost reality: computer use is expensive

A 20-step task with Sonnet 4.6 costs $1.74–$2.90 depending on whether you manage context history (see Section 3 for the math). The equivalent API call chain costs under $0.01. Computer use is not a shortcut — it is a last resort for applications with no API.

Anthropic also offers Claude Managed Agents (public beta as of April 2026) — a fully managed agent harness with built-in sandboxing, checkpointing, and server-sent event streaming. If you want Anthropic to handle the orchestration layer instead of building your own, Managed Agents is worth evaluating before hand-rolling a computer use agent.