Structured Error Responses for MCP
Returning structured errors from MCP tools so Claude can recover gracefully. The isError flag, error categories, retryable vs non-retryable distinctions, and how subagents should implement local recovery for transient failures while propagating only unresolvable errors.
Quick Reference
- →MCP tools signal errors using the isError: true flag in the tool result
- →Embed structured JSON in the content text: errorCategory, isRetryable, description, customerMessage
- →Four error categories: transient, validation, business, permission -- each requires different handling
- →Retryable errors (transient) should include retry guidance: delay, maxRetries
- →Non-retryable errors (validation, permission) should never be retried -- they will always fail
- →Generic 'Operation failed' errors prevent Claude from making intelligent recovery decisions
- →Subagents should implement local recovery for transient failures (retry 2-3 times internally)
- →Only propagate unresolvable errors to the orchestrator -- don't bubble up every timeout
- →Distinguish access failures from valid empty results -- empty is not an error
- →Include customerMessage for user-facing errors separate from technical description
Why Structured Errors Matter for Agents
When an MCP tool fails, what happens next depends entirely on the information in the error response. If the error says 'Operation failed,' Claude has no basis for deciding whether to retry, try a different tool, ask the user for clarification, or give up. Structured error responses give Claude the metadata it needs to make intelligent recovery decisions -- the same way a well-designed API gives calling code the information it needs to handle failures gracefully.
The exam will present scenarios where an agent receives 'Operation failed' or 'Error occurred' and ask what the problem is. The answer is always that generic errors prevent intelligent recovery. Claude cannot distinguish a temporary timeout (retry in 2 seconds) from a permanent permission denial (ask user for credentials) when both return the same message.
In a multi-agent system, error handling becomes even more critical. An orchestrator agent that receives an error from a subagent's tool call needs to decide: should it retry the subagent, route to a different subagent, escalate to the user, or proceed without that data? Structured errors make this decision possible.