Real Engineering Workflows/Real Engineering Workflows
Advanced11 min

Debugging Production Issues with Claude Code

Evidence-first debugging with Claude Code: gathering logs, screenshots, and traces; hypothesis-driven investigation; safe database MCP queries; browser MCP live investigation; and the fix workflow from reproduction to post-mortem.

Quick Reference

  • Gather evidence first — logs, stack traces, error messages, recent deploys — before asking Claude to hypothesize
  • Paste screenshots with Cmd+V, pipe log files with `cat production.log | claude -p`
  • Give Claude everything you know and ask for hypotheses before starting investigation
  • Database MCP for production: read-only tools only — never write access in production
  • Browser MCP: navigate to the production issue live, Claude observes and reports
  • Subagent for codebase search: Explore subagent finds relevant code paths, main session investigates
  • Fix workflow: understand → write reproducing test → fix → test passes → deploy
  • Post-mortem: ask Claude to write an incident summary from the session history

Evidence Gathering

The worst debugging sessions start with a vague problem statement and no context. The best start with everything you know: the exact error message, the full stack trace, the affected users, the deploy that preceded the issue, and the systems involved. Give Claude everything before asking it to investigate.

1

Pull the error and stack trace

Copy the exact error message and full stack trace from your logging system (Datadog, CloudWatch, Sentry, etc.). Don't summarize — paste the raw output.

2

Pull recent logs

Get logs from the 5–10 minutes around the incident. `cat production.log | claude -p` pipes the file directly into a non-interactive Claude session for initial triage.

3

Note recent deploys

Check your deployment history. When did the issue start? What deployed between the last known-good state and now? This is often the most direct path to root cause.

4

Characterize affected users/requests

Is the issue for all users or specific segments? Specific API endpoints? Specific data patterns? The scope of impact often points directly at the failure mode.

5

Paste screenshots

Use Cmd+V to paste browser screenshots, monitoring dashboards, or error screens directly into Claude Code. Claude reads them as visual context.

Piping production logs into Claude for triage
@-reference config files for context

Production issues often require understanding your system's configuration. Use @-references to pull in config files: @src/config/database.ts, @infrastructure/ecs-task-definition.json. Claude needs to understand your setup to reason about where things can fail.