Advanced13 min
Advanced Guardrails
When to add guardrails, how to architect a cost-aware stack across all five middleware hooks, and how to know they work. Covers before_agent input filters, wrap_tool_call for tool-level security, after_agent output safety, false positive management, and guardrail evaluation.
Quick Reference
- →before_agent — fires once per invocation; cheapest place to block bad inputs (regex or classifier)
- →after_agent — fires once after the loop; validate final response before the user sees it
- →wrap_tool_call — intercepts every tool invocation; validate arguments, enforce rate limits, check permissions
- →return {'jump_to': 'end'} + @hook_config(can_jump_to=['end']) short-circuits the entire agent loop
- →Layer cheapest-first: regex ($0) → classifier (~$0.002) → tool validation (~$0.001/tool) → output safety (~$0.003)
- →Every guardrail needs a fallback — fail-open for low-risk domains, fail-closed for regulated ones
When NOT to Add Guardrails
Every guardrail layer adds latency, cost, and false-positive risk. Before adding any, name the specific threat. If you can't name it, you don't need the layer yet.
| Scenario | User trust | Tool sensitivity | Recommended layers |
|---|---|---|---|
| Internal devtool, no external users | High | Low (read-only) | None — system prompt is sufficient |
| Prototype / demo, controlled audience | Medium | Low | None or input regex only |
| Public agent, no tool access, no PII | Low | None | Input classifier (before_agent) |
| Public agent with write tools (email, DB) | Low | High | Input + tool-level + output |
| Regulated domain (medical, financial, legal) | Low | High + PII | Full stack + fail-closed + audit log |
If your agent has no tools and handles no sensitive data
A system prompt is your guardrail. Adding LLM-based guardrails to a read-only Q&A agent doubles your cost and latency for a threat model that doesn't exist yet.