Guardrails (AI)
Updated 2026-04-06
Guardrails are evals that run in production before an answer reaches the user. If an eval detects a problem, the answer can be blocked, regenerated, or corrected before it is sent.
The distinction was explained clearly by Teresa Torres: ordinary AI Evals measure whether an error happened. Guardrails stop the error from reaching the user.
When They Are Worth It
Not every eval should become a guardrail. Every additional LLM call in the response path costs latency and money. The tradeoff is straightforward:
- high damage if wrong - guardrail makes sense
- low damage, high volume - better to evaluate later on a sample of traces
Torres’ interview-coach example is useful here: if an eval detects that the coach suggests a leading question, another LLM call can replace that question before the student ever sees it.
Technically
Guardrails are not a fundamentally different infrastructure. They are the same eval logic, just executed before the response is delivered. Code-based evals are especially attractive because they are fast and cheap. LLM-as-judge guardrails are more expensive, but sometimes necessary.
Connections
- AI Evals - guardrails are a subset, executed at a different point
- Error Mode Analysis - helps identify which failure modes are critical enough
- Teresa Torres - explained the distinction in the interview-coach context
Sources
- “AI Evals & Discovery - All Things Product with Teresa & Petra” - Teresa Torres and Petra Wille (2025-09)