Guardrails (AI) · LLM WIKI

Guardrails are evals that run in production before an answer reaches the user. If an eval detects a problem, the answer can be blocked, regenerated, or corrected before it is sent.

The distinction was explained clearly by Teresa Torres: ordinary AI Evals measure whether an error happened. Guardrails stop the error from reaching the user.

When They Are Worth It

Not every eval should become a guardrail. Every additional LLM call in the response path costs latency and money. The tradeoff is straightforward:

high damage if wrong - guardrail makes sense
low damage, high volume - better to evaluate later on a sample of traces

Torres’ interview-coach example is useful here: if an eval detects that the coach suggests a leading question, another LLM call can replace that question before the student ever sees it.

Technically

Guardrails are not a fundamentally different infrastructure. They are the same eval logic, just executed before the response is delivered. Code-based evals are especially attractive because they are fast and cheap. LLM-as-judge guardrails are more expensive, but sometimes necessary.

Connections

AI Evals - guardrails are a subset, executed at a different point
Error Mode Analysis - helps identify which failure modes are critical enough
Teresa Torres - explained the distinction in the interview-coach context

Sources

“AI Evals & Discovery - All Things Product with Teresa & Petra” - Teresa Torres and Petra Wille (2025-09)