Discovery Informs AI Product Quality

Product Discovery is not only about finding the right problem to solve. In AI products, it is also the basis for technical quality assurance. Without up-to-date customer understanding, you cannot write good evals, build a strong golden dataset, or brief human annotators well.

Teresa Torres made this connection explicit at the end of her conversation with Petra Wille: all the steps in the eval process are only as good as her understanding of the customer.

Where Discovery Shows Up

Prompt design: if you know the kinds of inputs users actually provide, you can write better prompts. Without discovery, you only cover the obvious cases, not the strange but real ones.

Orchestration: how you split an LLM system into sub-prompts and dimensions depends on what the system is supposed to achieve. That is a discovery question as much as a technical one.

Error analysis: if you do not know what users expect, you cannot judge whether an output is actually wrong. Human annotators need domain understanding, and that comes from customer research.

Eval design: which failure modes matter, which quality criteria should count, and how “good” is defined all depend on understanding the user.

Golden dataset: the dataset only represents known scenarios. Whether it represents the right scenarios depends on how well you understand real user behavior. That makes it a direct discovery artifact.

The Uncomfortable Consequence

It is possible to get AI products into production quickly if you are talking about prototypes and experiments. For real production products, eval overhead means:

you cannot skip discovery
you need humans who can judge outputs, not just engineers
you need to keep customer understanding current, because quality criteria shift over time

Petra Wille’s summary in the conversation was blunt: people keep saying these products can be spun up in hours. Prototypes maybe. Production products no.

Connections

AI Evals — evals are only as good as the customer understanding beneath them
Synthetische Testdaten für LLMs — the quality of synthetic dimensions depends on discovery
Criteria Drift — up-to-date customer understanding is one of the best defenses against drifting criteria
Teresa Torres — articulated the connection from direct experience
Petra Wille — reinforced it from the discovery side
Product Discovery — the overarching concept

Sources

AI Evals & Discovery - All Things Product with Teresa & Petra — Teresa Torres + Petra Wille (2025-09)