Steel evidence drawer with colored tabs and abstract data symbols — The lexicon favors terms that help teams ask cleaner questions about model behavior, evidence, and operating limits.

Working vocabulary

Short definitions, explicit boundaries, fewer borrowed assumptions.

AI language moves quickly, but teams still need stable words for ordinary decisions. This lexicon collects terms that appear in model reviews, data conversations, release planning, risk registers, and post-launch debugging. Each definition is deliberately plain: it should help a reader notice what must be checked before a term becomes a promise.

Model card

A structured note that records what a model is, how it was evaluated, where it should be used, and where caution is required.

Dataset shift

A change between the data used for development and the data encountered in use, often visible only after deployment context changes.

Evaluation set

A reference collection used to test behavior, useful only when its sampling, labels, and blind spots are understood.

Human baseline

A comparison point based on human performance or judgement, strongest when the task and reviewer conditions are clearly specified.

Data leakage

A flaw where information from the target, future, or test environment slips into training or validation and inflates confidence.

Operational boundary

The conditions under which a model is expected to behave acceptably, including people, tasks, stakes, and escalation paths.