Claim specimen
The statement under review is separated from the demo around it. MDL notes ask what the system is being credited with, which user situation it describes, and whether the wording still works when the example changes.
Independent AI evidence atlas
Many AI conversations move too quickly from a working prototype to a confident promise. MDL Wiki slows that jump down. It collects the vocabulary, review frames, and evidence habits that help a team ask what was trained, which data shaped the behavior, what the evaluation actually proves, and where the operating boundary begins. The site is not a general encyclopedia. It is a field atlas for the moments when a model claim needs to become traceable before it becomes trusted.
Object named
model, dataset, metric, workflow, release rule
Evidence visible
logs, samples, model cards, review records
Boundary stated
people, tasks, stakes, context, failure modes
Decision changed
what the note helps a team do next

The statement under review is separated from the demo around it. MDL notes ask what the system is being credited with, which user situation it describes, and whether the wording still works when the example changes.
Datasets are treated as historical objects. Collection path, labeling pressure, sampling gaps, duplicates, consent questions, and distribution drift are kept near the model claim instead of buried in a later appendix.
Benchmarks become useful when the reader can see the task, reference population, scoring rule, blind spots, and escalation threshold. A number without those edges is logged as unresolved evidence.
Reading posture
It should not inflate a term until it sounds impressive. It should make the term smaller, sharper, and easier to test. When a reader sees “human baseline,” “dataset shift,” “model card,” or “evaluation set,” the note should reveal the surrounding conditions that decide whether the phrase is useful or misleading.
MDL Wiki is written for researchers, product leads, operators, analysts, and reviewers who have to share language across different stakes. The pages favor verbs over slogans: compare, inspect, verify, bound, escalate, retire, and revise. That grammar keeps machine learning discussion attached to decisions instead of spectacle.
Model notes stay connected to data notes, release context, and the learning loop that produced them.
Scores are interpreted through task design, reviewer conditions, blind spots, and the cost of being wrong.
Definitions are short enough to quote but specific enough to prevent borrowed assumptions from spreading.
Field rule
Treat every model claim as provisional until its data trace and evaluation frame can be inspected together.