Pre-Registration
The practice of locking a model's tier thresholds, success criteria, and explicit failure conditions in writing before the test data is examined. The single highest-leverage defense against in-sample threshold-fitting.
Pre-registration in sports analytics means signing and dating a public document that specifies, before any test data has been touched: which tier thresholds will define the published claim, what counts as success at the end of the test window, and what counts as failure at intermediate checkpoints. Choosing the threshold or the success criterion after seeing the test data is in-sample fitting at the meta-level: the published tier becomes the bucket where the noise happened to align, not the bucket the model genuinely identified as high-confidence. Pre-registration is the only structural defense. The contract has to be public so that an institutional buyer, an academic citer, or a journalist can verify that the criteria were locked in advance. VAR's 2026-27 NFL forward test is the canonical implementation: signed, dated, append-only, lower bound bar pre-committed at 52.4%.
- Threshold-fitting is the most common failure mode in published sports analytics claims. A model that produces predictions across an edge range will, on any test season, have at least one bucket that beats baseline. Pre-registration is what separates the bucket the model identified from the bucket the test season happened to validate.
- Pre-registered failure conditions matter as much as pre-registered success conditions. The protocol requires explicit checkpoints (e.g. Week 4 yellow, Week 9 amber, Week 18 red) so that a failed forward test cannot be re-framed mid-season as 'still in calibration.'
- Institutional buyers and academic citers can tell the difference between a pre-registered claim and a post-hoc one. A team that will not point at a prior-version document locking the threshold before the test is signaling that the threshold is post-hoc.
- VAR's protocol requires pre-registration for every live-deployment tier. This was a direct response to having previously published tier thresholds that were chosen after seeing the season-end results.
Before any test data is examined, write a document specifying: (1) the model version and code path under test, (2) the tier thresholds to be evaluated, (3) the sample-size targets for the test window, (4) the success criterion (e.g. credible-interval lower bound above a defined break-even bar), (5) the explicit failure conditions at intermediate checkpoints. Sign and date the document. Publish it on a stable URL with an append-only changelog. Do not edit prior signed copy in place; substantive revisions land as new dated entries.
VAR's 2026-27 NFL pre-registration locks two tiers (PRIME spread |edge| ≥ 6 and PRIME_TOT |edge| ≥ 7), commits to publishing every pick before kickoff and every result within 24 hours, sets the season-end success bar at credible-interval lower bound ≥ 52.4%, and pre-commits three graded failure checkpoints (Week 4, Week 9, Week 18). The page is signed, dated, and append-only.
- Pre-registering only the success criterion and not the failure conditions. Without explicit failure checkpoints, a mid-season miss can be re-framed as 'still on track' indefinitely.
- Pre-registering without making the document publicly verifiable. A private pre-registration that cannot be checked by readers is a private commitment, not a public one.
- Editing prior signed copy in place after seeing results. The discipline that makes pre-registration credible is append-only: revisions land as new dated entries; prior copy stays readable.
- Pre-registering one tier and operating five. If a model runs multiple production tiers, the pre-registration has to either cover all of them or explicitly scope to the ones under public test, with the others disclosed as internal-only.
What's the difference between pre-registration and a backtest?
A backtest measures performance on historical data the model didn't train on. A pre-registration is a forward commitment about what will be measured on data that doesn't exist yet. Both are part of the validation chain: the backtest produces the figures the pre-registration cites; the pre-registration commits to grading future data against those figures.
Can a pre-registration be revised mid-test?
Substantive revisions land as new dated entries in the append-only changelog. Prior text is never overwritten. The signing date at the top of the document reflects the current active version; the history is part of the audit trail. Revisions that change the success criterion mid-test are themselves a signal worth surfacing in postmortems.
Why is the public, signed-and-dated property necessary?
Internal pre-registration with no public artifact is unverifiable by readers. The institutional credibility of pre-registration comes from third parties being able to confirm, in writing, that the criteria were locked before the test. Without that confirmation, the pre-registration is just a private claim that the team did it right.