Glossary/Model Quality

Monte Carlo Simulation

Also called MC Simulation

A numerical method that estimates probability distributions by running thousands of randomized trials and aggregating the results. The core engine behind every modern sports prediction model that produces probability distributions rather than point estimates.

Definition

Monte Carlo simulation generates probability estimates for an outcome by sampling from a probabilistic model many times and tabulating the frequency of each result. Rather than computing the answer analytically (which is often intractable), the method approximates the answer by repeated random sampling. For sports, this means building a probabilistic model of every contributing variable (team strength, individual matchups, game-state transitions, environmental factors) and running the model forward thousands of times to produce a distribution of possible game outcomes. The output is not a single number; it is a probability map that reveals not just the most likely result but the variance, tail risk, and full range of plausible scenarios.

Why It Matters
How to Compute

Define a probabilistic model of the system: parameters, distributions, dependencies. For each simulation pass, draw random samples from each input distribution and propagate through the model to produce a single outcome. Repeat thousands of times (typically 10,000 to 100,000+ for sports, depending on the time scale). Aggregate the resulting outcomes into a histogram or empirical distribution. The frequency of each outcome bucket is the estimated probability.

Example

For an NFL game, sample drive-level outcomes (touchdown, field goal, punt, turnover) from probabilities conditioned on field position, score differential, and time remaining. Each simulation pass advances the game state drive-by-drive until time expires. After 10,000 simulations, win probability is the fraction of simulations that ended with the home team ahead. The full output is a distribution over final scores, not a single point.

Common Mistakes
Frequently Asked

Why is Monte Carlo better than a single model output?

A single model output is a point estimate that hides everything about the variance of possible outcomes. Two scenarios with identical mean predictions can have very different risk profiles. Monte Carlo produces the full distribution, which is the correct input for any decision under uncertainty: sizing bets, hedging exposure, or evaluating tail risk.

How many simulations are enough?

Depends on what you're estimating. For win-probability point estimates, 1,000 simulations is often sufficient. For tail probabilities (rare-event outcomes like a 50-point game in NFL or a first-round KO in UFC), 10,000+ is typical. VAR runs ten thousand or more simulations per game across multiple seed ensembles for stable tail estimates.

How does Monte Carlo combine with machine learning?

ML models supply the probability inputs that the simulator samples from. A typical pipeline: a feature-rich ML model produces per-event probabilities (e.g., probability of a successful pass given pre-snap features), and the Monte Carlo simulator uses those probabilities to roll out full games. The two are complementary; ML provides the granularity, Monte Carlo provides the integration over time.

Where does Monte Carlo break down?

When the underlying probabilistic model is wrong, more simulations don't help; they just produce more confident wrong answers. The biggest risks are uncalibrated input probabilities (the simulator produces a sharp distribution centered on the wrong place), unmodeled correlations, and structural breaks in the data-generating process such as rule changes or talent influx. HVP rule 8 requires the production code path to be verified programmatically against the audit harness for exactly this reason.

See Also