Beta-Binomial Credible Interval
A Bayesian credible interval on a win-rate or success-rate estimate, computed under a Beta-Binomial model where wins and losses are draws from a binomial distribution and the underlying win probability has a Beta prior. The standard interval for honest reporting of model accuracy.
The Beta-Binomial credible interval is the 95% probability range for a true underlying win rate, given an observed count of wins and losses. The Bayesian formulation treats the unknown win probability p as a random variable with a Beta prior (typically Beta(1,1), the uniform prior); after observing w wins out of n trials, the posterior on p is Beta(w+1, n-w+1). The 95% credible interval is the range [2.5th percentile, 97.5th percentile] of this posterior. The interval is preferred over the frequentist confidence interval for small samples because it is well-defined at edge cases (zero wins, all wins) and does not depend on asymptotic approximations. The lower bound of the credible interval is the figure VAR plans and sizes against; the point estimate without the interval is a memorization claim, not a predictive-skill claim.
- Point-estimate accuracy without an interval hides the sample-size story. A 65% win rate on n=20 has a 95% CI lower bound near coin-flip; a 65% win rate on n=300 has a lower bound near 60%. The two claims are not the same and should not be reported the same way.
- The Beta-Binomial framing handles small samples cleanly. Frequentist Wald intervals collapse at zero wins or zero losses; the Bayesian interval remains well-defined.
- Citing the lower bound as the planning figure is a discipline that protects against single-season variance. A point estimate that drops 5 points in a noisy season triggers different decisions than a lower bound that already accounted for the noise.
- VAR's validation protocol requires every published win-rate to carry a Beta-Binomial 95% credible interval lower bound. This is the audit-defensible reporting standard.
Posterior: p | (w, n) ~ Beta(w + 1, n - w + 1). 95% credible interval: [Beta⁻¹(0.025; w+1, n-w+1), Beta⁻¹(0.975; w+1, n-w+1)] with a uniform Beta(1,1) prior.Choose a prior over the unknown win probability (uniform Beta(1,1) is the standard default; informative priors can be used when prior data is available). Given w observed wins out of n trials, the posterior distribution on the win probability is Beta(w+1, n-w+1) for the uniform prior. Compute the inverse CDF of this Beta distribution at 0.025 and 0.975; these are the credible-interval bounds. Cite the lower bound as the planning figure.
Observed: 142 wins out of 226 trials (62.83% win rate). Posterior is Beta(143, 85). 95% credible interval is approximately [56.36%, 68.87%]. The point estimate is 62.83%; the figure to plan and size against is 56.36%, the lower bound.
- Reporting only the point estimate. Without the interval, the reader cannot tell whether the claim is a 200-sample 62% (real edge) or a 20-sample 62% (coin flip with a lucky streak).
- Inflating n via Monte Carlo realizations. The interval gets artificially tight by a factor of roughly square-root-of-the-inflation. Independent-sample discipline is required for the interval to mean anything.
- Citing the upper bound or the point estimate as the planning figure. The lower bound is the audit-defensible number because it accounts for the sampling variance that can move the point estimate in a single season.
- Using a strongly informative prior without disclosure. Priors that pull the posterior toward a desired result are a form of in-sample fitting; the uniform Beta(1,1) is the safe default for published claims.
Why Beta-Binomial instead of frequentist Wald or Wilson?
Beta-Binomial is well-defined at small samples and edge cases (zero wins, all wins) where Wald collapses. Wilson intervals are reasonable but assume a frequentist framing that is awkward when the goal is reporting a credible range on the underlying win probability. Beta-Binomial gives that directly.
How is the lower bound the audit-defensible figure?
The lower bound is the win rate that, given the observed sample, has a 97.5% posterior probability of being below the true rate. Sizing and planning against the lower bound builds in a margin against single-season variance and against the lucky-streak failure mode that point estimates conceal.
Does the interval narrow as the sample grows?
Yes. The interval width scales roughly as 1/sqrt(n) for large samples. A 200-sample interval is meaningfully tighter than a 20-sample interval at the same point estimate. This is why aggregating across multiple walk-forward seasons produces credible intervals tight enough to underwrite, while single-season intervals usually do not.