Glossary/Model Quality

Beta-Binomial Credible Interval

Also called Beta-Binomial CI

A Bayesian credible interval on a win-rate or success-rate estimate, computed under a Beta-Binomial model where wins and losses are draws from a binomial distribution and the underlying win probability has a Beta prior. The standard interval for honest reporting of model accuracy.

Definition

The Beta-Binomial credible interval is the 95% probability range for a true underlying win rate, given an observed count of wins and losses. The Bayesian formulation treats the unknown win probability p as a random variable with a Beta prior (typically Beta(1,1), the uniform prior); after observing w wins out of n trials, the posterior on p is Beta(w+1, n-w+1). The 95% credible interval is the range [2.5th percentile, 97.5th percentile] of this posterior. The interval is preferred over the frequentist confidence interval for small samples because it is well-defined at edge cases (zero wins, all wins) and does not depend on asymptotic approximations. The lower bound of the credible interval is the figure VAR plans and sizes against; the point estimate without the interval is a memorization claim, not a predictive-skill claim.

Why It Matters
How to Compute
Posterior: p | (w, n) ~ Beta(w + 1, n - w + 1). 95% credible interval: [Beta⁻¹(0.025; w+1, n-w+1), Beta⁻¹(0.975; w+1, n-w+1)] with a uniform Beta(1,1) prior.

Choose a prior over the unknown win probability (uniform Beta(1,1) is the standard default; informative priors can be used when prior data is available). Given w observed wins out of n trials, the posterior distribution on the win probability is Beta(w+1, n-w+1) for the uniform prior. Compute the inverse CDF of this Beta distribution at 0.025 and 0.975; these are the credible-interval bounds. Cite the lower bound as the planning figure.

Example

Observed: 142 wins out of 226 trials (62.83% win rate). Posterior is Beta(143, 85). 95% credible interval is approximately [56.36%, 68.87%]. The point estimate is 62.83%; the figure to plan and size against is 56.36%, the lower bound.

Common Mistakes
Frequently Asked

Why Beta-Binomial instead of frequentist Wald or Wilson?

Beta-Binomial is well-defined at small samples and edge cases (zero wins, all wins) where Wald collapses. Wilson intervals are reasonable but assume a frequentist framing that is awkward when the goal is reporting a credible range on the underlying win probability. Beta-Binomial gives that directly.

How is the lower bound the audit-defensible figure?

The lower bound is the win rate that, given the observed sample, has a 97.5% posterior probability of being below the true rate. Sizing and planning against the lower bound builds in a margin against single-season variance and against the lucky-streak failure mode that point estimates conceal.

Does the interval narrow as the sample grows?

Yes. The interval width scales roughly as 1/sqrt(n) for large samples. A 200-sample interval is meaningfully tighter than a 20-sample interval at the same point estimate. This is why aggregating across multiple walk-forward seasons produces credible intervals tight enough to underwrite, while single-season intervals usually do not.

See Also