What, if anything, should a frequentist say about a single realized confidence interval (CI) and its chance of having covered the parameter? Jerzy Neyman's original answer was to refuse any nondegenerate probability for coverage ex post and, instead, to "state that the interval covers". In this paper I argue that the usual frequentist machinery already supports a different reading. I treat the coverage event as a Bernoulli random variable, with the nominal level 1-alpha as its design-based success probability, and view "confidence" as a probability forecast for that Bernoulli outcome. Using strictly proper scoring rules, I show that 1-alpha is the unique optimal constant forecast for coverage, both before and after observing the data, and that it remains optimal post-trial in common unbounded, translation-invariant models with pivot-based CIs. When the design yields a theta-free statistic--such as the relative width of the interval in a finite-window uniform model--the conditional coverage given that statistic provides a nonconstant, design-based refinement of 1-alpha that strictly improves predictive performance. Two thought experiments, a Monty Hall-style shell game and the "lost submarine" example of Morey et al. (2016), illustrate how this perspective resolves familiar interpretational puzzles about CIs without appealing to priors or single-case subjective degrees of belief. I conclude with simple "what to do when you see an interval" guidance for applied work and some implications for teaching confidence intervals as tools for forecasting long-run coverage. Keywords: Confidence intervals, coverage probability, proper scoring rules, probabilistic forecasting, frequentist inference Disclaimer: The findings and conclusions in this report are those of the author and do not necessarily represent the official position of the Centers for Disease Control and Prevention
If you were being scored on predicting whether a single confidence interval (CI) has covered its target parameter, what number would you report? Jerzy Neyman, the inventor of the CI procedure, suggested proceeding in two steps [15]: first, refusing to assign any probability to coverage itself, since, on the assumption that θ is a fixed constant and not a random variable, coverage becomes fully determined once an interval has been constructed; and second, stating that the constructed interval covers the parameter. Although the latter suggestion is not typically thought of as a forecast per se for individual coverage events, it is essentially just that-by stating that the intervals always cover, we are issuing a constant forecast for P (Cover) at 1, even if we choose not to interpret the forecasting subjectively, e.g., as our personal degree of belief in whether coverage occurred. The forecast has a natural frequentist interpretation in that, under repeated sampling, it will be wrong no more than α% of the time, which Neyman sensibly appeals to this as a strength of CI theory as a means for controlling the practicing statistician's error rate in making such statements.
Two other critical facts are, of course, also true. First, confidence procedures (CPs) can be constructed from data that carry no information at all about θ, and ex ante coverage probability can appear to change substantially ex post after an interval has been constructed [1,19], leading to the standard argument that the only way to think about CIs coherently is in terms of their long-run coverage properties. Second, and perhaps more importantly, for any given interval I, its coverage probability is degenerate in {0, 1}, conditioned on the realized values of its endpoints, as Neyman’s point above makes clear. This is the line we most often hear about how to interpret a given CI ex post (after construction)-that the interval either does or does not cover the parameter, and that we can make no probabilistic statement about its coverage because of it.
The “it either covers, or it doesn’t” statement has been the source of confusion, consternation, and frustration for beginning statistics students for a long time, likely since it was first made, and has led to a number of not-quite-satisfying claims about how to interpret CIs in the applied literature. Introductory papers and applied practitioners occasionally claim that constructed intervals retain their nominal coverage probability [9,8], critiques in response claim that they retain none [13,4,7,17], and still others seem so befuddled by it all that they suggest what intervals are trying to estimate are the interval endpoints themselves (rather than θ) [14]. More often than not, the ensuing debates-which can be rather spirited [12,18,10]-end with trenches being drawn along philosophical lines, with frequentists defending claims that are sometimes, on their face, rather awkward (e.g., retreating to a von Mises-style infinite hypothetical frequentism without acknowledging that his program, taken seriously, undermines the foundational pillars of Kolmogorov-style mathematical statistics [5]), and Bayesians wondering why the frequentists will not just switch to Bayesianism, put priors over θ, and construct credible intervals instead.
In the sections below, I hope to offer a principled frequentist reading of the concept of “confidence” and its associated intervals that resolves some of this confusion. More precisely, the notion I build toward recasts the notion of coverage probability as having three layers, rather than a single one: the first being the event-level degenerate conditional in {0, 1} determining whether an interval covers; the second being the design-level coverage guarantee of 1 -α that averages those conditionals over the randomness in X; and the third, or the notion of “confidence”, being a predictive probability, or model-based forecast, of empirical coverage with whatever information the statistician may have at hand. Under this view, we have clear bounds on what we can say about a particular interval ex post (e.g., we might predict that it covers θ with probability 1 -α), and, in some cases, we also have mathematical justification for updating our forecast in light of new evidence, for example, when we see the “trivial” interval [-∞, ∞], where we would very sensibly switch our prediction to 1, since coverage is certain. Separating our probabilistic forecasts from the design-level coverage guarantee generally washes away criticisms of CIs as being uninterpretable (with respect to the coverage event and not the actual value of θ), and, as I show below, it generally respects the frequentist machinery Neyman used to build his theory (even if he might have preferred not to see it that way).
The remainder of the paper is organized as follows. Section 2 presents a though experiment showing that design-level coverage probability can in fact be used to guide decision-making based on const
This content is AI-processed based on open access ArXiv data.