Either a Confidence Interval Covers, or It Doesn't (Or Does It?): A Model-Based View of Ex-Post Coverage Probability

Reading time: 6 minute
...

📝 Original Info

  • Title: Either a Confidence Interval Covers, or It Doesn’t (Or Does It?): A Model-Based View of Ex-Post Coverage Probability
  • ArXiv ID: 2602.15562
  • Date: 2026-02-17
  • Authors: ** 저자 정보가 논문 본문에 명시되지 않았습니다. (정보 없음) **

📝 Abstract

In Neyman's original formulation, a 1-alpha confidence interval procedure is justified by its long-run coverage properties, and a single realized interval is to be described only by the slogan that it either covers the parameter or it does not. On this view, post-data probability statements about the coverage of an individual interval are taken to be conceptually out of bounds. In this paper, I present two kinds of arguments against treating that "either-or" reading as the only legitimate interpretation of confidence. The first is informal, via a set of thought experiments in which the same joint probability model is used to compute both forward-looking and backward-looking probabilities for occurred-but-unobserved events. The second is more formal, recasting the standard confidence-interval construction in terms of infinite sequences of trials and their associated 0/1 coverage indicators. In that representation, the design-level coverage probability 1-alpha and the degenerate conditional probabilities given the full data appear simply as different conditioning levels of the same model. I argue that a strict behavioristic reading that privileges only the latter is in tension with the very mathematical machinery used to define long-run error rates. I then sketch an alternative view of confidence as a predictive probability (or forecast) about the coverage indicator, together with a simple normative rule for when intermediate probabilities for single coverage events should be allowed. Keywords: confidence intervals; coverage probability; frequentist inference; single-case probability; predictive probability; Neyman. Disclaimer: The findings and conclusions in this report are those of the author and do not necessarily represent the official position of the Centers for Disease Control and Prevention.

💡 Deep Analysis

📄 Full Content

When Jerzy Neyman introduced his theory of confidence intervals (CIs) in 1937 [21], he gave practicing statisticians a strong suggestion for how to interpret them: because θ is assumed to be a fixed, unknown constant, once a particular interval is generated, the coverage expression P(L(X) ≤ θ ≤ U (X)) is mathematically fixed, and so we can only say that the interval either did or did not succeed in covering it. The straightforward mathematical justification for this is that all the randomness in the confidence procedure (CP) lives in the data X, and so once we have a particular realization X = x i , the expression above becomes degenerate in {0, 1}. Intuitively, this also makes sense, since if we imagine sampling a particular set of interval bounds an infinite number of times, the probability of success (under that design) will be either 0 or 1, depending on whether the original interval covered θ. As a consequence, practical guidelines for probabilistically interpreting CIs typically revolve around their long-run coverage properties, rather than the properties of any single constructed interval [10,17,27,16], and attempts to say otherwise are often, though not always [19], branded as errors of interpretation [11] or fallacies in reasoning [20], despite the natural inclination to attach some kind of probability to realized intervals ex post (i.e., "post-data").

The tension between the accepted interpretation of CIs and the supposedly-fallacious one can be recast more generally as a statement about events we know have occurred, but whose outcomes we have not observed [25]: for frequentists, randomness, and thus probability, lives in the sampling process and not our knowledge of the outcomes, and so once a sample has been drawn, the ex-ante (i.e., “pre-data”) probability has collapsed to some value in {0, 1}. Although this is mathematically true, it should also ring alarm bells for practicing statisticians (but, in the case of CIs, never seems to), because we happily use frequentist methods for statistical inference in exactly this kind of realworld scenario. Take, for example, the case of medical diagnosis: given that a patient tests positive on, say, a rapid diagnostic test for the influenza virus, what is the probability that she actually has it? If we stay consistent with our interpretation of CIs, we should also say no probability statement may now be made: given her true underlying health state, the patient either does or does not have the flu, and there is no probability left to assign, because all of the randomness in the sampling process has now been exhausted. Clearly, though, following the “either-or” logic here would ruin the clinical value of the diagnostic test in guiding care, and it would obviate the effort epidemiologists and statisticians put in to estimating the test’s positive predictive value (P P V ) in the first place, neither of which seems particularly desirable.

The interpretive tension also runs along philosophical lines, with frequentists and propensitytheorists generally taking an ontic view of things (i.e., what matters is how randomness plays out in the world, whether we know about it or not) [12], and Bayesians generally taking an epistemic view (e.g., subjectivists identifying probability with personal degree of belief, or credence [5,24,12]) The latter have no trouble accommodating occurred-but-unobserved events, but the former run into more difficulty, since the interpretations tend not to deal explicitly with the role of the observer in making probability assignments (see, e.g., [30]). Statistical inference seems to require some kind of epistemic component, though-estimation with full knowledge is simply calculation-and even Neyman, arguably one of the foremost operationalists, admits the role of the observer in defining his theory of CIs (if θ is fixed-but-unknown, the natural follow-up question is, “Unknown by whom?”). More pointedly, if we do not care whether we know a CI covered θ, why are we trying to estimate θ at all? The answer to this question is perhaps more philosophical than statistical, and it is one that has been addressed thoroughly in the literature on the philosophies of both probability and science, so I will not attempt to summarize it here. However, in the sections below, I hope to show that, strictly speaking, the question itself is not one we need to entertain to provide a formal accounting of occurred-but-unobserved events within frequentism proper, and that we can in fact talk quite sensibly about coverage probability ex post, as long as we state clearly what we mean by the term “probability”.

In what follows, I deliberately adopt a rather strict reading of Neyman’s slogan-in a nutshell, that ex post probabilities of coverage are entirely out of bounds-and treat it as a normative rule (to be fair, though, this is how many, if not most, instructive pieces handle the interpretation, see e.g., [2,22,29,10,1,18] for examples). The arguments will form

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut