An algorithm to compute the power of Monte Carlo tests with guaranteed precision

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This article presents an algorithm that generates a conservative confidence interval of a specified length and coverage probability for the power of a Monte Carlo test (such as a bootstrap or permutation test). It is the first method that achieves this aim for almost any Monte Carlo test. Previous research has focused on obtaining as accurate a result as possible for a fixed computational effort, without providing a guaranteed precision in the above sense. The algorithm we propose does not have a fixed effort and runs until a confidence interval with a user-specified length and coverage probability can be constructed. We show that the expected effort required by the algorithm is finite in most cases of practical interest, including situations where the distribution of the p-value is absolutely continuous or discrete with finite support. The algorithm is implemented in the R-package simctest, available on CRAN.

💡 Research Summary

The paper addresses a fundamental problem in Monte Carlo hypothesis testing—accurately estimating the power β = F(α) of a test when the p‑value distribution F cannot be sampled directly. Traditional approaches rely on a fixed computational budget and produce point estimates (often the naïve average of indicator variables) without any guaranteed precision. The authors propose a novel algorithm that, instead of attempting to minimize error for a given amount of work, guarantees that the resulting confidence interval for β meets user‑specified length Δ and coverage probability 1 – γ.

The key idea is to view each Monte Carlo replication as a “stream” of independent Bernoulli observations X_{ij} (j = 1,2,…). For a given stream i the underlying (unobserved) p‑value p_i determines the success probability of the Bernoulli trials: X_{ij} ∼ Bernoulli(p_i). The algorithm must decide, for each stream, whether p_i ≤ α (the test would reject) or p_i > α (the test would not reject). To make this decision sequentially, the authors adopt the boundary‑crossing procedure of Gandy and Rubin‑Delanchy (2013). Two deterministic sequences, an upper boundary U_t and a lower boundary L_t, are constructed so that the partial sum S_t = ∑{j=1}^t X{ij} stops when it first crosses either boundary. If S_t ≥ U_t the algorithm declares p_i > α; if S_t ≤ L_t it declares p_i ≤ α. The boundaries are built recursively to ensure that the probability of a wrong decision for any stream does not exceed a small ε, uniformly over all possible p_i.

Algorithm 1 runs N such streams in parallel. At each iteration t it updates three counts: R_t (number of streams resolved as “positive”, i.e., p_i ≤ α), A_t (number resolved as “negative”, i.e., p_i > α), and |U_t| (number still unresolved). Using these quantities the algorithm constructs a conservative confidence interval I(R_t, A_t, |U_t|; γ) for β. The interval is defined as the union over all possible outcomes of the remaining unresolved streams, combined with a Clopper–Pearson binomial interval based on the resolved streams. By construction the interval is nested (I₁ ⊇ I₂ ⊇ … ⊇ I_∞) and contains β with probability at least 1 – γ. The algorithm stops as soon as the interval length falls below the user‑specified Δ, at which point it reports the interval as the final estimate.

A major theoretical contribution is the analysis of expected computational effort. In many realistic settings the expected stopping time τ_i of a single stream is infinite, making a naïve “run until all streams are resolved” approach infeasible. The authors show that by choosing N sufficiently large and allowing a small number κ of streams to remain unresolved, the overall expected effort e = ∑{i=1}^N min{τ_i, τ(N‑κ)} is finite. Theorem 2.1 proves that if the CDF F is Hölder‑continuous around α with exponent ξ (which holds for absolutely continuous distributions with bounded density, or for discrete distributions with finite support and P(p = α)=0), then E

An algorithm to compute the power of Monte Carlo tests with guaranteed precision

💡 Research Summary

Comments & Academic Discussion

Leave a Comment