Most Powerful Test with Exact Family-Wise Error Rate Control: Necessary Conditions and a Path to Fast Computing
Identifying the most powerful test in multiple hypothesis testing under strong family-wise error rate (FWER) control is a fundamental problem in statistical methodology. State-of-the-art approaches formulate this as a constrained optimisation problem, for which a dual problem with strong duality has been established in a general sense. However, a constructive method for solving the dual problem is lacking, leaving a significant computational gap. This paper fills this gap by deriving novel, necessary optimality conditions for the dual optimisation. We show that these conditions motivate an efficient coordinate-wise algorithm for computing the optimal dual solution, which, in turn, provides the most powerful test for the primal problem. We prove the linear convergence of our algorithm, i.e., the computational complexity of our proposed algorithm is proportional to the logarithm of the reciprocal of the target error. To the best of our knowledge, this is the first time such a fast and computationally efficient algorithm has been proposed for finding the most powerful test with family-wise error rate control. The method’s superior power is demonstrated through simulation studies, and its practical utility is shown by identifying new, significant findings in both clinical and financial data applications.
💡 Research Summary
**
This paper tackles the long‑standing problem of finding the most powerful multiple‑testing procedure that guarantees strong family‑wise error rate (FWER) control. While Rosset et al. (2022) showed that the problem can be expressed as an infinite‑dimensional binary program whose dual enjoys strong duality, they left a critical gap: no constructive method for computing the optimal dual Lagrange multipliers, and therefore no practical way to obtain the optimal test.
The authors first formalize the setting for K hypotheses (with K = 3 as the primary focus). They assume (i) h‑exchangeability – the joint distribution of the data is invariant under any permutation of hypothesis indices, and (ii) arrangement‑increasing – swapping a more “signal‑rich” observation with a less informative one never decreases the joint likelihood. Under these structural conditions, any optimal rule can be taken to be symmetric and likelihood‑ratio (LR) ordered, i.e., hypotheses with larger likelihood ratios (or equivalently smaller p‑values) are rejected at least as often as those with weaker evidence.
By moving to the p‑value domain, the authors rewrite power and FWER as linear functionals over the ordered simplex Q = {0 ≤ u₁ ≤ … ≤ u_K ≤ 1}. Specifically, minimal power Π_any and average power Π_l become weighted integrals of the decision vector D(u) against joint densities f_i(u); FWER becomes a weighted integral of the minimum component of D(u). This linear representation enables a clean Lagrangian formulation with a vector of dual variables μ. Strong duality guarantees that an optimal μ* exists and that the decision rule D_{μ*} (obtained by thresholding the LR at levels determined by μ*) solves the original primal problem, provided complementary‑slackness holds.
The central theoretical contribution is Theorem 3.8, which provides necessary optimality conditions for μ*. The theorem shows that, holding all but one coordinate of μ fixed, the optimal value of that coordinate must satisfy a one‑dimensional equation derived from the derivative of the FWER constraint and the linear power expression. In other words, the high‑dimensional optimality condition decomposes into K(K‑1)/2 scalar root‑finding problems that are mutually coupled only through the current values of the other coordinates.
Guided by these conditions, the authors propose a coordinate‑wise descent algorithm (Algorithm 1). The algorithm proceeds as follows:
- Initialise μ (e.g., all zeros).
- For each coordinate μ_k, solve the scalar optimality equation while keeping the remaining μ’s fixed. This is done by bracketing the root and applying a bisection or Newton‑type update; the required integrals over Q are evaluated with stable numerical quadrature.
- Cycle through all coordinates until the sup‑norm change in μ falls below a pre‑specified tolerance ε.
The authors prove linear convergence: the number of outer iterations needed to achieve error ε grows only as O(log (1/ε)). Since each iteration requires O(K) one‑dimensional solves and O(K) inexpensive integrals, the total computational cost is essentially logarithmic in the desired precision, a dramatic improvement over naïve high‑dimensional optimization or Monte‑Carlo based approaches.
Empirical evaluation concentrates on the K = 3 case. Simulations cover a range of null/alternative distributions (Gaussian, Beta, exponential) and compare the proposed method against classical strong‑FWER procedures such as Bonferroni, Holm, Hochberg, and Hommel. Across all settings, the new algorithm yields decision rules with higher minimal and average power while respecting the same α = 0.05 FWER constraint. Notably, the advantage is most pronounced when signal strengths are modest, where traditional procedures tend to be overly conservative.
Two real‑world applications illustrate practical impact. In a clinical trial with three treatment arms, the algorithm discovers a treatment effect that standard Holm‑adjusted tests miss, suggesting a potentially important therapeutic benefit. In a financial risk‑assessment study involving three correlated portfolios, the method identifies a significant risk factor overlooked by conventional FWER‑controlled tests, highlighting its relevance for regulatory compliance and portfolio optimization.
The paper acknowledges limitations. The derived optimality conditions and the coordinate‑wise algorithm are explicitly worked out only for K = 3; extending to larger K would increase the number of dual variables quadratically and may require additional structural assumptions (e.g., sparsity, block independence) to remain computationally tractable. Moreover, the theory relies on independence of test statistics, identical alternative p‑value densities, and the arrangement‑increasing property; violations (e.g., dependence in genomics data, heterogeneous effect sizes) would necessitate further methodological development. The authors outline possible extensions, including block‑coordinate updates, stochastic approximation for massive K, and relaxation of the symmetry assumptions.
In summary, this work bridges a crucial gap between the elegant duality theory for optimal FWER‑controlled testing and practical algorithmic implementation. By establishing necessary dual optimality conditions and delivering a provably fast, linearly convergent coordinate‑descent scheme, the authors make the “most powerful test” a computationally attainable tool for small‑K multiple testing problems, with demonstrated superiority in both simulated and real data contexts.
Comments & Academic Discussion
Loading comments...
Leave a Comment