Inference on effect size after multiple hypothesis testing
Significant treatment effects are often emphasized when interpreting and summarizing empirical findings in studies that estimate multiple, possibly many, treatment effects. Under this kind of selective reporting, conventional treatment effect estimates may be biased and their corresponding confidence intervals may undercover the true effect sizes. We propose new estimators and confidence intervals that provide valid inferences on the effect sizes of the significant effects after multiple hypothesis testing. Our methods are based on the principle of selective conditional inference and complement a wide range of tests, including step-up tests and bootstrap-based step-down tests. Our approach is scalable, allowing us to study an application with over 370 estimated effects. We justify our procedure for asymptotically normal treatment effect estimators. We provide two empirical examples that demonstrate bias correction and confidence interval adjustments for significant effects. The magnitude and direction of the bias correction depend on the correlation structure of the estimated effects and whether the interpretation of the significant effects depends on the (in)significance of other effects.
💡 Research Summary
In the era of high-dimensional data analysis, researchers frequently conduct multiple hypothesis testing to identify significant treatment effects among a vast array of variables. However, a critical issue arises from the practice of selective reporting: when only results that pass a predefined significance threshold (e.g., p < 0.05) are reported, the estimated effect sizes tend to be inflated, a phenomenon often referred to as the “Winner’s Curse.” Furthermore, the conventional confidence intervals associated with these selected effects often underestimate the true uncertainty, leading to overconfident and potentially misleading scientific conclusions.
This paper addresses this fundamental problem by proposing a novel framework for valid inference on effect sizes following multiple hypothesis testing. The core methodology is built upon the principle of “selective conditional inference.” Unlike traditional approaches that treat significant results as independent observations, the authors propose new estimators and confidence intervals that explicitly account for the selection process. By conditioning the distribution of the estimators on the event that the test results are significant, the proposed method mathematically corrects the bias introduced by the selection threshold.
A key strength of this research lies in its versatility and scalability. The proposed approach is designed to complement a wide range of existing multiple testing procedures, including step-up tests and bootstrap-based step-down tests. This makes it highly adaptable to various experimental designs and existing statistical pipelines. Furthermore, the authors demonstrate the scalability of their method by applying it to a large-scale application involving over 370 estimated effects, proving that the computational complexity remains manageable even in high-dimensional settings.
The paper also provides deep insights into the factors influencing bias correction. The researchers reveal that the magnitude and direction of the necessary bias correction are heavily dependent on the underlying correlation structure among the estimated effects. Moreover, they highlight that the correction must account for whether the significance of a particular effect is contingent upon the significance of other effects in the testing set. By providing empirical examples that demonstrate successful bias correction and confidence interval adjustment, the paper offers a robust tool for ensuring the integrity and reproducibility of scientific findings in the context of large-scale, multiple-testing-driven research.
Comments & Academic Discussion
Loading comments...
Leave a Comment