Identifying individual mediators is a central goal of high-dimensional mediation analysis, yet pervasive dependence among mediators can invalidate standard debiased inference and lead to substantial false discovery rate (FDR) inflation. We propose a Factor-Adjusted Debiased Mediation Testing (FADMT) framework that enables large-scale inference for individual mediation effects with FDR control under complex dependence structures. Our approach posits an approximate factor structure on the unobserved errors of the mediator model, extracts common latent factors, and constructs decorrelated pseudo-mediators for the subsequent inferential procedure. We establish the asymptotic normality of the debiased estimator and develop a multiple testing procedure with theoretical FDR control under mild high-dimensional conditions. By adjusting for latent factor induced dependence, FADMT also improves robustness to spurious associations driven by shared latent variation in observational studies. Extensive simulations demonstrate the superior finite-sample performance across a wide range of correlation structures. Applications to TCGA-BRCA multi-omics data and to China's stock connect study further illustrate the practical utility of the proposed method.
Understanding the mechanisms through which an exposure affects an outcome is a central problem across many scientific disciplines. Mediation analysis provides a principled framework for decomposing the total effect into a direct effect and an indirect effect transmitted through intermediate variables (Baron & Kenny 1986, MacKinnon et al. 2004). In modern genomics and multi-omics studies and increasingly in finance, hundreds or thousands of candidate mediators are routinely measured, making individual mediator discovery both scientifically important and statistically challenging. Two difficulties are particularly acute in high dimensions: mediators exhibit strong dependence driven by shared latent variation, and identifying active mediators necessitates rigorous simultaneous inference to maintain false discovery rate (FDR) control.
A growing literature studies high-dimensional mediation, with much of it focusing on inference for the overall indirect effect, which aggregates contributions from all mediators (Huang & Pan 2016, Zhou et al. 2020, Guo et al. 2022, 2023, Lin et al. 2023). Although informative, overall indirect effects can mask important mechanistic signals when individual mediation effects cancel out due to opposing directions. This motivates methods for individual mediation effect discovery.
Existing methods for testing individual mediation effects in high dimensions generally follow two paradigms: marginal modeling, which relies on simplifying independence assumptions and apply multiple testing to marginal regression coefficients (Dai et al. 2022, Liu et al. 2022, Du et al. 2023) and joint modeling, which employs high-dimensional inference or variable selection techniques such as screening, debiased Lasso, and adaptive Lasso (Zhang et al. 2016, 2021, Derkach et al. 2019, Shuai et al. 2023). While these approaches offer modeling flexibility, their validity is severely compromised by pervasive dependence among mediators.
Strong dependence among mediators presents two fundamental challenges. First, highdimensional inference or variable selection procedure rely on structural conditions such as irrepresentable condition for variable selection, or the compatibility/restricted eigenvalue (RE) conditions for valid inference. Strong correlation can disrupt these regular conditions, leading to procedure failure (Fan et al. 2020). Second, controlling the FDR in large-scale multiple testing becomes difficult when test statistics are strongly dependent, violating the assumptions underlying many classical FDR procedures (Benjamini & Hochberg 1995, Benjamini & Yekutieli 2001, Storey et al. 2004). Empirical evidence shows that ignoring such dependence can result in severe FDR inflation (Wu 2008, Blanchard & Roquain 2009, Fan, Han & Gu 2012, Fan et al. 2019).
To address these challenges, we propose a Factor-Adjusted Debiased Mediation Testing (FADMT) framework for high-dimensional individual mediation analysis with FDR control.
Our motivation is that dependence among mediators in modern omics studies is often driven by a few common factors, so that an approximate factor structure can provide a useful and parsimonious representation (Bai 2003, Fan et al. 2013). By separating pervasive factordriven dependence from idiosyncratic variation, factor-adjusted methods have been shown to substantially improve inference and multiple testing accuracy (Fan et al. 2019(Fan et al. , 2024)).
A fundamental difference between our setting and existing factor-adjusted frameworks is that traditional approximate factor models are applied to observable data, whereas we innovatively apply factor analysis to the unobserved errors of the mediator model. This requires a two-step construction: we first estimate the latent factor component and obtain estimated idiosyncratic components, which serve as decorrelated pseudo-mediators. We then use these pseudo-mediators for downstream debiased inference and multiple testing.
This shift introduces new technical challenges as the first-step estimation error propagates into downstream procedure.
We establish the asymptotic normality of the debiased estimator under mild regular con-ditions and develop a theoretically valid FDR control rule for individual mediation effects.
Extensive simulations further demonstrate strong finite-sample performance: FADMT controls FDR across a wide range of dependence structures while maintaining competitive power relative to existing methods. We further apply our method to a multi-omics dataset from the TCGA-BRCA cohort, investigating whether DNA methylation mediates the effect of age at diagnosis on MKI67 gene expression, and to a financial stock connect setting, examining whether market liberalization affects firms’ idiosyncratic risk through changes in corporate fundamentals. These applications demonstrate the method’s ability to uncover interpretable mediation effects in real-world high-dimensional data across domains. This paper makes several key contr
This content is AI-processed based on open access ArXiv data.