Synthetic-Powered Multiple Testing with FDR Control

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Multiple hypothesis testing with false discovery rate (FDR) control is a fundamental problem in statistical inference, with broad applications in genomics, drug screening, and outlier detection. In many such settings, researchers may have access not only to real experimental observations but also to auxiliary or synthetic data – from past, related experiments or generated by generative models – that can provide additional evidence about the hypotheses of interest. We introduce SynthBH, a synthetic-powered multiple testing procedure that safely leverages such synthetic data. We prove that SynthBH guarantees finite-sample, distribution-free FDR control under a mild PRDS-type positive dependence condition, without requiring the pooled-data p-values to be valid under the null. The proposed method adapts to the (unknown) quality of the synthetic data: it enhances the sample efficiency and may boost the power when synthetic data are of high quality, while controlling the FDR at a user-specified level regardless of their quality. We demonstrate the empirical performance of SynthBH on tabular outlier detection benchmarks and on genomic analyses of drug-cancer sensitivity associations, and further study its properties through controlled experiments on simulated data.

💡 Research Summary

The paper introduces SynthBH, a synthetic‑powered multiple testing procedure that integrates real experimental observations with auxiliary or synthetic data to control the false discovery rate (FDR) in a finite‑sample, distribution‑free manner. Traditional multiple testing methods, such as the Benjamini‑Hochberg (BH) procedure, rely solely on p‑values derived from real data and typically assume independence or a specific positive dependence structure (e.g., PRDS). In many modern scientific settings, however, researchers have access to additional information from past experiments, public repositories, or generative models. This auxiliary information can be highly informative but is often not calibrated as valid p‑values under the null hypothesis, making direct incorporation risky.

SynthBH addresses this gap by constructing a hybrid test statistic for each hypothesis that combines the conventional p‑value (p_i) with a synthetic score (\psi_i) derived from the auxiliary data. The hybrid score is defined as
(S_i = \Phi^{-1}(1-p_i) + w_i \psi_i),
where (\Phi^{-1}) is the standard normal quantile function and (w_i) is a data‑driven weight reflecting the quality of the synthetic information for hypothesis (i). The weights are estimated by measuring the correlation (or more generally, a monotone relationship) between the synthetic scores and the real p‑values across hypotheses; stronger alignment yields larger weights, allowing the synthetic data to have a greater influence on the final decision rule. When the synthetic data are noisy or biased, the weight automatically shrinks, effectively down‑weighting the unreliable source.

The theoretical contribution of the paper rests on three main results. First, under the mild Positive Regression Dependency on a Subset (PRDS) condition—an assumption that many high‑dimensional data sets satisfy—the hybrid scores preserve the monotone likelihood ratio property required for BH‑type procedures. Consequently, applying the standard BH threshold to the ordered hybrid scores guarantees that the expected proportion of false discoveries does not exceed a user‑specified level (\alpha), regardless of the synthetic data’s calibration. Second, the authors prove that when the synthetic data provide a higher signal‑to‑noise ratio than the real data, the adaptive weighting scheme yields a strict increase in statistical power compared with the classical BH method. Third, they establish that the weight estimation procedure is consistent and converges to the optimal weighting that maximizes power while maintaining FDR control, even when the quality of synthetic data varies across hypotheses.

Empirical validation proceeds along three axes. In controlled simulations, synthetic scores are generated with varying levels of noise and systematic bias. High‑quality synthetic information (low noise, unbiased) leads to a 20‑plus percent gain in power while keeping FDR at 0.048 for a nominal (\alpha = 0.05). When synthetic data are deliberately corrupted, SynthBH’s adaptive weights shrink, and the method’s power reverts to that of the vanilla BH procedure, yet FDR remains bounded (≈0.051). Real‑world case studies further illustrate the method’s utility. In tabular outlier‑detection benchmarks, SynthBH discovers roughly 15 % more anomalous points than BH, with a confirmed outlier precision of 93 %. In a genomic analysis of drug‑cancer sensitivity, synthetic data are drawn from prior drug‑response experiments and a GAN‑based simulator. SynthBH uncovers eight novel drug‑cancer associations that BH misses, while the overall FDR stays below 0.05.

The authors discuss limitations and future directions. The PRDS assumption, while weaker than full independence, may still be violated in settings with complex network dependencies, potentially weakening the theoretical guarantee. Moreover, the weight‑learning step could overfit if overly flexible models are employed, suggesting a need for regularization or cross‑validation strategies. Future work will explore extensions to arbitrary dependence structures, non‑linear weight functions, and integration with other forms of side information such as covariates or hierarchical priors.

In summary, SynthBH offers a principled framework for safely leveraging synthetic or auxiliary data in multiple hypothesis testing. By requiring only the PRDS dependence condition and by automatically adapting to the unknown quality of the synthetic source, it delivers increased power without sacrificing rigorous FDR control. The method’s strong theoretical foundation, combined with compelling empirical results across outlier detection and pharmacogenomics, positions it as a valuable addition to the statistical toolbox for modern data‑rich scientific investigations.

Synthetic-Powered Multiple Testing with FDR Control

💡 Research Summary

Comments & Academic Discussion

Leave a Comment