Multiple hypothesis testing with false discovery rate (FDR) control is a fundamental problem in statistical inference, with broad applications in genomics, drug screening, and outlier detection. In many such settings, researchers may have access not only to real experimental observations but also to auxiliary or synthetic data -- from past, related experiments or generated by generative models -- that can provide additional evidence about the hypotheses of interest. We introduce SynthBH, a synthetic-powered multiple testing procedure that safely leverages such synthetic data. We prove that SynthBH guarantees finite-sample, distribution-free FDR control under a mild PRDS-type positive dependence condition, without requiring the pooled-data p-values to be valid under the null. The proposed method adapts to the (unknown) quality of the synthetic data: it enhances the sample efficiency and may boost the power when synthetic data are of high quality, while controlling the FDR at a user-specified level regardless of their quality. We demonstrate the empirical performance of SynthBH on tabular outlier detection benchmarks and on genomic analyses of drug-cancer sensitivity associations, and further study its properties through controlled experiments on simulated data.
Multiple hypothesis testing is a cornerstone of modern statistical inference. In large-scale scientific studiesincluding genomics, drug discovery, high-throughput screening, and large-scale anomaly detection-researchers routinely test thousands to millions of hypotheses simultaneously. In such regimes, controlling the false discovery rate (FDR) (Benjamini and Hochberg, 1995) has become a default target, as it offers a favorable balance between statistical validity and power compared to more stringent criteria such as family-wise error control.
A persistent bottleneck in these applications is that the amount of trusted real data is often limited. For example, in genomics, the number of reliably measured samples can be small relative to the dimensionality; in drug-cancer sensitivity studies, each additional experiment can be costly; and in outlier detection, obtaining a clean reference set of inliers can be expensive or require manual verification. At the same time, practitioners increasingly have access to large amounts of auxiliary data that are not fully trustworthy but are often informative: past experiments on related populations, weakly labeled or automatically curated datasets, and synthetic samples generated by modern generative models. Such auxiliary datasets can be far larger than the real dataset, and when they are high quality, they have the potential to dramatically sharpen statistical evidence. However, because the auxiliary data distribution can differ from the real one in unknown ways, naively pooling real and synthetic data in classical testing pipelines can lead to spurious discoveries and inflated FDR. This tension creates a basic methodological gap. On the one hand, ignoring synthetic data can be overly conservative and low-powered in small-sample regimes. On the other hand, treating synthetic data as if they were real can destroy the very error guarantees that make multiple testing scientifically reliable. This paper asks a concrete question: Can we leverage arbitrary synthetic/auxiliary data to improve the power of multiple testing, while guaranteeing finite-sample FDR control that is robust to unknown synthetic data quality?
2 Synthetic-Powered P-Values Setting. We consider a standard multiple hypothesis testing framework with m hypotheses H 1 , . . . , H m . Each of hypothesis refers to a null claim-for example, H j referring to “the presence of the j-th genomic feature is not associated with the response to a given drug.” For each hypothesis H j , we assume access to (i) a valid p-value p j computed from the trusted real data only, and (ii) an additional p-value pj computed from the merged real-andsynthetic dataset. The merged p-value pj can be substantially more informative when the synthetic data are of high quality (since it effectively uses a larger sample), but it is generally not guaranteed to be valid under the null because the synthetic data distribution may be arbitrary.
Our contributions in this work begin with the formulation of a synthetic-powered p-value. Let a ∧ b := min(a, b) and a ∨ b := max(a, b) denote the minimum and maximum of two numbers, respectively. The synthetic-powered p-value at level δ ≥ 0 is defined as pδ j = p j ∧ (p j ∨ (p j -δ)).
(
This construction is deliberately careful in using synthetic data. Suppose we are handed high-quality synthetic data. If they provide very strong evidence, so that pj is smaller than p j , we want to use it instead of the real data p-value p j ; and we take p j ∧ pj to achieve this. At the same time, since the quality of the synthetic data is unknown in general, we account for the possibility that they may be poor or misleading, and we cap the effect of the synthetic p-value at p j -δ.
Generally, pδ j is not guaranteed to be a super-uniform variable-so not technically a classical p-value-but we use the term “synthetic-powered p-value” with perhaps a slight abuse of notation, because we will use it as an input to multiple testing procedures. 1From p-values to FDR control: SynthBH. The main technical contribution of this paper is to show how to turn the guarded p-values pδ j , for appropriate choices of δ ≥ 0, into a multiple testing procedure with provable FDR control. We consider a user-specified admission cost ε ≥ 0 for incorporating synthetic data: larger ε allows potentially larger power gains, but can also potentially increase false discoveries.
Our proposed method, SynthBH (Algorithm 1), is a Benjamini-Hochberg type step-up procedure that uses rankadaptive guardrails. When considering k candidate rejections, it only allows each p-value to be reduced by at most kε/m. This adaptive calibration is crucial: it allows the smallest p-values-those most likely to be nonnulls-to benefit from synthetic data, while preventing synthetic data from causing an uncontrolled proliferation of false discoveries. Under a natural positive dependence condition (a mild extension of classical conditions), we prove that SynthBH controls t
This content is AI-processed based on open access ArXiv data.