Learning False Discovery Rate Control via Model-Based Neural Networks
Controlling the false discovery rate (FDR) in high-dimensional variable selection requires balancing rigorous error control with statistical power. Existing methods with provable guarantees are often overly conservative, creating a persistent gap between the realized false discovery proportion (FDP) and the target FDR level. We introduce a learning-augmented enhancement of the T-Rex Selector framework that narrows this gap. Our approach replaces the analytical FDP estimator with a neural network trained solely on diverse synthetic datasets, enabling a substantially tighter and more accurate approximation of the FDP. This refinement allows the procedure to operate much closer to the desired FDR level, thereby increasing discovery power while maintaining effective approximate control. Through extensive simulations and a challenging synthetic genome-wide association study (GWAS), we demonstrate that our method achieves superior detection of true variables compared to existing approaches.
💡 Research Summary
This paper addresses the longstanding trade‑off in high‑dimensional variable selection between rigorous false discovery rate (FDR) control and statistical power (true positive rate, TPR). The recently proposed T‑Rex selector offers a scalable, finite‑sample FDR guarantee by augmenting the design matrix with random dummy variables and using an analytical estimator of the false discovery proportion (FDP). However, that analytical estimator is provably conservative: it overestimates the true FDP, causing the procedure to operate far below the target FDR level (α). The resulting over‑cautiousness especially hurts performance in low signal‑to‑noise ratio (SNR) regimes, where many truly active variables are discarded.
The authors propose to replace the conservative analytical FDP estimator with a data‑driven neural network that learns a tighter approximation of the FDP from synthetic data. The network takes as input the flattened vector of T‑Rex statistics – the relative frequency matrix Φ together with the calibration parameters (v, T, L) – and outputs a predicted FDP constrained to
Comments & Academic Discussion
Loading comments...
Leave a Comment