Estimating the False Discovery Rate of Variable Selection
We introduce a generic estimator for the false discovery rate of any model selection procedure, in common statistical modeling settings including the Gaussian linear model, Gaussian graphical model, and model-X setting. We prove that our method has a conservative (non-negative) bias in finite samples under standard statistical assumptions, and provide a bootstrap method for assessing its standard error. For methods like the Lasso, forward-stepwise regression, and the graphical Lasso, our estimator serves as a valuable companion to cross-validation, illuminating the tradeoff between prediction error and variable selection accuracy as a function of the model complexity parameter.
💡 Research Summary
The paper introduces a universal estimator for the false discovery rate (FDR) of any variable‑selection procedure, covering common settings such as the Gaussian linear model, Gaussian graphical model, and the non‑parametric model‑X framework. The authors start by decomposing the overall FDR into a sum of per‑variable contributions, FDR = ∑ₖ FDRₖ, where each FDRₖ is the expectation of the product of two factors: (i) the conditional probability that variable k is selected given a sufficient statistic Sₖ under the null hypothesis Hₖ, and (ii) an indicator that the usual two‑sided p‑value pₖ exceeds a threshold ζ, adjusted by ζ to control bias.
The first factor is estimated by Rao‑Blackwellization: because Sₖ is sufficient for the sub‑model under Hₖ, the conditional expectation E
Comments & Academic Discussion
Loading comments...
Leave a Comment