Estimating the False Discovery Rate of Variable Selection

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We introduce a generic estimator for the false discovery rate of any model selection procedure, in common statistical modeling settings including the Gaussian linear model, Gaussian graphical model, and model-X setting. We prove that our method has a conservative (non-negative) bias in finite samples under standard statistical assumptions, and provide a bootstrap method for assessing its standard error. For methods like the Lasso, forward-stepwise regression, and the graphical Lasso, our estimator serves as a valuable companion to cross-validation, illuminating the tradeoff between prediction error and variable selection accuracy as a function of the model complexity parameter.

💡 Research Summary

The paper introduces a universal estimator for the false discovery rate (FDR) of any variable‑selection procedure, covering common settings such as the Gaussian linear model, Gaussian graphical model, and the non‑parametric model‑X framework. The authors start by decomposing the overall FDR into a sum of per‑variable contributions, FDR = ∑ₖ FDRₖ, where each FDRₖ is the expectation of the product of two factors: (i) the conditional probability that variable k is selected given a sufficient statistic Sₖ under the null hypothesis Hₖ, and (ii) an indicator that the usual two‑sided p‑value pₖ exceeds a threshold ζ, adjusted by ζ to control bias.

The first factor is estimated by Rao‑Blackwellization: because Sₖ is sufficient for the sub‑model under Hₖ, the conditional expectation E

Estimating the False Discovery Rate of Variable Selection

💡 Research Summary

Comments & Academic Discussion

Leave a Comment