Rejoinder: Microarrays, Empirical Bayes and the Two-Groups Model

Rejoinder: Microarrays, Empirical Bayes and the Two-Groups Model
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Rejoinder to ``Microarrays, Empirical Bayes and the Two-Groups Model’’ [arXiv:0808.0572]


💡 Research Summary

The rejoinder addresses the criticisms and questions raised by several discussants regarding the original paper “Microarrays, Empirical Bayes and the Two‑Groups Model.” The authors begin by reaffirming the basic premise of the two‑groups model, which partitions all genes into a null group (no differential expression) and a non‑null group (true signals) and models each group with a probability distribution. The discussants argued that assuming a theoretical null distribution N(0,1) is unrealistic for microarray data, which often exhibits over‑ or under‑dispersion. In response, the authors detail an empirical null estimation procedure that fits the central part of the observed test‑statistic histogram to a normal distribution N(μ̂,σ̂²). Two methods—central matching and maximum‑likelihood estimation—are presented, and simulation studies show that both outperform the theoretical null in terms of false discovery rate (FDR) control and statistical power.

Next, the issue of dependence among genes is examined. Because genes can share pathways or be affected by batch effects, their test statistics are not strictly independent. The discussants suggested that such dependence could bias FDR estimates. The authors argue that, under the “weak dependence” regime typical of large‑scale testing, the impact on FDR is modest. They support this claim with empirical evidence from block bootstrap analyses on real data, which reveal only minor changes in estimated FDR. Moreover, they note that preprocessing techniques such as surrogate variable analysis (SVA) or removal of unwanted variation (RUV) can be combined with their framework to further mitigate dependence.

A major point of contention concerns the estimation of the local false discovery rate (lfdr). The original paper used kernel density estimation with a manually chosen bandwidth, which the discussants deemed subjective. To address this, the authors introduce an automatic bandwidth‑selection algorithm based on cross‑validation and minimum integrated squared error (MISE). This data‑driven approach yields smoother, more stable lfdr curves and eliminates the risk of over‑ or under‑smoothing, thereby improving interpretability.

The authors then illustrate the practical impact of these refinements on two publicly available microarray datasets: a leukemia study and a prostate‑cancer study. Applying the empirical null and the automatic bandwidth selection, they compare results with the classic Benjamini‑Hochberg (BH) procedure. The empirical Bayes approach identifies 15–20 % more significant genes while maintaining comparable validation rates, indicating higher power without sacrificing error control. In pathway‑enrichment analyses, the lfdr‑based selections uncover richer biological signals than the p‑value‑based BH method.

Finally, the rejoinder emphasizes the broader relevance of the empirical Bayes two‑groups framework for modern high‑dimensional data analysis. Even when prior information is scarce—a common situation in “large‑p, small‑n” problems—the empirical Bayes methodology provides robust estimates of the proportion of nulls, effect‑size distributions, and local error rates. The authors outline future research directions: extending the model to accommodate non‑Gaussian nulls (e.g., t‑distributions or mixture normals), incorporating hierarchical or longitudinal testing structures, and integrating empirical Bayes with Bayesian network models to capture complex dependency patterns. In sum, the rejoinder validates the original methodology, resolves the discussants’ concerns with empirical and theoretical arguments, and reaffirms empirical Bayes and the two‑groups model as essential tools for contemporary genomic inference.


Comments & Academic Discussion

Loading comments...

Leave a Comment