Microarrays, Empirical Bayes and the Two-Groups Model
The classic frequentist theory of hypothesis testing developed by Neyman, Pearson and Fisher has a claim to being the twentieth century’s most influential piece of applied mathematics. Something new is happening in the twenty-first century: high-throughput devices, such as microarrays, routinely require simultaneous hypothesis tests for thousands of individual cases, not at all what the classical theory had in mind. In these situations empirical Bayes information begins to force itself upon frequentists and Bayesians alike. The two-groups model is a simple Bayesian construction that facilitates empirical Bayes analysis. This article concerns the interplay of Bayesian and frequentist ideas in the two-groups setting, with particular attention focused on Benjamini and Hochberg’s False Discovery Rate method. Topics include the choice and meaning of the null hypothesis in large-scale testing situations, power considerations, the limitations of permutation methods, significance testing for groups of cases (such as pathways in microarray studies), correlation effects, multiple confidence intervals and Bayesian competitors to the two-groups model.
💡 Research Summary
The paper addresses a fundamental shift in statistical hypothesis testing brought about by high‑throughput technologies such as microarrays, where thousands of simultaneous tests are routine. Classical Neyman‑Pearson‑Fisher theory was designed for a small number of independent tests and relies on a fixed null hypothesis, a p‑value, and family‑wise error control. In large‑scale settings these tools become inadequate: the sheer number of tests inflates the probability of false discoveries, and the assumption of a single, exact point null often fails to capture the subtle biological reality.
To meet these challenges the authors introduce the “two‑groups model,” a simple Bayesian mixture framework that treats the entire collection of test statistics as arising from a mixture of a null distribution (f_0(z)) and a non‑null distribution (f_1(z)). The model contains two key parameters: the proportion of true nulls (\pi_0) and the shape of the alternative distribution. By estimating (\pi_0) and (f_1) directly from the data, one can compute the local false discovery rate (lfdr) for each test, i.e., the posterior probability that the corresponding null hypothesis is true. This posterior view provides richer information than a raw p‑value and links naturally to the frequentist false discovery rate (FDR) framework.
The authors carefully compare the two‑groups approach with the Benjamini‑Hochberg (BH) procedure. BH controls the expected proportion of false discoveries (global FDR) by ordering p‑values and applying a deterministic threshold. While BH is simple and widely used, it does not give test‑specific posterior probabilities and assumes a fixed point null. In contrast, the empirical Bayes mixture yields test‑specific lfdr values, allows adaptive thresholding (e.g., “adaptive BH”), and often yields higher power because the estimated (\pi_0) can be substantially less than 1, reflecting the fact that many tests are truly non‑null.
A substantial portion of the paper is devoted to the definition of the null hypothesis in large‑scale testing. The authors argue that a strict point null ((\mu=0)) is rarely appropriate for genomic data; instead, a “practical” or “interval” null that tolerates small effect sizes is more realistic. The empirical Bayes framework automatically incorporates this notion by letting the data determine the shape of the empirical null distribution, which may be wider or shifted relative to the theoretical standard normal.
Permutation methods, long a staple for assessing significance when parametric assumptions are doubtful, are examined critically. While permutation preserves the joint dependence structure and yields exact null distributions for a given test statistic, it is computationally intensive for millions of tests and does not provide individual lfdr estimates. Moreover, permutation assumes exchangeability, which can be violated in the presence of complex batch effects or heteroscedasticity. The two‑groups model, by contrast, offers a scalable alternative: it uses the observed marginal distribution of test statistics to infer the mixture components, sidestepping the need for exhaustive resampling.
The paper extends the mixture framework to group‑level inference, such as testing whether an entire pathway or gene set shows coordinated differential expression. By aggregating test statistics within a set and computing a set‑wise lfdr, one can assess the evidence for collective activity while still controlling the overall FDR. The authors also discuss correlation among tests, noting that positive dependence can inflate the nominal FDR if ignored. They propose empirical null estimation that absorbs correlation effects and suggest bootstrap‑based adjustments to maintain accurate error rates.
Finally, the authors explore the construction of multiple confidence intervals in a high‑dimensional context. Using the posterior distribution from the mixture model, one can form credible intervals for each effect size and, by borrowing strength across tests, construct simultaneous credible regions that respect a pre‑specified overall coverage probability. This Bayesian approach yields intervals that are naturally adaptive to the amount of information in each test and avoids the conservatism of traditional Bonferroni‑adjusted intervals.
In summary, the paper presents a compelling synthesis of Bayesian and frequentist ideas tailored to the realities of modern genomics. The empirical Bayes two‑groups model provides a principled way to estimate the proportion of true nulls, to compute test‑specific posterior error probabilities, and to adaptively control the false discovery rate. It addresses practical concerns such as the choice of null, the limitations of permutation, the handling of correlated tests, and the extension to pathway‑level analysis. By bridging the two statistical paradigms, the authors offer a flexible, powerful, and computationally feasible toolkit for large‑scale hypothesis testing, with implications far beyond microarrays to any domain where massive multiple testing is the norm.
Comments & Academic Discussion
Loading comments...
Leave a Comment