Application of Volcano Plots in Analyses of mRNA Differential Expressions with Microarrays
Volcano plot displays unstandardized signal (e.g. log-fold-change) against noise-adjusted/standardized signal (e.g. t-statistic or -log10(p-value) from the t test). We review the basic and an interactive use of the volcano plot, and its crucial role in understanding the regularized t-statistic. The joint filtering gene selection criterion based on regularized statistics has a curved discriminant line in the volcano plot, as compared to the two perpendicular lines for the “double filtering” criterion. This review attempts to provide an unifying framework for discussions on alternative measures of differential expression, improved methods for estimating variance, and visual display of a microarray analysis result. We also discuss the possibility to apply volcano plots to other fields beyond microarray.
💡 Research Summary
This review paper provides a comprehensive overview of volcano plots as a visual and analytical tool for differential expression studies using microarray data. It begins by contextualizing the evolution of microarray technology, noting that while the platform enables simultaneous measurement of thousands of mRNA transcripts, it also introduces challenges such as platform‑to‑platform variability, batch effects, noise characteristics, and limited dynamic range. The authors argue that many of these issues can be mitigated through careful experimental design, robust normalization, quality control, and appropriate statistical modeling.
The core of the manuscript focuses on two fundamental metrics used to assess differential expression: log‑fold‑change (log‑FC) and the t‑statistic derived from a two‑sample t‑test (or its Welch variant). By applying a log10 transformation to raw fluorescence intensities, the authors demonstrate that the distribution of expression values becomes approximately normal, which justifies the use of parametric tests. They derive the relationship between log‑FC and the t‑statistic, showing that the t‑statistic is essentially the log‑FC divided by an estimate of its standard error. This relationship highlights why the t‑statistic incorporates both effect size and variability, whereas log‑FC alone ignores noise.
A volcano plot is defined as a scatter plot with log‑FC on the x‑axis and either –log10(p‑value) or the t‑statistic on the y‑axis. The paper illustrates that the two y‑axis choices are highly correlated, producing similar visual patterns. The authors critique the common “double‑filtering” approach, which applies independent thresholds on absolute log‑FC and on the t‑statistic (or p‑value). In a volcano plot, this corresponds to cutting off two rectangular corners from the origin. While straightforward, double‑filtering can discard genes that are biologically interesting but fail one of the arbitrary cut‑offs, such as genes with modest fold change but highly significant p‑values, or genes with large fold change driven by a few outliers.
To address these limitations, the review introduces regularized (or moderated) statistics, where the sample variance is augmented by a positive constant (often denoted s0). This stabilizes variance estimates, especially when sample sizes are small, and yields a “joint filtering” criterion that appears as a curved decision boundary in the volcano plot rather than two perpendicular lines. The curved boundary naturally balances effect size against variability, reducing the risk of selecting genes with spuriously high t‑values due to underestimated variance.
The authors demonstrate these concepts using a publicly available dataset (37 cases, 18 controls, Illumina platform, 48,804 probes). They show examples of genes that would be selected by only one of the double‑filtering criteria, emphasizing the biological interpretation pitfalls of each. Interactive exploration is illustrated with an R script that uses the identify() function: clicking a point on the volcano plot instantly displays the gene name and its statistics, facilitating rapid hypothesis generation.
A substantial portion of the paper surveys Bioconductor packages relevant to volcano‑plot generation and regularized statistics. The limma package implements moderated t‑statistics via empirical Bayes shrinkage; edgeR and DESeq2 provide analogous methods for RNA‑seq count data; EnhancedVolcano and ggplot2 extensions enable highly customizable plots; and Shiny applications allow web‑based interactive exploration.
The review also proposes “stratified volcano plots,” where points are colored or sized according to additional attributes such as expression level, functional category, or sample subgroup. This enriches the plot with multi‑dimensional information without sacrificing the intuitive two‑dimensional layout.
In the discussion, the authors argue that volcano plots are not limited to microarray data. The same principle—displaying an unstandardized effect size against a standardized significance measure—applies to RNA‑seq, proteomics, metabolomics, and even non‑biological high‑throughput experiments where differential analysis is required. By unifying fold‑change and statistical significance in a single visual, volcano plots aid data exploration, gene selection for downstream validation, and clear communication of results in publications.
Overall, the paper positions the volcano plot as a bridge between simple descriptive statistics and sophisticated regularized inference, offering both a pedagogical framework for newcomers and practical guidance for experienced analysts seeking to improve the robustness and interpretability of their differential expression studies.
Comments & Academic Discussion
Loading comments...
Leave a Comment