Waste Not, Want Not: Why Rarefying Microbiome Data is Inadmissible

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The interpretation of count data originating from the current generation of DNA sequencing platforms requires special attention. In particular, the per-sample library sizes often vary by orders of magnitude from the same sequencing run, and the counts are overdispersed relative to a simple Poisson model These challenges can be addressed using an appropriate mixture model that simultaneously accounts for library size differences and biological variability. This approach is already well-characterized and implemented for RNA-Seq data in R packages such as edgeR and DESeq. We use statistical theory, extensive simulations, and empirical data to show that variance stabilizing normalization using a mixture model like the negative binomial is appropriate for microbiome count data. In simulations detecting differential abundance, normalization procedures based on a Gamma-Poisson mixture model provided systematic improvement in performance over crude proportions or rarefied counts – both of which led to a high rate of false positives. In simulations evaluating clustering accuracy, we found that the rarefying procedure discarded samples that were nevertheless accurately clustered by alternative methods, and that the choice of minimum library size threshold was critical in some settings, but with an optimum that is unknown in practice. Techniques that use variance stabilizing transformations by modeling microbiome count data with a mixture distribution, such as those implemented in edgeR and DESeq, substantially improved upon techniques that attempt to normalize by rarefying or crude proportions. Based on these results and well-established statistical theory, we advocate that investigators avoid rarefying altogether. We have provided microbiome-specific extensions to these tools in the R package, phyloseq.

💡 Research Summary

The paper “Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible” provides a rigorous statistical critique of the common practice of rarefying (subsampling to equal library size) in microbiome sequencing analyses. The authors begin by highlighting that modern high‑throughput DNA sequencing generates count data with library sizes that can differ by orders of magnitude across samples. Because counts are discrete and over‑dispersed relative to a simple Poisson model, naïve normalization by converting counts to proportions or by rarefying fails to account for heteroscedasticity—the variance of a proportion estimate depends on the total number of reads in that sample.

Drawing on well‑established theory from RNA‑Seq, the authors advocate the use of a Gamma‑Poisson (negative‑binomial) hierarchical model: K_ij ∼ NB(s_j µ_i, φ_i), where s_j scales for library size, µ_i is the mean proportion for OTU i, and φ_i captures biological over‑dispersion. This model yields a variance ν = s_j µ_i + φ_i s_j² µ_i², reducing to Poisson when φ_i = 0. By sharing information across thousands of OTUs, the dispersion parameters can be estimated robustly even with few biological replicates, dramatically increasing power while controlling false‑positive rates.

The paper then dissects the rarefying procedure: (1) choose a minimum library size N_min, (2) discard samples below that threshold, (3) randomly subsample the remaining libraries without replacement to N_min reads. The authors argue that this discards valid data, introduces unnecessary stochasticity, and forces all samples to share the worst‑case variance, thereby sacrificing statistical power. A simple two‑sample example (100 vs. 1,000 reads) demonstrates that rarefying eliminates a statistically significant difference that is evident in the raw counts. Empirical data from the GlobalPatterns and Long‑Term Dietary Patterns studies further illustrate that variance scales with mean abundance, confirming over‑dispersion and the inadequacy of Poisson‑based assumptions.

To quantify the impact, the authors design two simulation frameworks. Simulation A mimics descriptive studies that compare whole‑community structures (e.g., UniFrac distances) and evaluates clustering accuracy after different normalizations. Simulation B mimics differential abundance testing at the OTU level. Both simulations vary library size, effect size, and number of replicates, and they compare four approaches: (i) raw proportions, (ii) rarefied counts, (iii) negative‑binomial methods (edgeR, DESeq), and (iv) a zero‑inflated Gaussian model implemented in metagenomeSeq. Results consistently show that negative‑binomial methods achieve the lowest false‑positive rates and highest true‑positive rates across a wide range of conditions. MetagenomeSeq performs reasonably when many replicates are available but still exhibits a higher false‑positive tendency. Rarefying, by contrast, suffers from severe power loss and can even discard entire samples that would otherwise be correctly clustered.

The authors also provide a mathematical proof (Supplementary Text S1) that subsampling is statistically sub‑optimal because it inflates variance by a factor proportional to the reduction in library size. They propose a “common‑scale” transformation (rounding K_ij · s_min / s_j) as a minimal improvement over rarefying, yet it still suffers from data loss.

In the discussion, the authors stress that rarefying is statistically inadmissible: it discards information that could be used to estimate dispersion and mean abundances more accurately. Instead, variance‑stabilizing transformations derived from the negative‑binomial model should be employed. They have extended the phyloseq R package with microbiome‑specific wrappers for edgeR and DESeq, enabling researchers to apply these robust methods without leaving the familiar phyloseq workflow.

In conclusion, the paper provides compelling theoretical, simulated, and empirical evidence that rarefying should be abandoned in microbiome research. Adoption of negative‑binomial mixture models, as implemented in edgeR, DESeq, and the extended phyloseq tools, offers superior power, better control of false discoveries, and retains all observed data, thereby improving the reliability and reproducibility of microbiome studies.

Waste Not, Want Not: Why Rarefying Microbiome Data is Inadmissible

💡 Research Summary

Comments & Academic Discussion

Leave a Comment