mmcmcBayes:An R Package Implementing a Multistage MCMC Framework for Detecting the Differentially Methylated Regions
Identifying differentially methylated regions is an important task in epigenome-wide association studies, where differential signals often arise across groups of neighboring CpG sites. Many existing methods detect differentially methylated regions by aggregating CpG-level test results, which may limit their ability to capture complex regional methylation patterns. In this paper, we introduce the R package mmcmcBayes, which implements a multistage Markov chain Monte Carlo procedure for region-level detection of differentially methylated regions. The method models sample-wise regional methylation summaries using the alpha-skew generalized normal distribution and evaluates evidence for differential methylation between groups through Bayes factors. We use a multistage region-splitting strategy to refine candidate regions based on statistical evidence. We describe the underlying methodology and software implementation, and illustrate its performance through simulation studies and applications to Illumina 450K methylation data. The mmcmcBayes package provides a practical region-level alternative to existing CpG-based differentially methylated regions detection methods and includes supporting functions for summarizing, comparing, and visualizing detected regions.
💡 Research Summary
The paper introduces mmcmcBayes, an R package that implements a multistage Markov chain Monte Carlo (MCMC) framework for detecting differentially methylated regions (DMRs) at the region level rather than aggregating CpG‑wise test statistics. After converting β‑values to M‑values via a logit transformation (with a tiny offset for numerical stability), the method models the regional M‑value summaries using the alpha‑skew generalized normal (ASGN) distribution. The ASGN distribution extends the normal by adding a skewness parameter (α) while retaining location (ν) and scale (δ²) parameters, allowing it to capture the asymmetric and potentially bimodal shapes observed in real methylation data.
For each genomic segment, separate ASGN models are fitted to the two groups (e.g., cancer vs. control). Under the null hypothesis (H₀) the groups share a common ASGN distribution; under the alternative hypothesis (H₁) they have distinct parameters. Evidence for differential methylation is quantified by a Bayes factor (BF), computed from the posterior estimates of the ASGN parameters obtained via MCMC sampling. The multistage procedure starts with coarse genomic windows and recursively splits any segment whose BF exceeds a user‑specified threshold. Splitting continues up to a maximum number of stages (default = 3) or until no segment meets the splitting criterion. Posterior means from one stage become the priors for the next, ensuring information flow across stages while keeping prior variance fixed (typically 1) to avoid over‑constraining later, finer analyses.
The primary function mmcmcBayes() accepts two data frames (one per group) containing M‑values for each CpG site, ordered by chromosome and position. Optional arguments let users control the starting stage, maximum stages, number of sub‑segments per split (num_splits), MCMC settings (nburn, niter, thin), prior hyper‑parameters, and stage‑specific BF thresholds. Missing CpG values are ignored when computing region‑level means. Supporting functions (summarize_dmrs(), compare_dmrs(), plot_dmr_region()) facilitate downstream inspection, comparison across analyses, and visualization of mean M‑value profiles for identified DMRs.
Simulation studies demonstrate that with max_stages = 3 and num_splits = 50, the method balances false discovery rate (FDR) and detection precision better than several popular CpG‑based tools (DMRcate, bumphunter, DSS, bsseq). The approach is particularly adept at locating narrow, asymmetric DMRs that would be missed by kernel‑smoothing or fixed‑window methods. Application to Illumina 450K array data from cancer vs. normal samples identified biologically plausible DMRs overlapping known oncogenes and tumor suppressors, many of which were not reported by existing packages.
Key strengths of mmcmcBayes include: (1) direct modeling of skewed/bimodal methylation distributions via ASGN, (2) adaptive region refinement without pre‑specified window sizes, and (3) intuitive Bayesian evidence through Bayes factors, avoiding permutation‑based p‑values. Limitations involve the computational burden of MCMC, especially for whole‑genome datasets, and sensitivity to initial region definition and BF thresholds, which may require user tuning. Future work suggested by the authors involves parallelizing the MCMC engine, automating initial region selection, and extending the framework to sequencing‑based methylation data (e.g., whole‑genome bisulfite sequencing).
In summary, mmcmcBayes provides a robust, flexible, and statistically principled alternative for region‑level differential methylation analysis, complementing and in some scenarios surpassing existing CpG‑centric methodologies.
Comments & Academic Discussion
Loading comments...
Leave a Comment