Statistical inference of transmission fidelity of DNA methylation patterns over somatic cell divisions in mammals
We develop Bayesian inference methods for a recently-emerging type of epigenetic data to study the transmission fidelity of DNA methylation patterns over cell divisions. The data consist of parent-daughter double-stranded DNA methylation patterns with each pattern coming from a single cell and represented as an unordered pair of binary strings. The data are technically difficult and time-consuming to collect, putting a premium on an efficient inference method. Our aim is to estimate rates for the maintenance and de novo methylation events that gave rise to the observed patterns, while accounting for measurement error. We model data at multiple sites jointly, thus using whole-strand information, and considerably reduce confounding between parameters. We also adopt a hierarchical structure that allows for variation in rates across sites without an explosion in the effective number of parameters. Our context-specific priors capture the expected stationarity, or near-stationarity, of the stochastic process that generated the data analyzed here. This expected stationarity is shown to greatly increase the precision of the estimation. Applying our model to a data set collected at the human FMR1 locus, we find that measurement errors, generally ignored in similar studies, occur at a nontrivial rate (inappropriate bisulfite conversion error: 1.6$%$ with 80$%$ CI: 0.9–2.3$%$). Accounting for these errors has a substantial impact on estimates of key biological parameters. The estimated average failure of maintenance rate and daughter de novo rate decline from 0.04 to 0.024 and from 0.14 to 0.07, respectively, when errors are accounted for. Our results also provide evidence that de novo events may occur on both parent and daughter strands: the median parent and daughter de novo rates are 0.08 (80$%$ CI: 0.04–0.13) and 0.07 (80$%$ CI: 0.04–0.11), respectively.
💡 Research Summary
The paper introduces a Bayesian framework designed to infer the fidelity with which DNA methylation patterns are transmitted across somatic cell divisions in mammals. The authors focus on a novel and technically demanding data type: parent‑daughter double‑stranded methylation patterns obtained from single cells, each represented as an unordered pair of binary strings (one for each DNA strand). Because such data are costly to generate, the method must extract maximal information from a limited sample size.
To achieve this, the authors construct a hierarchical Bayesian model that simultaneously analyzes all CpG sites within a given locus. The model contains separate parameters for maintenance methylation failure, de novo methylation on the parent strand, and de novo methylation on the daughter strand. By placing a hierarchical prior on site‑specific rates, the approach allows each site to have its own rates while sharing a common hyper‑distribution, thereby preventing an explosion in the effective number of parameters. A key innovation is the incorporation of a “stationarity” prior that reflects the biological expectation that the underlying stochastic process is (near) stationary over many divisions; this prior dramatically sharpens posterior estimates when data are sparse.
Crucially, the authors explicitly model measurement error, distinguishing between incomplete bisulfite conversion (the most common source of false‑negative methylation calls) and sequencing errors. Both error types are given their own Bayesian parameters and are estimated jointly with the biological rates. This contrasts with many earlier studies that either ignore errors or treat them as fixed, known quantities.
The methodology is applied to a dataset collected at the human FMR1 locus, comprising parent‑daughter pairs from a modest number of single cells. Posterior inference reveals an incomplete bisulfite conversion error rate of 1.6 % (80 % credible interval 0.9–2.3 %). When this error is ignored, the estimated average maintenance failure rate is 0.04 and the daughter‑strand de novo rate is 0.14. After accounting for measurement error, these estimates drop to 0.024 and 0.07, respectively, indicating that neglecting error leads to substantial over‑estimation of biological rates.
The analysis also provides evidence that de novo methylation can occur on both strands. The posterior median for the parent‑strand de novo rate is 0.08 (80 % CI 0.04–0.13) and for the daughter‑strand de novo rate is 0.07 (80 % CI 0.04–0.11). This finding challenges the common assumption that new methylation events are confined to the daughter strand during replication.
Model checking through posterior predictive simulations demonstrates that the hierarchical, whole‑strand approach reproduces the observed distribution of methylation patterns far better than models that treat sites independently or that omit error modeling. The stationarity‑informed priors further reduce posterior variance, confirming the utility of incorporating biologically realistic constraints.
In summary, the study delivers a statistically rigorous, biologically informed, and computationally efficient tool for quantifying methylation transmission fidelity from scarce, high‑resolution epigenetic data. By jointly modeling whole‑strand patterns, hierarchical site variation, and measurement error, the authors achieve markedly improved precision and uncover nuanced aspects of methylation dynamics—namely, non‑negligible error rates, reduced maintenance failure, and the possibility of de novo methylation on both parental and daughter strands. The framework is readily extensible to other loci, species, or disease contexts where accurate epigenetic inheritance estimates are essential for understanding development, aging, or tumorigenesis.
Comments & Academic Discussion
Loading comments...
Leave a Comment