Markov models for accumulating mutations
We introduce and analyze a waiting time model for the accumulation of genetic changes. The continuous time conjunctive Bayesian network is defined by a partially ordered set of mutations and by the rate of fixation of each mutation. The partial order encodes constraints on the order in which mutations can fixate in the population, shedding light on the mutational pathways underlying the evolutionary process. We study a censored version of the model and derive equations for an EM algorithm to perform maximum likelihood estimation of the model parameters. We also show how to select the maximum likelihood poset. The model is applied to genetic data from different cancers and from drug resistant HIV samples, indicating implications for diagnosis and treatment.
💡 Research Summary
The paper introduces a novel probabilistic framework for modeling the temporal accumulation of genetic alterations, called the continuous‑time conjunctive Bayesian network (CTCBN). In this model, each possible mutation is associated with a fixation rate λi, and the admissible orderings of mutations are encoded by a partially ordered set (poset). The poset captures biological constraints: a mutation can only occur after all its predecessor mutations have already fixed in the population. Consequently, the state of the system at any time is an “ideal” of the poset, i.e., a subset that respects the ordering constraints. The dynamics are described by a continuous‑time Markov process whose transition rates are non‑zero only for minimal elements of the current ideal, and the rate of each such transition equals the corresponding λi. This construction yields a sparse transition structure and reduces the effective state space dramatically compared to a naïve 2ⁿ enumeration.
A major methodological contribution is the treatment of censored observational data, which is common in clinical genomics where some mutations may be unobserved or observed only after a limited follow‑up period. The authors derive the complete‑data likelihood for the CTCBN and develop an Expectation‑Maximization (EM) algorithm that iteratively estimates both the fixation rates and the underlying poset. In the E‑step, given current parameter values, the expected number of occurrences of each unobserved mutation and the expected sojourn times in each state are computed analytically using properties of Poisson processes and the Markov transition matrix. In the M‑step, the fixation rates are updated in closed form (λi = total expected occurrences of mutation i divided by total expected time during which i was eligible), while the poset is refined by adding or removing ordering relations to maximize a penalized likelihood criterion.
Model selection is addressed through the Bayesian Information Criterion (BIC). Because the number of possible posets grows super‑exponentially, the authors propose a greedy search that incrementally modifies the current poset, evaluates the BIC score, and prunes branches that cannot improve the objective. Graph‑theoretic preprocessing (e.g., topological sorting, detection of transitive reductions) and branch‑and‑bound techniques are employed to keep the search tractable. Cross‑validation and bootstrap resampling are used to assess the stability of the selected structure.
The methodology is validated on simulated data where the true λi and poset are known; the EM algorithm reliably recovers both sets of parameters. Real‑world applications include several cancer cohorts (colorectal, breast, lung) and a dataset of HIV isolates that have developed resistance to antiretroviral therapy. In cancer data, the inferred posets reveal that certain driver mutations (e.g., TP53, KRAS) consistently precede others, and the ordering correlates with patient survival and response to targeted therapies. In the HIV case, the model uncovers a stepwise acquisition of resistance mutations, suggesting optimal sequencing of drug regimens to delay full resistance.
Overall, the paper demonstrates that representing mutational accumulation as a continuous‑time Markov process constrained by a poset provides a powerful and biologically interpretable way to capture epistatic and temporal dependencies among mutations. The EM‑based inference scheme handles censored data robustly, and the BIC‑guided structure search yields parsimonious yet predictive models. These advances have direct implications for precision oncology and infectious disease management, where understanding the likely pathways of genetic change can inform early diagnosis, prognostic assessment, and the design of adaptive treatment strategies.
Comments & Academic Discussion
Loading comments...
Leave a Comment