We introduce and analyze a waiting time model for the accumulation of genetic changes. The continuous time conjunctive Bayesian network is defined by a partially ordered set of mutations and by the rate of fixation of each mutation. The partial order encodes constraints on the order in which mutations can fixate in the population, shedding light on the mutational pathways underlying the evolutionary process. We study a censored version of the model and derive equations for an EM algorithm to perform maximum likelihood estimation of the model parameters. We also show how to select the maximum likelihood poset. The model is applied to genetic data from different cancers and from drug resistant HIV samples, indicating implications for diagnosis and treatment.
The genetic progression of cancer is characterized by the accumulation of mutations in oncogenes and in tumor suppressor genes. Recent studies have shown that during the somatic evolution of cancer mutations in over 100 human genes are selected for, suggesting their beneficial effect on the growth of the cancer cell (Sjöblom et al., 2006).
In HIV infection, the virus acquires mutations in CTL epitopes that interfere with the immune response. This evolutionary process is specific for the genetic makeup of the infected host. Recently, a total of 478 CTL escape mutations have been identified in the HIV genome (Brumme et al., 2007).
Under drug treatment, HIV develops mutations that confer resistance to the applied drugs. Eventually, this evolutionary escape leads to therapy failure. More than 50 drug resistanceassociated mutations are known in three different HIV proteins (Johnson et al., 2006).
These three evolutionary scenarios have in common that, for the population of individuals, several mutations are available which increase fitness. Adaption is therefore characterized by the accumulation of these beneficial mutations which are virtually non-reversible.
In this paper, we introduce a statistical model for the accumulation of genetic changes. The continuous time conjunctive Bayesian network (CT-CBN) is a continuous time Markov chain model, defined by a partially ordered set (poset) of advantageous mutations, and the rate of fixation for each mutation. The partial order encodes constraints on the succession in which mutations can occur and fixate in the population. We assume that the fixation times follow independent exponential distributions. The exponential waiting process for a mutation starts only when all predecessor mutations of that mutation in the poset have already occurred. The order constraints and waiting times reveal important information on the underlying biological process with implications for diagnosis and treatment.
The CT-CBN is a continuous time analogue of the discrete conjunctive Bayesian network (D-CBN) introduced by Beerenwinkel et al. (2006). The D-CBN was shown to have very desirable statistical and algebraic properties (Beerenwinkel et al.). We argue that the continuous time CBN is the more natural model for the waiting process described above, and we explore the connection to the discrete CBN. A special case of the D-CBN, where the poset is a tree, is known as the oncogenetic or mutagenetic tree model (Desper et al., 1999;Beerenwinkel et al., 2005b,c). It has been applied to the somatic evolution of cancer (Radmacher et al., 2001;Rahnenführer et al., 2005) and to the evolution of drug resistance in HIV (Beerenwinkel et al., 2005a). The basic mutagenetic tree model has been extended to a mixture model (Beerenwinkel et al., 2005b) and to account for longitudinal data (Beerenwinkel and Drton, 2007).
A related tree model by von Heydebreck et al. (2004) represents the genetic changes at the leaves of the tree and regards the interior vertices as hidden events. Several authors have considered larger model classes, including general Bayesian networks (Simon et al., 2000;Deforche et al., 2006) and general Markov chain models on the state space of mutational patterns (Foulkes and DeGruttola, 2003;Hjelm et al., 2006). As compared to trees and posets, these models are more flexible in describing mutational pathways, but parameter estimation and model selection is considerably more difficult. In fact, the number of free parameters of these models is typically exponential in the number of mutations. By contrast, in the CT-CBN model, the number of free parameters equals the number of mutations. We demonstrate that parameter estimation and selection of an optimal poset can be performed efficiently for CT-CBNs. Thus, they provide an attractive framework for modeling the accumulation of mutations, especially if the number of mutations is moderate or large.
We formally define the CT-CBN in the next Section 2 and derive some basic properties of the model. The CT-CBN is an example of a regular exponential family with closed form maximum likelihood estimates (MLEs). In Section 3, we make precise the relation between the CT-CBN and the D-CBN. Section 4 deals with a censored version of the CT-CBN which is most relevant for observed data. The censored model lacks a closed form expression for the MLE, but has a natural EM algorithm for approximating maximum likelihood estimates. We apply our methods in Section 5 to genetic data from cancer cells and from drug resistant HI viruses. We close with a discussion in Section 6.
In this section, we introduce and describe some of the basic properties of continuous time conjunctive Bayesian networks (CT-CBN). These models are continuous time Markov chain models on the distributive lattice of a poset. To begin, we review some background material from combinatorics. The relevant combinatorial material can be found in introductory sections of Beerenwinkel et a
This content is AI-processed based on open access ArXiv data.