Dealing with Stochasticity in Biological ODE Models

Mathematical modeling with Ordinary Differential Equations (ODEs) has proven to be extremely successful in a variety of fields, including biology. However, these models are completely deterministic given a certain set of initial conditions. We conver…

Authors: Hamda Ajmal, Michael Madden, Catherine Enright

Dealing with Stochasticity in Biological ODE Models
Dealing with Stochasticity in Biological ODE Models ∗ Hamda Ajmal, Michael Madden and Catherine Enright School of Computer Science, National Uni versity of Ireland Gal way Email: h.ajmal1@nuigalway.ie , michael.madden@nuigalway.ie , cathenright@gmail.com Abstract Mathematical modeling with Ordinary Dif ferential Equations (ODEs) has prov en to be extremely successful in a variety of fields, including biology . Ho wev er , these models are completely deterministic gi ven a certain set of initial conditions. W e con v ert mathematical ODE models of three benchmark biological systems to Dynamic Bayesian Networks (DBNs). The DBN model can handle model uncertainty and data uncertainty in a principled manner . They can be used for temporal data mining for noisy and missing v ariables. W e apply Particle Filtering algorithm to infer the model variables by re-estimating the models parameters of v arious biological ODE models. The model parameters are automatically re-estimated using temporal evidence in the form of data streams. The results show that DBNs are capable of inferring the model v ariables of the ODE model with high accuracy in situations where data is missing, incomplete, sparse and irregular and true v alues of model parameters are not kno wn. Intr oduction Mathematical models of physical systems are widely av ailable in many domains (Ottesen et al., 2004). These models embody e xisting expert kno wledge and can be considered sufficient statistics of all prior experimentation in the domain. Mathematical modeling with ODEs has a v ery long tradition in biology . The earliest biological ODE models date back to 18th century (Malthus, 1798; V erhulst, 1845) . ODE models assume that the observed dynamics of a system are exclusi vely dri ven by internal, deterministic mechanisms and there is no uncertainty in the process. Ho we v er , in reality , biological systems are always sub- ject to ’unexplainable’ influences which are neither easily understood nor is it feasible to model them explicitly . All biological dynamical systems ev olv e under stochastic forces. Models of bi- ological systems are concerned with subsystems of the real world, therefore ideally , they must include the ef fect of e xternal random influences as they cannot be completely isolated from model (Ditle vsen and Samson, 2013). There are many factors in biological processes that are difficult to model explicitly with ODEs, for example, harmonic oscillations, enzymatic processes, or in- di vidual characteristics like body mass index, genes, lifestyle, habits, age etc. Such factors can contribute to the erratic beha vior in a model. Other factors may include measurement noise or data uncertainty . These multiple sources of noise need to be modeled explicitly , ho we ver , simple ODE models tend to ignore them, which ef fects the analysis of the studied biological system and makes them impractical to be applied to the real world. There is also increasing evidence of stochastic behavior in critical biological processes, such as gene regulation and cellular behavior . All these factors point to the possible need to account for stochasticity in mathematical models (Lipniacki et al., 2006a,b). ∗ T o cite this paper , please use this full conference citation: Hamda Ajmal, Michael Madden and Catherine Enright: Dealing with stochasticity in biological ODE models. In Pr oceedings of W orkshop on Computational Biology (WCB) , co-located with 34th International Conference on Machine Learning (ICML 2017), Sydney , 2017. 1 ODE models generally describe population lev el behaviors. For example, ODE models already exist to describe the pharmacokinetics and pharmacodynamics of all drugs (see Shargel et al., 2007). They are dev eloped during clinical trials, published for all approved drugs and used to determine safe dosage regimes for the average patient. Therefore, parameterization of these models is generic, typically done in a theoretical manner or based on laboratory or e xperimental data. Ho we ver , the critically ill patient’ s unique parameters can vary considerably from those of the av erage patient, such population based models often fail to capture specific clinical scenarios. T o describe individuals, model parameters must be re-calibrated using observations of the in- di vidual. Howe v er , the observed data may be missing or noisy , or it could sparse or infrequent relati ve to the dynamics of the underlying system thus, making indi vidualisation a challenging task. Pre vious research work (Enright and Madden, 2015) in our research group has sho wn that we can overcome the limitations of ODE models by mapping them to Dynamic Bayesian Networks (DBNs) incorporating a first order Euler solver . DBNs are well suited to handle uncertainty and deal with noisy and missing data. Learning the structure of the DBNs is a challenging task, espe- cially in situations where data is sparse or incomplete. The expert knowledge a v ailable in form of ODEs can be utilized as ODEs are encapsulated into the DBN structure. DBNs can reason effi- ciently with this powerful combination of domain kno wledge and real time data. They explicitly model measurement uncertainty and parameter uncertainty , allowing model parameters to be ad- justed from initial approximate v alues to their correct values using real-time evidence. By doing this, the knowledge elicitation bottleneck is bypassed. The technique was previously applied to the problem of modeling glycaemia in patients in an Intensi ve Care Unit (Enright et al., 2010), producing promising results. W e apply the methodology of encapsulating ODE models into DBNs on three small to medium sized benchmark biological ODE models using the software package PR OFET 1 and e v aluate the results of ODE model v ariable inference and model parameter re-estimation. 1 DBNs for Biological ODE Models In order to individualize a general biological ODE model to a specific case, we con vert it into a DBN using the software application PR OFET . The DBN framew ork explicitly models noise as measurement and parameter uncertainty and then reduces the uncertainty over time by indi vidu- alizing model parameters using temporal e vidence. W e apply this approach on three benchmark biological ODE models which were used by Dondelinger et al. (2013) to apply their ODE model parameter inference methods. W e e v aluate the results of ODE model v ariable inference and model parameter re-estimation on these ODE models; howe v er our experimental setup is different from that of Dondelinger et al. (2013). The essence of their work is to infer ODE model parameters from multiple noisy time series data. Our aim is to individualise the model parameters on a single time series data stream. Therefore, for each ODE model, benchmark data is obtained using the R package deSolve. The values of the model parameters and the initial state of model v ariables to generate the benchmark data are taken from Dondelinger et al. (2013). Instead of adding noise to the data, we assume that the true values of the ODE model parameters are unknown and must be inferred from the population v alues. T o discover the correct model parameters, evidence (which 1 http://profet.it.nuigalway .ie/ 2 in these cases are the true values taken from the benchmark solutions) is incorporated. Complete details of methodology to con vert ODE models to DBNs incorporating first order Euler solver can be found in Enright (2012). Below we describe each model and e xplain the results. 1.1 The PIF4/5 Model As described in Dondelinger et al. (2013), this is a model of gene regulation of genes PIF4 and PIF5 by T OC1 in the ciracadian clock gene regulatory network of Arabidopsis thaliana . The com- plete network is described by a Locke 2-loop model (Locke et al., 2005). Follo wing Dondelinger et al. (2013), the model was simplified, such that, the genes PIF4 and PIF5 are combined as PIF4/5 . The DBN created from the P I F 4 / 5 ODE model is sho wn in Figure 1a. Model parameter s is the promoter strength, K d is the rate constant, h represents t he Hill coef ficient of the re gulation by T O C 1 and d is the degradation rate of P I F 4 / 5 . P I F 4 / 5 and T O C 1 represent the concentra- tion of the genes. The model parameters are modeled as continuous nodes with linear Gaussian distribution. The initial state model distribution of the DBN can be vie wed as the distrib ution of the population. W e assume that we do not know the real v alues of the model parameters but we only have a rough guess of the population parameters. The true values of the model parameters must be inferred from the data. Model parameters are allo wed to vary in each time step. Evidence for the observed data is sampled from the benchmark data at dif ferent time points. The e vidence is deliberately sampled at sparse and irregular interv als. Howe ver , it does not contain any noise. W e run standard fixed time step Particle Filtering algorithm (Gordon et al., 1993) to infer the v alues of model parameters and model variables. Figure 2a sho ws a comparison of benchmark solution and the predicted solution. It can be seen that ev en though incorrect values of model parameters were chosen at the outset, the accuracy of the predicted values of P I F 4 / 5 concentra- tion begins to improv e as evidence is incorporated into the system. The Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) of the predicted data is 0.070 and 0.035 respecti vely . 1.2 The Lotka-V olterra Model Predator Prey Model The Lotka-V olterra model, also known as the predator-pre y model is described by two first order , non linear ODEs. These were first proposed by Edward J. Lotka in 1910 and since then hav e been frequently used to describe the dynamics of biological systems in which two species interact, one as the predator and the other as the prey . The DBN structure corresponding to the ODE model is shown in Figure 1b. The model variable node X represent the number of prey and Y represents the number of some predator . Model parameters a, b, c and d are the positive real numbers that describe the interaction of the two species. This ODE model exhibits periodicity and the interactions between model v ariables are non-linear . (a) (b) (c) Figure 1: (a) DBN structure of the (a) The PIF4/5 Model (b) The Lotka-V olterra model (c) The Signal Ttransduction Cascade Model 3 W e follo w the similar steps described in the previous section and run the DBN inference in time interv al [0,2]. Sparse evidence is sampled from the benchmark data of the model v ariable X . Graph in Figure 2b shows predicted v alues of prey population ov er time. As before, can see that e ven though incorrect values of model parameter were chosen at the start, the predicted results are very close to those of benchmark solution. RMSE and MAE of predicted v alues of X is 0.287 and 0.137 respecti vely . 1.3 The Signal T ransduction Cascade Model The model of signal transduction cascade was described in Vyshemirsky and Girolami (2008). The DBN constructed by PR OFET for this ODE model is shown in Figure 1c. The input signal is represented by the concentration of protein S which can bind to protein R to form a complex R S which acti v ates protein R into its phosphorylated form Rpp . Protein Rpp can then be de-acti v ated back to protein R . The model also defines the degradation of input signal, that is, the con version of protein S into its degraded form dS . This system represents a realistic formulation of signal transduction as a mathematical model using mass action and Michaelis-Menten kinetics. W e run the DBN inference in time interval [0,100]. F ollo wing Dondelinger et al. (2013), we sample the e vidence of Rpp concentration at more time points during the earlier part of the time series where the dynamics tend to be faster . In Figure 2c, we plot the value of Rpp concentration predicted along with the benchmark solution. W e can see that the DBN was able to predict the v alues of model variable Rpp with high accuracy , ev en though incorrect model parameters were chosen in the outset. RMSE and MAE of the predicted data is 0.0085 and 0.0053 respectiv ely . (a) (b) (c) Figure 2: Predicted values of model variables in the ODE system. Dashed lines represent the benchmark solution. Solid lines are the predicted trajectories. Error bars sho w one standard deviation. Gray dots are the points where evidence is receiv ed. (a) The PIF4/5 model (b) Lotka-V olterra Model (c) The Signal T ransduction Cascade model Conclusions This paper has discussed the challenges of applying ODE modeling approaches to real world bi- ological systems. ODE models are deterministic and they can not account for the uncertainty in the real world. W e proposed to tackle this problem by con verting the ODE models to DBNs that incorporate a first order Euler solver as proposed by Enright et al. (2013). W e tested the methodol- ogy of v arious biological ODE models. W e simulated a real-life situation by assuming that the true v alues of the ODE model parameters are not known and data is sparse, incomplete and collected at irregular interv als. Our DBN framew ork uses Particle Filtering inference algorithm to infer the 4 v alues of model v ariables of the ODE model by re-estimating the values of model parameters at each time step. W e hav e sho wn that our DBN modeling and inference works well for this task. Refer ences Susanne Ditlevsen and Adeline Samson. Introduction to stochastic models in biology . In Stochastic biomath- ematical models , pages 3–35. Springer , 2013. Frank Dondelinger , Dirk Husmeier , Simon Rogers, and Maurizio Filippone. ODE parameter inference using adapti ve gradient matching with Gaussian processes. In AIST A TS , 2013. Catherine Enright. A pr obabilistic frame work based on mathematical models with application to medical data str eams . PhD thesis, 2012. Catherine Enright, Michael G Madden, Norm Aleks Stuart Russell, Geof frey Manley , John Laffe y , Brian Harte, Anne Mulvey , and Niall Madden. Modelling glycaemia in ICU patients a dynamic Bayesian net- work approach. In Pr oceedings of BIOSIGN ALS-2010, 3r d International J oint Confer ence on Biomedical Engineering Systems and T echnologies , V alencia, 2010. Catherine G Enright and Michael G Madden. Modelling and monitoring the indi vidual patient in real time. In P .F . Lucas A. Hommersom, editor , F oundations of Biomedical Knowledge Repr esentation , pages 107– 136. Springer , 2015. Catherine G Enright, Michael G Madden, and Niall Madden. Bayesian networks for mathematical mod- els: techniques for automatic construction and efficient inference. International Journal of Appr oximate Reasoning , 54(2), 2013. Neil J Gordon, David J Salmond, and Adrian FM Smith. Nov el approach to nonlinear/non-Gaussian Bayesian state estimation. In IEE Pr oceedings F-Radar and Signal Pr ocessing , v olume 140. IET , 1993. T omasz Lipniacki, Pa wel Paszek, Allan R Brasier , Bruce A Luxon, and Marek Kimmel. Stochastic regula- tion in early immune response. Biophysical journal , 90:725–742, 2006a. T omasz Lipniacki, Pawel Paszek, Anna Marciniak-Czochra, Allan R Brasier, and Marek Kimmel. T ran- scriptional stochasticity in gene expression. Journal of theoretical biology , 238(2):348–367, 2006b. James CW Locke, Me gan M Southern, L ´ aszl ´ o K ozma-Bogn ´ ar , V ictoria Hibberd, Paul E Bro wn, Matthe w S T urner , and Andrew J Millar . Extension of a genetic network model by iterativ e experimentation and mathematical analysis. Molecular systems biology , 1(1), 2005. Alfred J Lotka. Contribution to the theory of periodic reactions. The Journal of Physical Chemistry , 14(3): 271–274, 1910. Thomas Malthus. An essay on the principle of population. 1798. Johnny T Ottesen, Mette S Olufsen, and Jesper K Larsen. Applied mathematical models in human physiol- ogy . Siam, 2004. Leon Shargel, Susanna W u-Pong, and Andrew BC Y u. Applied biopharmaceutics & pharmacokinetics . McGraw-Hill, 2007. 5 PF V erhulst. Mathematical researches into the law of population growth increase. Nouveaux M ´ emoir es de l’Acad ´ emie Royale des Sciences et Belles-Lettres de Bruxelles , 18:1–42, 1845. Vladislav Vyshemirsky and Mark A Girolami. Bayesian ranking of biochemical system models. Bioinfor- matics , 24(6), 2008. 6

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment