The important task of determining the connectivity of gene networks, and at a more detailed level even the kind of interaction existing between genes, can nowadays be tackled by microarraylike technologies. Yet, there is still a large amount of unknowns with respect to the amount of data provided by a single microarray experiment, and therefore reliable gene network retrieval procedures must integrate all of the available biological knowledge, even if coming from different sources and of different nature. In this paper we present a reverse engineering algorithm able to reveal the underlying gene network by using time-series dataset on gene expressions considering the system response to different perturbations. The approach is able to determine the sparsity of the gene network, and to take into account possible {\it a priori} biological knowledge on it. The validity of the reverse engineering approach is highlighted through the deduction of the topology of several {\it simulated} gene networks, where we also discuss how the performance of the algorithm improves enlarging the amount of data or if any a priori knowledge is considered. We also apply the algorithm to experimental data on a nine gene network in {\it Escherichia coli
The amount and the timing of appearance of the transcriptional product of a gene is mostly determined by regulatory proteins through biochemical reactions that enhance or block polymerase binding at the promoter region (Jacob andMonod 1961, Dickson et al. 1975). Considering that many genes code for regulatory proteins that can activate or repress other genes, the emerging picture is conveniently summarized as complex network where the genes are the nodes, and a link between two genes is present if they interact. The identification of these networks is becoming one of the most relevant task of new large-scale genomic technologies such as DNA microarrays, since gene networks can provide a detailed understanding of the cell regulatory system, can help unveiling the function of previously unknown genes and developing pharmaceutical compounds.
Different approaches have been proposed to describe gene networks (see (Filkov 2005) for a review), and different procedures have been proposed (Tong et al. 2002, Lee et al. 2002, Ideker et al. 2001, Davidson et al. 2002, Arkin et al. 1997, Yeung et al. 2002) to determine the network from experimental data. This is a computationally daunting task, which we address in the present work. Here we describe the network via deterministic evolution equations (Tegner et al. 2003, Bansal et al. 2006), which encode both the strenght and the direction of interaction between two genes, and we discuss a novel reverse engineering procedure to extract the network from experimental data. This procedure, though remaining a quantitative one, realizes one of the most important goal of modern system biology, which is the integration of data of different type and of knowledge obtained by different means.
We assume that the rate of synthesis of a transcript is determined by the concentrations of every transcript in a cell and by external perturbations. The level of gene transcripts is therefore seen to form a dynamical system which in the most simple scenario is described by the following set of ordinary differential equations (de Jong et al. 2002):
where X(t) = (x 1 (t), . . . , x Ng (t)) is a vector encoding the expression level of N g genes at times t, and U a vector encoding the strength of N p external perturbations (for instance, every element u k could measure the density of a specific substance administered to the system). In this scenario the gene regulatory network is the matrix A (of dimension N g × N g ), as the element A ij measures the influence of gene j on gene i, with a positive A ij indicating activation, a negative one indicating repression, and a zero indicating no interaction.
The matrix B (of dimension N g × N p ) encodes the coupling of the gene network with the N p external perturbations, as B ik measures the influence of the k-th perturbation on the i-th gene.
A critical step in our construction is the choice of a linear differential system. Even if a such kind of model is based on particular assumptions on the complex dynamics of a gene network, it seem the only practical approach due to the lack of knowledge of real interaction mechanism between thousands of genes. Even a simple nonlinear approach would give rise to an intractable amount of free parameters. However, it must also be recognized that all other approaches or models have weakness points. For instance, boolean models (which have been very recently applied to inference of networks from time series data, as in (Martin et al. 2007), strongly discretize the data and select, via the use of an arbitrary threshold, among active and inactive gene at every time-step. Dynamical Bayesian models, instead, are more data demanding than linear models due to their probabilistic nature. Moreover, their space complexity grows like N 4 g (at least in the famous Reveal Algorithm by K.P. Murphy (Murphy 2001)), which makes this tool suitable for small networks.
The linear model of Eq. ( 1) is suitable to describe the response of a system to small external perturbations. It can be recovered by expanding to first order, and around the equilibrium condition Ẋ(t) = 0, the dependency of Ẋ on X and U , Ẋ(t) = f (X(t), U ). Stability considerations (X(t) must not diverge in time) require the eigenvalues of A to have a negative real part. Moreover it clarifies that if the perturbation U is kept constant the model is not suitable to describe periodic systems, like cell cycles for example, since in this case X(t) asymptotically approaches a constant.
Unfortunately data from a given cell type involve thousands of responsive genes N g . This means that there are many different regulatory networks activated at the same time by the perturbations, and the number of measurements (microarray hybridizations) in typical experiments is much smaller than N g . Consequently, inference methods can be successful, but only if restricted to a subset of the genes (i.e. a specific network) (Basso et al. 2005), or to the dynamics of genes subsets. The
This content is AI-processed based on open access ArXiv data.