Inferring dynamic genetic networks with low order independencies

Reading time: 6 minute
...

📝 Original Info

  • Title: Inferring dynamic genetic networks with low order independencies
  • ArXiv ID: 0704.2551
  • Date: 2009-05-29
  • Authors: Researchers from original ArXiv paper

📝 Abstract

In this paper, we propose a novel inference method for dynamic genetic networks which makes it possible to face with a number of time measurements n much smaller than the number of genes p. The approach is based on the concept of low order conditional dependence graph that we extend here in the case of Dynamic Bayesian Networks. Most of our results are based on the theory of graphical models associated with the Directed Acyclic Graphs (DAGs). In this way, we define a minimal DAG G which describes exactly the full order conditional dependencies given the past of the process. Then, to face with the large p and small n estimation case, we propose to approximate DAG G by considering low order conditional independencies. We introduce partial qth order conditional dependence DAGs G(q) and analyze their probabilistic properties. In general, DAGs G(q) differ from DAG G but still reflect relevant dependence facts for sparse networks such as genetic networks. By using this approximation, we set out a non-bayesian inference method and demonstrate the effectiveness of this approach on both simulated and real data analysis. The inference procedure is implemented in the R package 'G1DBN' freely available from the CRAN archive.

💡 Deep Analysis

Deep Dive into Inferring dynamic genetic networks with low order independencies.

In this paper, we propose a novel inference method for dynamic genetic networks which makes it possible to face with a number of time measurements n much smaller than the number of genes p. The approach is based on the concept of low order conditional dependence graph that we extend here in the case of Dynamic Bayesian Networks. Most of our results are based on the theory of graphical models associated with the Directed Acyclic Graphs (DAGs). In this way, we define a minimal DAG G which describes exactly the full order conditional dependencies given the past of the process. Then, to face with the large p and small n estimation case, we propose to approximate DAG G by considering low order conditional independencies. We introduce partial qth order conditional dependence DAGs G(q) and analyze their probabilistic properties. In general, DAGs G(q) differ from DAG G but still reflect relevant dependence facts for sparse networks such as genetic networks. By using this approximation, we set

📄 Full Content

The development of microarray technology allows to simultaneously measure the expression levels of many genes at a precise time point. Thus it has become possible to observe gene expression levels across a whole process such as the cell cycle or response to radiation or different treatments. The objective is now to recover gene regulation phenomena from this data. We are looking for simple relationships such as "gene i activates gene j". But we also want to capture more complex scenarios such as auto-regulations, feed-forward loops, multi-component loops... as described by Lee et al. [21] in the case of the transcriptional regulatory network of the yeast Saccharomyces cerevisiae.

To such an aim, we both need to accurately take into account temporal dependencies and to deal with the dimension of the problem when the number p of observed genes is much higher than the number n of observation time points. Moreover we know that most of the genes whose expression has been monitored using microarrays are not taking part in the temporal evolution of the system. So we want to determine the few ‘active’ genes that are involved in the regulatory machinery, as well as the relationships between them. In short, we want to infer a network representing the dependence relationships which govern a system composed of several agents from the observation of their activity across short time series.

Static Modelling Such gene networks were first described using static modelling and mainly non oriented networks. One of the first tools used to describe interactions between genes is the relevance network [5] or correlation network [36]. Better known as the covariance graph [7] in graphical models theory, this undirected graph describes the pair-wise correlation between genes. Its topology is derived from the covariance matrix between the gene expression levels; an undirected edge is drawn between two variables whenever they are correlated. However, the correlation between two variables may be caused by linkage with other variables. This creates spurious edges due to indirect dependence relationships.

Consequently, there has been great interest in the concentration graph [20], also called the covariance selection model, which describes the conditional dependence structure between gene expression using Graphical Gaussian Models (GGMs). Let Y = (Y i ) 1≤i≤p be a multivariate Gaussian vector representing the expression levels of p genes. An undirected edge is drawn between two variables Y i and Y j whenever they are conditionally dependent given the remaining variables (See Figure 1B). The standard theory of estimation in GGMs [20,46] can be exploited only when the number of measurements n is The concentration graph corresponding to the motif A. For all i ≥ 3, Y i is a Gaussian variable representing the expression level of gene G i . Some cycles cannot be represented on the concentration graph. (C) Dynamic network equivalent to the regulation motif A. Each vertex X i t represents the expression level of gene G i at time t. This graph is acyclic and allows to define a Bayesian network. much higher than the number of variables p. This ensures that the sample covariance matrix is positive definite with probability one. However, in most microarray gene expression datasets, we have to cope with the opposite situation (n « p). Thus, the growing interest in “small n, large p” furthered the development of numerous alternatives (Schäfer and Strimmer [31,32] , Waddell and Kishino [44,43], Toh and Horimoto [40,41], Wu et al. [50], Wang et al. [45]). Even though concentration graphs allow to point out some dependence relationships between genes, they do not offer an accurate description of the interactions. Firstly, no direction is given to the interactions. Secondly, some motifs containing cycles as in Figure 1A cannot be properly represented.

Contrary to the previous undirected graphs, Bayesian networks (BNs) [13] model directed relationships. Based on a probabilistic measure, a BN representation of a model is defined by a Directed Acyclic Graph (DAG) and the set of conditional probability distributions of each variable given its parents in the DAG [28]. The theory of graphical models [46,9,20] then allows to derive conditional independencies from this DAG. However, the acyclicity constraint in static BNs is a serious restriction given the expected structure of genetic networks.

Dynamic Bayesian networks This limitation can be overcome by employing Dynamic Bayesian networks (DBNs) introduced for the analysis of gene expression time series by Friedman et al. [14] and Murphy and Mian [25]. In DBNs, a gene is no longer represented by a single vertex but by as many vertices as time points in the experiment. A dynamic network (Figure 1C) can then be obtained by unfolding in time the initial cyclic motif in Figure 1A. The direction according to time guarantees the acyclicity of this dynamic network and consequently allows to define a Bayesian n

…(Full text truncated)…

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut