Inference of Gene Predictor Set Using Boolean Satisfiability

Reading time: 6 minute
...

📝 Original Info

  • Title: Inference of Gene Predictor Set Using Boolean Satisfiability
  • ArXiv ID: 1005.4812
  • Date: 2010-05-26
  • Authors: Pey-Chang Kent Lin, Sunil P Khatri

📝 Abstract

The inference of gene predictors in the gene regulatory network has become an important research area in the genomics and medical disciplines. Accurate predicators are necessary for constructing the GRN model and to enable targeted biological experiments that attempt to confirm or control the regulation process. In this paper, we implement a SAT-based algorithm to determine the gene predictor set from steady state gene expression data (attractor states). Using the attractor states as input, the states are ordered into attractor cycles. For each attractor cycle ordering, all possible predictors are enumerated and a CNF expression is formulated which encodes these predictors and their biological constraints. Each CNF is explored using a SAT solver to find candidate predictor sets. Statistical analysis of the results selects the most likely predictor set of the GRN corresponding to the attractor data. We demonstrate our algorithm on attractor state data from a melanoma study, and present our predictor set results.

💡 Deep Analysis

Deep Dive into Inference of Gene Predictor Set Using Boolean Satisfiability.

The inference of gene predictors in the gene regulatory network has become an important research area in the genomics and medical disciplines. Accurate predicators are necessary for constructing the GRN model and to enable targeted biological experiments that attempt to confirm or control the regulation process. In this paper, we implement a SAT-based algorithm to determine the gene predictor set from steady state gene expression data (attractor states). Using the attractor states as input, the states are ordered into attractor cycles. For each attractor cycle ordering, all possible predictors are enumerated and a CNF expression is formulated which encodes these predictors and their biological constraints. Each CNF is explored using a SAT solver to find candidate predictor sets. Statistical analysis of the results selects the most likely predictor set of the GRN corresponding to the attractor data. We demonstrate our algorithm on attractor state data from a melanoma study, and present

📄 Full Content

With the mapping of the human genome complete, the focus in computational biology has shifted from sequence analysis to the understanding of gene regulation and its inter-relation with the biological system. The use of genome information has given rise to the notion of "personalized medicine" -targeted and specific disease prevention and treatment based on individual gene information [2], [3]. The urgent applications to cancer and gene-related diseases calls for the genomics field to significantly improve the algorithms used for accurate inference of the gene regulatory network (GRN).

In an organism, the genome is a highly complex control system wherein proteins and RNA produced by genes interact with and regulate the activity of other genes [4]. The activity of a target gene g i is regulated (or predicted) by the genes in its predictor (e.g. if g 1 becomes inactive when g 2 and g 3 are active, then g 2 and g 3 are called predictors of g). The complete set of predictors (predictor set), which contains the predictors for each gene in the GRN, describes the interaction of all genes within the gene regulatory network and is the prerequisite for inferring the GRN structure.

There are several GRN characteristics that impact the formulation of our GRN model and predictor inference algorithm. First, the gene activity level of all genes at a particular time t represents the state of the GRN at that time t. From our knowledge of biological systems, we observe that over time, cellular processes transition to stable attractor states. Some of these attractor states represent normal cellular phenomena in biology such as cell cycle and division. However, some attractor states are consistent with disease such as the metastasis of cancer. Second, the GRN is often inferred by observing microarray-based experimental data which measures the activity level of genes. The correlation of the observed gene activity (or state) can be used to help describe the gene regulation. The disadvantage of using microarray data is such that studies do not involve controlled time experimental data (time-series data). Hence the measurements are assumed to arise from the cyclic sequence of gene expressions (attractor states) in steady state (attractor cycles). The GRN is then inferred from this data, using methods traditionally based on probabilistic transition models [5], [6].

As previously mentioned, it is necessary to determine the predictor set to reconstruct the GRN. However, there may exist many possible predictors for any gene, based on the attractor cycle data. Furthermore, only certain combinations of predictors may form a valid predictor set due to biological constraints. The issue addressed in this paper is how to efficiently and deterministically select the predictors that form the predictor set. We have implemented a Boolean satisfiability (SAT) based algorithm for the inference of gene predictors. Satisfiability is a decision problem of determining whether the variables in a Boolean formula (expressed in Conjunctive Normal Form or CNF) can be assigned to make the formula evaluate to true. Although SAT is NP-complete, many SAT solvers have been developed to quickly and efficiently solve large SAT problems. Our algorithm takes advantage of advanced SAT solvers to find the predictor set.

The basic outline of our SAT-based algorithm is described briefly below. First, all possible orderings of attractor state are enumerated, yielding all possible attractor cycles. For each ordering, we enumerate all predictors that are logically valid, and create a CNF expression which encodes all these predictors and biological constraints (such as cardinality bounds on the predictors). A SAT solver is used to find the valid candidate predictor sets. After this process is done iteratively for all attractor cycle (orderings), statistical analysis provides the most likely candidates for the predictor set.

The key contributions of this paper are:

• We develop a Boolean Satisfiability based approach to realize the gene predictor set from attractor state data.

• We modify an existing SAT-solver (MiniSat [7]) for efficient all-SAT computation and further optimize MiniSat for improved predictor inference.

• On gene expression data from a melanoma study [1], we apply our SAT-based algorithm and present results for genes that regulate all the genes, including the cancer gene WNT5a.

• Our approach can be used to find the predictor set for any gene related disease, provided attractor state data is available. The predictor set information obtained from our algorithm can be used by biologists to fine tune their gene expression experiments. The remainder of this paper is organized as follows. Section II describes previous work in modeling the gene regulatory network and inference of gene predictors. Section III presents our FSM model and Boolean SAT approach. Section IV reports experimental results. Concluding comments and future work are discussed in Secti

…(Full text truncated)…

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut