Markov Logic Networks in the Analysis of Genetic Data

Reading time: 7 minute
...

📝 Abstract

Complex, non-additive genetic interactions are common and can be critical in determining phenotypes. Genome-wide association studies (GWAS) and similar statistical studies of linkage data, however, assume additive models of gene interactions in looking for genotype-phenotype associations. These statistical methods view the compound effects of multiple genes on a phenotype as a sum of partial influences of each individual gene and can often miss a substantial part of the heritable effect. Such methods do not use any biological knowledge about underlying genotype-phenotype mechanisms. Modeling approaches from the AI field that incorporate deterministic knowledge into models to perform statistical analysis can be applied to include prior knowledge in genetic analysis. We chose to use the most general such approach, Markov Logic Networks (MLNs), as a framework for combining deterministic knowledge with statistical analysis. Using simple, logistic regression-type MLNs we have been able to replicate the results of traditional statistical methods. Moreover, we show that even with simple models we are able to go beyond finding independent markers linked to a phenotype by using joint inference that avoids an independence assumption. The method is applied to genetic data on yeast sporulation, a phenotype governed by non-linear gene interactions. In addition to detecting all of the previously identified loci associated with sporulation, our method is able to identify four loci with small effects. Since their effect on sporulation is small, these four loci were not detected with methods that do not account for dependence between markers due to gene interactions. We show how gene interactions can be detected using more complex models, which can be used as a general framework for incorporating systems biology with genetics.

💡 Analysis

Complex, non-additive genetic interactions are common and can be critical in determining phenotypes. Genome-wide association studies (GWAS) and similar statistical studies of linkage data, however, assume additive models of gene interactions in looking for genotype-phenotype associations. These statistical methods view the compound effects of multiple genes on a phenotype as a sum of partial influences of each individual gene and can often miss a substantial part of the heritable effect. Such methods do not use any biological knowledge about underlying genotype-phenotype mechanisms. Modeling approaches from the AI field that incorporate deterministic knowledge into models to perform statistical analysis can be applied to include prior knowledge in genetic analysis. We chose to use the most general such approach, Markov Logic Networks (MLNs), as a framework for combining deterministic knowledge with statistical analysis. Using simple, logistic regression-type MLNs we have been able to replicate the results of traditional statistical methods. Moreover, we show that even with simple models we are able to go beyond finding independent markers linked to a phenotype by using joint inference that avoids an independence assumption. The method is applied to genetic data on yeast sporulation, a phenotype governed by non-linear gene interactions. In addition to detecting all of the previously identified loci associated with sporulation, our method is able to identify four loci with small effects. Since their effect on sporulation is small, these four loci were not detected with methods that do not account for dependence between markers due to gene interactions. We show how gene interactions can be detected using more complex models, which can be used as a general framework for incorporating systems biology with genetics.

📄 Content

arXiv:1003.0902v1 [q-bio.GN] 3 Mar 2010 1 Markov Logic Networks in the Analysis of Genetic Data Nikita A. Sakhanenko, David J. Galas∗ Institute for Systems Biology, Seattle, WA, USA ∗E-mail: dgalas@systemsbiology.org Abstract Complex, non-additive genetic interactions are common and can be critical in determining phenotypes. Genome-wide association studies (GWAS) and similar statistical studies of linkage data, however, assume additive models of gene interactions in looking for associations between genotype and phenotype. In general, these statistical methods view the compound effects of multiple genes on a phenotype as a sum of partial influences of each individual gene and can often miss a substantial part of the heritable effect. Such methods do not make use of any biological knowledge about underlying genotype-phenotype mechanisms. Modeling approaches from the Artificial Intelligence field that incorporate deterministic knowledge into models while performing statistical analysis can be applied to include prior knowledge in genetic analysis. We chose to use the most general such approach, Markov Logic Networks (MLNs), that employs first-order logic as a framework for combining deterministic knowledge with statistical analysis. Using simple, logistic regression-type MLNs we have been able to replicate the results of traditional statistical methods. Moreover, we show that even with quite simple models we are able to go beyond finding independent markers linked to a phenotype by using joint inference that avoids an independence assumption. The method is applied to genetic data on yeast sporulation, a phenotype known to be governed by non-linear interactions between genes. In addition to detecting all of the previously identified loci associated with sporulation, our method is able to identify four additional loci with small effects on sporulation. Since their effect on sporulation is small, these four loci were not detected with standard statistical methods that do not account for dependence between markers due to gene interactions. We show how gene interactions can be detected using more complex models, which in turn can be used as a general framework for incorporating systems biology with genetics. Such future work that embodies systems knowledge in probabilistic models is proposed. 2 Author Summary We have taken up the challenge of devising a framework for the analysis of genetic data that is fully functional in the usual statistical correlation analysis used in genome-wide association studies, but also capable of incorporating prior knowledge about biological systems relevant to the genetic phenotypes. We develop a general genetic analysis approach that meets this challenge. We adapt an AI method for learning models, called Markov Logic Networks, that is based on the fusion of Markov Random Fields with first order logic. Our adaption of the Markov Logic Network method for genetics allows very complex constraints and a wide variety of model classes to be imposed on probabilistic, statistical analysis. We illustrate the use of the method by analyzing a data set based on sporulation efficiency from yeast, in which we demonstrate gene interactions and identify a number of new loci involved in determining the phenotype. Introduction Genome-wide association studies (GWAS) have allowed the detection of many genetic contributions to complex phenotypes in humans (see www.genome.gov ). Studies of biological networks of different kinds, including genetic regulatory networks, protein-protein interaction networks and others, have made it clear, however, that gene interactions are abundant and are therefore of likely importance for genetic analysis [1]. Complex, non-additive interactions between genetic variations are very common and can potentially play a crucial role in determining phenotypes [2–5]. GWAS and similar statistical methods such as classical QTL studies generally assume additive models of gene interaction that attempt to capture a compound effect of multiple genes on a phenotype as a sum of partial influences of each individual gene [6,7]. These statistical methods also assume no biological knowledge about the underlying processes or phenotypes. Since biological networks are complex, and since variations are numerous, unconstrained searches for associations between genotype and phenotype require large population samples, and can succeed only in detecting a limited range of effects. Without imposing any constraints based on biological knowledge searching for gene interactions is very challenging, particularly when input data consist of different data types coming from various sources. The major question that motivated this work is “Can we constrain traditional statistical approaches by using biological knowledge to define some known networks that influence patterns in the data, and 3 can such approaches produce more complete genetic models?” For example, we might use the patterns present in the genotype data to build more

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut