NRSSPrioritize: Associating Protein Complex and Disease Similarity Information to Prioritize Disease Candidate Genes

Reading time: 6 minute
...

📝 Abstract

The identification of disease-associated genes has recently gathered much attention for uncovering disease complex mechanisms that could lead to new insights into the treatment of diseases. For exploring disease-susceptible genes, not only experimental approaches such as genome-wide association studies (GWAS) have been used, but also computational methods. Since experimental approaches are both time-consuming and expensive, numerous studies have utilized computational techniques to explore disease genes. These methods use various biological data sources and known disease genes to prioritize disease candidate genes. In this paper, we propose a gene prioritization method (NRSSPrioritize), which benefits from both local and global measures of a protein-protein interaction (PPI) network and also from disease similarity knowledge to suggest candidate genes for colorectal cancer (CRC) susceptibility. Network Propagation, Random Walk with Restart, and Shortest Paths are three network analysis tools that are applied to a PPI network for the purpose of scoring candidate genes. Also, by looking through diseases with similar symptoms to CRC and obtaining their causing genes, candidate genes are scored in a different way. Finally, to integrate these four different scoring schemes, Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) and Analytic Network Process (ANP) methods are applied to obtain appropriate weights for the above four quantified measures and the weighted summation of these measures are used to calculate the final score of each candidate gene. NRSSPrioritize was validated by cross-validation analysis and its results were compared with other prioritization tools, which gave the best performance when using our proposed method.

💡 Analysis

The identification of disease-associated genes has recently gathered much attention for uncovering disease complex mechanisms that could lead to new insights into the treatment of diseases. For exploring disease-susceptible genes, not only experimental approaches such as genome-wide association studies (GWAS) have been used, but also computational methods. Since experimental approaches are both time-consuming and expensive, numerous studies have utilized computational techniques to explore disease genes. These methods use various biological data sources and known disease genes to prioritize disease candidate genes. In this paper, we propose a gene prioritization method (NRSSPrioritize), which benefits from both local and global measures of a protein-protein interaction (PPI) network and also from disease similarity knowledge to suggest candidate genes for colorectal cancer (CRC) susceptibility. Network Propagation, Random Walk with Restart, and Shortest Paths are three network analysis tools that are applied to a PPI network for the purpose of scoring candidate genes. Also, by looking through diseases with similar symptoms to CRC and obtaining their causing genes, candidate genes are scored in a different way. Finally, to integrate these four different scoring schemes, Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) and Analytic Network Process (ANP) methods are applied to obtain appropriate weights for the above four quantified measures and the weighted summation of these measures are used to calculate the final score of each candidate gene. NRSSPrioritize was validated by cross-validation analysis and its results were compared with other prioritization tools, which gave the best performance when using our proposed method.

📄 Content

1

NRSSPrioritize: Associating Protein Complex and Disease Similarity Information to Prioritize Disease Candidate Genes

Abstract: The identification of disease-associated genes has recently gathered much attention for uncovering disease complex mechanisms that could lead to new insights into the treatment of diseases. For exploring disease-susceptible genes, not only experimental approaches such as genome-wide association studies (GWAS) have been used, but also computational methods. Since experimental approaches are both time-consuming and expensive, numerous studies have utilized computational techniques to explore disease genes. These methods use various biological data sources and known disease genes to prioritize disease candidate genes. In this paper, we propose a gene prioritization method (NRSSPrioritize), which benefits from both local and global measures of a protein-protein interaction (PPI) network and also from disease similarity knowledge to suggest candidate genes for colorectal cancer (CRC) susceptibility. Network Propagation, Random Walk with Restart, and Shortest Paths are three network analysis tools that are applied to a PPI network for the purpose of scoring candidate genes. Also, by looking through diseases with similar symptoms to CRC and obtaining their causing genes, candidate genes are scored in a different way. Finally, to integrate these four different scoring schemes, Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) and Analytic Network Process (ANP) methods are applied to obtain appropriate weights for the above four quantified measures and the weighted summation of these measures are used to calculate the final score of each candidate gene. Razieh Abdollahia, Sama Goliaeib, Zahra Razaghi-Moghadama,c* and Morteza Ebrahimia a Faculty of New Sciences and Technology, University of Tehran, Tehran, Iran. b University of Tehran, Tehran, Iran. c School of Biological Sciences, Institute for Research in Foundation Sciences (IPM), Tehran, Iran. *Corresponding author: Dr. Zahra Razaghi-Moghadam, email:razzaghi@ut.ac.ir

{s.r_abdollahi, sgoliaei, razzaghi, mo.ebrahimi}@ut.ac.ir 2

NRSSPrioritize was validated by cross-validation analysis and its results were compared with other prioritization tools, which gave the best performance when using our proposed method. Keywords: Gene prioritization, Protein-protein interaction network, Symptoms, Colorectal cancer.

  1. Introduction One of the most important challenges in disease treatment is the identification of causing genes that help us to design medical protocols. Genome-Wide Association Studies (GWAS) are new techniques for the identification of chromosomal intervals containing suspected disease related genes. These studies search for the genomes of single nucleotide polymorphisms (SNPs) that are not rare. However GWAS studies are not very accurate for detecting the exact gene related to a disease because they suggest hundreds or thousands of genes. Investigating all the genes suggested by GWAS in order to find the desired gene using experimental methods is very expensive and time consuming. Computational methods prioritize these genes and help us to focus on a smaller set of genes. For this purpose, computational methods utilize disease genes that are suggested experimentally as seed genes and prognosticate candidate genes for further studies. They use various data to solve the problem and some of them compose multiple data source information.
    Related genes for the same or a homologous disease tend to have interactions with each other in the PPI network (1). So the protein-protein interaction (PPI) network has become one of the most powerful sources in these studies, and this information has recently been provided in network structures. Integrating the PPI network with other biological data may lead to the discovery of new disease-causing genes. Some studies use local features of the PPI network, such as molecular triangulation (2), shortest path (SP) (3), and direct neighbors (4) to identify candidate genes. Some other studies such as Random Walk with Restart (RWR) (5) and Network Propagation (NP) (6) relay on global network information. The results of local algorithms are more vulnerable because they do not consider indirect functional relationships and so only consider direct interactions (7). On the other hand, global algorithms rely on the overall relationship between the disease gene and the other genes of the PPI network. Although they do not consider genes with poor connections and outliers, they have displayed a better performance than local methods (8). 3

Biological resources are very prone to containing noise. However, prioritization based on the quality of raw data and the composition of various heterogeneous data sources is helpful to overcome this disadvantage. DIR (9), ToppGene (10), CANDID (11), Endeavor (12) and MetaRanker (13) are exampl

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut