Predicting disease-related genes by path-based similarity and community structure in protein-protein interaction network
📝 Abstract
Network-based computational approaches to predict unknown genes associated with certain diseases are of considerable significance for uncovering the molecular basis of human diseases. In this paper, we proposed a kind of new disease-gene-prediction methods by combining the path-based similarity with the community structure in the human protein-protein interaction network. Firstly, we introduced a set of path-based similarity indices, a novel community-based similarity index, and a new similarity combining the path-based similarity index. Then we assessed the statistical significance of the measures in distinguishing the disease genes from non-disease genes, to confirm their availability in predicting disease genes. Finally, we applied these measures to the disease-gene prediction of single disease-gene family, and analyzed the performance of these measures in disease-gene prediction, especially the effect of the community structure on the prediction performance in detail. The results indicated that genes associated with the same or similar diseases commonly reside in the same community of the protein-protein interaction network, and the community structure is greatly helpful for the disease-gene prediction.
💡 Analysis
Network-based computational approaches to predict unknown genes associated with certain diseases are of considerable significance for uncovering the molecular basis of human diseases. In this paper, we proposed a kind of new disease-gene-prediction methods by combining the path-based similarity with the community structure in the human protein-protein interaction network. Firstly, we introduced a set of path-based similarity indices, a novel community-based similarity index, and a new similarity combining the path-based similarity index. Then we assessed the statistical significance of the measures in distinguishing the disease genes from non-disease genes, to confirm their availability in predicting disease genes. Finally, we applied these measures to the disease-gene prediction of single disease-gene family, and analyzed the performance of these measures in disease-gene prediction, especially the effect of the community structure on the prediction performance in detail. The results indicated that genes associated with the same or similar diseases commonly reside in the same community of the protein-protein interaction network, and the community structure is greatly helpful for the disease-gene prediction.
📄 Content
Manuscript Thursday, July 20, 2017 1
Predicting disease-related genes by path-based similarity and
community structure in protein-protein interaction network
Ke Hu1, Jing-Bo Hu1, Ju Xiang2,, Hui-Jia Li3,4,, Yan Zhang5,*, Shi Chen5, Chen-He Yi7
1Department of Physics, Xiangtan University Xiangtan, Xiangtan 411105, Hunan China
2Neuroscience Research Center & Department of Basic Medical Sciences, Changsha Medical University, Changsha 410219, Hunan,
China
3School of Management Science and Engineering, Central University of Finance and Economics, Beijing 100080, China
4Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
5Department of Computer, Changsha Medical University, Changsha 410219, Hunan, China
6School of Public Administration, Xiangtan University, Xiangtan 411105, Hunan, China
- Corresponding authors: Ju Xiang or Hui-Jia Li or Yan Zhang.
E-mail: xiang.ju@foxmail.com (J.X.); xiangju@aliyun.com (J.X.); hjli@amss.ac.cn (H.J.L.); zhangyancsmu@foxmail.com (Y.Z.); huke1998@aliyun.com (K.H); chenshi198001@qq.com (S.C.)
Abstract: Network-based computational approaches to predict unknown genes associated with certain diseases are of considerable significance for uncovering the molecular basis of human diseases. In this paper, we proposed a kind of new disease-gene-prediction methods by combining the path-based similarity with the community structure in the human protein-protein interaction network. Firstly, we introduced a set of path-based similarity indices, a novel community-based similarity index, and a new similarity combining the path-based similarity index. Then we assessed the statistical significance of the measures in distinguishing the disease genes from non-disease genes, to confirm their availability in predicting disease genes. Finally, we applied these measures to the disease-gene prediction of single disease-gene family, and analyzed the performance of these measures in disease-gene prediction, especially the effect of the community structure on the prediction performance in detail. The results indicated that genes associated with the same or similar diseases commonly reside in the same community of the protein-protein interaction network, and the community structure is greatly helpful for the disease-gene prediction.
PACS: 89.75.–k; 89.75.Fb; 89.75.Hc Keywords: Complex networks; Community structure; Topological similarity; Protein-protein interaction networks; Disease genes
Manuscript Thursday, July 20, 2017 2
CONTENTS
- Introduction …………………………………………………………………………………………………………………………………. 2
- Datasets ……………………………………………………………………………………………………………………………………….. 3 2.1. Human PPI Datasets …………………………………………………………………………………………………………… 3 2.2. Disease-Gene Data ………………………………………………………………………………………………………………. 3
- Methods ……………………………………………………………………………………………………………………………………….. 4 3.1. Definition of the topological similarity …………………………………………………………………………………. 4 3.1.1. Path-based Similarity (PS)…………………………………………………………………………………………. 4 3.1.2. Community-based Similarity (CS) ……………………………………………………………………………… 5 3.1.3. Combined similarity based on path structure and community structure ……………………… 6 3.2. Similarity scores of genes with respect to the disease genes …………………………………………………… 7 3.3. Metric ………………………………………………………………………………………………………………………………… 7
- Experimental results …………………………………………………………………………………………………………………….. 8 4.1. Analysis of feasibility …………………………………………………………………………………………………………… 8 4.2. Performance of method……………………………………………………………………………………………………… 10 4.2.1 ROC and AUC …………………………………………………………………………………………………………. 10 4.2.2. Precision …………………………………………..
This content is AI-processed based on ArXiv data.