Automated Protein Structure Classification: A Survey

Reading time: 5 minute
...

📝 Original Info

  • Title: Automated Protein Structure Classification: A Survey
  • ArXiv ID: 0907.1990
  • Date: 2009-07-14
  • Authors: Researchers from original ArXiv paper

📝 Abstract

Classification of proteins based on their structure provides a valuable resource for studying protein structure, function and evolutionary relationships. With the rapidly increasing number of known protein structures, manual and semi-automatic classification is becoming ever more difficult and prohibitively slow. Therefore, there is a growing need for automated, accurate and efficient classification methods to generate classification databases or increase the speed and accuracy of semi-automatic techniques. Recognizing this need, several automated classification methods have been developed. In this survey, we overview recent developments in this area. We classify different methods based on their characteristics and compare their methodology, accuracy and efficiency. We then present a few open problems and explain future directions.

💡 Deep Analysis

Deep Dive into Automated Protein Structure Classification: A Survey.

Classification of proteins based on their structure provides a valuable resource for studying protein structure, function and evolutionary relationships. With the rapidly increasing number of known protein structures, manual and semi-automatic classification is becoming ever more difficult and prohibitively slow. Therefore, there is a growing need for automated, accurate and efficient classification methods to generate classification databases or increase the speed and accuracy of semi-automatic techniques. Recognizing this need, several automated classification methods have been developed. In this survey, we overview recent developments in this area. We classify different methods based on their characteristics and compare their methodology, accuracy and efficiency. We then present a few open problems and explain future directions.

📄 Full Content

arXiv:0907.1990v1 [cs.CE] 13 Jul 2009 Automated Protein Structure Classification: A Survey Oktie Hassanzadeh oktie@cs.toronto.edu January 2008 Abstract Classification of proteins based on their structure provides a valu- able resource for studying protein structure, function and evolutionary relationships. With the rapidly increasing number of known protein structures, manual and semi-automatic classification is becoming ever more difficult and prohibitively slow. Therefore, there is a growing need for automated, accurate and efficient classification methods to generate classification databases or increase the speed and accuracy of semi-automatic techniques. Recognizing this need, several automated classification methods have been developed. In this survey, we overview recent developments in this area. We classify different methods based on their characteristics and compare their methodology, accuracy and efficiency. We then present a few open problems and explain future directions. Introduction Classification of protein structures is an interesting and challenging problem in the field of computational biology that plays an important role in sev- eral tasks for studying protein function. These tasks include protein struc- ture and function prediction, studying structural and evolutionary relation- ships between proteins, and identification of potential functional residues and binding sites. Several classification databases exist [1, 2, 3, 4, 5, 6] from which SCOP1 [3] and CATH2 [2] are the most widely used and ac- tive databases. These databases are updated intermittently using manual 1Structural Classification Of Proteins 2CATH is an acronym of the four main levels in its classification: Class, Architecture, Topology and Homologous superfamily 1 and semi-automatic methods. For example, SCOP has been updated every seven months on average during the last seven years, while CATH has been updated annually. On the other hand, the number of newly determined protein structures is constantly increasing. The Protein Data Bank (PDB) [7] currently con- tains 46,051 structures (as of October 2007), of which 6,358 structures were released during the first three quarters of 2007. This is roughly double the number of structures in the year 2004. This rapid increase in the number of structures calls for more efficient, accurate and automated classification methods. Automated methods may not be able to completely replace the manual and semi-automatic databases that incorporate the judgement of an experienced biologist. However, with the rapid increase in the number of known structures, they can (and currently do) play an important role as a preprocessing step for high-quality manual classification of proteins. Consequently, several automated classification methods have recently been developed. These methods differ in several aspects including their structure comparison criteria, the type of their input and the output of the classification. In most of the automated classification methods, the goal is to automatically assign a protein structure or domain to an existing class of a manual classification scheme, mainly SCOP and CATH. Some are specif- ically designed to predict only SCOP or only CATH classes while others provide a more flexible classification framework that is capable of assigning SCOP, CATH or other existing method’s classes to input structures. Al- though the main objective of automated methods is to accurately classify the input structures, recent methods consider efficiency as an important criterion in their evaluation as a result of the rapid increase in the number of known structures. Another desirable feature of a classification method is the ability to detect new classes when structures cannot be classified into existing classes. In this paper, we present a survey of major existing protein structure classification methods. These methods are listed in Table 1 along with their structure comparison criteria. The methods that are purely based on sequence comparison are efficient, but they often fail to identify remote ho- mologs of structurally similar proteins. Methods based on only structure comparison are effective in classifying at fold level, but not necessarily at family and superfamily levels. Methods that combine sequence and structure information for classification are generally more accurate but computation- ally more expensive. Table 2 shows the type of the input and the output of the classification methods discussed in this paper. Some methods perform classification on a 2 Table 1: Protein structure classification methods discussed in this paper Method Based on SUPERFAMILY (Gough et al., 2001 [8] and 2007 [9]) Sequence F2CS (CO) (Getz et al., 2002 [10] and 2004 [11]) Structure SGM (Rogen and Fain, 2003 [12]) Structure SCOPmap (Cheek et al., 2004 [13]) Structure/Sequence DTree (C¸amoglu et al., 2005 [14]) Structure/Sequence ProtClass (Aung and Tan, 2005 [15]) Structure proCC (Kim and Patel, 2006 [16]) Secondary Structure fastSCOP (Zemla

…(Full text truncated)…

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut