Computer Science / Computational Engineering Quantitative Biology / q-bio.BM

Automated Protein Structure Classification: A Survey

February 23, 2026

Reading time: 5 minute

...

#Computational Engineering #Computer Science #Quantitative Biology

📝 Original Info

Title: Automated Protein Structure Classification: A Survey
ArXiv ID: 0907.1990
Date: 2009-07-14
Authors: Researchers from original ArXiv paper

📝 Abstract

Classification of proteins based on their structure provides a valuable resource for studying protein structure, function and evolutionary relationships. With the rapidly increasing number of known protein structures, manual and semi-automatic classification is becoming ever more difficult and prohibitively slow. Therefore, there is a growing need for automated, accurate and efficient classification methods to generate classification databases or increase the speed and accuracy of semi-automatic techniques. Recognizing this need, several automated classification methods have been developed. In this survey, we overview recent developments in this area. We classify different methods based on their characteristics and compare their methodology, accuracy and efficiency. We then present a few open problems and explain future directions.

💡 Deep Analysis

Deep Dive into Automated Protein Structure Classification: A Survey.

📄 Full Content

arXiv:0907.1990v1 [cs.CE] 13 Jul 2009 Automated Protein Structure Classiﬁcation: A Survey Oktie Hassanzadeh oktie@cs.toronto.edu January 2008 Abstract Classiﬁcation of proteins based on their structure provides a valu- able resource for studying protein structure, function and evolutionary relationships. With the rapidly increasing number of known protein structures, manual and semi-automatic classiﬁcation is becoming ever more diﬃcult and prohibitively slow. Therefore, there is a growing need for automated, accurate and eﬃcient classiﬁcation methods to generate classiﬁcation databases or increase the speed and accuracy of semi-automatic techniques. Recognizing this need, several automated classiﬁcation methods have been developed. In this survey, we overview recent developments in this area. We classify diﬀerent methods based on their characteristics and compare their methodology, accuracy and eﬃciency. We then present a few open problems and explain future directions. Introduction Classiﬁcation of protein structures is an interesting and challenging problem in the ﬁeld of computational biology that plays an important role in sev- eral tasks for studying protein function. These tasks include protein struc- ture and function prediction, studying structural and evolutionary relation- ships between proteins, and identiﬁcation of potential functional residues and binding sites. Several classiﬁcation databases exist [1, 2, 3, 4, 5, 6] from which SCOP1 [3] and CATH2 [2] are the most widely used and ac- tive databases. These databases are updated intermittently using manual 1Structural Classiﬁcation Of Proteins 2CATH is an acronym of the four main levels in its classiﬁcation: Class, Architecture, Topology and Homologous superfamily 1 and semi-automatic methods. For example, SCOP has been updated every seven months on average during the last seven years, while CATH has been updated annually. On the other hand, the number of newly determined protein structures is constantly increasing. The Protein Data Bank (PDB) [7] currently con- tains 46,051 structures (as of October 2007), of which 6,358 structures were released during the ﬁrst three quarters of 2007. This is roughly double the number of structures in the year 2004. This rapid increase in the number of structures calls for more eﬃcient, accurate and automated classiﬁcation methods. Automated methods may not be able to completely replace the manual and semi-automatic databases that incorporate the judgement of an experienced biologist. However, with the rapid increase in the number of known structures, they can (and currently do) play an important role as a preprocessing step for high-quality manual classiﬁcation of proteins. Consequently, several automated classiﬁcation methods have recently been developed. These methods diﬀer in several aspects including their structure comparison criteria, the type of their input and the output of the classiﬁcation. In most of the automated classiﬁcation methods, the goal is to automatically assign a protein structure or domain to an existing class of a manual classiﬁcation scheme, mainly SCOP and CATH. Some are specif- ically designed to predict only SCOP or only CATH classes while others provide a more ﬂexible classiﬁcation framework that is capable of assigning SCOP, CATH or other existing method’s classes to input structures. Al- though the main objective of automated methods is to accurately classify the input structures, recent methods consider eﬃciency as an important criterion in their evaluation as a result of the rapid increase in the number of known structures. Another desirable feature of a classiﬁcation method is the ability to detect new classes when structures cannot be classiﬁed into existing classes. In this paper, we present a survey of major existing protein structure classiﬁcation methods. These methods are listed in Table 1 along with their structure comparison criteria. The methods that are purely based on sequence comparison are eﬃcient, but they often fail to identify remote ho- mologs of structurally similar proteins. Methods based on only structure comparison are eﬀective in classifying at fold level, but not necessarily at family and superfamily levels. Methods that combine sequence and structure information for classiﬁcation are generally more accurate but computation- ally more expensive. Table 2 shows the type of the input and the output of the classiﬁcation methods discussed in this paper. Some methods perform classiﬁcation on a 2 Table 1: Protein structure classiﬁcation methods discussed in this paper Method Based on SUPERFAMILY (Gough et al., 2001 [8] and 2007 [9]) Sequence F2CS (CO) (Getz et al., 2002 [10] and 2004 [11]) Structure SGM (Rogen and Fain, 2003 [12]) Structure SCOPmap (Cheek et al., 2004 [13]) Structure/Sequence DTree (C¸amoglu et al., 2005 [14]) Structure/Sequence ProtClass (Aung and Tan, 2005 [15]) Structure proCC (Kim and Patel, 2006 [16]) Secondary Structure fastSCOP (Zemla

…(Full text truncated)…

📄 Read Full PDF on ArXiv

📸 Image Gallery

Reference

This content is AI-processed based on ArXiv data.

Automated Protein Structure Classification: A Survey

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Related Posts

Aneka: A Software Platform for .NET-based Cloud Computing

Comparison of mechanical conditions in a lower leg model with 5 or 6 tissue types while exposed to prosthetic sockets applying finite element analysis

Supernodal Analysis Revisited

Start searching

No results found