📝 Original Info
- Title: Automated Protein Structure Classification: A Survey
- ArXiv ID: 0907.1990
- Date: 2009-07-14
- Authors: Researchers from original ArXiv paper
📝 Abstract
Classification of proteins based on their structure provides a valuable resource for studying protein structure, function and evolutionary relationships. With the rapidly increasing number of known protein structures, manual and semi-automatic classification is becoming ever more difficult and prohibitively slow. Therefore, there is a growing need for automated, accurate and efficient classification methods to generate classification databases or increase the speed and accuracy of semi-automatic techniques. Recognizing this need, several automated classification methods have been developed. In this survey, we overview recent developments in this area. We classify different methods based on their characteristics and compare their methodology, accuracy and efficiency. We then present a few open problems and explain future directions.
💡 Deep Analysis
Deep Dive into Automated Protein Structure Classification: A Survey.
Classification of proteins based on their structure provides a valuable resource for studying protein structure, function and evolutionary relationships. With the rapidly increasing number of known protein structures, manual and semi-automatic classification is becoming ever more difficult and prohibitively slow. Therefore, there is a growing need for automated, accurate and efficient classification methods to generate classification databases or increase the speed and accuracy of semi-automatic techniques. Recognizing this need, several automated classification methods have been developed. In this survey, we overview recent developments in this area. We classify different methods based on their characteristics and compare their methodology, accuracy and efficiency. We then present a few open problems and explain future directions.
📄 Full Content
arXiv:0907.1990v1 [cs.CE] 13 Jul 2009
Automated Protein Structure Classification:
A Survey
Oktie Hassanzadeh
oktie@cs.toronto.edu
January 2008
Abstract
Classification of proteins based on their structure provides a valu-
able resource for studying protein structure, function and evolutionary
relationships. With the rapidly increasing number of known protein
structures, manual and semi-automatic classification is becoming ever
more difficult and prohibitively slow. Therefore, there is a growing
need for automated, accurate and efficient classification methods to
generate classification databases or increase the speed and accuracy of
semi-automatic techniques. Recognizing this need, several automated
classification methods have been developed. In this survey, we overview
recent developments in this area. We classify different methods based
on their characteristics and compare their methodology, accuracy and
efficiency. We then present a few open problems and explain future
directions.
Introduction
Classification of protein structures is an interesting and challenging problem
in the field of computational biology that plays an important role in sev-
eral tasks for studying protein function. These tasks include protein struc-
ture and function prediction, studying structural and evolutionary relation-
ships between proteins, and identification of potential functional residues
and binding sites.
Several classification databases exist [1, 2, 3, 4, 5, 6]
from which SCOP1 [3] and CATH2 [2] are the most widely used and ac-
tive databases. These databases are updated intermittently using manual
1Structural Classification Of Proteins
2CATH is an acronym of the four main levels in its classification: Class, Architecture,
Topology and Homologous superfamily
1
and semi-automatic methods. For example, SCOP has been updated every
seven months on average during the last seven years, while CATH has been
updated annually.
On the other hand, the number of newly determined protein structures
is constantly increasing. The Protein Data Bank (PDB) [7] currently con-
tains 46,051 structures (as of October 2007), of which 6,358 structures were
released during the first three quarters of 2007. This is roughly double the
number of structures in the year 2004. This rapid increase in the number
of structures calls for more efficient, accurate and automated classification
methods. Automated methods may not be able to completely replace the
manual and semi-automatic databases that incorporate the judgement of an
experienced biologist. However, with the rapid increase in the number of
known structures, they can (and currently do) play an important role as a
preprocessing step for high-quality manual classification of proteins.
Consequently, several automated classification methods have recently
been developed.
These methods differ in several aspects including their
structure comparison criteria, the type of their input and the output of the
classification. In most of the automated classification methods, the goal is
to automatically assign a protein structure or domain to an existing class of
a manual classification scheme, mainly SCOP and CATH. Some are specif-
ically designed to predict only SCOP or only CATH classes while others
provide a more flexible classification framework that is capable of assigning
SCOP, CATH or other existing method’s classes to input structures. Al-
though the main objective of automated methods is to accurately classify
the input structures, recent methods consider efficiency as an important
criterion in their evaluation as a result of the rapid increase in the number
of known structures. Another desirable feature of a classification method is
the ability to detect new classes when structures cannot be classified into
existing classes.
In this paper, we present a survey of major existing protein structure
classification methods.
These methods are listed in Table 1 along with
their structure comparison criteria. The methods that are purely based on
sequence comparison are efficient, but they often fail to identify remote ho-
mologs of structurally similar proteins. Methods based on only structure
comparison are effective in classifying at fold level, but not necessarily at
family and superfamily levels. Methods that combine sequence and structure
information for classification are generally more accurate but computation-
ally more expensive.
Table 2 shows the type of the input and the output of the classification
methods discussed in this paper. Some methods perform classification on a
2
Table 1: Protein structure classification methods discussed in this paper
Method
Based on
SUPERFAMILY
(Gough et al., 2001 [8] and 2007 [9])
Sequence
F2CS (CO)
(Getz et al., 2002 [10] and 2004 [11])
Structure
SGM
(Rogen and Fain, 2003 [12])
Structure
SCOPmap
(Cheek et al., 2004 [13])
Structure/Sequence
DTree
(C¸amoglu et al., 2005 [14])
Structure/Sequence
ProtClass
(Aung and Tan, 2005 [15])
Structure
proCC
(Kim and Patel, 2006 [16])
Secondary Structure
fastSCOP
(Zemla
…(Full text truncated)…
📸 Image Gallery
Reference
This content is AI-processed based on ArXiv data.