Title: RamaDA: complete and automated conformational overview of proteins
ArXiv ID: 1111.5586
Date: 2011-11-24
Authors: The original author list is not included in the provided excerpt. Please refer to the full paper for the complete list of authors. —
📝 Abstract
The tertiary structure of protein, as well as the local secondary structure organization are fully determined by the angles of the peptidic bound. The backbone dihedral angles not only determine the global fold of the protein, but also the details of the local chain organization. Although a wealth of structural information is available in different databases and numerous structural biology softwares have been developed, rapid conformational characterization remains challenging. We present here RamaDA, a program able to give a synthetic description of the conformation of a protein. The RamaDA program is based on a model where the Ramachadran plot is decomposed into seven conformational domains. Within the framework of this model, each amino-acid of a given protein is assigned to one of these domains. From this assignment secondary structure elements can be detected with an accuracy equivalent to that of the DSSP program for helices and extended strands, and with the added capability of detecting PolyProline II secondary structures. Additionally, the determination of a z-score for each amino-acid of the protein emphasizes any irregularities in the element. It is also possible to use this analysis to detect characteristic conformational patterns. In the case of EF-hands, calcium-binding helix-loop-helix domains, it is possible to design a strict consensus for the 9 amino-acids of the loop. 523 calcium binding protein files can be found into the entire PDB with this pattern and only 2.7% false positive hits are detected. The program RamaDA gathers several tools in one and is then able to give a complete information on a protein structure, including loops and random coil regions. Through the example of EF-hands, a promising approach of structural biology is developed. RamaDA is freely available for download as well as online usage at http://ramada.u-strasbg.fr
💡 Deep Analysis
📄 Full Content
A protein is a hierarchical molecule, with a structure tha is organized in primary, secondary and tertiary structures. However, the tertiary structure, as well as the local secondary structure organization, are fully determined by the angles (ϕ, ψ) of the peptidic bound. The backbone dihedral angles not only dictate the global fold of the protein, but also the details of the local chain organization.
The Ramachandran plot, first proposed in 1965 by Ramakrishnan and Ramachandran [1] is a tailor-made tool to study the conformations adopted by amino-acids. This plot uses the dihedral angles ϕ et ψ to indicate if a specific pair is sterically allowed and/or which conformational domain is adopted [1,2]. Allowed (or favoured ) regions of this space have been associated with regular secondary structure elements such as α-helix or β-sheet, while empty disallowed regions have been highlighted.
Since Ramakrishnan and Ramachandrans initial work, several conformational domains have been identified in the allowed regions of the plot [3][4][5]. In the literature, one can find the extended region that can be split into the β-sheet and the PolyProline-II (PPII) domains [6], the α domain corresponding to right-handed helical conformations, the γ domain corresponding to a specific conformation of hydrogen bonded γ-turns [7], the ζ domain, which is exclusively composed of conformations of amino-acids preceding a proline [8], the α L domain corresponding to left-handed helical conformations and the PPII R domain, sometimes noted β P R [8], corresponding to right-handed PPII helical conformations. The existence of these conformational domains is only based on sterical hindrance and do not take into account any other parameter or external force.
The amount of structural information available in databases such as the Protein DataBank (PDB) [9] has increased much faster than the number of programs analyzing it. Actually, few programs and databases can give accurate local information on proteins in the PDB [10][11][12][13] and it remains a challenge to get this information. However, given the wealth of structural information available in the PDB, it is possible to develop a statistical model of the Ramachandran plot. From this model, we have developed a program called RamaDA (for Ramachandran Domain Analysis).
The RamaDA program takes into account all the coordinates found in the analyzed file, including the different models of the same protein created with NMR constraints, in order to assign a conformational domain to each amino-acid of a protein. This assignment leads to the detection of putative secondary structure elements and may be used to find specific conformational patterns in the entire PDB. The latter will be presented here through the example of EF-hands. These domains are composed of two helices separated by a 9-amino-acids loop known to bind calcium. They are important for signal transduction and muscle contraction [14].
is programmed in the python (www.python.org
) programming language, and employs the Biopython library [10]. The online version of RamaDA is hosted on an Apache server. An equivalent standalone version is also freely downloadable. Both take a protein structure file or a conformational pattern (see below) and and provide a graphic output of the analysis.
Lovell et al. [5] proposed a set of 500 protein structures extracted from the PDB to be representative of the statistical distribution of the (ϕ, ψ) angles in the Ramachandran plot. To this set, we added updated structures (PDB:1XFF, 1GOK, 1E70 and 1IG5), but kept one obsolete structure (PDB:1A1Y) in the list. This reference dataset contains 110 018 amino-acids and is referred to throughout this manuscript as top500. This reference set was split into four subsets : glycines (Gly), aminoacids preceding a proline (pre-Pro), prolines (Pro) and the others (dataset called General).
The seven conformational domains composing the Ramachandran plot that have been previously described in the literature (namely R-helices, L-helices, β, γ, ζ, PPII and PPII R ) were fitted by a set of 2D-Gaussian functions cyclically defined over the complete periodic [-180 • , 180 • ] × [-180 • , 180 • ] domain. Five parameters are necessary to describe each 2D-Gaussian : the position of the centre (ϕ centre , ψ centre ), the standard deviations along both axes of the 2D-Gaussian (σ ϕ , σ ψ ), and the angle made by the ψ axis of the Ramachandran plot and the major axis of the 2D-Gaussian.. These parameters were first determined manually for each domain and then fitted to the top500 distribution assuming a Poisson noise.
The statistical model of the Ramachandran plot implemented in RamaDA is composed of a set of 2D-Gaussian scaled to 1 (see Figure 1) and defined by the parameters found with top500. These parameters are gathered in Table 1.
This set of 2D-Gaussian functions is used as a description of the backbone angle statistical distribution over formation (ϕ,ψ) is given