CLeFAPS: Fast Flexible Alignment of Protein Structures Based on Conformational Letters

Reading time: 6 minute
...

📝 Original Info

  • Title: CLeFAPS: Fast Flexible Alignment of Protein Structures Based on Conformational Letters
  • ArXiv ID: 0903.0582
  • Date: 2009-03-04
  • Authors: Researchers from original ArXiv paper

📝 Abstract

CLeFAPS, a fast and flexible pairwise structural alignment algorithm based on a rigid-body framework, namely CLePAPS, is proposed. Instead of allowing twists (or bends), the flexible in CLeFAPS means: (a) flexibilization of the algorithm's parameters through self-adapting with the input structures' size, (b) flexibilization of adding the aligned fragment pairs (AFPs) into an one-to-multi correspondence set instead of checking their position conflict, (c) flexible fragment may be found through an elongation procedure rooted in a vector-based score instead of a distance-based score. We perform a comparison between CLeFAPS and other popular algorithms including rigid-body and flexible on a closely-related protein benchmark (HOMSTRAD) and a distantly-related protein benchmark (SABmark) while the latter is also for the discrimination test, the result shows that CLeFAPS is competitive with or even outperforms other algorithms while the running time is only 1/150 to 1/50 of them.

💡 Deep Analysis

Deep Dive into CLeFAPS: Fast Flexible Alignment of Protein Structures Based on Conformational Letters.

CLeFAPS, a fast and flexible pairwise structural alignment algorithm based on a rigid-body framework, namely CLePAPS, is proposed. Instead of allowing twists (or bends), the flexible in CLeFAPS means: (a) flexibilization of the algorithm’s parameters through self-adapting with the input structures’ size, (b) flexibilization of adding the aligned fragment pairs (AFPs) into an one-to-multi correspondence set instead of checking their position conflict, (c) flexible fragment may be found through an elongation procedure rooted in a vector-based score instead of a distance-based score. We perform a comparison between CLeFAPS and other popular algorithms including rigid-body and flexible on a closely-related protein benchmark (HOMSTRAD) and a distantly-related protein benchmark (SABmark) while the latter is also for the discrimination test, the result shows that CLeFAPS is competitive with or even outperforms other algorithms while the running time is only 1/150 to 1/50 of them.

📄 Full Content

The comparison of protein structures has been an extremely important problem in computational biology for a long time [1], and has been employed in almost all branches of contemporary structural biology [2], where two categories of application can be achieved from the result of pairwise alignment of protein structures [19].

The first category is derived from an exact alignment of residue-residue correspondences in order to identify the homologous core, which may be called alignment problem. It can be applied to make the functional prediction [3], to construct benchmark datasets on which sequence alignment algorithms can be tested [4], to discover sequence-structure-motif that enables protein structure prediction [5]. Finding the optimal correspondences that are structurally similar between the two input proteins has been proved to be NP-hard [12]. However, a practical solution can be obtained by first finding the local similar fragment pairs (SFPs) between two proteins with a certain similarity metric and then piling up those SFPs with a certain consistency metric [13,14]. For example, CLePAPS [15] searches for SFPs with conformational letters [22,23] and afterwards applies a ProSup-like [16] procedure. These algorithms treat protein structures as rigid-bodies, while the followings treat them as flexible [17,19]. Proteins are flexible molecules that undergo significant structural changes as part of their normal function [24]. However, for those current algorithms which introduce flexibility, the principal method is allowing twists (bents), regardless of whether these bents are meaningful or meaningless [19]. Moreover, it has been demonstrated that for a certain case (drawing ROC curve), the rigid version of FATCAT outperforms the flexible one [26]. Finally, it has been shown that the runtime of these algorithms is some bit slow [18,19].

The second category is derived from a scoring function for the assessment of the pairwise protein structures’ similarity based on an exact or fuzzy alignment, which may be called assessment problem. It can be applied to give a Yes/No answer to distinguish between ‘alignable’ and ’non-alignable’ proteins [20], to classify the known protein structures into hierarchical system [7,8,9], to search the query protein structure against a target database [10]. The classical geometric way is the length of alignment (LALI) plus the root mean squared deviation (RMSD). Clearly, this is a bi-criteria optimization problem where the goal is to minimize the RMSD while maximizing the number of residues [27]. However, since the RMSD weights the distances between all residue pairs equally, a small number of local structural deviations could result in a high RMSD, even when the global topologies of the compared structures are similar. More assessment functions have been suggested [32,33,34] while these functions have only solved the first problem by providing a single assessment score while the other problem is the dependence of the score magnitudes on the evaluated proteins’ size [29].

Just as the user of a sequence alignment program can control the ‘gappiness’ by adjusting gap penalties, changing parameters can make the structural alignment method handle different purposes, [16] gave a suggestion for parameter settings to deal with distantly-related proteins, other algorithms optimize a best parameter set on a training group for general purposes [14,28]. However, if the alignment task (for example, the database search) contains different types of proteins, such as closelyrelated, distantly-related, small size and large size, it will incur inaccuracy or ineffectiveness when assigning fixed parameters.

We proposed a new approach called CLeFAPS that introduces flexibility based on a rigid-body framework, namely CLePAPS. The ‘F’ in CLeFAPS means, (a: Selfadaptive strategy) flexiblization of the algorithm’s main parameters through the incorporation of d 0 factor from TM-score [29] to associate four main parameters with the size of the input proteins; moreover, combined with seed-explosion strategy (similar as BLAST [35]) for SFP generating, we ‘self-adapted’ all six main parameters instead of fixing them to handle different types of proteins; (b: Fuzzy-add strategy) flexiblization in the pile-up of the alignment through enlargement of one-toone correspondence set to one-to-multi which collects all AFPs while neglecting position conflict (shown in Fig. 1); then applying dynamic programming which uses TMscore as the objective function to get an optimal alignment path. (The similar procedure is applied in TMalign through constructing the TM-score rotation matrix [28]. However, such matrix is O(n 2 ) space complexity and the following dynamic programming is again O(n 2 ) time complexity, while CLeFAPS is both O(n) space and time complexity); (c: Vect-Elong strategy) flexible fragment may be found through the elongation procedure based on the Vect-score (see Eq. ( 9)) to collect local flexible fragm

…(Full text truncated)…

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut