Comprehensive knowledge of protein-ligand interactions should provide a useful basis for annotating protein functions, studying protein evolution, engineering enzymatic activity, and designing drugs. To investigate the diversity and universality of ligand binding sites in protein structures, we conducted the all-against-all atomic-level structural comparison of over 180,000 ligand binding sites found in all the known structures in the Protein Data Bank by using a recently developed database search and alignment algorithm. By applying a hybrid top-down-bottom-up clustering analysis to the comparison results, we determined approximately 3000 well-defined structural motifs of ligand binding sites. Apart from a handful of exceptions, most structural motifs were found to be confined within single families or superfamilies, and to be associated with particular ligands. Furthermore, we analyzed the components of the similarity network and enumerated more than 4000 pairs of ligand binding sites that were shared across different protein folds.
Deep Dive into Comprehensive structural classification of ligand binding motifs in proteins.
Comprehensive knowledge of protein-ligand interactions should provide a useful basis for annotating protein functions, studying protein evolution, engineering enzymatic activity, and designing drugs. To investigate the diversity and universality of ligand binding sites in protein structures, we conducted the all-against-all atomic-level structural comparison of over 180,000 ligand binding sites found in all the known structures in the Protein Data Bank by using a recently developed database search and alignment algorithm. By applying a hybrid top-down-bottom-up clustering analysis to the comparison results, we determined approximately 3000 well-defined structural motifs of ligand binding sites. Apart from a handful of exceptions, most structural motifs were found to be confined within single families or superfamilies, and to be associated with particular ligands. Furthermore, we analyzed the components of the similarity network and enumerated more than 4000 pairs of ligand binding site
Introduction Most proteins function by interacting with other molecules. Therefore, the knowledge of interactions between proteins and their ligands is central to our understanding of protein functions. However, simply enumerating the interactions of individual proteins with individual ligands, which is now indeed possible owing to the massive production of experimentally determined protein structures, would only serve to increase the amount of data, not necessarily our knowledge or understanding, of protein functions. What is needed is a classification of general patterns of interactions. Otherwise, it would be difficult to apply the wealth of information to elucidate the evolutionary history of protein functions (Andreeva & Murzin, 2006;Goldstein, 2008), to engineer enzymatic activity (Gutteridge & Thornton, 2005), or to develop new drugs (Rognan, 2007).
In order to classify protein-ligand interactions and to extract general patterns from the classification, it is a prerequisite to compare the ligand binding sites of different proteins. There are already a number of methods to compare the atomic structures or other structural features of functional sites of proteins (see reviews, Jones & Thornton, 2004;Lee et al., 2007).
Applications of these methods lead to the discoveries of ligand binding site structures shared by many proteins of different folds (Kobayashi & Go, 1997;Kinoshita et al., 1999;Stark & Russell, 2003;Brakoulias & Jackson, 2004;Shulman-Peleg et al., 2004;Gold & Jackson, 2006). Gold & Jackson (2006) conducted an all-against-all comparison of 33,168 binding sites, the results of which have been compiled into the SitesBase database. They have described several unexpected similarities across different protein folds and applied their method to the annotation of unclassified proteins. More recently, Minai et al. (2008) compared all pairs of 48,347 potential ligand binding sites in 9708 representative protein chains, and demonstrated the applicability of ligand binding site comparison to drug discovery.
To date, however, no method has been applied to the exhaustive all-against-all comparison of all ligand binding sites found in the Protein Data Bank (PDB) (Berman et al., 2007), presumably because these methods were not efficient enough to handle the huge amount of data in the current PDB, or because it was assumed that the redundancy (in terms of sequence homology) or some “trivial” ligands (such as sulfate ions) in the PDB did not present any interesting findings. As of June, 2008, the PDB contains over 51,000 entries with more than 180,000 ligand binding sites excluding water molecules, and hence naively comparing all the pairs of this many binding sites (> 3 × 10 10 pairs) is indeed a formidable task. Nevertheless, multiple structures of many proteins that have been solved with a variety of ligands (e.g., inhibitors for enzymes) could provide a great opportunity for analyzing the diversity of binding modes, and some apparently trivial ligands are often used by crystallographers to infer the functional sites from the “apo” structure. In other words, the diversity of these apparently redundant data is too precious a source of information to be ignored.
To handle this huge amount data, we have recently developed the GIRAF (Geometric Indexing with Refined Alignment Finder) method (Kinjo & Nakamura, 2007). By combining ideas from geometric hashing (Wolfson & Rigoutsos, 1997) and relational database searching (Garcia-Molina et al., 2002), this method can efficiently find structurally and chemically similar local protein structures in a database and produce alignments at atomic resolution independent of sequence homology, sequence order, or protein fold. In this method, we first compile a database of ligand binding sites into an ordinary relational database management system, and create an index based on the geometric features with surrounding atomic environments. Owing to the index, potentially similar ligand binding sites can be efficiently retrieved and unlikely hits are safely ignored. For each of the potential hits found, the refined atom-atom alignment is obtained by iterative applications of bipartite graph matching and optimal superposition. In this study, we have further improved the original GIRAF method so that one-against-all comparison takes effectively one second, and applied it to the first all-against-all comparison of all ligand binding sites in the PDB.
In order to extract recurring patterns in ligand binding sites, we then classified the ligand binding sites based on the results of the all-against-all comparison, and defined structural motifs. So far, such structural motifs have been determined either manually (Porter et al., 2004) or automatically (Wangikar et al., 2003;Polacco & Babbitt, 2006). Given the huge amount of data, manual curation of all potential motifs is not feasible, and previously developed automatic methods are computationally too intensive (Wangikar et al., 20
…(Full text truncated)…
This content is AI-processed based on ArXiv data.