Environment specific substitution tables have been used effectively for distinguishing structural and functional constraints on proteins and thereby identify their active sites (Chelliah et al. (2004)). This work explores whether a similar approach can be used to identify specificity determining residues (SDRs) responsible for cofactor dependence, substrate specificity or subtle catalytic variations. We combine structure-sequence information and functional annotation from various data sources to create structural alignments for homologous enzymes and functional partitions therein. We develop a scoring procedure to predict SDRs and assess their accuracy using information from bound specific ligands and published literature.
Deep Dive into Identification of specificity determining residues in enzymes using environment specific substitution tables.
Environment specific substitution tables have been used effectively for distinguishing structural and functional constraints on proteins and thereby identify their active sites (Chelliah et al. (2004)). This work explores whether a similar approach can be used to identify specificity determining residues (SDRs) responsible for cofactor dependence, substrate specificity or subtle catalytic variations. We combine structure-sequence information and functional annotation from various data sources to create structural alignments for homologous enzymes and functional partitions therein. We develop a scoring procedure to predict SDRs and assess their accuracy using information from bound specific ligands and published literature.
Enzymes are critical to cellular machinery. Enzymes are believed to have developed dierent specicities following gene duplication events that ease the evolutionary pressure on copies and allow exploration of novel avenues to greater organismal tness. Each copy then develops its own niche, characterized by expression and localization, catalytic mechanism, substrate specicity, cofactor dependence and catalysis products. Such paralogous enzymes should have an evolutionary imprint corresponding to their specic niche, in addition to maintenance of structural fold.
Thus evolutionary analysis of available structural and sequnce data should enable identication of key residues responsible for specicity of various kinds. Enzyme specicity can be estimated with functional assays without structure determination, but identication of SDRs (specicity determining residues) remains dicult. While ENZYME (Bairoch (2000)) -a database of enzyme sequences with detailed functional annotation -exists, there is no such database of SDRs. Time, cost and technical limitations slow down structure determination and even when structure is known, it is not trivial to identify the residues important for binding cofactors and substrates.
Hence it is important to be able to identify such residues computationally. Reliable detection of such residues will aid in deciding whether a SNP is deleterious or neutral and suggest mutation studies. Function assignment to sequence could be done at a ner level, e.g. by verifying that SDRs necessary for certain substrate are present. Computational SDR identication has received a lot of attention and several methods have been proposed. Evolutionary trace (ET) is one of the most important methods (Madabushi et al. (2002), Mihalek et al. (2004)). It builds a phylogenetic tree based on sequence comparisons, such that branch lengths are indicative of evolutionary divergence. Functional subgroups consist of sequences in subtrees determined from this tree using a divergence cuto. Residues common to a subtree are considered specicityconferring rather than the ones common to entire tree. Spatial cluster identication can be used with ET to reduce the number of false positives. Inferring phylogeny correctly remains the main cause of concern in this approach, hence attempts have been made to use existing annotation with various statistical techniques. Another important direction is to use spatial proximity of residues.
Cornerstone of our approach is that structural environment inuences residue substitution patterns, illustrated by Overington et al. (1990) and later used eectively for structure-sequence alignment and fold recognition (Shi et al. (2001)). Structural environment of a residue is described in terms of secondary structure, solvent accessibility, sidechain-sidechain and sidechainmainchain hydrogen bonding. Residue substitution tables derived from a set of high quality sequence-structure alignments represent the expected substitution rate in a structural environment. Unexpected conservation of a residue is indicative of functional restraint acting on it.
Advantage of using ESSTs is that the structurally conserved residues are masked, which is why active sites of homologous enzymes can be identied reliably with this approach. This approach has been extended in the present work by using functional annotation information.
A set of homologous enzymes is generally a union of smaller functionally specic subsets, e.g. substrate-specic subsets in serine proteinases (trypsin, chymotrypsin etc.), cofactor-specic subsets in ferrodoxin reductases (NAD and NADP specic) and so on. In multiple sequence alignment of a homologous protein family, SDRs generally appear as dierentially conserved subcolumns. But all such appearances would not be SDRs. Our hypothesis is that SDRs would be identied by combining dierential conservation with ESST-based detection of functional restraint.
In order to test our hypothesis, we need to construct a dataset of homologous enzyme families with reliable functional partitions in them. While SCOP classication can be used in a straightforward way for making families, identifying functionally specic subsets is not a trivial task. Some automated approaches to detect functional shift, e.g. Abhiman and Sonnhammer (2005), exist to infer such partitions but manual annotation remains the most reliable. Additionally, protein function is not a precise and quantiable entity. This restricted our study to enzymes which are the the most well studied and well annotated class of proteins. Enzyme function is fairly well dened and well classied according to hierarchical Enzyme Classication scheme (EC). We use the mapping between SCOP domains and EC numbers (George et al. (2004)) to make EC-specic subgroups within a SCOP domain family. We generate proles (multiple structure-sequence alignments) for SCOP families and functional partitions. Sequence homologs for structural families were found using PSIBLAST (Al
…(Full text truncated)…
This content is AI-processed based on ArXiv data.