Background. All-atom crystallographic refinement of proteins is a laborious manually driven procedure, as a result of which, alternative and multiconformer interpretations are not routinely investigated. Results. We describe efficient loop sampling procedures in Rappertk and demonstrate that single loops in proteins can be automatically and accurately modelled with few positional restraints. Loops constructed with a composite CNS/Rappertk protocol consistently have better Rfree than those with CNS alone. This approach is extended to a more realistic scenario where there are often large positional uncertainties in loops along with small imperfections in the secondary structural framework. Both ensemble and collection methods are used to estimate the structural heterogeneity of loop regions. Conclusion. Apart from benchmarking Rappertk for the all-atom protein refinement task, this work also demonstrates its utility in both aspects of loop modelling - building a single conformer and estimating structural heterogeneity the loops can exhibit.
Conclusion Apart from benchmarking Rappertk for the all-atom protein renement task, this work also demonstrates its utility in both aspects of loop modelling -building a single conformer and estimating structural heterogeneity the loops can exhibit.
1 Introduction X-ray crystallography has been the most popular protein structure determination technique of both pre-and post-genomic eras. The challenges of macromolecular crystallography are manifold -after the dicult steps of expression, purication, crystallization and data collection, there remains the nal and important task of data interpretation in order to build a model which explains the observed diractions. Structural interpretation requires overcoming the phase problem and often starts with partial and incorrect phases. Typically, semi-automatic iterative renement is carried out, gradually improving the model’s quality as indicated by the R and R f ree factors as well as decrease in covalent geometry and excluded volume violations. Although excellent softwares like CCP4 (CCP4 (1994)), Phenix (Adams et al. (2002)) and cns (Brunger et al. (1998)) make this task possible, the structure renement procedure remains manually-driven hence laborious and subjective. Due to this, the heterogeneity in structural interpretation of diraction data is often ignored in favour of a single-conformer isotropic B-factor model.
Protein structure is important for its function. But very stable, rigid proteins cannot exhibit enzymatic activity. This suggests that proteins have to be stable enough to retain their fold yet dynamic enough to be functional. Both experimental and computational studies indicate that single-conformer interpretation of crystallographic data is not adequate to capture the native state dynamics which is largely conserved even in a crystal owing to its high solvent content (Petsko (1996), Jensen (1997)). Reporting a multiconformer interpretation of data will make use of the structure less misleading, especially in the analyses that depend on geometry such as shapes of binding sites, orientations of sidechains, detection of non-covalent interactions and so on. While multiple interpretations are necessary, they should be free from any bias such as that introduced when dierent crystallographers solve the same diraction data. Multiconformer interpretation will be greatly facilitated by automated methods.
Thus multiple persuasive justications emerge for automating the protein crystallographic renement task: (a) capturing the dynamics of protein in the crystalline state (b) removing subjective bias from the renement process and (c) reducing the need for precious human resource. But this goal is hard to achieve in practice. The under-determined nature of the problem (number of independent observation < number of parameters) prevents a straightforward solution by minimization. Even when sucient restraints exist, minimization methods like conjugate gradient, steepest descent etc. suer from the problem of local minima. Hence use of well-known features of proteins is unavoidable. Automatic pattern recognition in electron density is very successful in presence of high resolution data and good phases because it looks for such features (Perrakis et al. (1997)). But at medium resolution or given poor phases, this strategy can get misled.
Our recent eorts with automated crystallographic renement started with rapper, which is a conformation sampling program for proteins and uses a genetic algorithm cum branchand-bound (gabb) algorithm. DePristo et al. (2004) showed that multiple interpretations similar to the deposited structure are possible given the deposited data, and the divergence in interpretation is correlated to resolution. With rapper, it was demonstrated (DePristo et al. (2005)) that when a protein structure is approximately known, it can be rened to native-like quality, unlike mdsa in cns which may get stuck in local minima. Fundamental features of rapper responsible for avoidance of local minima traps were (a) use of ne-grained, propensity-weighted φ -ψ maps for backbone sampling (b) use of backbonedependent rotameric libraries (c) use of ideal Engh and Huber covalent geometry (d) mild use of electron density and positional restraints to guide the sampling process. Later Furnham et al. (2006) demonstrated that a low-resolution dataset can be rescued and interpreted semi-automatically to obtain structure of a system with great biological signicance. DePristo et al. (2005) observed that automatic renement becomes less satisfactory as positional restraints become weaker: the structures could not be rened if the initial C α perturbation was of order of 3Å or more. This is not unexpected because larger positional restraints dilute the information and would make the search harder. But often a practical problem encountered in crystallography is that of missing loops, i.e. knowing loop regions with far less C α positional certainty than the regions with regular s
This content is AI-processed based on open access ArXiv data.