Profile Conditional Random Fields for Modeling Protein Families with Structural Information

Profile Conditional Random Fields for Modeling Protein Families with   Structural Information
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

A statistical model of protein families, called profile conditional random fields (CRFs), is proposed. This model may be regarded as an integration of the profile hidden Markov model (HMM) and the Finkelstein-Reva (FR) theory of protein folding. While the model structure of the profile CRF is almost identical to the profile HMM, it can incorporate arbitrary correlations in the sequences to be aligned to the model. In addition, like in the FR theory, the profile CRF can incorporate long-range pairwise interactions between model states via mean-field-like approximations. We give the detailed formulation of the model, self-consistent approximations for treating long-range interactions, and algorithms for computing partition functions and marginal probabilities. We also outline the methods for the global optimization of model parameters as well as a Bayesian framework for parameter learning and selection of optimal alignments.


💡 Research Summary

The paper introduces a novel statistical framework for modeling protein families called the profile Conditional Random Field (profile CRF). The authors position this model as a synthesis of two well‑established concepts: the profile hidden Markov model (HMM), which has been the workhorse for sequence alignment and family modeling, and the Finkelstein‑Reva (FR) theory of protein folding, which provides a physics‑based description of long‑range contacts in three‑dimensional structures. By marrying these ideas, the profile CRF retains the familiar linear architecture of a profile HMM—states for match, insert, delete, and transitions among them—while extending the expressive power to capture arbitrary correlations among residues and explicit long‑range pairwise interactions between distant positions in the model.

Model formulation
The energy (or negative log‑probability) of an alignment (s) to a target sequence (x) is expressed as

\


Comments & Academic Discussion

Loading comments...

Leave a Comment