Out-of-equilibrium selection pressure enhances inference from protein sequence data

Reading time: 5 minute
...

📝 Original Info

  • Title: Out-of-equilibrium selection pressure enhances inference from protein sequence data
  • ArXiv ID: 2602.11427
  • Date: 2026-02-11
  • Authors: ** (논문에 명시된 저자 정보가 제공되지 않았으므로, 여기서는 “저자 정보 미제공”이라고 표기합니다.) **

📝 Abstract

Homologous proteins have similar three-dimensional structures and biological functions that shape their sequences. The resulting coevolution-driven correlations underlie methods from Potts models to AlphaFold, which infer protein structure and function from sequences. Using a minimal model, we show that fluctuating selection strength and the onset of new selection pressures improve coevolution-based inference of structural contacts. Our conclusions extend to realistic synthetic data and to the inference of interaction partners. Out-of-equilibrium noise arising from ubiquitous variations in natural selection thus enhances, rather than hinders, the success of inference from protein sequences.

💡 Deep Analysis

📄 Full Content

Introduction. During evolution, proteins evolve through random mutations of their sequences and natural selection for function. This coupling between sequence and function leads to coevolution between amino acid sites in multiple sequence alignments (MSAs) of homologous proteins, which share ancestry, three-dimensional structure, and function. Amino-acid identities at contacting positions in a protein's threedimensional structure are correlated, due to the need to preserve physico-chemical complementarity. This has allowed to predict structural contacts from sequences, using mutual information [1] and Potts models [2][3][4][5], which are pairwise maximum entropy models inferred on MSAs. Attention coefficients in protein language models trained on MSAs capture coevolution [6,7], a key factor underlying AlphaFold's success in protein structure prediction [7]. Beyond structure, coevolution allows to infer interaction partners [8][9][10][11] and functional sectors of collectively correlated amino acids [12][13][14][15], as well as to investigate protein evolution [16][17][18][19][20][21].

Natural environments often change over time [22][23][24][25][26][27], causing variable selection pressures on proteins [22]. How do such fluctuations impact inference from sequences? To address this, we consider a minimal model of protein sequences evolving under fluctuating selection to preserve structure. We show that the resulting out-of-equilibrium noise can enhance the inference of structural contacts via Potts models. This conclusion extends to realistic synthetic data. It also holds in a simple model of the onset of a new selection pressure, where a random ancestral sequence evolves under structural selection with phylogeny. Finally, we show that out-of-equilibrium selection also enhances the inference of interaction partners. Our results suggest that time-dependent external drives can facilitate inference by pairwise maximum entropy models.

Model and methods. We consider a minimal model, where protein sequences are represented as length-L sequences of Ising spins (σ 1 , . . . , σ L ), defined on the nodes * anne-florence.bitbol@epfl.ch of a fixed Erdős-Rényi random graph. Structural contacts are modeled by pairwise ferromagnetic couplings on the graph edges, all set to 1 for simplicity, leading to the Hamiltonian

where E denotes the set of graph edges, see also [28].

We sample independent equilibrium sequences using a Metropolis-Hastings algorithm by proposing spin flips at random sites and accepting them with probability

where ∆H represents the energy variation associated to the flip, and T the Monte Carlo sampling temperature, see Fig. S1(a) for a schematic and Fig. S2 for information about equilibration. Selection strength is controlled by T , with higher T corresponding to weaker selection [28].

We model variable selection strength using a telegraph process that switches between T 1 and T 2 > T 1 with equal transition rates 1/τ . We start from independent equilibrium sequences at temperature T 1 , see Fig. S1(b).

To go beyond this minimal model, we generate realistic synthetic data using a Potts model inferred from natural sequences via bmDCA [30,31], an approach whose generative power was experimentally validated [31].

Finally, to model the onset of a new selection pressure, we start from a random ancestral sequence and apply selection for structure using the minimal model described above. We evolve different sequences from the same ancestor via independent Markov chains, producing a simple star phylogeny, see Fig. S1(c).

Fluctuating selection enhances contact inference. In our model, selection strength is controlled by the sampling temperature T . At very low T , only spin flips that decrease the energy (Eq. 1) are accepted, leading to low sequence variability and poor inference. Conversely, at very high T , almost all spin flips are accepted, producing noisy sequences that also hinder inference. As a result, inference is optimal at intermediate temperatures.

In the present minimal model featuring a ferromagneticparamagnetic phase transition (Fig. S3 and [28,32]), inference is most successful around and moderately above this transition [28,33]. With our parameter values, it occurs at T C = 4 (Fig. S4). Inference performance also depends on the number of available sequences (Fig. S5 and [32,33]). In natural data, this number is limited by experimental sampling and by biological evolution, leading to finite-size effects that hinder inference [3,4,34]. Thus motivated, we work with finite MSAs throughout.

How does the time variability in selection pressure impact inference from sequence data? To answer this within our minimal model, we start from independent sequences equilibrated at T 1 = 1 and evolve them under a time-varying sampling temperature that switches between T 1 = 1 and T 2 = 15.

Fig. 1 shows the True Positive (TP) fraction -defined as the fraction of true contacts among the top N inferred Isi

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut