Modeling Protein Evolution via Generative Inference From Monte Carlo Chains to Population Genetics

Modeling Protein Evolution via Generative Inference From Monte Carlo Chains to Population Genetics
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Generative models derived from large protein sequence alignments define complex fitness landscapes, but their utility for accurately modeling non-equilibrium evolutionary dynamics remains unclear. In this work, we perform a rigorous comparative analysis of three simulation schemes, designed to mimic evolution in silico by local sampling of the probability distribution defined by a generative model. We compare standard independent Markov Chain Monte Carlo, Monte Carlo on a phylogenetic tree, and a population genetics dynamics, benchmarking their outputs against deep sequencing data from four distinct in vitro evolution experiments. We find that standard Monte Carlo fails to reproduce the correct phylogenetic structure and generates unrealistic, gradual mutational sweeps. Performing Monte Carlo on a tree inferred from data improves phylogenetic fidelity and historical accuracy. The population genetics scheme successfully captures phylogenetic correlations, mutational abundances, and selective sweeps as emergent properties, without the need to infer additional information from data. However, the latter choice come at the price of not sampling the proper generative model distribution at long times. Our findings highlight the crucial role of phylogenetic correlations and finite-population effects in shaping evolutionary trajectories on fitness landscapes. These models therefore provide powerful tools for predicting complex adaptive paths and for reliably extrapolating evolutionary dynamics beyond current experimental limitations.


💡 Research Summary

The paper investigates how generative models derived from large protein multiple‑sequence alignments can be used to simulate non‑equilibrium evolutionary dynamics. Three distinct simulation frameworks are compared: (1) standard independent Markov Chain Monte Carlo (MCMC) that samples each sequence in isolation, (2) tree‑guided MCMC that performs local sampling along a phylogeny inferred from the data, and (3) a population‑genetics–based dynamics that explicitly models a finite population with replication, mutation, and selection. To benchmark the methods, the authors use deep‑sequencing data from four independent in‑vitro evolution experiments that differ in selection pressure and mutation rate. The independent MCMC fails to reproduce the observed phylogenetic structure and generates unrealistically smooth, gradual sweeps of mutations, reflecting its inability to capture clonal interference and finite‑population stochasticity. Tree‑guided MCMC improves historical fidelity by respecting the order of mutational events, yet it still assumes an effectively infinite population and therefore cannot generate the characteristic selective sweeps seen in the data. The population‑genetics scheme, by contrast, naturally yields the correct phylogenetic correlations, realistic mutational abundance spectra, and emergent selective sweeps without any additional inference from the experimental data. This approach, however, does not converge to the exact stationary distribution defined by the underlying generative model at long times, highlighting a trade‑off between faithfully reproducing evolutionary dynamics and sampling the formal model distribution. The authors conclude that phylogenetic correlations and finite‑population effects are essential determinants of evolutionary trajectories on complex fitness landscapes. They advocate for hybrid frameworks that combine the statistical rigor of generative models with the dynamical realism of population‑genetics simulations, arguing that such integrated tools will be powerful for predicting adaptive pathways and extrapolating beyond current experimental limits.


Comments & Academic Discussion

Loading comments...

Leave a Comment