Predictability of evolutionary trajectories in fitness landscapes

Predictability of evolutionary trajectories in fitness landscapes
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Experimental studies on enzyme evolution show that only a small fraction of all possible mutation trajectories are accessible to evolution. However, these experiments deal with individual enzymes and explore a tiny part of the fitness landscape. We report an exhaustive analysis of fitness landscapes constructed with an off-lattice model of protein folding where fitness is equated with robustness to misfolding. This model mimics the essential features of the interactions between amino acids, is consistent with the key paradigms of protein folding and reproduces the universal distribution of evolutionary rates among orthologous proteins. We introduce mean path divergence as a quantitative measure of the degree to which the starting and ending points determine the path of evolution in fitness landscapes. Global measures of landscape roughness are good predictors of path divergence in all studied landscapes: the mean path divergence is greater in smooth landscapes than in rough ones. The model-derived and experimental landscapes are significantly smoother than random landscapes and resemble additive landscapes perturbed with moderate amounts of noise; thus, these landscapes are substantially robust to mutation. The model landscapes show a deficit of suboptimal peaks even compared with noisy additive landscapes with similar overall roughness. We suggest that smoothness and the substantial deficit of peaks in the fitness landscapes of protein evolution are fundamental consequences of the physics of protein folding.


💡 Research Summary

The paper tackles the long‑standing question of how predictable evolutionary trajectories are by examining the structure of protein fitness landscapes. Because experimental fitness landscapes have so far been limited to a handful of enzymes and cover only a minuscule fraction of sequence space, the authors turn to a computational model that can be exhaustively explored. They use an off‑lattice protein‑folding model in which fitness is defined as robustness to misfolding: the fewer misfolded copies a sequence must produce before reaching the required amount of correctly folded protein, the higher its fitness. This model captures essential physical interactions between amino acids, reproduces the universal distribution of evolutionary rates among orthologous proteins, and has previously been shown to reflect the dependence of evolutionary rate on protein abundance and effective population size.

A central methodological contribution is the introduction of “mean path divergence” (¯d) as a quantitative measure of evolutionary predictability. For any pair of monotonic (fitness‑increasing) paths that share the same start and end points, the divergence d(p1,p2) is defined as the average of the shortest Hamming distances between each point on one path and the other path. The mean path divergence of a bundle of paths is the probability‑weighted average of d over all pairs, where the probability of a path is proportional to the product of fixation probabilities of its constituent mutations. Small ¯d indicates that the accessible monotonic paths are similar to each other, implying a more deterministic evolutionary process; large ¯d signals that even a limited set of monotonic routes can be highly divergent, reducing predictability.

To relate path divergence to landscape topology, the authors compute several global roughness metrics: (1) deviation from additivity (the residual sum of squares after fitting an additive model), (2) local roughness (root‑mean‑square fitness differences between a point and its neighbors), (3) peak fraction (the proportion of points with no fitter neighbor), and (4) mean distance to the “tree component” (the set of nodes with at most one fitter neighbor, which includes peaks and plateaus). They also calculate the fraction of monotonic paths to the main peak (Fm) as a traditional measure of path scarcity.

Three classes of landscapes are compared: (i) folding‑model landscapes derived from the off‑lattice simulations, (ii) additive random landscapes perturbed by Gaussian noise, and (iii) experimentally derived landscapes from combinatorial mutagenesis of drug‑resistant enzymes and DNA‑protein binding assays. Across all classes, the folding‑model and experimental landscapes are significantly smoother than random NK‑type landscapes. Their roughness values are comparable to additive landscapes with moderate noise, but they exhibit a striking deficit of sub‑optimal peaks: for a given level of overall roughness, the number of local maxima is far lower than in noisy additive controls. This deficit is attributed to the physical constraints of protein folding, which suppress the formation of many low‑fitness basins.

Importantly, the authors find a robust positive correlation between global roughness measures and mean path divergence. Smoother landscapes yield lower ¯d, meaning that the starting and ending genotypes largely determine the evolutionary route. Rougher landscapes produce higher ¯d, reflecting a larger spread of possible trajectories even when the number of monotonic paths (Fm) is limited. Thus, path divergence provides a finer‑grained assessment of predictability than the simple count or fraction of accessible paths.

The study concludes that protein evolution operates on intrinsically smooth fitness landscapes shaped by folding physics. This smoothness, together with the scarcity of sub‑optimal peaks, makes evolutionary outcomes more deterministic than would be expected from purely random epistatic models. The findings have implications for protein engineering, where designing sequences that lie on smooth regions of the landscape could enhance the reliability of directed evolution, and for theoretical biology, where incorporating physical constraints may improve models of adaptive dynamics.


Comments & Academic Discussion

Loading comments...

Leave a Comment