Multi-scale sequence correlations increase proteome structural disorder and promiscuity
Numerous experiments demonstrate a high level of promiscuity and structural disorder in organismal proteomes. Here we ask the question what makes a protein promiscuous, i.e., prone to non-specific interactions, and structurally disordered. We predict that multi-scale correlations of amino acid positions within protein sequences statistically enhance the propensity for promiscuous intra- and inter-protein binding. We show that sequence correlations between amino acids of the same type are statistically enhanced in structurally disordered proteins and in hubs of organismal proteomes. We also show that structurally disordered proteins possess a significantly higher degree of sequence order than structurally ordered proteins. We develop an analytical theory for this effect and predict the robustness of our conclusions with respect to the amino acid composition and the form of the microscopic potential between the interacting sequences. Our findings have implications for understanding molecular mechanisms of protein aggregation diseases induced by the extension of sequence repeats.
💡 Research Summary
The authors address a fundamental question in proteomics: why are many eukaryotic proteins intrinsically disordered and why do some proteins act as “hubs” that interact with dozens or hundreds of partners? They propose that the answer lies in the statistical correlations of identical amino‑acid types along the primary sequence. In other words, when residues of the same type appear repeatedly at characteristic distances—a phenomenon they term multi‑scale sequence correlations—both intra‑protein (structural) disorder and inter‑protein promiscuity are enhanced.
To test this hypothesis, the study first defines a normalized correlation function ηαβ(x)=gαβ(x)/gαβ^r(x), where gαβ(x) is the observed joint probability of finding residues α and β separated by distance x, and gαβ^r(x) is the same probability for a randomized sequence set. Values larger than one indicate statistically significant correlations. Using a non‑redundant dataset of intrinsically disordered proteins (IDPs) and comparing them with well‑structured all‑α and all‑β proteins, the authors find that the diagonal elements ηαα(x) (i.e., correlations between identical residues) are markedly higher in IDPs across a wide range of distances, often extending to hundreds of residues. Twelve amino acids—Gly, Tyr, Arg, Trp, Ser, Glu, Pro, Asp, Gln, Ala, Lys, and Thr—show a cumulative correlation ratio χαα>1.1 in disordered proteins relative to ordered ones, indicating a robust multi‑scale pattern.
Next, the authors examine high‑throughput protein‑protein interaction (PPI) data from yeast two‑hybrid (Y2H) and affinity‑purification mass‑spectrometry (AP/MS) experiments for human, yeast, and Escherichia coli proteomes. They separate “hub” proteins (many interaction partners) from “end” proteins (single partner) and compute the same correlation metrics. In human hubs, six residues (His, Phe, Ile, Pro, Gly, Tyr) display χαα>1.1, while yeast and bacterial hubs show weaker but still significant trends for a subset of residues. Moreover, when the entire proteomes are compared, nine residues exhibit stronger diagonal correlations in humans than in bacteria, supporting the view that eukaryotic proteomes are intrinsically more promiscuous.
To explain why enhanced diagonal correlations increase promiscuity, the paper presents two complementary theoretical models. The first is a toy one‑dimensional lattice where sequences consist of two residue types (H and P). Random sequences (no correlation) have an interaction energy variance σ² proportional to the sequence length L. When identical residues are forced to form adjacent pairs (C2) or triples (C3), the variance doubles or quadruples, respectively, while the mean remains zero. Because the free‑energy of binding to a random target is essentially −kT ln⟨e^(−E/kT)⟩, a larger σ leads to a lower average free energy, i.e., higher promiscuity. The second model introduces a “design temperature” Td and a pairwise design potential Uαβ(x) that biases the placement of residues during a Monte‑Carlo annealing process. When the effective design potential U(x)=Upp+Uhh−2Uhp is negative (favoring like‑type contacts), the resulting sequences develop strong ηαα(x)>1 correlations. Analytical calculations show that the variance of the interaction energy between a designed (correlated) sequence and a random partner is proportional to the magnitude of U(x). Consequently, any correlated sequence—regardless of the sign of the inter‑sequence binding potential V(ρ)—exhibits a broader energy distribution and thus a lower average binding free energy than an uncorrelated sequence. This result is independent of the specific functional form of V(ρ) or the overall amino‑acid composition.
The authors illustrate the concept with the human EWSR1 protein, a known hub with 94 interaction partners, which displays pronounced diagonal correlations. They argue that many disease‑associated repeat expansions (e.g., collagen, poly‑glutamine tracts) may increase promiscuity via the same mechanism, potentially leading to pathological aggregation.
In summary, the paper provides compelling statistical evidence that multi‑scale, same‑type residue correlations are a hallmark of intrinsically disordered proteins and hub proteins. Theoretical analysis demonstrates that such correlations increase the variance of interaction energies, thereby lowering the average free energy of non‑specific binding. This mechanism offers a unified explanation for the coexistence of structural disorder and high interaction promiscuity in eukaryotic proteomes, and it suggests that modulation of sequence correlations could be a strategic avenue for protein engineering, disease‑mechanism studies, and the interpretation of large‑scale interaction networks.
Comments & Academic Discussion
Loading comments...
Leave a Comment