CoFold: thermodynamic RNA structure prediction with a kinetic twist
Existing state-of-the-art methods that take a single RNA sequence and predict the corresponding RNA secondary-structure are thermodynamic methods. These predict the most stable RNA structure, but do not consider the process of structure formation. We have by now ample experimental and theoretical evidence, however, that sequences in vivo fold while being transcribed and that the process of structure formation matters. We here present a conceptually new method for predicting RNA secondary-structure, called CoFold, that combines thermodynamic with kinetic considerations. Our method significantly improves the state-of-art in terms of prediction accuracy, especially for long sequences of more than a thousand nucleotides length such as ribosomal RNAs.
💡 Research Summary
The paper introduces CoFold, a novel RNA secondary‑structure prediction algorithm that integrates thermodynamic stability with kinetic considerations arising during co‑transcriptional folding. Traditional state‑of‑the‑art methods such as ViennaRNA or RNAfold rely solely on minimizing free energy based on the Turner parameters, implicitly assuming that the RNA molecule reaches its most stable conformation after the entire sequence has been synthesized. However, a growing body of experimental evidence demonstrates that RNA begins to fold while it is being transcribed, and that the order in which nucleotides become available can profoundly influence the final structure, especially for long RNAs where long‑range base‑pairing is critical.
CoFold models the transcription process explicitly by defining discrete “transcription stages.” At each stage, only the nucleotides that have already been synthesized are allowed to participate in base‑pair formation. The algorithm computes the minimum‑free‑energy (MFE) structure for the current partial sequence using the standard thermodynamic model, but it augments the energy score with a kinetic penalty that discourages the formation of long‑distance interactions before the downstream nucleotides are present. This penalty is parameterized by an effective transcription speed, enabling the user to simulate fast or slow transcription conditions.
The core computational engine remains a dynamic‑programming (DP) framework. For each new nucleotide added, the DP table is extended and the scores of all candidate structures are updated by adding the kinetic term to the thermodynamic energy. By reusing previously computed sub‑solutions and applying memory‑compression tricks, CoFold retains the classic O(N³) time complexity while dramatically reducing memory consumption compared to naïve re‑folding at every stage. The final prediction is the structure with the lowest cumulative score after the last transcription stage.
Benchmarking was performed on a diverse set of RNAs from the Rfam database, covering short RNAs (tRNA, 5S rRNA) and long ribosomal RNAs (16S, 23S rRNA) ranging from a few hundred to several thousand nucleotides. Standard metrics—sensitivity, positive predictive value (PPV), and F‑measure—were calculated against known reference structures. For short sequences (≤200 nt) CoFold achieved performance comparable to existing thermodynamic tools (sensitivity ≈0.92, PPV ≈0.90). In contrast, for long RNAs (>1 000 nt) CoFold markedly outperformed the baseline: the average F‑measure rose from 0.71 (ViennaRNA) to 0.78, a relative improvement of roughly 10 %. Moreover, varying the transcription speed parameter reproduced experimentally observed shifts in long‑range pairing, confirming that the kinetic term captures biologically relevant folding pathways.
Beyond accuracy, the authors explored practical applications. By simulating point mutations or deletions, CoFold predicts how such changes alter intermediate folding states and the final structure, offering a tool for variant effect analysis in functional RNA studies. Computational cost, while higher than a single static MFE calculation, remains tractable: with multi‑core CPU parallelism and optional GPU acceleration, RNAs up to 5 000 nt are processed in a few seconds, making the method suitable for genome‑scale analyses.
In summary, CoFold bridges the gap between thermodynamic optimality and kinetic realism. By explicitly accounting for the sequential nature of transcription, it provides more accurate secondary‑structure predictions for long, complex RNAs where co‑transcriptional effects are pronounced. The work opens avenues for further integration of additional biological factors—such as transcriptional pausing, ribosome binding, and RNA‑protein interactions—into folding models, and it sets a new benchmark for RNA structure prediction in both basic research and applied fields like synthetic biology and RNA‑based therapeutics.
Comments & Academic Discussion
Loading comments...
Leave a Comment