RNA structure prediction: progress and perspective
Many recent exciting discoveries have revealed the versatility of RNAs and their importance in a variety of cellular functions which are strongly coupled to RNA structures. To understand the functions of RNAs, some structure prediction models have been developed in recent years. In this review, the progress in computational models for RNA structure prediction is introduced and the distinguishing features of many outstanding algorithms are discussed, emphasizing three dimensional (3D) structure prediction. A promising coarse-grained model for predicting RNA 3D structure, stability and salt effect is also introduced briefly. Finally, we discuss the major challenges in the RNA 3D structure modeling.
💡 Research Summary
The review opens by emphasizing the central role of RNA in a wide array of cellular processes—transcription, translation, regulation, and catalysis—and how these functions are intimately linked to RNA’s secondary and tertiary structures. Traditional experimental techniques such as X‑ray crystallography, NMR spectroscopy, and cryo‑EM provide high‑resolution structural information but are limited by cost, time, sample requirements, and the difficulty of capturing dynamic or transient conformations. Consequently, computational approaches have become indispensable for probing RNA architecture at scale.
The authors first distinguish between secondary‑structure prediction (1D/2D) and full three‑dimensional (3D) modeling. Secondary‑structure methods, exemplified by Mfold, RNAfold, and CONTRAfold, rely on thermodynamic free‑energy minimization or probabilistic context‑free grammars. These algorithms have matured to the point where they can reliably predict base‑pairing patterns for sequences up to several hundred nucleotides. However, tertiary structure prediction must simultaneously account for a richer set of non‑canonical interactions—including base‑stacking, loop‑loop contacts, coaxial stacking, metal‑ion coordination, and long‑range tertiary contacts—making a simple energy function insufficient.
The review then categorizes 3D prediction strategies into two broad families. The first family consists of atomistic or near‑atomistic methods such as Rosetta‑FARFAR, SimRNA, 3dRNA, and RNAComposer. These tools combine physics‑based force fields with sophisticated sampling schemes (Monte Carlo, fragment assembly, replica exchange) to explore conformational space. While they can achieve sub‑angstrom accuracy for small RNAs, their computational cost scales poorly with sequence length, and exhaustive sampling remains a bottleneck for larger RNAs.
The second family comprises coarse‑grained (CG) approaches that reduce each nucleotide to a handful of pseudo‑atoms (typically 2–5). Representative CG frameworks include Vfold3D, CG‑RNA, and the “promising coarse‑grained model” highlighted by the authors. By simplifying the representation, CG models dramatically accelerate conformational search and enable simultaneous estimation of thermodynamic stability (ΔG) and ionic effects. The highlighted model explicitly incorporates Debye‑Hückel screening and ion‑binding terms, allowing it to predict salt‑dependent melting temperatures that correlate strongly with experimental data.
Performance evaluation across the field relies on metrics such as root‑mean‑square deviation (RMSD), TM‑score, and the accuracy of predicted free‑energy landscapes. Recent deep‑learning advances—EternaBrain, RNA‑BERT, AlphaFold‑RNA—have leveraged large‑scale simulation and experimental datasets to train end‑to‑end networks that output 3D coordinates or distance maps. These networks achieve inference speeds orders of magnitude faster than physics‑based methods while delivering comparable RMSD values for many benchmark RNAs. Nevertheless, challenges persist: data bias toward well‑studied riboswitches, limited generalization to non‑canonical nucleotides, and difficulty handling large ribonucleoprotein complexes.
A critical theme of the review is the influence of the physicochemical environment. Most existing predictors assume neutral pH and a fixed monovalent ion concentration, yet intracellular conditions feature millimolar Mg²⁺ concentrations, variable pH, and crowding agents that profoundly affect folding pathways and final structures. The authors argue that accurate modeling of charge screening, specific Mg²⁺ binding, and temperature dependence is essential for realistic predictions, especially for therapeutic design where ionic conditions are deliberately modulated.
The discussion of outstanding challenges identifies four priority areas. First, the continuous expansion and curation of high‑quality RNA 3D structures in repositories such as the Protein Data Bank and RNAcentral are needed to train and benchmark new algorithms. Second, the development of multiscale hybrid frameworks that seamlessly integrate coarse‑grained sampling with atomistic refinement could reconcile speed and accuracy. Third, establishing robust feedback loops between computational predictions and experimental validation (e.g., SHAPE‑seq, DMS‑MaP, cryo‑EM) will iteratively improve model reliability. Fourth, incorporating explicit thermodynamic models for ion competition, pH effects, and macromolecular crowding will broaden the applicability of predictions to physiological contexts.
In conclusion, the review paints an optimistic picture of RNA 3D structure prediction. The convergence of coarse‑grained physics, deep‑learning inference, and increasingly rich experimental datasets is poised to deliver integrated platforms capable of predicting not only static structures but also stability profiles and environmental responsiveness. Such capabilities will be transformative for fields ranging from rational drug design—where RNA aptamers and ribozymes are emerging therapeutic modalities—to synthetic biology, where engineered RNAs serve as scaffolds, switches, and regulatory elements. Continued progress will hinge on interdisciplinary collaboration, data sharing, and methodological innovation aimed at bridging the gap between computational models and the complex, ion‑rich milieu of living cells.