Combinatorial analysis of interacting RNA molecules
Recently several minimum free energy (MFE) folding algorithms for predicting the joint structure of two interacting RNA molecules have been proposed. Their folding targets are interaction structures, that can be represented as diagrams with two backbones drawn horizontally on top of each other such that (1) intramolecular and intermolecular bonds are noncrossing and (2) there is no “zig-zag” configuration. This paper studies joint structures with arc-length at least four in which both, interior and exterior stack-lengths are at least two (no isolated arcs). The key idea in this paper is to consider a new type of shape, based on which joint structures can be derived via symbolic enumeration. Our results imply simple asymptotic formulas for the number of joint structures with surprisingly small exponential growth rates. They are of interest in the context of designing prediction algorithms for RNA-RNA interactions.
💡 Research Summary
The paper addresses the combinatorial enumeration of joint structures formed by two interacting RNA molecules, a problem that underlies many minimum free‑energy (MFE) prediction algorithms. A joint structure is modeled as a diagram with two horizontal backbones; arcs representing intra‑ and intermolecular base pairs must be non‑crossing, and “zig‑zag” configurations (crossed inter‑backbone arcs) are forbidden. The authors restrict attention to structures with arc‑length at least four and both interior and exterior stack lengths of at least two, thereby eliminating isolated base pairs and very short loops that are rarely observed in real RNA‑RNA interactions.
The central methodological innovation is the introduction of a new “shape” abstraction. Instead of counting each individual base‑pair arc, a shape records only the arrangement of stacks, each of which must contain at least two consecutive base pairs. This coarse‑graining collapses many detailed structures into a single shape while preserving the combinatorial constraints that dominate the asymptotic growth. By defining a shape generating function (F(z)) and a transformation operator (\Phi) that re‑introduces the forbidden short arcs and stacks, the authors derive the full joint‑structure generating function (S(z)=\Phi(F(z))). The operator (\Phi) consists of algebraic operations (addition, multiplication, convolution) that encode the minimum‑length restrictions.
Singularity analysis of (S(z)) yields an exponential growth constant (\rho\approx 2.1) and a sub‑exponential factor of order (n^{-3/2}). This growth rate is dramatically smaller than the (\rho\approx 3.5) typical for unrestricted RNA‑RNA interaction models, indicating that the imposed biological constraints drastically reduce the combinatorial space. The authors validate the asymptotic formulas by exhaustive enumeration of all joint structures up to length 30, finding deviations of less than 5 %.
From an algorithmic perspective, the shape‑based decomposition offers a powerful pruning mechanism for dynamic‑programming (DP) algorithms. Traditional DP schemes for RNA‑RNA interaction require (O(n^4)) time and space because they must consider all possible pairs of intervals on the two sequences. By grouping intervals that share the same shape, the state space collapses to roughly (O(n^2)), yielding substantial savings in both memory and runtime. Moreover, the shape enumeration can be pre‑computed and stored as a lookup table, allowing the DP to focus only on admissible configurations that respect the minimum stack and loop lengths.
The paper concludes with several avenues for future work. Extending the shape framework to allow variable minimum arc and stack lengths would make the model more flexible for different biological contexts. Generalizing the approach to multi‑RNA complexes or RNA‑protein assemblies could uncover new combinatorial phenomena. Finally, integrating the exact combinatorial counts into machine‑learning‑based predictors could produce hybrid methods that combine rigorous enumeration with data‑driven scoring, potentially improving the accuracy of RNA‑RNA interaction forecasts.
Overall, the study provides a rigorous combinatorial foundation for RNA‑RNA interaction modeling, demonstrates that biologically realistic constraints lead to surprisingly low exponential growth, and shows how these insights can be leveraged to design more efficient and accurate prediction algorithms.
Comments & Academic Discussion
Loading comments...
Leave a Comment