Asymptotics of Canonical and Saturated RNA Secondary Structures
It is a classical result of Stein and Waterman that the asymptotic number of RNA secondary structures is $1.104366 n^{-3/2} 2.618034^n$. In this paper, we study combinatorial asymptotics for two special subclasses of RNA secondary structures - canonical and saturated structures. Canonical secondary structures were introduced by Bompf"unewerer et al., who noted that the run time of Vienna RNA Package is substantially reduced when restricting computations to canonical structures. Here we provide an explanation for the speed-up. Saturated secondary structures have the property that no base pairs can be added without violating the definition of secondary structure (i.e. introducing a pseudoknot or base triple). Here we compute the asymptotic number of saturated structures, we show that the asymptotic expected number of base pairs is $0.337361 n$, and the asymptotic number of saturated stem-loop structures is $0.323954 1.69562^n$, in contrast to the number $2^{n-2}$ of (arbitrary) stem-loop structures as classically computed by Stein and Waterman. Finally, we show that the density of states for [all resp. canonical resp. saturated] secondary structures is asymptotically Gaussian. We introduce a stochastic greedy method to sample random saturated structures, called quasi-random saturated structures, and show that the expected number of base pairs of is $0.340633 n$.
💡 Research Summary
The paper investigates two restricted families of RNA secondary structures—canonical (also called “regular”) structures and saturated structures—by means of analytic combinatorics. Starting from the classical Stein‑Waterman result that the total number of secondary structures on an $n$‑nucleotide sequence grows as $1.104366,n^{-3/2},2.618034^{,n}$, the authors ask how much this count is reduced when additional biologically motivated constraints are imposed.
Canonical structures.
Canonical structures were introduced by Bompfünewerer et al. as those in which every stem has length at least two and any two adjacent stems are separated by at least one unpaired nucleotide (a bulge or a loop). This restriction mirrors the way the Vienna RNA Package prunes its dynamic‑programming tables, thereby speeding up folding calculations. The authors translate the recursive description of canonical structures into a functional equation for the ordinary generating function $C(z)=\sum_{n\ge0}c_nz^n$:
\
Comments & Academic Discussion
Loading comments...
Leave a Comment