Central and Local Limit Theorems for RNA Structures

Reading time: 5 minute
...

📝 Original Info

  • Title: Central and Local Limit Theorems for RNA Structures
  • ArXiv ID: 0707.4281
  • Date: 2007-08-01
  • Authors: ** Christian M. Reidys (주요 저자) 외 (논문에 명시된 공동 저자들) **

📝 Abstract

A k-noncrossing RNA pseudoknot structure is a graph over $\{1,...,n\}$ without 1-arcs, i.e. arcs of the form (i,i+1) and in which there exists no k-set of mutually intersecting arcs. In particular, RNA secondary structures are 2-noncrossing RNA structures. In this paper we prove a central and a local limit theorem for the distribution of the numbers of 3-noncrossing RNA structures over n nucleotides with exactly h bonds. We will build on the results of \cite{Reidys:07rna1} and \cite{Reidys:07rna2}, where the generating function of k-noncrossing RNA pseudoknot structures and the asymptotics for its coefficients have been derived. The results of this paper explain the findings on the numbers of arcs of RNA secondary structures obtained by molecular folding algorithms and predict the distributions for k-noncrossing RNA folding algorithms which are currently being developed.

💡 Deep Analysis

📄 Full Content

An RNA molecule consists of the primary sequence of the four nucleotides A, G, U and C together with the Watson-Crick (A-U, G-C) and (U-G) base pairing rules. The latter specify the pairs of nucleotides that can potentially form bonds. Single stranded RNA molecules form helical structures whose bonds satisfy the above base pairing rules and which, in many cases, determine their function. For instance RNA ribosomes are capable of catalytic activity, cleaving other RNA molecules. Not all possible bonds are realized, though. Due to bio-physical constraints and the chemistry of Watson-Crick base pairs there exist rather severe constraints on the bonds of an RNA molecule. In light of this three decades ago Waterman et.al. pioneered the concept of RNA secondary structures [21,17], being subject to the most strict combinatorial constraints. Any structure can be represented by drawing the primary sequence horizontally, ignoring all chemical bonds of its backbone, see Fig. 1. Then one draws all bonds, satisfying the Watson-Crick base pairing rules as arcs in the upper half-plane, effectively identifying structure with the set of all arcs. In this representation, RNA secondary structures have no 1-arcs, i.e. arcs of the form (i, i + 1) and no two arcs (i 1 , j 1 ), (i 2 , j 2 ), where i 1 < j 1 and i 2 < j 2 with the property i 1 < i 2 < j 1 < j 2 . In other words there exist no two arcs that cross in the diagram representation of the structure. It is well-known that there exist additional types of nucleotide interactions [1]. These bonds are called pseudoknots [23] and occur in functional RNA (RNAseP [14]), ribosomal RNA [12] and are conserved in the catalytic core of group I introns. Pseudoknots appear in plant viral RNAs pseudo-knots and in in vitro RNA evolution [20] experiments have produced families of RNA structures with pseudoknot motifs, when binding HIV-1 reverse transcriptase. Important mechanisms like ribosomal frame shifting [3] also involve pseudoknot interactions. k-noncrossing RNA structures introduced in [10] capture these pseudoknot bonds and generalize the concept of the RNA secondary structures in a natural way. In the diagram representation k-noncrossing RNA structure has no 1-arcs and contains at most k -1 mutually crossing arcs. The starting point of this paper was the experimental finding that 3-noncrossing RNA structures for random sequences of length 100 over the nucleotides A, G, U and C exhibited sharply concentrated numbers of arcs (centered at 39). It was furthermore intriguing that the numbers of arcs were significantly higher than those in RNA secondary structures. While it is evident that 3-noncrossing RNA structures have more arcs than secondary structures, the jump from 27 to 39 (for n = 100 ) with a maximum number of 50 arcs was not anticipated. Since all these quantities were via the generating functions for k-noncrossing RNA structures in [10] explicitly known we could easily confirm that the numbers of 3-noncrossing RNA structures with exactly h arcs, S ′ 3 (n, h) satisfy indeed almost "perfectly" a Gaussian distribution with a mean of 39, see Fig. 3. We also found that a central limit theorem holds for RNA secondary structures with h arcs, see Figure 4. These observation motivated us to understand how and why these limit distributions arise, which is what the present paper is about. Our main results can be summarized as follows: Theorem. Let S ′ 3 (n, h) denote the number of 3-noncrossing RNA structures with exactly h arcs. Then the random variable X n having distribution P(X n = h) = S ′ 3 (n, h)/S 3 (n) satisfies a central and local limit theorem with mean 0.39089 n and variance 0.041565 n.

Our particular strategy is rooted in our recent work on asymptotic enumeration of k-noncrossing RNA structures [11] and a paper of Bender [2] who showed how such central limit theorems arise in case of singularities that are poles. In order to put our results into context let us provide some background on central and local limit theorems. Suppose we are given a set A n (of size a n ). For instance let A n be the set of subsets of {1, . . . , n}. Suppose further we are given A n,k (of size a n,k ), k ∈ N representing a disjoint set partition of A n . For instance let A n,k be the number of subsets with exactly k elements. Consider the random variable ξ n having the probability distribution P(ξ n = k) = a n,k /a n , then the corresponding probability generating function is given by k≥0

Let ϕ n (w) = k≥0 a n,k w k , then ϕn(w) ϕn( 1) is the probability generating function of ξ n and

is called the bivariate generating function. For instance, in our example we have

2 n and the resulting bivariate generating function is

The key idea consists in considering f (z, w) as being parameterized by w and to study the change of its singularity in an ǫ-disc centered at w = 1. Indeed the moment generating function is given by

and

is the characteristic function of ξ n . This shows that the coefficien

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut