Canonical RNA pseudoknot structures

CANONICAL RNA PSEUDOKNOT STR UCTURES GANG MA AND CHRISTIAN M. REID YS ⋆ Abstract. In this paper we study k -noncrossing, canonical RNA pseudoknot structures with minimum arc-length ≥ 4. Let T [4] k,σ ( n ) denote the num ber of these structures. W e derive exac t en umeration results b y computing the generating funct ion T [4] k,σ ( z ) = P n T [4] k,σ ( n ) z n and deriv e the asymptot ic for m ulas T [4] k, 3 ( n ) ∼ c k n − ( k − 1) 2 − k − 1 2 ( γ [4] k, 3 ) − n for k = 3 , . . . , 9. In particular we ha v e for k = 3, T [4] 3 , 3 ( n ) ∼ c 3 n − 5 2 . 0348 n . Our results prov e that the set of biophysically relev an t RNA pseudoknot structures is surpr i singly small and suggest a new s tr ucture class as target for prediction algorithms. 1. Introduction RNA pseudoknot s t ructures hav e drawn a lo t of atten tion ov er the last decade [1]. F rom micro- RNA binding to ribosoma l frameshifts [20], we currently discov er novel RNA functionalities a t truly amazing ra tes. Our co nceptional unders ta nding of RNA pseudo knot structures has not kept up with this pa ce. Only rece ntly the genera tin g functions o f k - noncrossing RNA structure s of arc-length ≥ 2 [11], ar c-length ≥ 4 [9] and canonical k -noncros sing RNA structures of arc-length ≥ 2 [13] have been derived. While these co m binatorial results op en new p erspectives fo r the design of new folding alg orithms, it has to b e no ted that realistic pseudoknot str uc tur es ar e s ub ject to a minimu m a rc-length ≥ 4 and stac k-length ≥ 3. Therefore the above str uctu re cla sses are not “b est po ssible”. The la ck o f a transparent tar get class o f RNA pseudoknot structures repres e n ts a pro blem for ab initio predictio n algorithms. There are four algorithms, capable of the ener gy based predic tio n of cer tain pse udo knots in po lynomial time: Riv as et al . (dynamic prog ramming, Date : June, 2008. Key wor ds and ph r ases. RNA secondary structure, pseudoknot, en umer a tion, gene rating function, singularit y analysis. 1 2 GANG MA AND CHRISTIAN M. REIDYS ⋆ gap-matrice s, O ( n 6 ) time and O ( n 4 ) space) [2 1 ], Uemura et al. ( O ( n 5 ) time a nd O ( n 4 ) s pace, tr ee- adjoining g rammars) [25], Akutsu [3] and Lyng so [18]. All of them follow the dyna mic progr a mming paradigm and none pro duces an eas ily sp eciﬁable class of pse udo knots as output. In this pap er we characterize a class of pseudoknot RNA s tr uctures in which bo nds have a minimu m length o f four and stacks contains at least three base pa irs. Our results show that this s tructure class is idea lly suited a s a priori -output for prediction a lgorithms. T ab.1 indicates that this c lass remains s uit able even for more complex pseudoknots (sp eciﬁed in terms of la rger sets o f mutual ly crossing b onds). In fact, o ne ca n search RNA 3-noncro ssing pseudoknot structure w ith arc-leng th ≥ 4 a nd stack-length σ ≥ 3 for a sequence of length 100 w.r.t. a v ar iet y o f ob jective functions (in particular lo op-based minimum free energy mo dels) on a 4-cor e PC in a few minutes [10]. In order to put our r esults in to context, w e turn the clo c k back b y a lmost three decades. 19 78 M. W aterman et al. [27, 28, 29, 30] b egan deriving the concepts for e numeration and prediction of RNA seco ndary structures. The latter r epresen t arguably the prototype of prediction-targ ets of RNA structures. RNA seconda ry structures are coar se grained structure s which c an b e represented as outer-planar g raphs, diagr ams, Motzkin-paths or w ords over “ . ” “ ( ” and “ ) ”. Their decisive feature is that they hav e no tw o cros sing bonds, see Fig.1. Let T [ λ ] 2 ( n ) denote the num b er of secondary structures with arc-length ≥ λ ov er [ n ] = { 1 , . . . , n } . The k ey to RNA secondary structures is the following r ecursion for T [ λ ] 2 ( n ): (1.1) T [ λ ] 2 ( n ) = T [ λ ] 2 ( n − 1) + n − ( λ +1) X j =0 T [ λ ] 2 ( n − 2 − j ) T [ λ ] 2 ( j ) , where T [ λ ] 2 ( n ) = 1 for 0 ≤ n ≤ λ . The latter follows from co nsidering the concatenation of Motzkin-paths with minim um p eak length λ − 1. Eq. (1.1) implies for the genera t ing function T [ λ ] 2 ( z ) = P n ≥ 0 T [ λ ] 2 ( n ) z n the functional equa t ion (1.2) z 2 T [ λ ] 2 ( z ) 2 − (1 − z + z 2 + · · · + z λ ) T [ λ ] 2 ( z ) + 1 = 0 from which even tually T [ λ ] 2 ( z ) = − 1 + 2 z − 2 z 2 + z λ +1 + √ 1 − 4 z + 4 z 2 − 2 z λ +1 + 4 z λ +2 − 4 z λ +3 + z 2 λ +2 2( z 3 − z 2 ) follows. Ther e f ore, minimum a rc-length restrictions do not imp ose pa rticular diﬃculties fo r RNA secondary s tructures. In fact minim um stack size co nditions can also b e dealt with s t raightfor- wardly . W e furthermore no t e that eq . (1.1) is a c onstructive rec ur sion, i.e. it allows to inductively build seco ndary str uctures over [ n ] from thos e over [ i ], for all i < n . CANONICAL RNA PSEUDOKNOT STRUC TURES 3 1 1 0 2 0 3 0 4 0 5 0 6 0 7 0 2 0 3 0 4 0 6 0 7 0 7 6 3 ’ e nd 5 ’ e nd 5 ’- en d 3 ’- en d 1 0 2 0 3 0 4 0 5 0 6 0 7 0 7 6 Figure 1. RNA secondary structures. In order to a na lyze RNA structure with crossing bo nds, we recall the no t ion o f k -noncro s sing diagrams [11]. A k -noncrossing diagram is a lab eled gr aph over the vertex set [ n ] with vertex degrees ≤ 1, repr esen ted b y drawing its v ertices 1 , . . . , n in a horizo ntal line a nd its arc s ( i, j ), where i < j , in the upper ha lf-pla ne, containing at most k − 1 mutually c r ossing arcs. The vertices and arcs cor respond to nucleotides and W a t son-Crick ( A-U , G-C ) and ( U-G ) base pairs, 4 GANG MA AND CHRISTIAN M. REIDYS ⋆ 1 2 3 4 5 6 7 8 9 1 0 11 1 2 1 3 1 4 1 5 1 2 3 4 5 6 7 8 9 1 0 Figure 2. k -noncrossi ng diagrams: we displa y a 4-noncrossing, arc-length λ ≥ 4 and σ ≥ 1 (upp er) and 3-noncrossing, λ ≥ 4 and σ ≥ 2 (low er) diagram. resp ectiv ely . Diag rams hav e the following three k ey parameter s: the ma xim um num ber of mut ually crossing ar cs, k − 1 , the minimum arc-leng th , λ and minimum stack-length, σ ( h k , λ, σ i -diagrams). The length of an arc ( i, j ) is j − i and a stack o f length σ is a sequence o f “ parallel” arcs of the form (( i, j ) , ( i + 1 , j − 1) , . . . , ( i + ( σ − 1) , j − ( σ − 1))), see Fig.2. W e call an arc of length λ a λ -arc. Let T [ λ ] k,σ ( n ) denote the se t of k -noncros sing diagr ams with minim um arc- and stack-length λ and σ and let T [ λ ] k,σ ( n ) denote their num b er. In the following, we shall iden tify pseudoknot RNA structure s with k -noncrossing diagra ms and re- fer to them as h k , λ, σ i -structures. Pseudoknot RNA s tructures o ccur in functiona l RNA (RNAseP) [17], rib osomal RNA [16] and plant viral RNAs and vitro RNA ev o lutio n exp erimen ts hav e pro- duced families of RNA str uctures with pseudokno t motifs [24]. In Fig .3 we give several repr e- sentations o f the UTR-pseudoknot o f the mouse hepa tit is virus. Due to the crossing s of ar cs pseudoknots diﬀers considerably from secondary structures: ps eudoknot RNA structures are in- herently non-inductive and no ana logue of eq. (1.1) exists. One key for the genera ting function of k -noncrossing RNA structur es T [ λ ] k ( z ) was the bijection of Chen et al. [4] obtained in the co n text of k -noncrossing partitions. This bijection has b een genera liz ed to k -noncrossing tangled diag rams [5], a class of contact-structures tailor ed for expressing RNA tertiary in tera ctions. Via the bijec- tion k -noncrossing RNA structures can b e iden tiﬁed with certain walks in Z k − 1 that rema in in the region { ( x 1 , . . . , x k − 1 ) ∈ Z k − 1 | x 1 ≥ x 2 ≥ . . . x k − 1 ≥ 0 } CANONICAL RNA PSEUDOKNOT STRUC TURES 5 5 ’e n d 3 ’e n d 1 0 2 0 3 0 4 0 5 0 5 6 1 1 1 0 2 0 3 0 4 0 5 0 5 6 3 ’e nd 5 ’e nd Figure 3. UTR-pseudoknot structure of the mouse hepatitis virus. starting a nd ending at 0, the b oundaries o f whic h are called w alls. The enumeration of these walks is obta ined employing the reﬂection principle. This metho d is due to Andr ´ e in 188 7 [2] and has subsequently been genera lized by Gessel and Zeilb erger [7]. In the reﬂectio n principle “bad”-i.e. re ﬂected- walks cancel themse lves. In other words o ne enumerates all walks and due to cancellation o nly the o ne s survive that never touch the walls. Despite its b eaut y this metho d do es not trigger any algorithmic in tuition and is nonconstr uctiv e. Moreover, k -noncros s ing RNA s tr uc- tures canno t directly b e enumerated via the reﬂection principle: it do es not preserve a minimum arc-length. In [11] it is shown how to eliminate speciﬁc classes of a rcs a ft er reﬂection. One non- trivial implication of this theory is tha t all generating functions for k -noncross ing RNA structures are D -ﬁnite, i.e . ther e exists a nonc onstru ctiv e rec ur rence relation of ﬁnite length with po lynomial co eﬃcien ts for T [ λ ] k,σ ( n ). Note how ever, that a lth ough w e ca n prov e the existence of this r ecurrence 6 GANG MA AND CHRISTIAN M. REIDYS ⋆ it is at present not known for any k > 2. In Fig.4 we illustrate the key steps for the enumeration of k -noncrossing RNA structures [1 1 ]. 1 2 3 4 5 6 + + + - - - 1 2 1 2 1 1 0 1 2 3 1 2 3 Figure 4. Exact enumeration of k -noncrossing RNA structures. Once T [4] k,σ ( z ) is known w e employ singularity analysis and study its domina n t singula rities, using Hankel contours. This Ansatz has been pioneered by P . Fla jolet and A.M. Odlyzko [6]. Its bas ic idea is the construction of an “sing ular-analogue” o f the T aylor-expa nsion. It ca n be sho wn that, under certain conditions, there exists an appr o ximatio n, which is lo cally of the same or de r as the original function. The particular, loc al appr oximation a llo ws then to derive the asymptotic form of the co eﬃcien ts. In o ur situation all conditions for singularity analys is are met, since all our generating functions are D -ﬁnite [2 2 , 31] and D -ﬁnite functions hav e a n analytic contin ua tio n into any simply-co nn ected do main co n taining zero . W e will compute T [4] k,σ ( z ) and show that T [4] k,σ ( z ) has a n unique do mina n t singularity , whos e type depe nds solely on the crossing num b er [1 2 , 13]. Via singularity a na lysis will pro duce an array of exp onen tial g ro wth rates indexed by k and σ , summarized in T ab. 1. The ideas o f this pap e r build on those of [11, 13]. In [13] core - structures are intro du ced via whic h σ -canonical k - noncrossing structures can b e enumerated. h k, 4 , σ i -structures where σ ≥ 3 ca n how ever not be en umerated via core-str uctu res, see Fig.5. This is a result fro m the fact that the co re-map, obtained by identifying stacks b y s ing le arcs do es not pres e rv e arc-le ngth. Therefore we hav e to in tro duce a new s et of k -noncrossing diagrams, denoted by T ∗ k ( n, h ). This class is designed for inducing a new type of cores, C ∗ k ( n ′ , h ′ ) (see Theorem 3). Then we pro ceed using ideas similar to those in [13] and prove CANONICAL RNA PSEUDOKNOT STRUC TURES 7 k 3 4 5 6 7 8 9 σ = 3 2 . 0348 2 . 2644 2 . 4432 2 . 5932 2 . 7243 2 . 8414 2 . 9480 σ = 4 1 . 7898 1 . 9370 2 . 0488 2 . 1407 2 . 2198 2 . 2896 2 . 3523 σ = 5 1 . 6465 1 . 7532 1 . 8330 1 . 8979 1 . 9532 2 . 0016 2 . 0449 σ = 6 1 . 5515 1 . 6345 1 . 6960 1 . 7457 1 . 7877 1 . 8243 1 . 8569 σ = 7 1 . 4834 1 . 5510 1 . 6008 1 . 6408 1 . 6745 1 . 7038 1 . 7297 σ = 8 1 . 4319 1 . 4888 1 . 5305 1 . 5639 1 . 5919 1 . 6162 1 . 6376 σ = 9 1 . 3915 1 . 4405 1 . 4763 1 . 5049 1 . 5288 1 . 5494 1 . 5677 T able 1. Exponential growth rates of h k , 4 , σ i -structu res where σ ≥ 3. I - 1 I I + 4 J J I + 4 I L en g th = 4 L en g th = 2 Figure 5. Core-structures will in general hav e 2-arcs: the structure δ ∈ T [4] 3 , 3 (12) (lhs) is mapp ed into its core c ( δ ) (rhs). Clearly δ has arc-length ≥ 4 and as a consequ ence of the collapse of the stack (( I + 1 , J + 2) , ( I + 2 , J + 1) , ( I + 3 , J )) (the red arcs are b eing remo ved) into the arc ( I + 3 , J ), c ( δ ) contains the arc ( I , I + 4), whic h is, after relab e ling, a 2-arc. our exact enum eration res ult , Theo rem 3. As for the sing ula rit y ana lysis the main co ntribution is Claim 1 of The o rem 4: a new functional equa tio n for T [4] k,σ ( z ). 2. Preliminaries In this Sec tio n we provide some background on the ge ne r ating functions o f k -noncrossing matc hings [4, 1 5 ] and k -noncross ing RNA structures [11, 1 2 ]. W e denote the set (num ber ) of k -noncrossing RNA str uc tur es with a rc-length ≥ λ a nd s tac k-s ize ≥ σ by T [ λ ] k,σ ( n ) ( T [ λ ] k,σ ( n )). By abuse o f nota t ion we omit the indices λ and σ in T [ λ ] k,σ ( n ) ( T [ λ ] k,σ ( n )) for λ = 2 and σ = 1. A k - noncrossing core- structure is a k -noncrossing RNA str uc tur es in which there exis ts no t wo arcs of the form ( i, j ) , ( i + 1 , j − 1). The set (num b er) of k -noncrossing core-s t ructures a nd k -noncrossing core - structures with 8 GANG MA AND CHRISTIAN M. REIDYS ⋆ exactly h arcs is denoted by C k ( n )( C k ( n )) and C k ( n, h ) ( C k ( n, h )), respe c tively . F urthermor e we denote b y f k ( n, ℓ ) the num b er of k -noncro ssing diag rams with arbitra ry ar c-length a nd ℓ isola t ed vertices over n vertices and set M k ( n ) = P n ℓ =0 f k ( n, ℓ ). That is, M k ( n ) is the num b er of a ll k -noncrossing partial matchings. In Fig.6 we display the v ar ious t yp e s o f diag rams inv olved. 1 3 4 5 6 8 1 2 3 4 5 6 7 8 2 3 4 5 6 8 1 2 3 4 5 6 7 8 2 7 7 ( A) ( B) ( D ) ( C ) 1 9 1 0 9 1 0 9 1 0 9 1 0 Figure 6. Basic d i agram types: ( A) 4-noncrossing matching (no isolated p oin ts), (B) 3- noncrossing partial matc hing (isolated p o ints 4 and 9), (C) 4-noncrossing RN A structure with arc-length ≥ 4 and stack length ≥ 1, (D ) R N A structure with arc-length ≥ 5 and stac k-length ≥ 3. 2.1. k -noncrossing partial matc hi ngs and RNA structures. The follo wing identities are due to Grabiner and Ma gy a r [8] X n ≥ 0 f k ( n, 0) · x n n ! = det[ I i − j (2 x ) − I i + j (2 x )] | k − 1 i,j =1 (2.1) X n ≥ 0 ( n X ℓ =0 f k ( n, ℓ ) ) · x n n ! = e x det[ I i − j (2 x ) − I i + j (2 x )] | k − 1 i,j =1 , (2.2) where I r (2 x ) = P j ≥ 0 x 2 j + r j !( r + j )! denotes the hyperb olic B essel function of the ﬁrst kind o f or der r . Eq. (2.1 ) and (2.2 ) allow only “in principle” for explicit computatio n of the num be r s f k ( n, ℓ ) and in view of f k ( n, ℓ ) =  n ℓ  f k ( n − ℓ, 0) ev e r ything ca n be reduced to (p erfect) matc hing s , wher e we hav e the follo wing s it uation: there exists an asympto tic approximation of the determinan t of the hyperb olic Bessel function for general k due to [15] and employing the s ubt raction of sing ularities- principle [19] one c an pr o ve [1 5 ] (2.3) ∀ k ∈ N ; f k (2 n, 0) ∼ c k n − (( k − 1) 2 +( k − 1) / 2) (2( k − 1)) 2 n , c k > 0 , CANONICAL RNA PSEUDOKNOT STRUC TURES 9 k 2 3 4 5 6 7 8 9 10 γ − 1 k 2.6180 4.7913 6 .8541 8.8875 10 .9083 12.9226 14 .9330 16.9410 1 8.9472 T able 2. The exp onentia l growth rates of h k , 2 , 1 i - s tructures. k 4 5 6 7 8 9 γ [4] k − 1 6 . 5290 8 . 6483 10 . 7 176 12 . 7635 14 . 7963 16 . 8210 T able 3. The exp onentia l growth rates of h k , 4 , 1 i - s tructures. where ρ k = 1 2( k − 1) is the dominan t real singula rit y of P n ≥ 0 f k (2 n, 0) z 2 n . F or h k , 2 , 1 i -structures we have [11, 12] T k ( n ) = ⌊ n/ 2 ⌋ X b =0 ( − 1) b  n − b b  M k ( n − 2 b ) (2.4) T k ( n ) ∼ c k n − (( k − 1) 2 +( k − 1) / 2) ( γ k ) − n , c k > 0 , (2.5) where γ k is the unique, minimal solution o f z z 2 − z +1 = ρ k , see T ab. 2. F or h k , 4 , 1 i -structures we hav e acco rding to [9] the following exact enum eration result T [4] k ( n ) = X b ≤⌊ n 2 ⌋ ( − 1) b λ ( n, b ) M k ( n − 2 b ) , 4 ≤ k ≤ 9 , (2.6) where λ ( n, b ) denotes the num be r o f w ay of selecting b ar cs o f length ≤ 3 over n vertices a nd (2.7) T [4] k ( n ) ∼ c k n − (( k − 1) 2 +( k − 1) / 2)  γ [4] k  − n where γ [4] k is the unique po sitiv e, real s olution o f z r 1 ( − z 2 ) 1 − z r 1 ( − z 2 ) = ρ k where r 1 ( z ) sa tisﬁes u ( z ) = p 1 + 4 z − 4 z 2 − 6 z 3 + 4 z 4 + z 6 r 1 ( z ) = − − 2 z 2 + z 3 − 1 + u ( z ) 2(1 − 2 z − z 2 + z 4 ) . In T ab. 3 we pr esen t the e x ponential growth rates for T [4] k ( n ) fo r k = 4 , . . . , 9. F or h k , 2 , σ i - structures we hav e according to [13] (2.8) T k,σ ( x ) = 1 u 0 x 2 − x + 1 X n ≥ 0 f k (2 n, 0)  √ u 0 x u 0 x 2 − x + 1  2 n where u 0 = ( x 2 ) σ − 1 ( x 2 ) σ − x 2 +1 and (2.9) T k,σ ( n ) ∼ c k n − (( k − 1) 2 +( k − 1) / 2)  γ − 1 k  n 10 GANG MA AND CHRISTIAN M. REIDYS ⋆ k 2 3 4 5 6 7 8 9 10 σ = 2 1.968 0 2.5881 3.0382 3.4138 3.7438 4.0420 4.3162 4.5715 4.8115 σ = 3 1.716 0 2.0477 2.2704 2.4466 2.5955 2.7259 2.8427 2.9490 3.0469 σ = 4 1.578 2 1.7984 1.9410 2.0511 2.1423 2.2209 2.2904 2.3529 2.4100 σ = 5 1.489 9 1.6528 1.7561 1.8347 1.8991 1.9540 2.0022 2.0454 2.0845 T able 4. The exp onentia l growth rates h k , 2 , σ i -structures [13]. where γ k,σ is a positive real do m inant singular ity of P n ≥ 0 T k,σ ( n ) x n and the minimal po sitiv e real solution o f the eq uation (2.10) q ( x 2 ) σ − 1 ( x 2 ) σ − x 2 +1 x  ( x 2 ) σ − 1 ( x 2 ) σ − x 2 +1  x 2 − x + 1 = ρ k . In T able 4 we pres en t the exp onen tial growth rates o f h k , 2 , σ i -structures. 2.2. Singularity analysis. Let us next recall some basic fact ab out a na lytic functions. Pfring- sheim’s Theorem [23] guara n tees that each power series with p ositiv e co eﬃcien ts has a p ositiv e real dominant singularity . This singula rit y plays a key role for the a symptotics of the co eﬃcients. In the pro of o f Theorem 4 it will be imp ortan t to deduce relations betw een the co e ﬃ cients from functional equations of ge nerating functions. The class of theorems that deal with such deductions are called trans fer -theorems [6]. W e cons ider a s peciﬁc domain in which the functions in questio n are ana lytic and whic h is “ sligh tly” bigger than their resp ectiv e r adius o f co n vergence. It is tailor ed for extracting the co eﬃcients via Cauch y’s integral for m ula. Details on the method can be found in [6]. In ca se of D -ﬁnite functions we hav e analytic contin uation in any simply connected domain containing zero [2 6] and all prer equisites of sing ularit y analysis a re me t. T o b e precise, given tw o nu mbers φ, R , wher e R > 1 and 0 < φ < π 2 and ρ ∈ R , the op en doma in ∆ ρ ( φ, R ) is deﬁned as (2.11) ∆ ρ ( φ, R ) = { z | | z | < R , z 6 = ρ, | Arg ( z − ρ ) | > φ } A domain is a ∆ ρ -domain if it is o f the form ∆ ρ ( φ, R ) for some R and φ . A function is ∆ ρ -analytic if it is a nalytic in so me ∆ ρ -domain. W e use the nota t ion (2.12) ( f ( z ) = O ( g ( z )) a s z → ρ ) ⇐ ⇒ ( f ( z ) /g ( z ) is b ounded as z → ρ ) and if we w r ite f ( z ) = O ( g ( z )) it is implicitly ass umed that z tends to a (unique) singula rit y . [ z n ] f ( z ) denotes the co eﬃcien t of z n in the p o wer ser ies expansio n of f ( z ) ar ound 0 . CANONICAL RNA PSEUDOKNOT STRUC TURES 11 k q 0 ,k ( z ) M k 3 (1 / 4 − 4 z 2 ) z 2 {± 1 / 4 } 4 (144 z 4 − 40 z 2 + 1) z 6 {± 1 / 2 , ± 1 / 6 } 5 ( − 80 z 2 + 1024 z 4 + 1) z 8 {± 1 / 4 , ± 1 / 8 } 6 ( − 4144 z 4 + 140 z 2 + 1440 0 z 6 + 1) z 10 {± 1 / 2 , ± 1 / 6 , ± 1 / 10 , } 7 ( − 1 − 12 544 z 4 + 224 z 2 + 1474 56 z 6 ) z 12 {± 1 / 4 , ± 1 / 8 , ± 1 / 12 } 8 (1 − 336 z 2 + 3158 4 z 4 + 2822 400 z 8 − 8266 24 z 6 ) z 14 {± 1 / 2 , ± 1 / 6 , ± 1 / 10 , ± 1 / 1 4 } 9 − ( − 48 0 z 2 + 1 + 698 88 z 4 + 3774 8736 z 8 − 3358 720 z 6 ) z 16 {± 1 / 4 , ± 1 / 8 , ± 1 / 12 , ± 1 / 1 6 } T able 5. The polyn o mials q 0 ,k ( z ) and their nonzero roots. Theorem 1. [6] L et f ( z ) , g ( z ) b e D -ﬁnite, ∆ ρ -analytic functions with u nique dominant singularity ρ and su p p ose (2.13) f ( z ) = O ( g ( z )) for z → ρ . Then we ha ve (2.14) [ z n ] f ( z ) = K  1 − O ( 1 n )  [ z n ] g ( z ) , wher e K is some c onstant. Let F k ( z ) = P n f k (2 n, 0) z 2 n , the o r dinary generating function of k -noncrossing matc hings. It follows from eq. (2 .1 ) that the p o wer series F k ( z ) is D -ﬁnite, i.e. there exists some e ∈ N such that (2.15) q 0 ,k ( z ) d e dz e F k ( z ) + q 1 ,k ( z ) d e − 1 dz e − 1 F k ( z ) + · · · + q e,k ( z ) F k ( z ) = 0 , where q j,k ( z ) are p olynomials. The k ey p oin t is that any dominant s ingularit y of F k ( z ) is co ntained in the s et of r oots o f q 0 ,k ( z ) [22], w hich we deno te by M k . The p olynomials q 0 ,k ( z ) and their sets of r oots for k = 3 , . . . , 9 are given in T able 5. Acco rdingly , F k ( z ) has s ing ularities ± ρ k , wher e ρ k = (2( k − 1)) − 1 . As a co nsequence of Theore m 1, eq. (2.3) and the so called sup ercritical c a se of singularity ana ly sis [6], VI.9., p. 4 0 0, w e give the fo llowing result[14] tailor ed for o ur functiona l equations. 12 GANG MA AND CHRISTIAN M. REIDYS ⋆ Theorem 2 . Supp ose ϑ σ ( z ) is algebr aic over K ( z ) , r e gular for | z | < δ and satisﬁes ϑ σ (0) = 0 . Supp ose further γ k,σ is the unique solution with minimal mo dulus < δ of the two e quations ϑ σ ( x ) = ρ k and ϑ σ ( x ) = − ρ k . Then γ k,σ is the unique dominant singularity of F k ( ϑ σ ( z )) and (2.16) [ z n ] F k ( ϑ σ ( z )) ∼ c k n − (( k − 1) 2 +( k − 1) / 2)  γ − 1 k,σ  n . 3. Exact Enumera tion In this section we pr esen t the exa ct enumeration of h k , 4 , σ i -structures, where σ ≥ 3. The struc- ture o f our for m ula is ana logous to the M¨ obius in version formula prov ed in [13]: T k,σ ( n, h ) = P h − 1 b = σ − 1  b +(2 − σ )( h − b ) − 1 h − b − 1  C k ( n − 2 b , h − b ), w hich rela tes the num ber of all structures and the num- ber of cor e-structures. As we p oin ted out in the in tro duction the latter ca nn ot b e us e d in order to enum erate k -noncrossing structur e s with arc- length ≥ 4, see Fig .5 . W e consider the arc-s e t s β 2 = { ( i, i + 2) | i + 1 isola ted } and β 3 = { ( i, i + 3) | i + 1 , i + 2 isolated } and set β = β 2 ∪ β 3 . F urther more C ∗ k ( n, h ) = { δ | δ ∈ C k ( n, h ); δ co n tains no 1-a rc and no β - arc } (3.1) T ∗ k ( n, h ) = { δ | δ ∈ T k ( n, h ); δ con tains no 1-arc and no β -a rc } . (3.2) Theorem 3. Su p p ose we have k , h, σ ∈ N , k ≥ 2 , h ≤ n/ 2 and σ ≥ 3 . Then the nu mb er of h k , 4 , σ i -structu r es having exactly h ar cs is given by (3.3) T [4] k,σ ( n, h ) = h − 1 X b = σ − 1  b + (2 − σ )( h − b ) − 1 h − b − 1  C ∗ k ( n − 2 b, h − b ) wher e C ∗ k ( n, h ) s atisﬁ es C ∗ k ( n, 0) = 1 and C ∗ k ( n, h ) = h − 1 X b =0 ( − 1) h − b − 1  h − 1 b  T ∗ k ( n − 2 h + 2 b + 2 , b + 1) for h ≥ 1 . (3.4) F urthermor e, T ∗ k ( n, h ) s atisﬁ es (3.5) T ∗ k ( n, h ) = X 0 ≤ j 1 + j 2 + j 3 ≤ h ( − 1) j 1 + j 2 + j 3 λ ( n, j 1 , j 2 , j 3 ) f k ( n − 2 j 1 − 3 j 2 − 4 j 3 , n − 2 h − j 2 − 2 j 3 ) wher e λ ( n, j 1 , j 2 , j 3 ) =  n − j 1 − 2 j 2 − 3 j 3 j 1 , j 2 , j 3 , n − 2 j 1 − 3 j 2 − 4 j 3  . CANONICAL RNA PSEUDOKNOT STRUC TURES 13 n 8 9 10 11 12 13 14 15 16 17 18 19 20 21 2 2 23 24 T [4] 3 , 3 ( n ) 1 2 4 8 15 28 52 96 176 316 557 965 1660 2860 4974 8754 15562 T [4] 3 , 4 ( n ) 1 1 1 2 4 8 14 23 36 56 88 141 231 382 633 1038 1679 T able 6. Exact enumeration: T [4] 3 , 3 ( n ) and T [4] 3 , 4 ( n ) for n ≤ 24, resp ective ly . In T ab.6 we display the ﬁrst num b ers of h k , 4 , 3 i - s tructures and h k , 4 , 4 i -structures , r espectively . Pr o of. W e ﬁrst show that there exists a mapping from h k , 4 , σ i -structures with h ar cs over [ n ] int o ˙ S σ − 1 ≤ b ≤ h − 1 C ∗ k ( n − 2 b, h − b ) : (3.6) c : T [4] k,σ ( n, h ) → ˙ [ σ − 1 ≤ b ≤ h − 1 C ∗ k ( n − 2 b, h − b ) , δ 7→ c ( δ ) which is obtained in tw o steps: ﬁrst induce c ( δ ) by mapping arcs and is o lated vertices a s follows: (3.7) ∀ ℓ ≥ σ − 1; (( i − ℓ, j + ℓ ) , . . . , ( i, j )) 7→ ( i, j ) and j 7→ j if j is a n isolated vertex and seco nd relab el the r esulting dia gram fro m left to right in incr e asing or der, see Fig.7. Claim 1 . c : T [4] k,σ ( n, h ) − → ˙ S σ − 1 ≤ b ≤ h − 1 C ∗ k ( n − 2 b, h − b ) is well-deﬁned and s urjectiv e. 1 2 3 4 5 6 7 8 9 1 0 11 1 2 3 4 7 1 2 3 4 8 7 1 2 5 6 1 3 14 Figure 7. T he mapping c : T [4] k,σ ( n, h ) − → ˙ S σ − 1 ≤ b ≤ h − 1 C ∗ k ( n − 2 b, h − b ) is obtained in tw o steps: ﬁrst contraction of the stacks while keeping isolated p o ints and secondly relabeling of t h e resulting diagram. By construction, c do es not change the cr ossing num b er. Since T [4] k,σ ( n ) contains only arcs of length ≥ 4 we derive c ( T [4] k,σ ( n )) ⊂ C ∗ k ( n − 2 b , h − b ). Therefor e c is well-deﬁned. It remains to show that c is sur j ective. F or this purpo se let δ ∈ C ∗ k ( n − 2 b, h − b ) and set a = b − ( σ − 1)( h − b ). W e pr o ceed constructing a k -noncross ing structure ˜ δ in three steps: Step 1 . replace each lab el i by r i , where r i ≤ r s if and only if i ≤ s . Step 2 . replace the leftmost ar c ( r p , r q ) by the sequence of ar c s (3.8) (( τ p − ([ σ − 1] + a ) , τ q + ([ σ − 1] + a )) , . . . , ( τ p , τ q )) 14 GANG MA AND CHRISTIAN M. REIDYS ⋆ replace any other a rc ( r p , r q ) by the s equence (3.9) (( τ p − [ σ − 1] , τ q + [ σ − 1]) , . . . , ( τ p , τ q )) and each isolated vertex r s by τ s . Step 3 . Set for x, y ∈ Z , τ b + y ≤ τ c + x if and only if ( b < c ) or ( b = c and y ≤ x ). By construction, ≤ is a linear order ov er n − 2 b + 2( h − b ) ( σ − 1) + 2 a = n − 2 b + 2( h − b ) ( σ − 1) + 2( b − ( σ − 1 )( h − b )) = n elements, which w e then lab el fro m 1 to n (left to right) in increasing order. It is stra igh tforward to verify that c ( ˜ δ ) = δ holds. It rema ins to show that ˜ δ ∈ T [4] k,σ ( n ). Supp ose a c ont r ario ˜ δ co n tains an arc ( i , i + 2 ) . Since σ ≥ 3 w e can then conclude that i + 1 is necessarily isolated. The arc ( i, i + 2) is mapp ed b y c into ( j, j + 2) with isolated p oint j + 1, which is imp ossible b y deﬁnition of C ∗ k ( n ′ , h ′ ). It follows similarly that an a rc of the form ( i, i + 3) cannot b e c on tained in ˜ δ and Claim 1 follows. Lab eling the h arcs of δ ∈ T [4] k,σ ( n, h ) from left to right and keeping track of mult iplicities gives rise to the map (3.10) f k,σ : T [4] k,σ ( n, h ) → ˙ [ σ − 1 ≤ b ≤ h − 1   C ∗ k ( n − 2 b, h − b ) ×    ( a j ) 1 ≤ j ≤ h − b | h − b X j =1 a j = b, a j ≥ σ − 1      , given by f k,σ ( δ ) = ( c ( δ ) , ( a j ) 1 ≤ j ≤ h − b ). W e can conc lude that f k,σ is well-deﬁned a nd a bijectio n. W e pr oceed co m puting the multip licities of the res ult ing cor e-structures [13]: (3.11) |{ ( a j ) 1 ≤ j ≤ b | h − b X j =1 a j = b ; a j ≥ σ − 1 }| =  b + (2 − σ )( h − b ) − 1 h − b − 1  . Eq. (3.1 1 ) and eq. (3.1 0 ) imply (3.12) T [4] k,σ ( n, h ) = h − 1 X b = σ − 1  b + (2 − σ )( h − b ) − 1 h − b − 1  C ∗ k ( n − 2 b, h − b ) , whence eq. (3.3). Nex t we consider the ma p (3.13) c ∗ : T ∗ k ( n, h ) → ˙ [ 0 ≤ b ≤ h − 1 C ∗ k ( n − 2 b, h − b ) , δ 7→ c ∗ ( δ ) Indeed, c ∗ is well deﬁned, since a n y diag ram in T ∗ k ( n, h ) ca n be mapped into a core structure without 1 - and β - arcs, i.e. into an element o f C ∗ k ( n ′ , h ′ ). That gives rise to T ∗ k ( n, h ) = h − 1 X b =0  h − 1 b  C ∗ k ( n − 2 b, h − b ) (3.14) CANONICAL RNA PSEUDOKNOT STRUC TURES 15 and via M¨ obius-inv ers ion form ula we obtain eq. (3.4). It is str aigh tfor w ard to s how ther e are λ ( n, j 1 , j 2 , j 3 ) =  n − j 1 − 2 j 2 − 3 j 3 j 1 ,j 2 ,j 3 ,n − 2 j 1 − 3 j 2 − 4 j 3  wa ys to select j 1 1-arcs , j 2 β 2 -arcs and j 3 β 3 -arcs over [ n ]. Since removing j 1 1-arcs , j 2 β 2 -arcs a nd j 3 β 3 -arcs r emo ves 2 j 1 + 3 j 2 + 4 j 3 vertices, the num b er of conﬁguratio ns of at least j 1 1-arcs , j 2 β 2 -arcs a nd j 3 β 3 -arcs is given by λ ( n, j 1 , j 2 , j 3 ) f k ( n − 2 j 1 − 3 j 2 − 4 j 3 , n − 2 h − j 2 − 2 j 3 ). Via inclusion- e x clusion principle, we arrive at T ∗ k ( n, h ) = X 0 ≤ j 1 + j 2 + j 3 ≤ h ( − 1) j 1 + j 2 + j 3 λ ( n, j 1 , j 2 , j 3 ) f k ( n − 2 j 1 − 3 j 2 − 4 j 3 , n − 2 h − j 2 − 2 j 3 ) , whence Theor em 3.  The following functional iden tit y , re lating the biv ariate generating functions of T [4] k,σ ( n, h ) and C ∗ k ( n, h ), is ins t rumental fo r pr o ving o ur ma in re sult in the next s ection, Theo rem 4. Lemma 1. [13] L et k, σ ∈ N , k ≥ 2 and let u, x b e indeterminants. Supp ose we have (3.15) ∀ h ≥ 1 , A k,σ ( n, h ) = h − 1 X b = σ − 1  b + (2 − σ )( h − b ) − 1 h − b − 1  B k ( n − 2 b, h − b ) and A k,σ ( n, 0) = 1 . Then we ha ve the fun ctio nal r elation X n ≥ 0 X 0 ≤ h ≤ n 2 A k,σ ( n, h ) u h x n = X n ≥ 0 X 0 ≤ h ≤ n 2 B k ( n, h )  u · ( ux 2 ) σ − 1 1 − ux 2  h x n . (3.16) According to Lemma 1 eq. (3.14) a nd eq. (3.3) we obtain the tw o functional ident ities X n ≥ 0 X 0 ≤ h ≤ n 2 T ∗ k ( n, h ) u h x n = X n ≥ 0 X 0 ≤ h ≤ n 2 C ∗ k ( n, h )  u 1 − ux 2  h x n (3.17) X n ≥ 0 T [4] k,σ ( n ) x n = X n ≥ 0 X 0 ≤ h ≤ n 2 C ∗ k ( n, h )  ( x 2 ) σ − 1 1 − x 2  h x n for σ ≥ 3 . (3.18) 4. Asymptotic Enumera tion In this section w e study the asymptotics o f h k , 4 , σ i -structures, where σ ≥ 3. W e are particula rly int erested in deriving s imple for m ulas that can b e used for ass essing the complexity of prediction 16 GANG MA AND CHRISTIAN M. REIDYS ⋆ algorithms for k -noncrossing RNA structures. In or der to state T he o rem 4 b elo w we introduce w 0 ( x ) = x 2 σ − 2 1 − x 2 + x 2 σ (4.1) v ( x ) = 1 − x + w ( x ) x 2 + w ( x ) x 3 + w ( x ) x 4 (4.2) v 0 ( x ) = 1 − x + w 0 ( x ) x 2 + w 0 ( x ) x 3 + w 0 ( x ) x 4 . (4.3) Theorem 4. L et k , σ ∈ N , k , σ ≥ 3 , x b e an indeterminate and ρ k the dominant, p ositive r e al singularity of P n ≥ 0 f k (2 n, 0) z 2 n . Then T [4] k,σ ( x ) , the gener ating function of h k, 4 , σ i -structur es is given by (4.4) T [4] k,σ ( x ) = 1 v 0 ( x ) X n ≥ 0 f k (2 n, 0) p w 0 ( x ) x v 0 ( x ) ! 2 n . F urthermor e (4.5) T [4] k,σ ( n ) ∼ c k n − ( k − 1) 2 − k − 1 2 1 γ [4] k,σ ! n , for k = 3 , 4 , . . . , 9 holds, wher e γ [4] k,σ is the p ositive r e al dominant singularity of T [4] k,σ ( x ) and the minimal p ositive re al solution of the e quation √ w 0 ( x ) x v 0 ( x ) = ρ k and f k (2 n, 0) ∼ n − ( k − 1) 2 − k − 1 2  1 ρ k  2 n (e q. (2.3)). Pr o of. In the following we will use the notation w 0 instead o f w 0 ( x ), eq. (4.1). The ﬁr st step derives a functional equation r e lating the biv aria te generating functions of T ∗ k ( n, h ) and f k (2 h ′ , 0). F or this purp ose we use eq. (3.5). Claim 1 . X n ≥ 0 X h ≤ n 2 T ∗ k ( n, h ) w h x n = 1 v ( x ) X n ≥ 0 f k (2 n, 0)  √ w x v ( x )  2 n . (4.6) CANONICAL RNA PSEUDOKNOT STRUC TURES 17 Set ϕ m ( w ) = P h ≤ m 2  m 2 h  f k (2 h, 0) w h . In order to prove Claim 1 we compute X n ≥ 0 X h ≤ n 2 T ∗ k ( n, h ) w h x n = X n ≥ 0 X h ≤ n 2 X 0 ≤ j 1 + j 2 + j 3 ≤ h ( − 1) j 1 + j 2 + j 3 λ ( n, j 1 , j 2 , j 3 ) f k ( n − 2 j 1 − 3 j 2 − 4 j 3 , n − 2 h − j 2 − 2 j 3 ) w h x n = X n ≥ 0 X j 1 + j 2 + j 3 ≤ n 2 ( − 1) j 1 + j 2 + j 3 λ ( n, j 1 , j 2 , j 3 ) x n × X h ≥ j 1 + j 2 + j 3 f k ( n − 2 j 1 − 3 j 2 − 4 j 3 , n − 2 h − j 2 − 2 j 3 ) w h = X n ≥ 0 X j 1 + j 2 + j 3 ≤ n 2 ( − 1) j 1 + j 2 + j 3 λ ( n, j 1 , j 2 , j 3 ) x n × X h ≥ j 1 + j 2 + j 3  n − 2 j 1 − 3 j 2 − 4 j 3 n − 2 h − j 2 − 2 j 3  f k (2( h − j 1 − j 2 − j 3 ) , 0) w h = X n ≥ 0 X j 1 + j 2 + j 3 ≤ n 2 ( − 1) j 1 + j 2 + j 3 λ ( n, j 1 , j 2 , j 3 ) w j 1 + j 2 + j 3 ϕ n − 2 j 1 − 3 j 2 − 4 j 3 ( w ) x n . W e interchange the summation ov er j 1 + j 2 + j 3 and n and a rriv e at X j 1 + j 2 + j 3 ≥ 0 X n ≥ 2 j 1 +3 j 2 +4 j 3 ( − 1) j 1 + j 2 + j 3  n − j 1 − 2 j 2 − 3 j 3 j 1 , j 2 , j 3 , n − 2 j 1 − 3 j 2 − 4 j 3  w j 1 + j 2 + j 3 ϕ n − 2 j 1 − 3 j 2 − 4 j 3 ( w ) x n = X j 1 + j 2 + j 3 ≥ 0 ( − w ) j 1 + j 2 + j 3 j 1 ! j 2 ! j 3 ! X n ≥ 2 j 1 +3 j 2 +4 j 3 ( n − j 1 − 2 j 2 − 3 j 3 )! ( n − 2 j 1 − 3 j 2 − 4 j 3 )! ϕ n − 2 j 1 − 3 j 2 − 4 j 3 ( w ) x n . Setting m = n − 2 j 1 − 3 j 2 − 4 j 3 this b ecomes = X j 1 + j 2 + j 3 ≥ 0 ( − w ) j 1 + j 2 + j 3 j 1 ! j 2 ! j 3 ! x 2 j 1 +3 j 2 +4 j 3 X m ≥ 0 ( m + j 1 + j 2 + j 3 )! m ! ϕ m ( w ) x m = X m ≥ 0   X j 1 + j 2 + j 3 ≥ 0  m + j 1 + j 2 + j 3 m, j 1 , j 2 , j 3  ( − wx 2 ) j 1 ( − wx 3 ) j 2 ( − wx 4 ) j 3   ϕ m ( w ) x m = X m ≥ 0 ϕ m ( w ) x m  1 1 + w x 2 + wx 3 + wx 4  m +1 = 1 1 + w x 2 + wx 3 + wx 4 X m ≥ 0 ϕ m ( w )  x 1 + w x 2 + wx 3 + wx 4  m . 18 GANG MA AND CHRISTIAN M. REIDYS ⋆ Next we compute X m ≥ 0 ϕ m ( w ) y m = Z ∞ 0 X m ≥ 0 ϕ m ( w ) ( xy ) m m ! e − x dx = Z ∞ 0 X m ≥ 0 X h ≤ m 2  m 2 h  f k (2 h, 0) w h ( xy ) m m ! e − x dx = Z ∞ 0 X m ≥ 0 X h ≤ m 2 f k (2 h, 0) w h ( xy ) 2 h (2 h )! ( xy ) m − 2 h ( m − 2 h )! e − x dx = Z ∞ 0 X h ≥ 0 f k (2 h, 0) ( √ w xy ) 2 h (2 h )! X m ≥ 2 h ( xy ) m − 2 h ( m − 2 h )! e − x dx = X n ≥ 0 f k (2 n, 0) ( √ w y ) 2 n (2 n )! Z ∞ 0 e − (1 − y ) x x 2 n dx = X n ≥ 0 f k (2 n, 0) ( √ w y ) 2 n (2 n )! R ∞ 0 e − (1 − y ) x ((1 − y ) x ) 2 n d ((1 − y ) x ) (1 − y ) 2 n +1 = 1 1 − y X n ≥ 0 f k (2 n, 0)  √ w y 1 − y  2 n . Therefore the biv aria te genera t ing function ca n b e written as X n ≥ 0 X h ≤ n 2 T ∗ k ( n, h ) w h x n = 1 v ( x ) X n ≥ 0 f k (2 n, 0)  √ w x v ( x )  2 n , whence Claim 1. In view of eq. (3.17) and Claim1 we a r riv e at (4.7) X n ≥ 0 X 0 ≤ h ≤ n 2 C ∗ k ( n, h )  w 1 − w x 2  h x n = 1 v ( x ) X n ≥ 0 f k (2 n, 0)  √ w x v ( x )  2 n . By deﬁnition of w 0 = w 0 ( x ) hav e (4.8) ( x 2 ) σ − 1 1 − x 2 = w 0 1 − w 0 x 2 . CANONICAL RNA PSEUDOKNOT STRUC TURES 19 According to eq.(3.18), e q.(4.8) and eq.(4 .7 ) this allows us to derive T [4] k,σ ( x ) = X n ≥ 0 X 0 ≤ h ≤ n 2 C ∗ k ( n, h )  ( x 2 ) σ − 1 1 − x 2  h x n = X n ≥ 0 X 0 ≤ h ≤ n 2 C ∗ k ( n, h )  w 0 1 − w 0 x 2  h x n = 1 v 0 ( x ) X n ≥ 0 f k (2 n, 0)  √ w 0 x v 0 ( x )  2 n , whence (4.4). Let V k ( x ) = P n ≥ 0 f k (2 n, 0)  √ w 0 x v 0 ( x )  2 n . Claim 2 . The unique, minimal, p ositiv e, real solution o f (4.9) ϑ σ ( x ) = √ w 0 x v 0 ( x ) = ρ k , for k = 3 , 4 , . . . , 9 denoted by γ [4] k,σ is the unique dominant singularity of T [4] k,σ ( x ). Clearly , a dominant singula rit y of 1 v 0 ( x ) V k ( x ) is either a singularity o f V k ( x ) or 1 v 0 ( x ) . Supp ose there e x ists some singula rit y ζ ∈ C which is a p ole of 1 v 0 ( x ) . By co nstruction ζ 6 = 0 a nd ζ is necessarily a non-ﬁnite singularity of V k ( x ). If | ζ | ≤ γ [4] k,σ , then we arr iv e at the co n tradiction | V k ( ζ ) | > | V k ( γ [4] k,σ ) | ≥ V k ( | ζ | ) since V k ( ζ ) is not ﬁnite and V k ( γ [4] k,σ ) = P n ≥ 0 f k (2 n, 0) ρ 2 n k < ∞ . Therefor e all dominant singu- larities of T [4] k,σ ( x ) are sing ularities of V k ( x ). According to P ringsheim’s Theorem [23], T [4] k,σ ( x ) has a dominant p ositiv e rea l singular it y which by construction equals γ [4] k,σ being the minimal po sitiv e real s o lution of eq. (4 .9). T o pr o ve this, we use that for 3 ≤ k ≤ 9, the gener ating function F k ( x ) has only the tw o do mina n t singular ities ± ρ k , see Section 2, T ab. 5. F urthermor e we verify tha t fo r 3 ≤ k ≤ 9 , γ [4] k,σ , has s tr ictly smaller mo dulus than all solutions o f ϑ σ ( z ) = − ρ k , whence Claim 2. Accordingly , Theor em 2 applies and we have (4.10) T [4] k,σ ( n ) ∼ c k n − ( k − 1) 2 − k − 1 2 1 γ [4] k,σ ! n for so m e consta n t c k completing the pro of of Theor em 4 .  Ac knowledgmen ts. W e a r e gr ateful to Hillary S. W. Han, F enix. W. D. Huang, Emma Y. Jin and Linda Y. M. Li for their help. This work w a s supp orted by the 973 Pro ject, the PCSIR T 20 GANG MA AND CHRISTIAN M. REIDYS ⋆ Pro ject of the Ministry of Education, the Ministry of Science and T echnology , and the Nationa l Science F oundatio n of China. References [1] Mapping RNA form and function Science, 2 (2005). [2] Andr ´ e . D., 1887. Solution directed du proble ` e me, r ´ e solu par M. Bertrand. C. R. Acad. Sci. Paris, 105, 436–437. [3] T. Akutsu, Dynamic pr o gr amming algorithms for RNA se c ondary structur e pr e diction with pseudoknots Dis c r. Appl. Math. 1 04 (200 0), 45-62. [4] W.Y.C. Chen, E.Y. P . Deng, R.R.X. Du, R. P . Stanley and C.H. Y an, Cr ossings and Nestings of Matchings and Partitions T rans. Amer. Math. So c . 359 (2007), No. 4, 1555–1575. [5] W.Y.C. Chen, J. Qin and C. M. Reidys, Cr ossings and Nestings of tangle d-diagr ams , [6] P . Fla jolet and R. Sedgewic k, Analytic c ombinatorics , (2007). [7] Gessel. I. M., Zeilberger. D., 1992. Random walk i n a W eyl cham b er, Proc. Amer. Math. So c. 115 27–31. [8] D.J. Grabiner and P . Magy ar , R andom walks in Weyl c h amb ers and the de c omp osition of tensor p owers D i scr. Appl. Math. 2 ( 1993),239-260. [9] H.S.W. Han and C.M.Reidys, Pseudoknot RNA structur e with ar c-leng th ≥ 4 . J. Comp. Biol. , to appear. [10] Huang, F.W.D., P eng W.W.J., Reidys, C. M. 2008. F olding RNA pseudoknot structur es , in preparation [11] E.Y. Jin, J. Qin and C.M. Reidys, Combinatorics of RNA structur es with pseudoknots Bull. Math. Biol . 70 (2008), 45-67, PMID: 178961 59. [12] E.Y. Jin and C.M. Reidys, Asymptotic e nu mb er ation of RNA struct u r es with pseudoknots Bull. Math. B i ol. (2007), DOI 10.1007/s115 38-007-9265-2. [13] E.Y. Jin and C.M. Reidys, RNA-LEGO: Combinatorial Design of Pseudoknot RNA. A d v. Appl. Math., to appear. [14] E.Y. Ji n and C.M. Reidys, RNA pseudoknot st ru ctur es with ar c-lengt h ≥ 3 and st a ck-length ≥ σ . submi t ted. [15] E.Y. Ji n , C.M. Reidys and R. W ang, Asymptotic analysis k-noncr ossing matchings , 2008, [16] D.A.M. Konings and R.R. Gutell, A c omp arison of thermo dynamic foldings with c omp ar atively derived struc- tur es of 16S and 16S-like rRNAs RNA. 1 (1995),559-574 . [17] A. Lori a and T. P an, Domain structur e of the rib ozyme f r om eub acterial rib onucle ase P RNA. 2 ( 1996), 551-563. [18] R. Lyngso and C. Pedersen, Pseudoknots in RNA se co ndary structur es Ph ysics of Biological Systems: F rom Molecules to Species Sci. (1996). [19] A.M. Odlyzk o, Asymptotic enumer ati o n metho ds , Handbo ok of combinatorics V ol. 2, 1021-1231, (1995). [20] N. Parkin, M. Chamorro and H.E. V armus, A n RNA pseudoknot and an optimal heptameric shift site ar e r e quir e d for highly eﬃcient rib osomal fr ameshifting on a r e t r ovir al messenger RNA. J. Pro c Natl Acad Sci USA, 89 (1991)713-717. [21] E. Riv as and S. Eddy , A Dynamic Pr o gr amming Algorithm f or RNA st ru ctur e pr e diction inclusing pseudoknots J. Mol. Biol. 2 85 (19 99), 2053-2068. [22] R.P . Stanley , Di ﬀe rent iably ﬁnite pow er series. Europ. J. Combinat orics. 1 , (1980), 175-188. [23] E.C. Ti t c hmarsh, The the ory of functions Oxford Unive rsity Pr e ss, London, 1939. CANONICAL RNA PSEUDOKNOT STRUC TURES 21 [24] C. T uerk, S. M acDougal and L. Gold, RNA pseudoknots that inhibit human immuno deﬁcie ncy virus typ e 1 r everse tr anscriptase. Pro c. Natl. Acad. Sci. USA, 89 ( 1992)698 8-6992. [25] Y. Uemura, A . Hasega wa, S. Koba yashi and T. Y okomori, T r e e adjoining gr ammars for RNA structur e pr e dic- tion Theoret. Comput. Sci. 210 (1999), 277-303. [26] W. W asow, Asymptotic exp ansions for or dinary diﬀer ent ial e quations, Dover (1987) . A r eprint of the John Wiley ed ition (1965). [27] M.S. W aterman, Combinatorics of RNA hairpins and cloverle afs St ud. Appl. Math. 60 (1979), 91-96. [28] M.S. W aterman, Se c ondary structur e of si ng l e - str ande d nucleic acids Adv. Math.I (suppl.) 1 (1978), 167-212. [29] J.A. How ell , T.F. Smith and M.S. W aterman, Compu tation of gener ating functions for biolo gica l mole cules SIAM J. Appl. Math. 3 9 (1 980), 119-133. [30] M.S. W aterman and T.F. Smith, R apid dynamic pr o gr amming algorithms for RNA sec ondary structur e. Adv. Appl. Math. 7 ( 1986), 455-464. [31] D. Zeilb erger, A Holonomic systems appr o ach to sp ecial f u nctions identit ies . J. of Computational and Applied Math. 32 (1990 ), 321-368. Center for Combin a torics, LPMC-TJKLC, Nankai University, Tianjin 3 00071, P.R. China, Phone: * 86- 22-2350-6 800, F ax: *86-22-2350 -9272 E-mail addr ess : reidys@nankai .edu.cn

Canonical RNA pseudoknot structures

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment