Many computerized methods for RNA-RNA interaction structure prediction have been developed. Recently, $O(N^6)$ time and $O(N^4)$ space dynamic programming algorithms have become available that compute the partition function of RNA-RNA interaction complexes. However, few of these methods incorporate the knowledge concerning related sequences, thus relevant evolutionary information is often neglected from the structure determination. Therefore, it is of considerable practical interest to introduce a method taking into consideration both thermodynamic stability and sequence covariation. We present the \emph{a priori} folding algorithm \texttt{ripalign}, whose input consists of two (given) multiple sequence alignments (MSA). \texttt{ripalign} outputs (1) the partition function, (2) base-pairing probabilities, (3) hybrid probabilities and (4) a set of Boltzmann-sampled suboptimal structures consisting of canonical joint structures that are compatible to the alignments. Compared to the single sequence-pair folding algorithm \texttt{rip}, \texttt{ripalign} requires negligible additional memory resource. Furthermore, we incorporate possible structure constraints as input parameters into our algorithm. The algorithm described here is implemented in C as part of the \texttt{rip} package. The supplemental material, source code and input/output files can freely be downloaded from \url{http://www.combinatorics.cn/cbpc/ripalign.html}. \section{Contact} Christian Reidys \texttt{duck@santafe.edu}
Deep Dive into RNA-RNA interaction prediction based on multiple sequence alignments.
Many computerized methods for RNA-RNA interaction structure prediction have been developed. Recently, $O(N^6)$ time and $O(N^4)$ space dynamic programming algorithms have become available that compute the partition function of RNA-RNA interaction complexes. However, few of these methods incorporate the knowledge concerning related sequences, thus relevant evolutionary information is often neglected from the structure determination. Therefore, it is of considerable practical interest to introduce a method taking into consideration both thermodynamic stability and sequence covariation. We present the \emph{a priori} folding algorithm \texttt{ripalign}, whose input consists of two (given) multiple sequence alignments (MSA). \texttt{ripalign} outputs (1) the partition function, (2) base-pairing probabilities, (3) hybrid probabilities and (4) a set of Boltzmann-sampled suboptimal structures consisting of canonical joint structures that are compatible to the alignments. Compared to the singl
arXiv:1003.3987v3 [math-ph] 14 Jul 2010
BIOINFORMATICS
Vol. 00 no. 00
Pages 1–8
RNA-RNA interaction prediction based on multiple
sequence alignments
Andrew X. Li 1, Manja Marz 2, Jing Qin 3, Christian M. Reidys 1,4∗
1Center for Combinatorics, LPMC-TJKLC, Nankai University Tianjin 300071, P.R. China
2 RNA Bioinformatics Group, Philipps-University Marburg, Marbacher Weg 6, 34037 Marburg,
Germany
3Max Planck Institute for Mathematics in the Sciences, Inselstrasse 22, D-04103 Leipzig, Germany
4College of Life Science, Nankai University Tianjin 300071, P.R. China.
Received on *****; revised on *****; accepted on *****
Associate Editor: *****
ABSTRACT
Motivation Many computerized methods for RNA-RNA interaction
structure prediction have been developed. Recently, O(N6) time
and O(N4) space dynamic programming algorithms have become
available that compute the partition function of RNA-RNA interaction
complexes.
However,
few
of
these
methods
incorporate
the
knowledge concerning related sequences, thus relevant evolutionary
information is often neglected from the structure determination.
Therefore, it is of considerable practical interest to introduce a method
taking into consideration both thermodynamic stability and sequence
covariation.
Results We present the a priori folding algorithm ripalign,
whose input consists of two (given) multiple sequence alignments
(MSA). ripalign outputs (1) the partition function, (2) base-pairing
probabilities, (3) hybrid probabilities and (4) a set of Boltzmann-
sampled suboptimal structures consisting of canonical joint structures
that are compatible to the alignments. Compared to the single
sequence-pair folding algorithm rip, ripalign requires negligible
additional memory resource. Furthermore, we incorporate possible
structure constraints as input parameters into our algorithm.
Availability
The
algorithm
described
here
is
implemented
in
C as part of the rip package.
The supplemental material,
source code and input/output files can freely be downloaded from
http://www.combinatorics.cn/cbpc/ripalign.html.
Contact Christian Reidys duck@santafe.edu
Keywords multiple sequence alignment, RNA-RNA interaction,
joint structure, dynamic programming, partition function, base
pairing probability, hybrid, loop, RNA secondary structure.
1
INTRODUCTION
RNA-RNA interactions play a major role at many different
levels of the cellular metabolism such as plasmid replication
control, viral encapsidation, or transcriptional and translational
regulation. With the discovery that a large number of transcripts
∗to whom correspondence should be addressed. Phone: *86-22-2350-6800;
Fax: *86-22-2350-9272; duck@santafe.edu
in higher eukaryotes are noncoding RNAs, RNA-RNA interactions
in cellular metabolism are gaining in prominence.
Typical
examples of interactions involving two RNA molecules are snRNAs
(Forne et al., 1996); snoRNAs with their targets (Bachellerie et al.,
2002); micro-RNAs from the RNAi pathway with their mRNA
target (Ambros, 2004; Murchison and Hannon, 2004); sRNAs from
Escherichia coli (Hershberg et al., 2003; Repoila et al., 2003); and
sRNA loop-loop interactions (Brunel et al., 2003). The common
feature in many ncRNA classes, especially prokaryotic small RNAs,
is the formation of RNA-RNA interaction structures that are much
more complex than the simple sense-antisense interactions.
As it is the case for the general RNA folding problem
with unrestricted pseudoknots (Akutsu, 2000), the RNA-RNA
interaction problem (RIP) is NP-complete in its most general
form (Alkan et al., 2006; Mneimneh, 2009). However, polynomial-
time algorithms can be derived by restricting the space of
allowed configurations in ways that are similar to pseudoknot
folding algorithms (Rivas and Eddy, 1999). The simplest approach
concatenates the two interacting sequences and subsequently
employs a slightly modified standard secondary structure folding
algorithm. The algorithms RNAcofold (Hofacker et al., 1994;
Bernhart et al., 2006), pairfold (Andronescu et al., 2005), and
NUPACK (Ren et al., 2005) subscribe to this strategy. A major
shortcoming of this approach is that it cannot predict important
motifs such as kissing-hairpin loops. The paradigm of concatenation
has also been generalized to the pseudoknot folding algorithm of
Rivas and Eddy (1999). The resulting model, however, still does not
generate all relevant interaction structures (Chitsaz et al., 2009b).
An alternative line of thought is to neglect all internal base-pairings
in either strand and to compute the minimum free energy (MFE)
secondary structure for their hybridization under this constraint. For
instance, RNAduplex and RNAhybrid (Rehmsmeier et al., 2004)
follows this line of thought. RNAup (M¨uckstein et al., 2006, 2008)
and intaRNA (Busch et al., 2008) restrict interactions to a single
interval that remains unpaired in the secondary structure for each
partner. These models have proved particularly useful for bacterial
sRNA/mRNA interactions (Geissmann
…(Full text truncated)…
This content is AI-processed based on ArXiv data.