A new distance for high level RNA secondary structure comparison

Reading time: 6 minute
...

📝 Original Info

  • Title: A new distance for high level RNA secondary structure comparison
  • ArXiv ID: 0810.4002
  • Date: 2008-10-23
  • Authors: Researchers from original ArXiv paper

📝 Abstract

We describe an algorithm for comparing two RNA secondary structures coded in the form of trees that introduces two new operations, called node fusion and edge fusion, besides the tree edit operations of deletion, insertion, and relabeling classically used in the literature. This allows us to address some serious limitations of the more traditional tree edit operations when the trees represent RNAs and what is searched for is a common structural core of two RNAs. Although the algorithm complexity has an exponential term, this term depends only on the number of successive fusions that may be applied to a same node, not on the total number of fusions. The algorithm remains therefore efficient in practice and is used for illustrative purposes on ribosomal as well as on other types of RNAs.

💡 Deep Analysis

Deep Dive into A new distance for high level RNA secondary structure comparison.

We describe an algorithm for comparing two RNA secondary structures coded in the form of trees that introduces two new operations, called node fusion and edge fusion, besides the tree edit operations of deletion, insertion, and relabeling classically used in the literature. This allows us to address some serious limitations of the more traditional tree edit operations when the trees represent RNAs and what is searched for is a common structural core of two RNAs. Although the algorithm complexity has an exponential term, this term depends only on the number of successive fusions that may be applied to a same node, not on the total number of fusions. The algorithm remains therefore efficient in practice and is used for illustrative purposes on ribosomal as well as on other types of RNAs.

📄 Full Content

A New Distance for High Level RNA Secondary Structure Comparison Julien Allali and Marie-France Sagot Abstract—We describe an algorithm for comparing two RNA secondary structures coded in the form of trees that introduces two new operations, called node fusion and edge fusion, besides the tree edit operations of deletion, insertion, and relabeling classically used in the literature. This allows us to address some serious limitations of the more traditional tree edit operations when the trees represent RNAs and what is searched for is a common structural core of two RNAs. Although the algorithm complexity has an exponential term, this term depends only on the number of successive fusions that may be applied to a same node, not on the total number of fusions. The algorithm remains therefore efficient in practice and is used for illustrative purposes on ribosomal as well as on other types of RNAs. Index Terms—Tree comparison, edit operation, distance, RNA, secondary structure.  1 INTRODUCTION R NAS are one of the fundamental elements of a cell. Their role in regulation has been recently shown to be far more prominent than initially believed (20 December 2002 issue of Science, which designated small RNAs with regulatory function as the scientific breakthrough of the year). It is now known, for instance, that there is massive transcription of noncoding RNAs. Yet current mathematical and computer tools remain mostly inadequate to identify, analyze, and compare RNAs. An RNA may be seen as a string over the alphabet of nucleotides (also called bases), {A, C, G, T}. Inside a cell, RNAs do not retain a linear form, but instead fold in space. The fold is given by the set of nucleotide bases that pair. The main type of pairing, called canonical, corresponds to bonds of the type A  U and G  C. Other rarer types of bonds may be observed, the most frequent among them is G  U, also called the wobble pair. Fig. 1 shows the sequence of a folded RNA. Each box represents a consecutive sequence of bonded pairs, corresponding to a helix in 3D space. The secondary structure of an RNA is the set of helices (or the list of paired bases) making up the RNA. Pseudoknots, which may be described as a pair of interleaved helices, are in general excluded from the secondary structure of an RNA. RNA secondary structures can thus be represented as planar graphs. An RNA primary structure is its sequence of nucleotides while its tertiary structure corresponds to the geometric form the RNA adopts in space. Apart from helices, the other main structural elements in an RNA are: 1. hairpin loops which are sequences of unpaired bases closing a helix; 2. internal loops which are sequences of unpaired bases linking two different helices; 3. bulges which are internal loops with unpaired bases on one side only of a helix; 4. multiloops which are unpaired bases linking at least three helices. Stems are successions of one or more among helices, internal loops, and/or bulges. The comparison of RNA secondary structures is one of the main basic computational problems raised by the study of RNAs. It is the problem we address in this paper. The motivations are many. RNA structure comparison has been used in at least one approach to RNA structure prediction that takes as initial data a set of unaligned sequences supposed to have a common structural core [1]. For each sequence, a set of structural predictions are made (for instance, all suboptimal structures predicted by an algo- rithm like Zucker’s MFOLD [15], or all suboptimal sets of compatible helices or stems). The common structure is then found by comparing all the structures obtained from the initial set of sequences, and identifying a substructure common to all, or to some of the sequences. RNA structure comparison is also an essential element in the discovery of RNA structural motifs, or profiles, or of more general models that may then be used to search for other RNAs of the same type in newly sequenced genomes. For instance, general models for tRNAs and introns of group I have been derived by hand [3], [10]. It is an open question whether models at least as accurate as these, or perhaps even more accurate, could have been derived in an automatic way. The identification of smaller structural motifs is an equally important topic that requires comparing structures. As we saw, the comparison of RNA structures may concern known RNA structures (that is, structures that were experimentally determined) or predicted structures. The objective in both cases is the same: to find the common parts of such structures. In [11], Shapiro suggested to mathematically model RNA secondary structures without pseudoknots by means of trees. The trees are rooted and ordered, which means that the order among the children of a node matters. This order corresponds to the 5’-3’ orientation of an RNA sequence. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 2, NO. 1, JANUARY-MARCH 2005 1 . J. Allali is w

…(Full text truncated)…

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut