A stitch in time: Efficient computation of genomic DNA melting bubbles

Reading time: 6 minute
...

📝 Original Info

  • Title: A stitch in time: Efficient computation of genomic DNA melting bubbles
  • ArXiv ID: 0802.1057
  • Date: 2008-07-19
  • Authors: Researchers from original ArXiv paper

📝 Abstract

Background: It is of biological interest to make genome-wide predictions of the locations of DNA melting bubbles using statistical mechanics models. Computationally, this poses the challenge that a generic search through all combinations of bubble starts and ends is quadratic. Results: An efficient algorithm is described, which shows that the time complexity of the task is O(NlogN) rather than quadratic. The algorithm exploits that bubble lengths may be limited, but without a prior assumption of a maximal bubble length. No approximations, such as windowing, have been introduced to reduce the time complexity. More than just finding the bubbles, the algorithm produces a stitch profile, which is a probabilistic graphical model of bubbles and helical regions. The algorithm applies a probability peak finding method based on a hierarchical analysis of the energy barriers in the Poland-Scheraga model. Conclusions: Exact and fast computation of genomic stitch profiles is thus feasible. Sequences of several megabases have been computed, only limited by computer memory. Possible applications are the genome-wide comparisons of bubbles with promotors, TSS, viral integration sites, and other melting-related regions.

💡 Deep Analysis

Deep Dive into A stitch in time: Efficient computation of genomic DNA melting bubbles.

Background: It is of biological interest to make genome-wide predictions of the locations of DNA melting bubbles using statistical mechanics models. Computationally, this poses the challenge that a generic search through all combinations of bubble starts and ends is quadratic. Results: An efficient algorithm is described, which shows that the time complexity of the task is O(NlogN) rather than quadratic. The algorithm exploits that bubble lengths may be limited, but without a prior assumption of a maximal bubble length. No approximations, such as windowing, have been introduced to reduce the time complexity. More than just finding the bubbles, the algorithm produces a stitch profile, which is a probabilistic graphical model of bubbles and helical regions. The algorithm applies a probability peak finding method based on a hierarchical analysis of the energy barriers in the Poland-Scheraga model. Conclusions: Exact and fast computation of genomic stitch profiles is thus feasible. Seq

📄 Full Content

A stitch in time: Efficient computation of genomic DNA melting bubbles Eivind Tøstesen∗ Department of Tumor Biology, Norwegian Radium Hospital, N-0310 Oslo, Norway, and Department of Mathematics, University of Oslo, N-0316 Oslo, Norway (Dated: November 10, 2021) Background: It is of biological interest to make genome-wide predictions of the locations of DNA melting bubbles using statistical mechanics models. Computationally, this poses the challenge that a generic search through all combinations of bubble starts and ends is quadratic. Results: An efficient algorithm is described, which shows that the time complexity of the task is O(NlogN) rather than quadratic. The algorithm exploits that bubble lengths may be limited, but without a prior assumption of a maximal bubble length. No approximations, such as windowing, have been introduced to reduce the time complexity. More than just finding the bubbles, the algorithm produces a stitch profile, which is a probabilistic graphical model of bubbles and helical regions. The algorithm applies a probability peak finding method based on a hierarchical analysis of the energy barriers in the Poland-Scheraga model. Conclusions: Exact and fast computation of genomic stitch profiles is thus feasible. Sequences of several megabases have been computed, only limited by computer memory. Possible applications are the genome-wide comparisons of bubbles with promotors, TSS, viral integration sites, and other melting-related regions. PACS numbers: 87.14.Gg, 87.15.Ya, 05.70.Fh, 02.70.Rr I. BACKGROUND Models of DNA melting make it possible to compute what regions that are single-stranded (ss) and what re- gions that are double-stranded (ds). Based on statistical mechanics, such model predictions are probabilistic by nature. Bubbles or single-stranded regions play an essen- tial role in fundamental biological processes, such as tran- scription, replication, viral integration, repair, recombi- nation, and in determining chromatin structure [1, 2]. It is therefore interesting to apply DNA melting models to genomic DNA sequences, although the available mod- els so far are limited to in vitro knowledge. Genomic applications began around 1980 [3, 4], and have been gaining momentum over the years with the increasing availability of sequences, faster computers, and model development. It has been found that predicted ds/ss boundaries often are located at or very close to exon- intron junctions, the correspondence being stronger in some genomes than others [5, 6, 7, 8], which suggested a gene finding method [9]. In the same vein, compar- isons of actin cDNA melting maps in animals, plants, and fungi suggested that intron insertion could have target the sites of such melting fork junctions in ancient genes [10, 11]. In other studies, bubbles in promotor regions were computed to test the hypothesis that the stability of the double helix contributes to transcriptional regula- tion [12, 13, 14, 15, 16, 17]. Bubbles induced by superhe- licity have also been found to correlate with replication origins as well as promotors [18, 19, 20, 21]. In addi- tion to the testing of specific hypotheses, a strategy has ∗Email: eivindto@math.uio.no been to provide whole genomes with annotations of their melting properties [22, 23]. Combined with all other ex- isting annotations, such melting data allow exploratory data mining and possibly to form new hypotheses [24]. For example, the human genomic melting map was made available, compared to a wide range of other annotations, and was shown to provide more information than the lo- cal GC content [23]. In the genomic studies, various melting features have proved to be of particular interest. These include the bubbles and helical regions, bubble nucleation sites, cooperative melting domains, melting fork junctions, breathers, sites of high or low stability, and SIDD sites. Most often we want to know their locations, but addi- tional information is sometimes useful, such as probabil- ities, dynamics, stabilities, and context. DNA melting models based on statistical mechanics are powerful tools for calculating such properties, especially those models that can be solved by dynamical programming in poly- nomial time. For many features of interest, however, al- gorithms remain to be developed to do such predictions. The existing melting algorithms typically produce melt- ing profiles of some numerical quantity for each sequence position. The prototypical example is Poland’s probabil- ity profile [25], but also profiles of melting temperatures (melting maps), free energies or other quantities are com- puted per basepair. The result can be plotted as a curve, while the wanted features often have the format of re- gions, junctions and other sites. Some genomics data mining tools also require data in these formats rather than curves. As a remedy, melting profiles have been sub- jected to ad hoc post-processing methods to extract the wanted features, such as segmentation algorithms [23],

…(Full text truncated)…

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut