Hidden breakpoints in genome alignments

Hidden breakpoints in genome alignments
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

During the course of evolution, an organism’s genome can undergo changes that affect the large-scale structure of the genome. These changes include gene gain, loss, duplication, chromosome fusion, fission, and rearrangement. When gene gain and loss occurs in addition to other types of rearrangement, breakpoints of rearrangement can exist that are only detectable by comparison of three or more genomes. An arbitrarily large number of these “hidden” breakpoints can exist among genomes that exhibit no rearrangements in pairwise comparisons. We present an extension of the multichromosomal breakpoint median problem to genomes that have undergone gene gain and loss. We then demonstrate that the median distance among three genomes can be used to calculate a lower bound on the number of hidden breakpoints present. We provide an implementation of this calculation including the median distance, along with some practical improvements on the time complexity of the underlying algorithm. We apply our approach to measure the abundance of hidden breakpoints in simulated data sets under a wide range of evolutionary scenarios. We demonstrate that in simulations the hidden breakpoint counts depend strongly on relative rates of inversion and gene gain/loss. Finally we apply current multiple genome aligners to the simulated genomes, and show that all aligners introduce a high degree of error in hidden breakpoint counts, and that this error grows with evolutionary distance in the simulation. Our results suggest that hidden breakpoint error may be pervasive in genome alignments.


💡 Research Summary

The paper addresses a subtle but important problem in comparative genomics: the existence of “hidden” breakpoints that are not detectable when only pairwise genome comparisons are performed, but become apparent only in the context of three or more genomes. Such hidden breakpoints arise naturally when gene gain and loss occur together with large‑scale rearrangements (inversions, translocations, fusions, fissions). The authors first formalize the notion of hidden breakpoints and prove that an arbitrarily large number of them can be generated even when every pair of genomes appears rearrangement‑free.

To quantify hidden breakpoints, they extend the classic multichromosomal breakpoint median problem to accommodate genomes with unequal gene content. For three genomes A, B, and C they define a common gene set, represent each genome as a permutation of that set, and compute the usual pairwise breakpoint distances dAB, dBC, dCA. They then introduce a median genome M that minimizes the sum of breakpoint distances to the three inputs. By comparing the total pairwise distance with the sum of distances to the median, they derive a lower bound on the number of hidden breakpoints:

 HiddenBreakpoint ≥ ½ ·


Comments & Academic Discussion

Loading comments...

Leave a Comment