Global transposable characteristics in the yeast complete DNA sequence
📝 Abstract
Global transposable characteristics in the complete DNA sequence of the Saccharomyces cevevisiae yeast is determined by using the metric representation and recurrence plot methods. In the form of the correlation distance of nucleotide strings, 16 chromosome sequences of the yeast, which are divided into 5 groups, display 4 kinds of the fundamental transposable characteristics: a short period increasing, a long quasi-period increasing, a long major value and hardly relevant.
💡 Analysis
Global transposable characteristics in the complete DNA sequence of the Saccharomyces cevevisiae yeast is determined by using the metric representation and recurrence plot methods. In the form of the correlation distance of nucleotide strings, 16 chromosome sequences of the yeast, which are divided into 5 groups, display 4 kinds of the fundamental transposable characteristics: a short period increasing, a long quasi-period increasing, a long major value and hardly relevant.
📄 Content
arXiv:1112.2771v1 [q-bio.GN] 13 Dec 2011 Global transposable characteristics in the yeast complete DNA sequence Zuo-Bing Wu∗ State Key Laboratory of Nonlinear Mechanics, Institute of Mechanics, Chinese Academy of Sciences, Beijing 100190, China April 14, 2021 Abstract Global transposable characteristics in the complete DNA sequence of the Saccharomyces cevevisiae yeast is determined by using the met- ric representation and recurrence plot methods. In the form of the correlation distance of nucleotide strings, 16 chromosome sequences of the yeast, which are divided into 5 groups, display 4 kinds of the fundamental transposable characteristics: a short period increasing, a long quasi-period increasing, a long major value and hardly relevant. Keywords Yeast, DNA sequences, Coherence structure, Metric representation, Recurrence plot 1Correspondence to: wuzb@lnm.imech.ac.cn 1 1 Introduction The recent complete DNA sequences of many organisms are available to systematically search of genome structure. For the large amount of DNA se- quences, developing methods for extracting meaningful information is a ma- jor challenge for bioinformatics. To understand the one-dimensional symbolic sequences composed of the four letters ‘A’, ‘C’, ‘G’ and ‘T’ (or ‘U’), some sta- tistical and geometrical methods were developed[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]. In special, chaos game representation (CGR)[12], which generates a two- dimensional square from a one-dimensional sequence, provides a technique to visualize the composition of DNA sequences. The characteristics of CGR images was described as genomic signature, and classification of species in the whole bacteria genome was analyzed by making an Euclidean metric be- tween two CGR images[13]. Based on the genomic signature, the distance between two DNA sequences depending on the length of nucleotide strings was presented[14] and the horizontal transfers in prokaryotes and eukaryotes were detected and charaterized[15, 16]. Recently, a one-to-one metric repre- sentation of the DNA sequences[17], which was borrowed from the symbolic dynamics, makes an ordering of subsequences in a plane. Suppression of certain nucleotide strings in the DNA sequences leads to a self-similarity of pattern seen in the metric representation of DNA sequences. Self-similarity limits of genomic signatures were determined as an optimal string length for generating the genomic signatures[18]. Moreover, by using the metric repre- sentation method, the recurrence plot technique of DNA sequences was estab- lished and employed to analyze correlation structure of nucleotide strings[19]. As a eukaryotic organism, yeast is one of the premier industrial microor- ganisms, because of its essential role in brewing, baking, and fuel alcohol pro- duction. In addition, yeast has proven to be an excellent model organism for 2 the study of a variety of biological problems involving the fields of genetics, molecular biology, cell biology and other disciplines within the biomedical and life sciences. In April 1996, the complete DNA sequence of the yeast (Saccharomyces cevevisiae) genome, consisting of 16 chromosomes with 12 million basepairs, had been released to provide a resource of genome informa- tion of a single organism. However, only 43.3% of all 6000 predicted genes in the Saccharomyces cerevisiae yeast were functionally characterized when the complete sequence of the yeast genome became available[20]. Moreover, it was found that DNA transposable elements have ability to move from place to place and make many copies within the genome via the transposition[21, 22]. Therefore, the yeast complete DNA sequence remain a topic to be studied respect to its genome architecture structure in the whole sequence. In this paper, using the metric representation and recurrence plot meth- ods, we analyze global transposable characteristics in the yeast complete DNA sequence, i.e., 16 chromosome sequences. 2 Metric representation and recurrence plot methods For a given DNA sequence s1s2 · · · si · · · sN (si ∈{A, C, G, T}), a plane metric representation is generated by making the correspondence of symbol si to number µi or νi ∈{0, 1} and calculating values (α, β) of all subsequences Σk = s1s2 · · · sk (1 ≤k ≤N) defined as follows α = 2 Pk j=1 µk−j+13−j + 3−k = 2 Pk i=1 µi3−(k−i+1) + 3−k, β = 2 Pk j=1 νk−j+13−j + 3−k = 2 Pk i=1 νi3−(k−i+1) + 3−k, (1) where µi is 0 if si ∈{A, C} or 1 if si ∈{G, T} and νi is 0 if si ∈{A, T} or 1 if si ∈{C, G}. Thus, the one-dimensional symbolic sequence is partitioned 3 into N subsequences Σk and mapped in the two-dimensional plane (α, β). Subsequences with the same ending l-nucleotide string, which are labeled by Σl, correspond to points in the zone encoded by the l-nucleotide string. Taking a subsequence Σi ∈Σl, we calculate Θ(ǫl −|Σi −Σj|) = Θ(ǫl − q (αi −αj)2 + (βi −βj)2), (2) where Θ is the Heaviside function [Θ(x) = 1, if x > 0; Θ(x) = 0, if x ≤0] and Σj is a subsequence (j ≥l). When Θ(ǫl −|Σi −
This content is AI-processed based on ArXiv data.