A run is a maximal occurrence of a repetition $v$ with a period $p$ such that $2p \le |v|$. The maximal number of runs in a string of length $n$ was studied by several authors and it is known to be between $0.944 n$ and $1.029 n$. We investigate highly periodic runs, in which the shortest period $p$ satisfies $3p \le |v|$. We show the upper bound $0.5n$ on the maximal number of such runs in a string of length $n$ and construct a sequence of words for which we obtain the lower bound $0.406 n$.
Deep Dive into On the maximal number of highly periodic runs in a string.
A run is a maximal occurrence of a repetition $v$ with a period $p$ such that $2p \le |v|$. The maximal number of runs in a string of length $n$ was studied by several authors and it is known to be between $0.944 n$ and $1.029 n$. We investigate highly periodic runs, in which the shortest period $p$ satisfies $3p \le |v|$. We show the upper bound $0.5n$ on the maximal number of such runs in a string of length $n$ and construct a sequence of words for which we obtain the lower bound $0.406 n$.
Repetitions and periodicities in strings are one of the fundamental topics in combinatorics on words [2,13]. They are also important in other areas: lossless compression, word representation, computational biology etc. Repetitions are studied from different directions: classification of words not containing repetitions of a given exponent, efficient identification of factors being repetitions of different types and finally computing the bounds of the number of repetitions of a given exponent that a string may contain, which we consider in this paper. Both the known results in the topic and a deeper description of the motivation can be found in the survey by Crochemore et al. [5].
The concept of runs (also called maximal repetitions) has been introduced to represent all repetitions in a string in a succinct manner. The crucial property of runs is that their maximal number in a string of length n (denoted as runs(n)) is O(n) [10]. Due to the work of many people, much better bounds on runs(n) have been obtained. The lower bound 0.927n was first proved in [8]. Afterwards it was improved by Kusano et al. [12] to 0.944n employing computer experiments and very recently by Simpson [18] to 0.944575712n. On the other hand, the first explicit upper bound 5n was settled in [15], afterwards it was systematically improved to 3.44n [17], 1.6n [3,4] and 1.52n [9]. The best known result runs(n) ≤ 1.029n is due to Crochemore et al. [6], but it is conjectured [10] that runs(n) < n. The maximal number of runs was also studied for special types of strings and tight bounds were established for Fibonacci strings [10,16] and more generally Sturmian strings [1].
The combinatorial analysis of runs in strings is strongly related to the problem of estimation of the maximal number of occurrences of squares in a string. In the latter the gap between the upper and lower bound is much larger than for runs [5,7]. However, a recent paper [11] by some of the authors shows that introduction of exponents larger than 2 can lead to obtaining tighter bounds for the number of corresponding occurrences.
In this paper we introduce and study the concept of highly periodic runs (hp-runs) in which the period is at least three times shorter than the run. We show the following bounds on the number hp-runs(n) of such runs in a string of length n:
The upper bound is achieved by analyzing prime words (i.e. words that are primitive and minimal/maximal in the class of their cyclic equivalents) that appear as periods of hp-runs. As for the lower bound, we give a simple argument that leads to 0.4n bound and then describe a family of words that improves this bound to 0.406n.
We consider words over a finite alphabet A, u ∈ A * ; by ε we denote an empty word; the positions in a word u are numbered from 1 to |u|. By Alph(u) we denote the set of all letters of u. For u = u 1 u 2 . . . u m , by u[i . . j] we denote a factor of u equal to u i . . . u j (in particular u
. i] are called prefixes of u, and words u[i . . m] -suffixes of u. We say that positive integer p is the (shortest) period of a word u = u 1 . . . u m (notation:
If w k = u (k is a non-negative integer) then we say that u is the k th power of the word w. A square is the 2 nd power of some word. The primitive root of a word u, denoted root(u), is the shortest such word w that w k = u for some positive k. We call a word u primitive if root(u) = u, otherwise it is called nonprimitive. We say that words u and v are cyclically equivalent (or that one of them is a cyclic rotation of the other) if u = xy and v = yx for some x, y ∈ A * . It is a simple observation that if u and v are cyclically equivalent then root(u) = root(v).
Let us assume that A is totally ordered by ≤ what induces a lexicographical order in A * , also denoted by ≤. We say that u ∈ A * is a prime word if it is primitive and minimal or maximal in the class of words that are cyclically equivalent to it. It can be proved [13] that a prime word u cannot have a proper (i.e. non-empty and different than u) prefix that would also be its suffix.
A run (also called a maximal repetition) in a string u is an interval [i .
. j] such that both the associated factor u[i . . j] has period p, 2p ≤ ji + 1, and the property cannot be extended to the right nor to the left:
and u[jp + 1] = u[j + 1] when the letters are defined. A highly periodic run (hp-run) is a run [i . . j] for which the shortest period p satisfies 3p ≤ ji + 1. For simplicity, in the further text we sometimes refer to runs or hp-runs as to occurrences of corresponding factors of u.
Let u ∈ A * be a word of length n. By P = {p 1 , p 2 , . . . , p n-1 } we denote the set of inter-positions of u that are located between pairs of consecutive letters of u.
We define a function F that assigns to each hp-run v in a string the set of handles among all inter-positions within v. Hence, F is a mapping from the set of hp-runs occurring in u to the set 2 P of subsets of P . Let v be a hp-r
…(Full text truncated)…
This content is AI-processed based on ArXiv data.