On z-factorization and c-factorization of standard episturmian words
📝 Original Info
- Title: On z-factorization and c-factorization of standard episturmian words
- ArXiv ID: 1011.5971
- Date: 2010-11-30
- Authors: ** 논문에 명시된 저자 정보가 제공되지 않았습니다. (원문에 저자명과 소속이 누락되어 있음) **
📝 Abstract
Ziv-Lempel and Crochemore factorization are two kinds of factorizations of words related to text processing. In this paper, we find these factorizations for standard epiesturmian words. Thus the previously known c-factorization of standard Sturmian words is provided as a special case. Moreover, the two factorizations are compared.💡 Deep Analysis
📄 Full Content
In [1], Crochemore factorizations of some of well-known infinite words, namely characteristic Sturmian words and (generalized) Thue-Morse words and the period doubling sequence, are explicitly given based on their combinatorial structures. Also, they have shown that in general, the number of factors in the Crochemore factorization is at most twice the number of factors of the Ziv-Lempel factorization.
The Crochemore factorization (or c-factorization for short) of a word w is defined as follows: Each factor of c(w) is either a fresh letter, or it is a maximal factor of w, which has already occurred in the prefix of the word. More formally, the c-factorization c(w) of a word w is
The Ziv-Lempel factorization (or z-factorization for short) of a word w is
where z m is the shortest prefix of z m z m+1 • • • which occurs only once in the word z 1 • • • z m . In this paper, we give explicit formulas for z-factorization and c-factorization of standard episturmian words , thus we obtain the previous c-factorization of standard Sturmian words in [1] as a special case. Moreover, these results reveal the relation between two factorizations in the case of standard episturmian words. The rest of the paper is organized as follows. In Section 2 we present some useful definitions and notation of combinatorics on words. Section 3 is devoted to review the definition and some properties of episturmian words. In Section 4, we study z-factorization of standard episturmian words. Finally in Section 5 we present a result about the c-factorization of standard episturmian words.
We denote the alphabet (which is finite) by A. As usual, we denote by A * , the set of words over A and by ǫ the empty word. We use the notation A + = A * \ {ǫ}. If a ∈ A and w = w 1 w 2 . . . w n is a word over A with the w i ∈ A, then the symbols |w| and |w| a denote respectively the length n of w, and the number of occurrences of letter a in w. For an infinite word w we denote by Alph(w) (resp. U lt(w)) the number of letters which appear (resp. appear infinitely many times) in w (The first notation is also used for finite words). A word v is a factor of a word w, written v ≺ w, if there exists u, u ′ ∈ A * , such that w = uvu ′ . A word v is said to be a prefix (resp. suffix) of a word w, written v ✁ w (resp. v ✄ w), if there exists u ∈ A * such that w = vu (resp. w = uv). If w = vu (resp. w = uv,) we simply write v = wu -1 (resp. v = u -1 w). The notations of prefix and factor extend naturally to infinite words. Two words u and v are conjugate if there exist words p and q such that u = pq and v = qp. For a word w, the set F (w) (resp. F n (w)) is the set of its factors (resp. the set of its factors of length n); these notations are also used for infinite words. If w is an infinite word, then the related complexity function, is p w
An infinite word s is episturmian if F (s) is closed under reversal and for any ℓ ∈ N there exists at most one right special word in F ℓ (s). Then Sturmian words are just nonperiodic episturmian words on a binary alphabet. An episturmian word is standard if all its left special factors are prefixes of it. It is well-known that if an episturmian word t is not periodic and U lt(t) = k, then its complexity function is ultimately p t (n) = (k -1)n + q for some q ∈ N + . Let t be an episturmian word. If t is nonperiodic then there exists a unique standard episturmian word s satisfying F t = F s ; If t is periodic then we may find several standard episturmian words s satisfying F t = F s . In any case, there exists at least one standard episturmian word s with F t = F s . If the sequence of palindromic prefixes of a standard episturmian word s is
where w (+) is defined as the shortest palindrome having w as a prefix. (Similar construction for Sturmian words can be found in [9].) The relation between u n and u n+1 can also be explained using morphisms: For a ∈ A define the morphism ψ a by ψ a (a) = a, and ψ a (x) = ax for x ∈ A{a}.
It is known that for any integer n, h n is primitive (See Proposition 2.8 of [6]) and so is h n . For any integer n define P (n) as the maximum value of i satisfying i < n and x i = x n ; if there is no such i then P (n) is undefined. We have the following Lemma.
(i)
Proof.
(i) See the end of Section 2.1 of [6].
(ii) This is proved by using part (i) and ( 1).
It is obvious that h n-1 ✁ h n . In addition, by Proposition 2.11 of [6] we have Lemma 2.
Suppose that x n = α and the letter α has at least one appearance before x n in ∆(s). Then
(ii) The word