The sequence of open and closed prefixes of a Sturmian word
A finite word is closed if it contains a factor that occurs both as a prefix and as a suffix but does not have internal occurrences, otherwise it is open. We are interested in the {\it oc-sequence} of a word, which is the binary sequence whose $n$-th element is $0$ if the prefix of length $n$ of the word is open, or $1$ if it is closed. We exhibit results showing that this sequence is deeply related to the combinatorial and periodic structure of a word. In the case of Sturmian words, we show that these are uniquely determined (up to renaming letters) by their oc-sequence. Moreover, we prove that the class of finite Sturmian words is a maximal element with this property in the class of binary factorial languages. We then discuss several aspects of Sturmian words that can be expressed through this sequence. Finally, we provide a linear-time algorithm that computes the oc-sequence of a finite word, and a linear-time algorithm that reconstructs a finite Sturmian word from its oc-sequence.
💡 Research Summary
The paper introduces a binary invariant, the “oc‑sequence”, that records for each prefix of a word whether it is open (0) or closed (1). A finite word is defined as closed if it contains a factor that appears exactly twice – once as a prefix and once as a suffix – with no internal occurrences; otherwise it is open. This notion coincides with the older concept of periodic‑like words and with complete returns to a factor. For any word w (finite or infinite) the oc‑sequence oc(w)=c₁c₂… is built by setting cₙ=1 if the prefix of length n is closed, and cₙ=0 otherwise.
The authors first explore basic properties of open and closed words, showing that a closed word has a unique right extension that remains closed and preserves the period (Lemmas 4–6). They also prove that any run of zeros in the oc‑sequence is at least as long as the preceding run of ones (Lemma 11), linking the lengths of runs directly to the distances between successive occurrences of prefixes.
The central focus is on Sturmian words—binary infinite words with exactly n + 1 distinct factors of length n for each n. The paper distinguishes standard (characteristic) Sturmian words, whose left extensions by both letters remain Sturmian, and notes that the Fibonacci word is a classic example. The main theorem (Theorem 14) states that every Sturmian word (finite or infinite) is uniquely determined, up to a renaming of the alphabet, by its oc‑sequence. To prove this, the authors develop several auxiliary results, notably Lemma 15, which shows that for a right‑special Sturmian word the longest repeated prefix is also a suffix of the word. This structural rigidity forces the oc‑sequence to encode the entire combinatorial structure of the word.
Beyond uniqueness, the paper shows that the class of finite Sturmian words (denoted St) is maximal among binary factorial languages with the property that the oc‑sequence uniquely identifies each word. In any larger factorial language, one can find non‑isomorphic words sharing the same oc‑sequence, demonstrating the optimality of St for this kind of identification.
The authors then connect the oc‑sequence of standard Sturmian words to the continued‑fraction expansion of their slope α. Specifically, the lengths of runs of 1’s in the oc‑sequence correspond to twice the partial quotients of α, generalizing a known result for the Fibonacci word where the run lengths form the doubled Fibonacci sequence. They also describe how semi‑central prefixes of a standard Sturmian word can be expressed as uₙ₋₁ uₙ uₙ₊₁, where the uₙ’s belong to the standard sequence, leading to an infinite product representation of the word in squares of reversed standard words.
Two linear‑time algorithms are presented. The first computes the oc‑sequence of any finite word w in O(|w|) time by maintaining the length of the longest repeated prefix while scanning w; a change in this length signals a closed prefix (output 1), otherwise an open prefix (output 0). The second algorithm reconstructs a finite Sturmian word from a given oc‑sequence. It parses the sequence into runs of 1’s, interprets each run length as a partial quotient, and iteratively builds the word using the standard Sturmian construction (essentially a Fibonacci‑type concatenation), handling the possible alphabet renaming. Both algorithms use linear space and are straightforward to implement.
In summary, the paper demonstrates that the simple binary pattern of open versus closed prefixes captures the full combinatorial essence of Sturmian words. This provides a new bridge between word combinatorics, number‑theoretic representations (continued fractions), and efficient algorithmic processing. The results have potential implications for pattern matching, symbolic dynamics, and the analysis of low‑complexity sequences in theoretical computer science.
Comments & Academic Discussion
Loading comments...
Leave a Comment