Lyndon words and Fibonacci numbers
It is a fundamental property of non-letter Lyndon words that they can be expressed as a concatenation of two shorter Lyndon words. This leads to a naive lower bound log_{2}(n)} + 1 for the number of distinct Lyndon factors that a Lyndon word of length n must have, but this bound is not optimal. In this paper we show that a much more accurate lower bound is log_{phi}(n) + 1, where phi denotes the golden ratio (1 + sqrt{5})/2. We show that this bound is optimal in that it is attained by the Fibonacci Lyndon words. We then introduce a mapping L_x that counts the number of Lyndon factors of length at most n in an infinite word x. We show that a recurrent infinite word x is aperiodic if and only if L_x >= L_f, where f is the Fibonacci infinite word, with equality if and only if f is in the shift orbit closure of f.
💡 Research Summary
The paper investigates the relationship between Lyndon words—a class of primitive, lexicographically minimal strings—and the Fibonacci sequence. It begins by recalling the well‑known property that any non‑trivial Lyndon word can be written as the concatenation of two shorter Lyndon words. From this property one obtains a naïve lower bound for the number of distinct Lyndon factors in a Lyndon word of length n: ⌊log₂ n⌋ + 1. The authors observe that this bound is far from optimal and propose a much tighter bound based on the golden ratio φ = (1 + √5)/2.
The central construction is the family of “Fibonacci Lyndon words.” For each k, a word Lₖ of length Fₖ (the k‑th Fibonacci number) is defined recursively as Lₖ = Lₖ₋₁ Lₖ₋₂, where both Lₖ₋₁ and Lₖ₋₂ are themselves Lyndon words. Because the Fibonacci numbers grow like φᵏ, the recursive decomposition of Lₖ requires exactly ⌊log_φ Fₖ⌋ + 1 steps, showing that any Lyndon word of length n must contain at least ⌊log_φ n⌋ + 1 distinct Lyndon factors. Moreover, the Fibonacci Lyndon words achieve this bound, establishing its optimality.
After establishing the finite‑word result, the authors turn to infinite words. They introduce the counting function Lₓ(n), which for an infinite word x returns the number of distinct Lyndon factors of length at most n. This function serves as a quantitative measure of combinatorial complexity and, crucially, of aperiodicity. The paper proves two complementary statements: (1) If x is a recurrent infinite word that is aperiodic, then Lₓ(n) ≥ L_f(n) for all n, where f denotes the classic Fibonacci infinite word (the limit of the finite Fibonacci Lyndon words). (2) Equality Lₓ = L_f holds if and only if x lies in the shift‑orbit closure of f; that is, x shares exactly the same set of Lyndon factors as f up to shift. Consequently, the Fibonacci infinite word is identified as the “minimal‑complexity” aperiodic recurrent word with respect to Lyndon factor counts.
The proofs combine classic combinatorial arguments on Lyndon factorization, properties of the Fibonacci recurrence, and standard results on recurrent and aperiodic sequences. A key lemma shows that the Lyndon factorization of any word is unique and that the recursive structure of Fibonacci Lyndon words forces the factor count to follow the logarithm base φ. For infinite words, the authors use the concept of shift‑orbit closure to relate the factor sets of x and f, and they demonstrate that any deviation from the Fibonacci factor count forces periodicity.
Beyond the theoretical contributions, the paper discusses potential applications. In data compression, algorithms that exploit Lyndon factorization could benefit from the knowledge that Fibonacci‑type structures are extremal with respect to factor count, possibly leading to near‑optimal dictionary sizes. In symbolic dynamics and automata theory, the function Lₓ provides a new invariant for distinguishing aperiodic minimal subshifts from periodic ones.
In summary, the authors replace the coarse log₂ bound with the precise log_φ bound, prove its optimality via Fibonacci Lyndon words, and extend the analysis to infinite sequences by introducing the Lyndon‑factor counting function Lₓ. Their results deepen the interplay between combinatorics on words and number‑theoretic growth rates, and they furnish a robust criterion for aperiodicity in recurrent infinite words.