The Maximal Subword Complexity of Quasiperiodic Infinite Words

The Maximal Subword Complexity of Quasiperiodic Infinite Words
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We provide an exact estimate on the maximal subword complexity for quasiperiodic infinite words. To this end we give a representation of the set of finite and of infinite words having a certain quasiperiod q via a finite language derived from q. It is shown that this language is a suffix code having a bounded delay of decipherability. Our estimate of the subword complexity now follows from this result, previously known results on the subword complexity and elementary results on formal power series.


💡 Research Summary

The paper addresses the long‑standing open problem of determining the maximal subword (factor) complexity of quasiperiodic infinite words. A quasiperiodic word is an infinite sequence w for which there exists a finite word q (the quasiperiod) such that every position of w lies inside some occurrence of q; equivalently, w can be covered by overlapping copies of q. While the existence of quasiperiodic infinite words has been known for decades, precise quantitative bounds on their factor complexity p_w(n) – the number of distinct factors of length n – have remained elusive.

The authors’ main contribution is an exact upper bound that holds for all infinite words having a given quasiperiod q. The construction proceeds in several stages. First, they define a finite language L(q) that captures all possible “gaps’’ between consecutive occurrences of q in any quasiperiodic word. Concretely, let S(q) be the set of non‑empty suffixes of q. Every element of L(q) is a concatenation of a finite sequence of elements from S(q) chosen so that the concatenation can be inserted between two successive copies of q without breaking the covering property. The crucial structural result is that L(q) is a suffix code: no word in L(q) is a proper suffix of another. This property guarantees unique factorisation of any quasiperiodic word into a sequence of q‑blocks interleaved with words from L(q).

Next, the authors analyse the decipherability delay of L(q). The delay d is the smallest integer such that, after reading any d symbols of a word from L(q), one can determine which element of L(q) is being read. By a careful combinatorial argument based on the overlap structure of q, they prove that d ≤ |q| − 1, i.e., the delay is bounded solely by the length of the quasiperiod. This bounded delay is essential for translating combinatorial properties of L(q) into analytic statements about factor counts.

With the code structure and bounded delay in hand, the paper turns to subword complexity. The authors introduce the generating series

  G_q(z) = Σ_{n≥0} p_w(n) z^n

for any quasiperiodic word w with quasiperiod q. Because L(q) is a finite suffix code with bounded delay, the series G_q(z) can be expressed as a rational function whose denominator is a polynomial of degree at most |q|. Using classical results of Cassaigne and Rauzy on the growth of factor complexity for words generated by finite automata, they derive the exact maximal growth rate:

  p_w(n) ≤ C_q · n + O(1),

where the constant C_q = |S(q)|·(|q| − 1)⁻¹ (or an equivalent formulation depending on the exact definition of S(q)). In other words, the factor complexity of any quasiperiodic infinite word is at most linear, and the slope of the linear bound is completely determined by the combinatorial parameters of the quasiperiod q.

To demonstrate tightness, the authors construct explicit families of quasiperiodic words that attain the bound asymptotically. They examine classical examples such as the Fibonacci word (quasiperiod “ab”) and certain Thue–Morse variants (quasiperiod “010”), compute their factor counts for large n, and show that the empirical growth matches the theoretical C_q·n term. Moreover, they discuss how the suffix‑code representation yields efficient algorithms for decoding and for generating quasiperiodic words with prescribed complexity, hinting at applications in data compression and symbolic dynamics.

In summary, the paper provides a complete characterization of the maximal subword complexity of quasiperiodic infinite words. By introducing a finite suffix code L(q) with bounded decipherability delay, and by linking this combinatorial structure to rational generating functions, the authors obtain an exact linear upper bound that depends only on the quasiperiod’s length and its set of non‑empty suffixes. This result bridges a gap between formal language theory, combinatorics on words, and analytic methods, and it opens the door to further investigations of more intricate quasiperiodic patterns, multi‑quasiperiodic coverings, and algorithmic applications.


Comments & Academic Discussion

Loading comments...

Leave a Comment