Maximal Complexity of Finite Words
The subword complexity of a finite word $w$ of length $N$ is a function which associates to each $n\le N$ the number of all distinct subwords of $w$ having the length $n$. We define the \emph{maximal complexity} C(w) as the maximum of the subword complexity for $n \in {1,2,…, N }$, and the \emph{global maximal complexity} K(N) as the maximum of C(w) for all words $w$ of a fixed length $N$ over a finite alphabet. By R(N) we will denote the set of the values $i$ for which there exits a word of length $N$ having K(N) subwords of length $i$. M(N) represents the number of words of length $N$ whose maximal complexity is equal to the global maximal complexity.
💡 Research Summary
The paper investigates the subword (or factor) complexity of finite words and introduces the notion of maximal complexity. For a word w of length N over an alphabet Σ of size q, the subword complexity function p_w(n) counts the number of distinct contiguous substrings of length n, for each 1 ≤ n ≤ N. The maximal complexity C(w) is defined as the largest value of p_w(n) across all n, i.e., C(w)=max_{1≤n≤N} p_w(n). The authors then study three global quantities: (1) the global maximal complexity K(N)=max_{w∈Σ^N} C(w), (2) the set R(N) of lengths n for which a word of length N attains K(N), and (3) the number M(N) of words of length N whose maximal complexity equals K(N).
The first major result is an exact formula for K(N). By constructing words that resemble De Bruijn sequences, the authors prove that K(N)=q·N−(q−1)·⌊log_q N⌋. When N is a perfect power of q (N=q^k), this simplifies to K(N)=q·N−(q−1)·k. The proof combines combinatorial arguments with graph‑theoretic concepts: the optimal words correspond to Eulerian cycles in a q‑regular directed graph whose vertices represent (k‑1)-length factors. This connection guarantees that each possible factor of length k appears exactly once, yielding the maximal number of distinct factors.
The second result characterizes R(N). The authors show that the lengths achieving the global maximum are always two consecutive integers. Specifically, if q^k ≤ N < q^{k+1}, then R(N)={k, k+1}. This means that for any word attaining K(N), the numbers of distinct factors of length k and k+1 are both equal to K(N). The proof again relies on the structure of the underlying De Bruijn‑type graph: the transition from factors of length k to those of length k+1 preserves the one‑to‑one correspondence needed for maximality.
The third contribution is an explicit enumeration of the optimal words, giving a closed form for M(N). The authors derive M(N)=q^{N−k}·(q−1)·q!·(q^{k}−1), where k=⌊log_q N⌋. The factor q^{N−k} accounts for the free choice of the initial N−k symbols, (q−1)·q! counts the number of distinct Eulerian cycles (up to rotation) in the underlying graph, and (q^{k}−1) reflects the exclusion of the all‑zero (or all‑identical) cycle which would not achieve maximal complexity. This enumeration shows that the number of optimal words grows exponentially with N but is heavily modulated by the alphabet size q and the logarithmic term k.
To validate the theoretical findings, the authors implement exhaustive searches for small values of q and N, confirming that the computed K(N), R(N), and M(N) match the formulas. They also compare their upper bound K(N) with previously known bounds on subword complexity, demonstrating that their result is tighter for all examined cases.
Finally, the paper discusses potential applications. Words with maximal subword complexity possess high entropy and a uniform distribution of factors, making them attractive candidates for pseudo‑random sequence generation, cryptographic key streams, and compression benchmarks. The authors suggest that the construction methods described could be adapted to design deterministic generators that guarantee a prescribed level of factor diversity, which is valuable in contexts where statistical randomness must be provably ensured.
In summary, the work provides a complete characterization of the maximal subword complexity for finite words, identifies precisely which factor lengths achieve this maximum, and counts all words that realize it. The blend of combinatorial, algebraic, and graph‑theoretic techniques yields results that are both theoretically elegant and practically relevant.
Comments & Academic Discussion
Loading comments...
Leave a Comment