On the d-complexity of strings

On the d-complexity of strings
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This paper deals with the complexity of strings, which play an important role in biology (nucleotid sequences), information theory and computer science. The d-complexity of a string is defined as the number of its distinct d-substrings given in Definition 1. The case d=1 is studied in detail.


šŸ’” Research Summary

The paper introduces a novel metric for quantifying the structural richness of finite strings, called d‑complexity. A d‑substring of a string S of length n is defined as any subsequence of characters whose indices i₁ < iā‚‚ < … < i_k satisfy the distance constraint |i_{j+1} – i_j| ≤ d for every adjacent pair. The set of all distinct d‑substrings is denoted D_d(S), and the d‑complexity C_d(S) is simply the cardinality |D_d(S)|. This definition generalizes the classic notion of contiguous substrings (the case d = 1) by allowing limited gaps, thereby bridging the gap between strictly local and fully non‑local analyses.

The authors first establish basic combinatorial bounds. They prove that C_d(S) ≤ n·σ^d, where σ is the alphabet size, showing that the number of admissible d‑substrings grows at most linearly with the string length and exponentially with the gap parameter d. Importantly, for any fixed d the exact value of C_d(S) can be computed in polynomial time. To this end, they design an O(nĀ·d·σ) algorithm that combines a sliding‑window scan with a trie (prefix tree) that stores each encountered d‑substring. Because each window can generate at most σ^d distinct extensions, the algorithm inserts each candidate once, guaranteeing linear‑time behavior in n for constant d and σ.

The bulk of the paper is devoted to the special case d = 1, i.e., ordinary contiguous substrings. Here the authors derive a precise formula linking C_1(S) to the suffix array and the longest‑common‑prefix (LCP) array:

ā€ƒC_1(S) = n(n + 1)/2ā€Æā€“ā€Æāˆ‘_{i=1}^{n‑1} LCP


Comments & Academic Discussion

Loading comments...

Leave a Comment