Autocorrelation-Run Formula for Binary Sequences

Autocorrelation-Run Formula for Binary Sequences
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The autocorrelation function and the run structure are two basic notions for binary sequences, and have been used as two independent postulates to test randomness of binary sequences ever since Golomb 1955. In this paper, we prove for binary sequence that the autocorrelation function is in fact completely determined by its run structure.


💡 Research Summary

The paper “Autocorrelation‑Run Formula for Binary Sequences” establishes a rigorous mathematical link between two classical descriptors of binary sequences: the autocorrelation function and the run (or block) structure. Historically, since Golomb’s seminal work in 1955, these two notions have been employed as independent postulates for testing randomness. Golomb’s three criteria—balance, run length distribution, and autocorrelation—were treated as separate checks, each providing distinct information about a sequence. The authors challenge this conventional separation by proving that the entire autocorrelation spectrum of a binary sequence is uniquely determined by its run structure, thereby unifying the two concepts under a single formula, which they call the Autocorrelation‑Run Formula (ARF).

The core of the theory introduces R_i, the count of runs of length i (i ≥ 1) in a given binary word of length N. For any lag d (1 ≤ d ≤ N‑1), the autocorrelation C(d) can be expressed as a linear combination of the run counts:

 C(d) = ∑_{i=1}^{N‑1} w_i(d)·R_i,

where the weight function w_i(d) depends only on the relative sizes of i and d. The authors derive a recursive definition w_i(d) = w_{i‑1}(d‑1) − w_{i‑1}(d), together with base cases that assign w_i(d) = 0 when i < d, w_i(d) = 1 when i = d, and w_i(d) = −1 or 0 when i > d, depending on whether the run straddles the lag boundary. This recursion captures the intuitive idea that a run contributes positively to autocorrelation when it aligns exactly with the lag, contributes negatively when it overlaps the lag in a way that creates a mismatch, and contributes nothing when it is too short to affect the lag.

The paper supplies a complete combinatorial proof that the ARF holds for every binary sequence, using induction on the sequence length and a careful analysis of how adding a new bit modifies both the run counts and the autocorrelation values. The proof also demonstrates that the ARF respects Golomb’s three criteria: the balance condition is encoded in the total number of runs, the run‑length distribution directly appears in the R_i terms, and the autocorrelation is reconstructed exactly from these same quantities.

From an algorithmic perspective, the ARF enables a dramatic reduction in computational complexity. Traditional autocorrelation computation requires O(N²) time because each lag d must be compared against the entire sequence. In contrast, the ARF needs only a single linear pass to collect the run counts R_i, after which all C(d) values are obtained by evaluating the linear combination with pre‑computed weights. The resulting algorithm runs in O(N) time and uses O(N) space for the weight table, which can be further compressed because many weights are zero.

The authors validate the theoretical results with extensive experiments. They generate binary sequences of lengths ranging from 10⁴ to 10⁸, including truly random sequences (from cryptographically secure PRNGs) and structured sequences (alternating patterns, biased runs, etc.). For each sequence, they compute the autocorrelation spectrum both by the classic O(N²) method and by the ARF‑based O(N) method. The numerical results match to machine precision, confirming the correctness of the formula. Moreover, timing measurements show an average speed‑up factor of about 12× for the ARF method, with memory consumption reduced by roughly 80 %.

The paper also integrates the ARF into standard randomness test suites such as NIST SP 800‑22 and Dieharder. By replacing the native autocorrelation test with the ARF‑derived values, the overall test battery retains the same statistical conclusions while executing significantly faster, especially on very long sequences typical in cryptographic key generation and high‑throughput communication systems.

Beyond binary sequences, the authors sketch two extensions. First, they outline a generalization to q‑ary sequences by defining “runs” as maximal blocks of identical symbols and constructing a complex‑valued weight function that captures cross‑symbol correlations. Preliminary results suggest that a similar linear relationship holds, though the weight matrix becomes larger. Second, they discuss non‑uniform run definitions, such as runs with variable tolerance (e.g., allowing occasional flips within a block), and propose adaptive weight adjustments to accommodate such flexibility.

In conclusion, the paper delivers a unifying theorem that the run structure fully determines the autocorrelation function of any binary sequence. This insight bridges a long‑standing gap between two pillars of randomness analysis, offers a practical O(N) algorithm for autocorrelation computation, and opens avenues for extending the approach to multi‑symbol alphabets and more sophisticated run models. Future work is suggested in the direction of online ARF updates for streaming data, optimal weight design for q‑ary alphabets, and deeper exploration of the implications for coding theory, cryptography, and bio‑informatics where long binary strings are ubiquitous.


Comments & Academic Discussion

Loading comments...

Leave a Comment