How to Compute a Moving Sum
Windowed recurrences are sliding window calculations where a function is applied iteratively across the window of data, and are ubiquitous throughout the natural, social, and computational sciences. In this monograph we explore the computational aspects of these calculations, including sequential and parallel computation, and develop the theory underlying the algorithms and their applicability. We introduce an efficient new sequential algorithm with low latency, and develop new techniques to derive and analyze the complexity and domain of validity of existing sequential algorithms. For parallel computation we derive new parallel and vector algorithms by relating windowed recurrences to the algebraic construction of semidirect products, and to algorithms for exponentiation in semigroups. In the middle chapters of the monograph we further develop the theory of semi-associativity and the algebraic conditions for representing function composition and function application by data. This systematizes the techniques used by practitioners to parallelize recurrence calculations. We end the monograph with an extensive gallery of examples of interest to specialists in many fields. Throughout the monograph new algorithms are described with pseudo-code transcribed from functioning source code.
💡 Research Summary
The monograph “How to Compute a Moving Sum” presents a comprehensive study of sliding‑window calculations, focusing on moving sums as the canonical example but extending the discussion to arbitrary binary operations, including non‑associative and non‑commutative cases. The authors begin by formalizing the moving‑sum problem: given a stream a₁, a₂, … and a window length n, compute yᵢ = aᵢ + aᵢ₋₁ + … + aᵢ₋ₙ₊₁ for all i. They adopt a left‑to‑right insertion convention (new elements added on the left, old elements evicted on the right) because it aligns naturally with the functional notation addₓ(y) = x + y and simplifies the algebraic treatment of non‑commutative operators.
The early chapters review classic approaches. The naïve O(N·n) method recomputes each window from scratch. The “subtract‑on‑evict” technique maintains a running total, updating it in O(1) per step by subtracting the evicted element, but it requires careful handling of the initial window. The “difference of prefix sums” algorithm pre‑computes prefix sums and obtains each window by a single subtraction, achieving O(N) time with O(N) extra space. The authors analyze the trade‑offs of each method, especially regarding boundary conditions and the impact of floating‑point non‑associativity.
The core contributions are two new sequential algorithms: the “Two Stacks” method and the “Double‑Ended Window” (DEW) algorithm. The Two Stacks algorithm keeps an input stack for newly arriving elements and an output stack for elements about to be evicted. When the output stack empties, the input stack is reversed and becomes the new output stack. This yields constant‑time insert and evict operations, overall O(N) time, and O(n) auxiliary memory. The analysis introduces the “cumulative dominance” property and the Peter‑Paul lemma to prove that the algorithm’s latency never exceeds a constant bound, even in worst‑case input patterns.
The DEW algorithm refines the two‑stack idea by explicitly separating “flip” (reversing a stack) and “slide” (moving the window one position). Each iteration performs at most three primitive operations: push the new element, possibly pop the oldest element, and update the aggregate using a user‑provided binary operator ∘ that satisfies a weakened associativity called semi‑associativity. The authors prove that DEW runs in exactly 3N elementary steps, uses O(n) memory, and provides bounded latency suitable for real‑time streaming systems. They also give a graphical representation via stacked staggered sequence diagrams, which clarifies data flow and aids implementation.
A substantial portion of the monograph is devoted to the algebraic framework that underpins these algorithms. The authors define a “windowed recurrence” as a recurrence relation over a set A of functions acting on a set B of data values, equipped with a set action φ: A → End(B). When the binary operation is associative, the recurrence reduces to ordinary prefix sums; when it is only semi‑associative, the recurrence can still be evaluated efficiently using the presented algorithms. The paper introduces the concept of a semidirect product A ⋉ B to model the interaction between functions and data, showing that many sliding‑window calculations can be expressed as exponentiation in such a semidirect product. This perspective unifies sequential and parallel treatments.
Parallelization is addressed by relating sliding‑window recurrences to exponentiation in semigroups. The authors revisit classic parallel prefix algorithms (Kogge‑Stone, Blelloch) and reinterpret them as exponentiation in a semidirect product. They then apply addition‑chain techniques due to Brauer and Thurber to construct short exponentiation sequences, achieving logarithmic depth (⌈log₂ N⌉) and O(N log N) total work on a PRAM model. The monograph details how to adapt these techniques to vector processors and GPUs, providing pseudo‑code for SIMD‑friendly implementations. Multi‑query scenarios are handled by reusing the same addition chain for many windows, reducing per‑query work to O(log N) or better.
The later chapters broaden the scope to categories and magmoids, allowing windowed recurrences over heterogeneous domains (e.g., mixing integers, matrices, and strings). The authors discuss how to lift the semi‑associative framework to these richer algebraic structures, preserving the ability to parallelize reductions.
Finally, the monograph presents a gallery of concrete applications: computing k‑mer frequencies in DNA sequences, rolling n‑gram statistics in natural‑language processing, real‑time FIR filtering in signal processing, and cumulative risk metrics in finance. For each case the authors identify the appropriate binary operation, verify its semi‑associative property, and select the most suitable algorithm (Two Stacks, DEW, or a vectorized prefix‑sum variant). They also report empirical benchmarks showing that DEW outperforms traditional sliding‑window implementations on both CPU and GPU platforms, especially when low latency is required.
In summary, the paper delivers a unified algebraic theory of sliding‑window recurrences, introduces two highly efficient sequential algorithms (Two Stacks and DEW) with provable constant latency, and extends the theory to parallel and vectorized contexts via semidirect products and addition‑chain exponentiation. The work bridges the gap between abstract algebraic insight and practical high‑performance code, offering a valuable toolkit for researchers and engineers dealing with streaming data across a wide range of scientific and engineering domains.
Comments & Academic Discussion
Loading comments...
Leave a Comment