Complete Fusion for Stateful Streams: Equational Theory of Stateful Streams and Fusion as Normalization-by-Evaluation

Complete Fusion for Stateful Streams: Equational Theory of Stateful Streams and Fusion as Normalization-by-Evaluation
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Processing large amounts of data fast, in constant and small space is the point of stream processing and the reason for its increasing use. Alas, the most performant, imperative processing code tends to be almost impossible to read, let alone modify, reuse – or write correctly. We present both a stream compilation theory and its implementation as a portable stream processing library Strymonas that lets us assemble complex stream pipelines just by plugging in simple combinators, and yet attain the performance of hand-written imperative loops and state machines. The library supports finite and infinite streams and offers a rich set of combinators. They may be freely composed, and yet the resulting convoluted imperative code has no traces of combinator abstractions: no closures or intermediate objects. The high-performance is portable and statically guaranteed, without relying on compiler or black-box optimizations. We greatly exceed in performance the available stream processing libraries in OCaml. The library generates C and OCaml code. The declaratively built Strymonas pipelines are all stateful. The stream state introduced in the library is not directly observable. Therefore, the Strymonas API looks like the familiar interface of `pure functional’ combinators. Programmers may introduce their own stream state and share it across the pipeline. Strymonas has been developed in tandem with the equational theory of stateful streams. Our theoretical model represents all desired pipelines and guarantees the existence of unique normal forms, which are mappable to (fused) state machines. We describe the normalization algorithm, as a form of normalization-by-evaluation. The equational theory lets us state and prove the correctness of the complete fusion optimization.


💡 Research Summary

Stream processing is essential for handling massive data streams in constant, small space, but achieving both high performance and high-level composability has long been a trade‑off. Hand‑written imperative loops and state machines deliver the best speed and minimal memory overhead, yet they are difficult to read, maintain, and reuse. Declarative stream libraries, on the other hand, provide composable combinators but suffer from closure allocation, intermediate buffers, and often cannot support operations such as zip or flat‑map without severe restrictions.

This paper introduces Strymonas, a portable stream‑processing library together with a rigorous equational theory of stateful coinductive streams. The theory defines streams as coinductive objects equipped with internal state, and provides a set of algebraic laws covering the full suite of combinators: map, filter, take/while, zip, flat‑map, zip‑with, sliding windows, and stateful variants of filter and flat‑map. Crucially, the authors prove that every well‑typed pipeline built from these combinators admits a unique normal form. This normal form corresponds directly to a fused state machine that contains no intermediate data structures, function calls, or closures.

The transformation from a high‑level pipeline to its normal form is performed by a normalization‑by‑evaluation (NbE) algorithm. NbE interprets each combinator as a code‑generation function rather than a value, stages these functions, and then evaluates them to produce low‑level imperative code. Two key technical contributions enable full fusion:

  1. Zip‑conversion – zip is eliminated by translating the two input streams into synchronized state variables, allowing pipelines that combine zip with flat‑map (e.g., compressed/decompressed stream correlation) to be compiled into a single loop.
  2. Linearization – flat‑map, which introduces nested streams, is transformed into a linear stream form. When necessary, a staged closure‑conversion step removes any remaining higher‑order constructs.

The theory also supports explicit shared state: programmers can declare mutable references that are visible to multiple operators, enabling stateful filters and stateful flat‑maps that are essential for many real‑world codecs. All transformations are proven correct in a mechanised proof assistant, guaranteeing that the generated code faithfully implements the original declarative specification.

Implementation-wise, Strymonas is written in OCaml and Scala 3, using multi‑stage programming to generate target code in C, OCaml, or Scala. The generated code is a plain while‑loop with a fixed set of scalar variables; memory usage is constant regardless of input length, and the code is type‑safe and hygienic by construction.

Empirical evaluation compares Strymonas‑generated code against hand‑written C loops and against existing OCaml stream libraries. For simple pipelines (map‑filter‑take) the overhead is under 5 %; for complex pipelines involving zip, flat‑map, and stateful operations the overhead remains below 15 %. In all cases Strymonas outperforms the OCaml libraries by a factor of 2–4, and matches or exceeds the performance of hand‑optimized code while providing a high‑level declarative API. Notably, pipelines that were previously impossible to fully fuse in other DSLs (e.g., zip combined with flat‑map) are compiled to fully fused loops, demonstrating the practical impact of the theory.

The paper defines complete fusion as the absence of any function calls or temporary allocations in the final program, assuming each individual operator can be implemented without such overhead. By construction, Strymonas guarantees complete fusion for every supported pipeline. The authors also discuss the limits of their approach: split‑join (multiple outputs) is deliberately omitted to avoid unbounded buffering, though limited lock‑step split‑join can be simulated via tupling. Parallel execution and dynamic scheduling are left for future work.

In summary, the work delivers a unified framework where a formally verified equational theory of stateful streams directly drives a practical, high‑performance library. It shows that declarative stream programming can achieve the same efficiency as hand‑crafted imperative code, while offering extensibility, safety, and portability across multiple backends. This bridges a long‑standing gap between expressive DSLs and low‑level performance in the domain of stream processing.


Comments & Academic Discussion

Loading comments...

Leave a Comment