WEX: Formal Specifications for Windows in Stream Processing
A key operation in processing an unbounded data stream is windowing, which extracts finite portions of streams for further handling. The existing frameworks and query languages either require windows to be defined using ad hoc imperative languages or are limited to rudimentary constructs such as time- or count-based windows. We propose Window EXpression, a formal specification for precisely expressing windowing constructs based on monadic second-order logic. WEX can naturally express traditional windowing constructs such as sliding windows and tumbling windows, as well as more complex windows whose start and end indices are triggered based on the satisfaction of given logical conditions. After introducing a model of symbolic automata with lookbacks over an alphabet theory, we present another equivalent representation of WEX based on symbolic regular expressions. The precise semantics of windowing enable static analysis over WEX. In particular, we show that, in general, it is undecidable to check whether a WEX allows an unbounded number of overlapping windows. However, when the data stream is over a finite alphabet, or the alphabet theory has the so-called completion property, the problem becomes decidable.
💡 Research Summary
The paper addresses a fundamental limitation in modern stream processing systems: the lack of a rigorous, expressive, and analyzable way to define windows. Windows are the basic unit that allows a processor to reason about a finite segment of an otherwise infinite data stream. Existing platforms either restrict users to simple time‑ or count‑based windows, or they force users to write ad‑hoc imperative code to describe more complex windowing logic. Both approaches hinder declarative query formulation, impede static verification, and make it difficult to reason about resource consumption such as the number of overlapping windows that may be generated at a single stream position.
To solve this, the authors introduce Window EXpression (WEX), a formal language for specifying windows. WEX is built on Monadic Second‑Order logic (MSO) of one successor, which naturally incorporates the linear order of stream positions. A WEX specification consists of a guarded MSO formula ϕ(xₛ, xₑ) with two free variables representing the start and end indices of a window. A pair (a, b) of indices defines a window if the sub‑stream w
Comments & Academic Discussion
Loading comments...
Leave a Comment