Recognizing well-parenthesized expressions in the streaming model
Motivated by a concrete problem and with the goal of understanding the sense in which the complexity of streaming algorithms is related to the complexity of formal languages, we investigate the problem Dyck(s) of checking matching parentheses, with $s$ different types of parenthesis. We present a one-pass randomized streaming algorithm for Dyck(2) with space $\Order(\sqrt{n}\log n)$, time per letter $\polylog (n)$, and one-sided error. We prove that this one-pass algorithm is optimal, up to a $\polylog n$ factor, even when two-sided error is allowed. For the lower bound, we prove a direct sum result on hard instances by following the “information cost” approach, but with a few twists. Indeed, we play a subtle game between public and private coins. This mixture between public and private coins results from a balancing act between the direct sum result and a combinatorial lower bound for the base case. Surprisingly, the space requirement shrinks drastically if we have access to the input stream in reverse. We present a two-pass randomized streaming algorithm for Dyck(2) with space $\Order((\log n)^2)$, time $\polylog (n)$ and one-sided error, where the second pass is in the reverse direction. Both algorithms can be extended to Dyck(s) since this problem is reducible to Dyck(2) for a suitable notion of reduction in the streaming model.
💡 Research Summary
The paper investigates the classic Dyck(s) language recognition problem in the streaming model, where the input is a sequence of parentheses of s different types and the task is to verify that they are properly matched. The authors focus on the most fundamental case, Dyck(2), and present both algorithmic upper bounds and matching lower bounds, then extend the results to arbitrary s.
One‑pass randomized algorithm (Dyck 2).
The input of length n is divided into √n‑sized blocks. Within each block a conventional stack is maintained, guaranteeing exact verification locally. At block boundaries the stack’s state is compressed using a polynomial fingerprint: each opening parenthesis contributes +x, each closing parenthesis contributes –x, and the accumulated polynomial is evaluated modulo a large random prime p. This fingerprint can be updated in O(log n) bits per symbol, so the whole algorithm needs O(√n log n) space. The randomness is supplied by public coins; the error is one‑sided and stems only from hash collisions, which can be driven down to 1/poly(n) by choosing p and the random seed appropriately. Each symbol is processed in polylog (n) time.
Space lower bound.
To show optimality, the authors construct a hard distribution over inputs and apply the information‑cost framework. They devise a game that mixes public randomness (available to both parties) with private randomness (chosen independently for each block). This mixture enables a direct‑sum argument: any protocol that solves a single block with error ε must convey Ω(√n) bits of information, and by replicating the block many times the total information cost becomes Ω(√n log n). A combinatorial argument for the base case reinforces the bound. Consequently, any one‑pass streaming algorithm (even with two‑sided error) requires Ω(√n log n) space up to polylogarithmic factors, matching the upper bound.
Two‑pass algorithm with a reverse pass.
If the algorithm is allowed a second pass that reads the stream backwards, the space can be reduced dramatically. In the forward pass the algorithm accumulates the same polynomial fingerprint and stores intermediate values at block boundaries. In the reverse pass it “undoes” the fingerprint by processing the stored values in reverse order, effectively checking that the overall polynomial evaluates to zero without ever materializing the full stack. This requires only O(log n) bits per block, yielding total space O((log n)²). The algorithm remains randomized with one‑sided error and runs in polylog (n) time per symbol.
Extension to Dyck(s).
The authors show that Dyck(s) reduces to Dyck(2) in the streaming setting. Each of the s parenthesis types is encoded as a distinct binary pattern, which is then interpreted as a sequence of the two primitive parentheses. The reduction can be performed on the fly, so the same algorithms and lower bounds apply, with the obvious scaling of n by a factor of O(s).
Implications and future work.
The work bridges streaming complexity and formal language theory, demonstrating that information‑theoretic techniques can yield tight space lower bounds for language‑recognition problems. It also highlights the power of a single reverse pass, suggesting that modest bidirectional access can dramatically reduce memory requirements. Open directions include exploring multi‑pass models, extending the approach to broader non‑regular language families, and investigating whether similar reductions exist for other context‑free languages.
Comments & Academic Discussion
Loading comments...
Leave a Comment