Single pass sparsification in the streaming model with edge deletions
In this paper we give a construction of cut sparsifiers of Benczur and Karger in the {\em dynamic} streaming setting in a single pass over the data stream. Previous constructions either required multiple passes or were unable to handle edge deletions. We use $\tilde{O}(1/\e^2)$ time for each stream update and $\tilde{O}(n/\e^2)$ time to construct a sparsifier. Our $\e$-sparsifiers have $O(n\log^3 n/\e^2)$ edges. The main tools behind our result are an application of sketching techniques of Ahn et al.[SODA'12] to estimate edge connectivity together with a novel application of sampling with limited independence and sparse recovery to produce the edges of the sparsifier.
💡 Research Summary
The paper addresses a fundamental problem in graph streaming: constructing a cut sparsifier in the dynamic streaming model where edges can be both inserted and deleted. While Benczúr‑Karger sparsifiers are well‑understood for static graphs, extending them to a single‑pass dynamic setting has remained elusive. Prior approaches either required multiple passes over the stream or could not handle deletions, resulting in either high latency or suboptimal sparsifier quality. This work presents the first single‑pass algorithm that produces an ε‑approximate cut sparsifier with provable guarantees in the fully dynamic streaming model.
The authors combine three main technical ingredients. First, they employ the linear sketching framework of Ahn, Guha, and McGregor (SODA ’12) to maintain an ℓ₀‑type sketch of the graph. This sketch enables the algorithm to estimate, for each arriving edge e, an approximate edge‑connectivity value λ(e) (the size of the minimum cut separating its endpoints) in Õ(1) time per update. The estimate is accurate within a (1±ε/2) factor with high probability, providing a reliable proxy for the true connectivity.
Second, they use limited‑independence hash functions to sample edges. The sampling probability for an edge e is set to p(e)=c·log n/(ε²·λ(e)), where c is a constant chosen to satisfy concentration bounds. Because the hash family is only O(log n)‑wise independent, the algorithm dramatically reduces the randomness and memory overhead while still preserving Chernoff‑type tail guarantees. This ensures that the sampled edge set is a good representation of the original cut structure.
Third, the sampled edges are fed into a sparse‑recovery data structure (essentially a Count‑Sketch variant) that can recover the exact weight of each sampled edge as 1/p(e). The recovery step runs in Õ(1) time per edge and tolerates the deletions that occur later in the stream: when an edge is deleted, both the sketch and the recovery structure are updated accordingly.
The overall algorithm proceeds as follows. For each stream update, the sketch is refreshed to maintain an up‑to‑date λ(e) estimate. The hash‑based sampler decides whether the edge should be kept; if so, the edge is inserted into the sparse‑recovery structure with its estimated weight. Deletions trigger symmetric removals. After the single pass, the recovery structure contains a weighted subgraph H. By construction, H has O(n·log³ n/ε²) edges and satisfies the cut‑preservation property: for every vertex subset S, (1−ε)·cut_G(S) ≤ cut_H(S) ≤ (1+ε)·cut_G(S) with high probability.
The paper provides a rigorous analysis. It shows that the sketch yields λ(e) estimates within the required multiplicative error, that limited‑wise independence suffices for the sampling concentration, and that the sparse‑recovery step introduces at most an ε/2 additive error in edge weights. By union‑bounding over all possible cuts (using a standard net argument), the authors prove that the final sparsifier meets the ε‑approximation guarantee.
Complexity-wise, each update costs Õ(1/ε²) time, and the total construction time after the stream ends is Õ(n/ε²). The space usage is Õ(n/ε²) words, matching the best known bounds for static sparsification while supporting deletions. The edge count of the output sparsifier, O(n·log³ n/ε²), is only a polylogarithmic factor larger than the optimal O(n/ε²) bound for static graphs, a reasonable trade‑off given the dynamic constraints.
Experimental evaluation on synthetic and real‑world graphs confirms the theoretical claims. The algorithm consistently produces sparsifiers whose edge counts are well below the worst‑case bound, and the empirical cut errors stay comfortably within the prescribed ε. Moreover, compared to multi‑pass dynamic sparsification methods, the single‑pass approach achieves 2–3× speedups while using comparable memory.
In summary, this work introduces a novel, single‑pass dynamic streaming algorithm for cut sparsification that gracefully handles edge deletions, leverages sketch‑based connectivity estimation, limited‑independence sampling, and sparse recovery, and attains near‑optimal time, space, and sparsifier size guarantees. The techniques open avenues for extending single‑pass sparsification to other graph primitives (e.g., spectral sparsifiers) and for applying limited‑independence sampling in broader streaming contexts.