Computer Science / Computational Complexity Computer Science / Data Structures

A Lower Bound for Estimating High Moments of a Data Stream

February 23, 2026

Reading time: 6 minute

...

#Data #Data Structures #Computer Science #Computational Complexity

📝 Original Info

Title: A Lower Bound for Estimating High Moments of a Data Stream
ArXiv ID: 1201.0253
Date: 2015-03-19
Authors: ** (논문에 명시된 저자 정보가 제공되지 않았으므로, 저자 정보를 알 수 없습니다.) **

📝 Abstract

We show an improved lower bound for the Fp estimation problem in a data stream setting for p>2. A data stream is a sequence of items from the domain [n] with possible repetitions. The frequency vector x is an n-dimensional non-negative integer vector x such that x(i) is the number of occurrences of i in the sequence. Given an accuracy parameter Omega(n^{-1/p}) < \epsilon < 1, the problem of estimating Fp is to estimate \norm{x}_p^p = \sum_{i \in [n]} \abs{x(i)}^p correctly to within a relative accuracy of 1\pm \epsilon with high constant probability in an online fashion and using as little space as possible. The current space lower bound for this problem is Omega(n^{1-2/p} \epsilon^{-2/p}+ n^{1-2/p}\epsilon^{-4/p}/ \log^{O(1)}(n)+ (\epsilon^{-2} + \log (n))). The first term in the lower bound expression was proved in \cite{B-YJKS:stoc02,cks:ccc03}, the second in \cite{wz:arxiv11} and the third in \cite{wood:soda04}. In this note, we show an Omega(p^2 n^{1-2/p} \epsilon^{-2}/\log (n)) bits space bound, for Omega(pn^{-1/p}) \le \epsilon \le 1/10.

💡 Deep Analysis

📄 Full Content

In the insert-only data streaming model, a stream is modeled as a sequence of items i 1 , i 2 , . . . , where the items come from a large domain [n] = {1, 2, . . . , n}. The frequency vector is an n-dimensional vector x whose ith coordinate x(i) counts the number of occurrences of i in the sequence. Each new arrival of an item i j increments x(i j ) to x(i j ) + 1. Define x p p = i∈[n] |x i | p . The pth moment estimation problem, with accuracy parameter ǫ, is to design a structure that can process the stream sequence in an online fashion and return a real value Fp satisfying Fp -x p p ≤ ǫ x p p with probability 9/10. The estimate Fp may use only the structure and not the original stream, that is, a stream may be processed in an online fashion only. The F p estimation problem has played a pivotal role in the study of data streaming algorithms. It was first posed and studied by Alon, Matias and Szegedy [1]. They showed that for all p = 1, a deterministic ǫ-accurate F p estimation with ǫ ≤ 1/8 requires Ω(n) bits, as does a randomized algorithm with no error. This reduces the scope to approximate randomized algorithms or randomized PTAS. A series of works [1,2,3] culminated in showing a lower bound of Ω(n1-2/p ǫ -2/p ) bits for ǫ-accurate F p estimation. Very recently, Woodruff and Zhang in [6] improve this bound to Ω(n 1-2/p ǫ -4/p ) bits, where, Ω(f (n, ǫ)) denotes f (n, ǫ)/ log O(1) (n/ǫ). Woodruff in [5] shows an Ω(ǫ -2 + log(n)) bits bound for F p , for all p = 1. 1 So, the current lower bound for F p estimation in bits is:

In this note, we show a lower bound of Ω p 2 n 1-2/p ǫ -2 / log(n) bits for this problem, improving upon the current known bounds.

We will reduce the standard t-party set disjointness problem to F p estimation. The problem t-DISJ is as follows: the instance is a collectionof t sets S 1 , . . . , S t , each subset of [n], where, the set S i is given to the ith party with the promise that the set family is either pair-wise disjoint, or, S 1 ∩ . . . ∩ S t has exactly one element in common. We denote the ith coordinate of a vector x by x(i); so x = [x(1), . . . , x(n)]. With this notation, an instance of t-DISJ consists of n-dimensional binary vectors x 1 , . . . , x t , where, x r is given to the rth party and is interpreted as the characteristic vector of the set S r . The promise is that either, (a) x 1 + . . . + x t is a binary vector (the disjoint case), or, ( 2) there is exactly one index i such that x 1 (i) = x 2 (i) = . . . = x t (i) = 1 (the common element case). It is well-known that any one-way randomized communication protocol that solves t-DISJ with probability at least 7/8 requires Ω(n/t) bits [2,3]. We show the following theorem.

Theorem 1 For 2 < p < n 1/p /2 and max(80p/n 1/p , 3/ √ n) ≤ ǫ ≤ 1/4, an algorithm that estimates F p with relative error of ǫ/10 and with probability 19/20 uses space

Proof We present a randomized one-way communication protocol for t-DISJ that is correct with probability 9/10, where, t = ⌈ǫn 1/p /(2p)⌉. The protocol uses two structures that can process stream updates, one for estimating F p to within a factor of 1 ± ǫ/10 with confidence 1 -1/(20n), and, the second for estimating F 0 to within a factor of 1 ± ǫ/10 with probability 19/20. A one-way protocol for t-DISJ is as follows. Consider an instance of t-DISJ. Party 1 inserts x 1 into each of the structures for estimating F p and F 0 and sends the pair of structures to the second party. This party further adds its vector x 2 into the two structures received and then relays it to the third party, and so on, in sequence. Finally, the tth party inserts its own vector into the structures obtained from t -1st party. It then uses the procedure InferDisj of Figure 1 to infer whether the instance is pair-wise disjoint or has a common element.

We first show that the procedure InferDisj is correct with probability at least 9/10. Define the event GoodF 0 as F0 ∈ (1 ± ǫ/10) x 0 , so, GoodF 0 holds with probability 19/20. Let x = x 1 + x 2 + . . . + x t . Say that i is a heavy item in x if x(i) = t. Procedure InferDisj obtains an estimate F i p obtained by applying the F p estimation algorithm to the vector x + n 1/p e i (in parallel, for each i). Given x and an index i, we consider three cases. Assume 3p < n.

Case 1: x has no heavy item, that is, x is a binary vector. So,

where, x ′ is a binary vector with x ′ (i) = 0. Hence, x ′ 0 = x 0 -x(i) and

)), assuming p < n 1/p /3 and elementary calculations .

So with probability 1 -1/(20n), and conditional on GoodF 0 ,

procedure InferDisj Input: Given F 0 and F p sketches of x = x 1 + . . . + x t (integer n-dimensional vector) such that 1. For x, one of the two cases hold: Disjoint : x ∈ {0, 1} n , or, Common Element: there exists exactly one i such x(i) = t = ⌈ǫn 1/p /(2p)⌉ and the remaining x(j)’s are either 0 or 1.

F0 ∈ (1 ± ǫ)F 0 with probability 19/20, and, Fp ∈ (1 ± ǫ)F p with probability 1 -1/(20n).

Output: Returns common element i if the inpu

📄 Read Full PDF on ArXiv

📸 Image Gallery

Reference

This content is AI-processed based on open access ArXiv data.

A Lower Bound for Estimating High Moments of a Data Stream

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Related Posts

A case study of the difficulty of quantifier elimination in constraint databases: the alibi query in moving object databases

Limits of Approximation Algorithms: PCPs and Unique Games (DIMACS Tutorial Lecture Notes)

A Bound on the Sum of Weighted Pairwise Distances of Points Constrained to Balls

Start searching

No results found