AMS Without 4-Wise Independence on Product Domains

In their seminal work, Alon, Matias, and Szegedy introduced several sketching techniques, including showing that 4-wise independence is sufficient to obtain good approximations of the second frequency moment. In this work, we show that their sketchin…

Authors: Vladimir Braverman, Kai-Min Chung, Zhenming Liu

Symposium on Theoretical Aspects of Computer Science 2010 (Nancy , Fr ance), pp. 119-130 www .st acs-conf .org AMS WITHOUT 4 -WISE INDEPENDENCE ON PR ODUCT DOM AINS VLADIMIR BRA VERMAN 1 AND KAI-MIN CHUNG 2 AND ZHENMING LIU 3 AND MICHAEL MITZENMA CHER 4 AND RAF AIL OSTR O VSKY 5 1 Univ ersity of California Los Angeles. Supported in part by NSF gran ts 071 6835, 07163 89, 0830803 , 0916574 and Lockheed Martin Corporation. E-mail addr ess : vova@cs.ucla. edu URL : http://www. cs.ucla.edu/ ˜ vova 2 Harv ard School of E ngineering and Applied Sciences. Supported by US-Israel BSF grant 2006060 and NS F grant CNS-0831289. E-mail addr ess : kmchung@fas.h arvard.edu URL : http://peop le.seas.harva rd.edu/ ˜ kmchung/ 3 Harv ard School of Engineering and Applied Sciences. Supported in part by NSF grant CN S-0721491. The work was finished du ring an internship in Microsoft Research Asia. E-mail addr ess : zliu@fas.harv ard.edu URL : http://peop le.seas.harva rd.edu/ ˜ zliu/ 4 Harv ard School of E ngineering and Applied Sciences. Supported in part by NSF grant CNS-0721491 and research grants from Y ahoo!, Google, and Cisco. E-mail addr ess : michaelm@eecs .harvard.edu URL : http://www. eecs.harvard. edu/ ˜ michaelm/ 5 Univ ersity of California Los An geles. Supported in p art by IBM F aculty A ward, Lock heed-Martin Corporation Research A ward, Xerox Innov ation Group A ward, the Okawa Foundation A ward, Intel, T eradata, NS F grants 071683 5, 0716389, 0830803, 0916574 and U.C. MICR O grant. E-mail addr ess : rafail@cs.ucl a.edu URL : http://www. cs.ucla.edu/ ˜ rafail A B S T R A C T . In their seminal work, Alon, Matias, and Szegedy introduced sev eral sketching tech- niques, including showing that 4 -wise independe nce is sufficient to obtain good approximations of the second frequency moment. In this work, we sho w that t heir sketching technique can be extended to product domains [ n ] k by using the product of 4 -wise independent functions on [ n ] . Our work extend s that of Indyk and McGregor , who sho wed the result for k = 2 . Their primary motiv ation was the problem of identifying correlations in data streams. In their mo del, a stream o f pairs ( i, j ) ∈ [ n ] 2 arriv e, giving a joint distribution ( X , Y ) , and they fi nd approximation algorithms for how close the joint distribution is to the product of the mar ginal distrib utions under v arious metrics, which naturally corresponds to how close X and Y are to being independent. By using our technique, we obtain a ne w result for the problem of approximating the ℓ 2 distance between the joint distribution and the product of t he marginal distri bution s for k -ary vectors, instead of just pairs, in a single pass. Our analysis giv es a randomized algorithm that is a ( 1 ± ǫ ) approximation (with probability 1 − δ ) that requires space logarithmic in n and m and proportional to 3 k . 1998 ACM Subject Classification: F .2.1, G.3 . K ey wor ds and phrases: Data Streams, Randomized Algorithms, S treaming Algorithms, Indep endence, Sketches. THIS P APER IS A MERGE FR OM THE WORK OF [7, 9, 10] c  V . Brav erman, K. Chung, Z. Liu, M. Mitzenmacher, and R. Ostrovsky CC  Creative Commons Attribution-NoDe rivs License 120 V . BRA VE RMAN, K. CHUNG, Z. LIU, M. MITZENMACHER, AND R. OSTR O VSKY 1. Introduc tion In their semina l work , Alon, Matias and S zeg edy [4] presen ted celebrat ed sketch ing techniq ues and sho wed that 4 -wise independe nce is suf fi cient to obta in good approximatio ns of the second freque ncy moment. Indyk and McGregor [12] make use of this technique in their work introdu ce the problem of meas uring ind epende nce in th e strea ming model. There they gi ve efficient algo- rithms for approxi mating pairwise indepen dence for the ℓ 1 and ℓ 2 norms. In their model, a stream of pairs ( i, j ) ∈ [ n ] 2 arri ve, gi ving a joint distrib ution ( X, Y ) , a nd the notion of appro ximating pair - wise independen ce corresponds to approximatin g the dista nce between the join t distr ib ution and the produ ct of the margi nal distrib utions for the pairs. Indyk and McGregor state, as an exp licit open questi on in their paper , the problem of whether one can estimate k -wise indepen dence on k -tuples for any k > 2 . In particular , Indyk and McGrego r show that, for the ℓ 2 norm, they can make use of the pro duct of 4 -wise inde penden t functions on [ n ] in the sk etching method of A lon, Matias , and Szege dy . W e extend their appro ach to show that on the product domain [ n ] k , the ske tching m ethod of Alon , Matias, and S zege dy work s when using the pro duct of k copies of 4 -wise indep endent functi ons on [ n ] . The cost is that the memory requirement s of ou r ap proach gro w e xponentially with k , proporti onally to 3 k . Measurin g ind ependen ce and k -wise independ ence is a fundamen tal probl em with many ap- plicati ons (see e.g., L ehmann [13]). Recently , this problem was also addressed in other models by , among others, Alon, Ando ni, Kaufman, Matulef, Rubin feld and X ie [1]; Batu, Fort no w , Fischer , Kumar , Rubinfeld and White [5]; Goldreich and Ron [11]; Batu, Kumar and Rubinfeld [6 ]; Alon, Goldreic h and Mansour [3]; and Rubinfeld and Servedio [15]. Trad itional non-paramet ric m ethods of te sting indepe ndence ove r empiric al data usu ally require spa ce comple xity that is poly nomial to either the suppor t size or input size. The scale of contempora ry data sets often prohibi ts suc h space complexit y . It is therefore natura l to ask whether we w ill be able to design algor ithms to test for independe nce in streaming model. Inte resting ly , this specific proble m appears not to ha ve been introd uced until the work of Ind yk and McGreg or . While ar guably results for the ℓ 1 norm wou ld be strong er than for the ℓ 2 norm in this settin g, the problem for ℓ 2 norms is in terestin g in its o wn right. The problem for the ℓ 1 norm has been recently resolv ed by Bra ver man and Ostrovsk y in [8 ]. The y ga ve an (1 ± ǫ, δ ) -approximat ion algorithm that m ake s a single pass ove r a data stream and uses polylo garithmic m emory . 1.1. Our Results In this pap er we generalize the “sketc hing of sk etches” result of Indyk and M cGreg or . Our specific theor etical contri bu tions can be summarized as follo w s: Main Theorem. Let ~ v ∈ R ( n k ) be a vector w ith entries ~ v p ∈ R for p ∈ [ n ] k . Let h 1 , . . . , h k : [ n ] → {− 1 , 1 } be indepe ndent copies of 4-wise independent hash functions; that is, h i (1) , . . . , h i ( n ) ∈ {− 1 , 1 } are 4 -wise independ ent hash functions for each i ∈ [ k ] , and h 1 ( · ) , . . . , h k ( · ) are mutually independe nt. Define H ( p ) = Q k i =1 h j ( p j ) , and the sketc h Y = P p ∈ [ n ] k ~ v p H ( p ) . W e prove that the sketch Y can be used to giv e an efficient approximatio n for k ~ v k 2 ; our result is stated formally in Theorem 4.2. Note that H is not 4 -wise indepe ndent. As a corolla ry , the main applica tion of our main theorem is to extend the result of Indyk and McGrego r [12] to detect the dependen cy of k rando m v ariable s in streamin g m odel. AMS WITHOUT 4 -WISE INDEPENDENCE ON PR ODUCT DOMAINS 121 Cor ollary 1.1. F or ever y ǫ > 0 and δ > 0 , th er e e xists a randomize d alg orithm that compu tes, given a sequ ence a 1 , . . . , a m of k -tuples , in one pass and using O (3 k ǫ − 2 log 1 δ (log m + log n )) memory bits, a number Y so that the pr obability Y devia tes fr om the ℓ 2 distan ce between pr oduct and joint distrib ution by mor e than a factor of (1 + ǫ ) is at most δ . 1.2. T echniques and a Histo rical R emark This paper is mer ge from [7, 9, 10], w here the same result was obtain ed with dif ferent proof s. The proof of [10] generalize s the geometri c approach of Indyk and McGregor [12] with ne w geo- metric observ ations. The proof s of [7, 9] are more combinato rial in nature. These papers offer ne w insigh ts, bu t due to the space limitation, we focu s on the proof from [9] in this paper . Original papers are a va ilable on line and are recommended to the interested reader . 2. The Model W e provi de the general underlying model. Here we mostly follo w the notati on of [7 , 12]. Let S be a stream of size m with ele ments a 1 , . . . , a m , wher e a i ≡ ( a 1 i , . . . , a k i ) ∈ [ n ] k . (When we ha ve a sequence of elements that are themselve s vecto rs, we denote the sequenc e number by a subscript and the vect or entry by a superscript when both are needed.) The stream S define s an empirica l distrib ution ov er [ n ] k as fo llo ws: the freq uenc y f ( ω ) of an element ω ∈ [ n ] k is de fined as the number of times it appears in S , and the empiric al distrib ution is Pr[ ω ] = f ( ω ) m for any ω ∈ [ n ] k . Since ω = ( ω 1 , . . . , ω k ) is a vecto r of size k , w e may also vie w the streaming data as defining a joint distrib ution o ver the random v ariables X 1 , . . . , X k corres ponding to the v alues in each di- mension . (In the case of k = 2 , we write the ran dom var iables as X and Y rather than X 1 and X 2 .) There is a natura l way of defining mar ginal distrib ution for the random variab le X i : for ω i ∈ [ n ] , let f i ( ω i ) be the number of times ω i appear s in the i th coo rdinate of an element of S , or f i ( ω i ) =   { a j ∈ S : a i j = ω i }   . The empirica l mar ginal distrib ution Pr i [ · ] for the i th coordi nate is defined as Pr i [ ω i ] = f i ( ω i ) m for any ω i ∈ [ n ] . Next le t ~ v be the vector in R [ n ] k with ~ v ω = Pr[ ω ] − Q 1 ≤ i ≤ k Pr i [ ω i ] for all ω ∈ [ n ] k . Our goal is to approx imate the v alue k ~ v k ≡   X ω ∈ [ n ] k       Pr[ ω ] − Y 1 ≤ i ≤ k Pr i [ ω i ]       2   1 2 . (2.1) This repres ent th e ℓ 2 norm between the tensor of the margi nal distrib utions a nd the joint distrib ution, which we would ex pect to be close to zero in the case where the X i were truly indepe ndent. Finally , our algorithms will assume the av ailability of 4-wise independen t hash functions. For more on 4-wis e independe nce, includin g ef ficient implementat ions, see [2, 16]. For the purp oses of this paper , the follo wing simple definition will suffice . 122 V . BRA VE RMAN, K. CHUNG, Z. LIU, M. MITZENMACHER, AND R. OSTR O VSKY Definition 2.1. (4- wise independen ce) A f amily of has h func tions H with domain [ n ] and range {− 1 , 1 } is 4-wise independen t if for any distinct value s i 1 , i 2 , i 3 , i 4 ∈ [ n ] and any b 1 , b 2 , b 3 , b 4 ∈ {− 1 , 1 } , the followin g equality holds , Pr h ←H [ h ( i 1 ) = b 1 , h ( i 2 ) = b 2 , h ( i 3 ) = b 3 , h ( i 4 ) = b 4 ] = 1 / 16 . Remark 2.2. In [12], the family of 4-wise indepen dent hash function s H is called 4-wise indepen- dent random vectors. For consist encies with in our paper , we will alwa ys view the object H as a hash funct ion famil y . 3. The Algorithm and its Analysis for k = 2 W e begin b y r ev ie wing the approximati on algorith m and a ssociat ed proof fo r the ℓ 2 norm giv en in [12 ]. Revi e wing this result will allow us to prov ide the necessary notation and frame the setting for our exten sion to general k . Moreo ver , in our proof, we find that a constant in L emma 3.1 from [12] that we subsequent ly generaliz e appears incorre ct. (Because of this, our proof is slightly dif ferent and more detailed than the original.) Although the error is minor in the context of their paper (it only affect s the con stant f actor in the order notation), it becomes more important w hen consid ering the proper genera lization to lar ger k , and hence it is useful to correct here. In th e case k = 2 , we as sume tha t the sequence ( a 1 1 , a 2 1 ) , ( a 1 2 , a 2 2 ) , . . . , ( a 1 m , a 2 m ) arri ves an item by an item. Each ( a 1 i , a 2 i ) (for 1 ≤ i ≤ m ) is an element in [ n ] 2 . The random va riables X and Y ov er [ n ] can be expresse d as follo ws:    Pr[ i, j ] = Pr[ X = i, Y = j ] = |{ ℓ : ( a 1 ℓ , a 2 ℓ ) = ( i, j ) }| / m Pr 1 [ i ] = Pr[ X = i ] = |{ ℓ : ( a 1 ℓ , a 2 ℓ ) = ( i, · ) }| /m Pr 2 [ j ] = Pr[ Y = j ] = |{ ℓ : ( a 1 ℓ , a 2 ℓ ) = ( · , j ) }| / m. W e simplify the notation and use p i ≡ Pr [ X = i ] , q j ≡ Pr[ Y = j ] , r i,j = Pr[ X = i, Y = j ] . and s i,j = Pr[ X = i ] Pr [ Y = j ] . Indyk and McGregor’ s algorithm proceeds in a similar fashion to the streaming algorithm pre- sented in [4]. Specifically let s 1 = 72 ǫ − 2 and s 2 = 2 log(1 /δ ) . The alg orithm computes s 2 random v ariable s Y 1 , Y 2 , . . . , Y s 2 and outputs their median. The outpu t is the algorithm’ s estimate on the norm of v defined in Equation 2.1. E ach Y i is the a vera ge of s 1 random varia bles Y ij : 1 ≤ j ≤ s 1 , where Y ij are in depend ent, identically distrib uted random v ariable s. Each of the v ariabl es D = D ij can be comput ed from the algori thmic routin e shown in Figure 1. 2 - D A P P R O X I M A T I O N  ( a 1 1 , a 2 1 ) , . . . , ( a 1 m , a 2 m )  1 Indepe ndently genera te 4-wise indepe ndent rando m functions h 1 , h 2 from [ n ] to {− 1 , 1 } . 2 f or c ← 1 to m 3 do Let the c th item ( a 1 c , a 2 c ) = ( i, j ) 4 t 1 ← t 1 + h 1 ( i ) h 2 ( j ) , t 2 ← t 2 + h 1 ( i ) , t 3 ← t 3 + h 2 ( j ) . 5 Return Y = ( t 1 /m − t 2 t 3 /m 2 ) 2 . Figure 1: The procedure for generatin g random var iable Y for k = 2 . By the end of the process 2 - D A P P R O X I M A T I O N , we hav e t 1 /m = P i,j ∈ [ n ] h 1 ( i ) h 2 ( j ) r i,j , t 2 /m = P i ∈ [ n ] h 1 ( i ) p i , and t 3 /m = P i ∈ [ n ] h 2 ( i ) q i . A lso, w hen a v ector is in R ( n 2 ) , its indices ca n be repres ented by ( i 1 , i 2 ) ∈ [ n ] 2 . In what follo ws, we will use a bold letter to represent the index of a AMS WITHOUT 4 -WISE INDEPENDENCE ON PR ODUCT DOMAINS 123 high dimensi onal vect or , e.g., v i ≡ v i 1 ,i 2 . The follo wing Lemma sho ws that the expecta tion of Y is k v k 2 and the v ariance of Y is at most 8(E[ Y ]) 2 becaus e E[ Y 2 ] ≤ 9E[ Y ] 2 . Lemma 3.1. ( [12] ) L et h 1 , h 2 be two indepen dent instan ces of 4-wise independe nt hash functions fr om [ n ] to {− 1 , 1 } . L et v ∈ R n 2 and H ( i )( ≡ H  ( i 1 , i 2 )  = h 1 ( i i ) · h 2 ( i 2 ) . Let us define Y =  P i ∈ [ n ] 2 H ( i ) v i  2 . Then E[ Y ] = P i ∈ [ n ] 2 ~ v 2 i and E[ Y 2 ] ≤ 9(E[ Y ]) 2 , whic h implies V ar [ Y ] ≤ 8 E 2 [ Y ] . Pr oof. W e ha ve E[ Y ] = E[( P i H ( i ) ~ v i ) 2 ] = P i ~ v 2 i E[ H 2 ( i )] + P i 6 = j ~ v i ~ v j E[ H ( i ) H ( j )] . For all i ∈ [ n ] 2 , we kno w h 2 ( i ) = 1 . O n the other hand, H ( i ) H ( j ) ∈ {− 1 , 1 } . The probab ility that H ( i ) H ( j ) = 1 is Pr[ H ( i ) H ( j ) = 1] = Pr[ h 1 ( i 1 ) h 1 ( j 1 ) h 2 ( i 2 ) h 2 ( j 2 ) = 1] = 1 / 16 +  4 2  1 / 16 + 1 / 16 = 1 / 2 . The last equality holds is because h 1 ( i 1 ) h 1 ( j 1 ) h 2 ( i 2 ) h 2 ( j 2 ) = 1 is equi v alent to saying either all these v ariables are 1, or exact ly two of these vari ables are -1, or all these va riables are -1. Therefore , E[ h ( i ) h ( j )] = 0 . Consequen tly , E[ Y ] = P i ∈ [ n ] 2 ( ~ v i ) 2 . No w we bound the va riance. Recall that V ar[ Y ] = E[ Y 2 ] − E [ Y ] 2 , we bound E[ Y 2 ] = X i , j , k , l ∈ [ n ] 2 E[ H ( i ) H ( j ) H ( k ) h ( l )] ~ v i ~ v j ~ v k ~ v l ≤ X i , j , k , l ∈ [ n ] 2 | E[ H ( i ) H ( j ) H ( k ) H ( l )] | ·| ~ v i ~ v j ~ v k ~ v l | . Also | E[ H ( i ) H ( j ) H ( k ) H ( l )] | ∈ { 0 , 1 } . The quan tity E[ H ( i ) H ( j ) H ( k ) H ( l )] 6 = 0 if and only if the follo wing relati on holds, ∀ s ∈ [2] : (( i s = j s ) ∧ ( k s = l s )) ∨ (( i s = k s ) ∧ ( j s = l s )) ∨ (( i s = l s ) ∧ ( k s = j s )) . (3.1) Denote the set of 4-tuples ( i , j , k , l ) that satisfy the abo ve relatio n by D . W e may also view each 4-tupl e as an o rdered set tha t cons ists of 4 po ints in [ n ] 2 . Consider th e unique smalle st axe s-parall el rectan gle in [ n ] 2 that contai ns a giv en 4-tuple in D (i.e. contains the four ordere d points). Note this could either be a (dege nerate) line segment or a (non-de generate) rec tangle, as we discuss below . Let M : D → { A, B , C, D } be the function that m aps an element σ ∈ D to the smallest rectan- gle AB C D defined by σ . Sinc e a rectangle can be uniquely determine d by its diagonals , we may write M : D → ( χ 1 , χ 2 , ϕ 1 , ϕ 2 ) , where χ 1 ≤ χ 2 ∈ [ n ] , ϕ 1 ≤ ϕ 2 ∈ [ n ] and the corresp onding rectan gle is understoo d to be the one with diag onal { ( χ 1 , ϕ 1 ) , ( χ 2 , ϕ 2 ) } . Also, the in ver se function M − 1 ( χ 1 , χ 2 , ϕ 1 , ϕ 2 ) repres ents the pre-images of ( χ 1 , χ 2 , ϕ 1 , ϕ 2 ) in D . ( χ 1 , χ 2 , ϕ 1 , ϕ 2 ) is de gen- erate if either χ 1 = χ 2 or ϕ 1 = ϕ 2 , in which case the rectangle (and its diagonal s) correspond to the segme nt itself, or χ 1 = χ 2 and ϕ 1 = ϕ 2 , and the rectangl e is just a single point. Example 3.2. Let i = (1 , 2) , j = (3 , 2) , k = (1 , 5) , and l = (3 , 5) . The tuple is in D and its correspon ding bound ing rec tangle is a non-de generate rec tangle. The functio n M ( i , j , k , l ) = (1 , 3 , 2 , 5) . Example 3.3. Let i = j = (1 , 4) and k = l = (3 , 7) . The tuple is also in D and minimal bound- ing rect angle formed by these points is an interv al { (1 , 4) , ( 3 , 7) } . The functio n M ( i , j , k , l ) = (1 , 3 , 4 , 7) . T o start we consider the non -deg enerate case s. Fix any ( χ 1 , χ 2 , ϕ 1 , ϕ 2 ) with χ 1 < χ 2 and φ 1 < φ 2 . There are in total  4 2  2 = 36 tuple s ( i , j , k , l ) in D w ith M ( i , j , k , l ) = ( χ 1 , χ 2 , ϕ 1 , ϕ 2 ) . T wenty- four of these tuples corresp ond to the setting where none of i , j , k , l are equal, as there are twenty- four permuta tions of the assignment of the labels i , j , k , l to the four points. (This correspond s to the fi rst exa mple). In this case the four points form a rectangle , and we hav e | ~ v i ~ v j ~ v k ~ v l | ≤ 1 2 (( ~ v χ 1 ,ϕ 1 ~ v χ 2 ,ϕ 2 ) 2 + ( ~ v χ 1 ,ϕ 2 ~ v χ 2 ,ϕ 1 ) 2 ) . Intuiti vely , in these cases, we assign the “weight ” of the tuple to the diagon als. 124 V . BRA VE RMAN, K. CHUNG, Z. LIU, M. MITZENMACHER, AND R. OSTR O VSKY The remaining twelve tuples in M − 1 ( χ 1 , χ 2 , ϕ 1 , ϕ 2 ) corresp ond to interv als. (This corre- spond s to the second exa mple.) In this case two of i , j , k , l correspon d to one endpoi nt of the inter - v al, and the other two labels correspond to the other endpoint. Hen ce we hav e either | ~ v i ~ v j ~ v k ~ v l | = ( ~ v χ 1 ,ϕ 1 ~ v χ 2 ,ϕ 2 ) 2 or | ~ v i ~ v j ~ v k ~ v l | = ( ~ v χ 1 ,ϕ 2 ~ v χ 2 ,ϕ 1 ) 2 , and there are six tuples for each case. Therefore for any χ 1 < χ 2 ∈ [ n ] and ϕ 1 < ϕ 2 ∈ [ n ] we hav e: X ( i , j , k , l ) ∈ M − 1 ( χ 1 ,χ 2 ,ϕ 1 ,ϕ 2 ) | v i v j v k v l | ≤ 18(( v χ 1 ,ϕ 1 v χ 2 ,ϕ 2 ) 2 + ( v χ 1 ,ϕ 2 , v χ 2 ,ϕ 1 ) 2 ) . The analysis is similar for the deg enerate cases, where the constant 18 in the bound above is no w quite loose. When exactly one of χ 1 = χ 2 or ϕ 1 = ϕ 2 holds, the size of M − 1 ( χ 1 , χ 2 , ϕ 1 , ϕ 2 ) is  4 2  = 6 , and t he resulti ng interv als cor respon d to ver tical o r horizontal line s. When both χ 1 = χ 2 and ϕ 1 = ϕ 2 , then | M − 1 ( χ 1 , χ 2 , ϕ 1 , ϕ 2 ) | = 1 . In sum, we ha ve Follo wing the s ame analysi s as for the non-de generate cases, we find X i , j , k , l ∈D | ~ v i ~ v j ~ v k ~ v l | = X χ 1 ≤ χ 2 ϕ 1 ≤ ϕ 2 X ( i , j , k , l ) ∈ M − 1 ( χ 1 ,χ 2 ,ϕ 1 ,ϕ 2 ) | ~ v i ~ v j ~ v k ~ v l | ≤ X χ 1 <χ 2 ϕ 1 <ϕ 2 18(( ~ v χ 1 ,ϕ 1 ~ v χ 2 ,ϕ 2 ) 2 + ( ~ v χ 1 ,ϕ 2 ~ v χ 2 ,ϕ 1 ) 2 ) + X χ 1 = χ 2 ϕ 1 <ϕ 2 6(( ~ v χ 1 ,ϕ 1 ~ v χ 2 ,ϕ 2 ) 2 + ( ~ v χ 1 ,ϕ 2 ~ v χ 2 ,ϕ 1 ) 2 ) + X χ 1 <χ 2 ϕ 1 = ϕ 2 6(( ~ v χ 1 ,ϕ 1 ~ v χ 2 ,ϕ 2 ) 2 + ( ~ v χ 1 ,ϕ 2 ~ v χ 2 ,ϕ 1 ) 2 ) + X χ 1 = χ 2 ϕ 1 = ϕ 2 ( ~ v χ 1 ,ϕ 1 ~ v χ 2 ,ϕ 2 ) 2 ≤ 9 X i ∈ [ n ] 2 j ∈ [ n ] 2 ( ~ v i ~ v j ) 2 = 9E 2 [ Y ] . Finally , we hav e P i , j , k , l ∈ [ n ] 2 | E[ H ( i ) H ( j ) H ( k ) H ( l )] | · | ~ v i ~ v j ~ v k ~ v l | ≤ P i , j , k , l ∈D | ~ v i ~ v j ~ v k ~ v l | ≤ 9E 2 [ Y ] and V ar[ Y ] ≤ 8E[ Y ] 2 . W e emphas ize the geo metric interpretat ion of the abov e proof as follo ws. The goal is to bound the v ariance by a con stant times E 2 [ Y ] = P i , j ∈ [ n ] 2 ( ~ v i v j ) 2 , wher e the index s et is th e se t of all poss i- ble lines in plane [ n ] 2 (each line appears twice). W e first sho w that V ar[ Y ] ≤ P i , j , k , l ∈D | ~ v i ~ v j ~ v k ~ v l | , where the 4-tuple in dex set corres ponds to a set of recta ngles in a natural way . The main idea of [12] is to use inequa lities of the form | ~ v i ~ v j ~ v k ~ v l | ≤ 1 2 (( ~ v χ 1 ,ϕ 1 ~ v χ 2 ,ϕ 2 ) 2 + ( ~ v χ 1 ,ϕ 2 ~ v χ 2 ,ϕ 1 ) 2 ) to assign the “weight” of each 4 -tuple to the diagon als of the co rrespon ding rectan gle. T he ab ov e analys is sho ws that 18 copies of all lines are sufficien t to accommodate all 4-tu ples. While similar inequ alities could also assign the weight of a 4 -t uple to the v ertical or hori zontal edge s of the c orrespo nding rectangl e, using vertica l or horizo ntal edges is problemat ic. The reason is that there are Ω( n 4 ) 4 -tuples but only O ( n 3 ) vertical or horizonta l edges, so some lines would receiv e Ω( n ) weight, requiring Ω( n ) copies . This problem is already noted in [7]. Our bound he re is E[ Y 2 ] ≤ 9E 2 [ Y ] , while in [1 2] the bound obtained is E[ Y 2 ] ≤ 3E 2 [ Y ] . There appears to hav e been an error in the deri vatio n in [12]; so me intuition comes from the fol- lo wing example. W e note that |D | is at least  4 2  2 ·  n 2  2 = 9 n 4 − 9 n 2 . (This counts the number of non-de generate 4 -tuples.) Now if we set v i = 1 for all 1 ≤ i ≤ n 2 , we hav e E[ Y 2 ] ≥ |D | = 9 n 4 − 9 n 2 ∼ 9E 2 ( D ) , which suggests V ar[ D ] > 3E 2 [ D ] . Again, we emp hasize this di screpan cy is of lit tle importance to [1 2]; the point th ere is that t he v arianc e is b ounded by a constant factor times AMS WITHOUT 4 -WISE INDEPENDENCE ON PR ODUCT DOMAINS 125 the square of the e xpecta tion. It is here, where we are generali zing to k ≥ 3 , that the exa ct constant fact or is of some importance. Giv en the bounds on the expe ctation and v ariance for the D i,j , standa rd techniques yield a bound on the perfo rmance of our algori thm. Theor em 3.4 . F or e very ǫ > 0 and δ > 0 , ther e exi sts a ran domized a lgorithm that compute s, given a sequence ( a 1 1 , a 2 1 ) , . . . , ( a 1 m , a 2 m ) , in o ne pass an d u sing O ( ǫ − 2 log 1 δ (log m + log n )) memory bits, a number Med so that the pr obabi lity Med devia tes fr om k v k 2 by mor e than ǫ is at most δ . Pr oof. Reca ll the algo rithm describe d in the beginnin g of Sectio n 3: let s 1 = 72 ǫ − 2 and s 2 = 2 log δ . W e first computes s 2 random variab les Y 1 , Y 2 , . . . , Y s 2 and outp uts their median Med , where each Y i is the a vera ge of s 1 random varia bles Y ij : 1 ≤ j ≤ s 1 and Y ij are independe nt, identicall y distrib uted random va riables computed by Figure 1 . By Chebyshe v’ s inequal ity , w e kno w that for any fixe d i , Pr    Y i − k ~ v k    ≥ ǫ k ~ v k ] ≤ V ar ( Y i ) ǫ 2 k ~ v k 2 = (1 /s 1 )V ar [ Y ] ǫ 2 k ~ v k 2 = (9 ǫ 2 / 72) k ~ v k 2 ǫ 2 k ~ v k 2 = 1 8 . Finally , by standard Chernof f bound ar guments (see for ex ample Chapter 4 of [14]), the prob ability that more than s 2 / 2 of the varia bles Y i de viate by mo re t han ǫ k ~ v k from k ~ v k is at most δ . In case this does not hap pen, the median Med supplies a good estimate to the required qua ntity k ~ v k as ne eded. 4. The General Case k ≥ 3 No w let us move to the g eneral case where k ≥ 3 . Recall that ~ v is a vect or i n R n k that maintains certain statistics of a data stream, and we are interested in estimating its ℓ 2 norm k ~ v k . There is a natura l ge neraliz ation for Indyk and McGregor’ s method for k = 2 to construc t an estimator for k ~ v k : let h 1 , . . . , h k : [ n ] → {− 1 , 1 } be independe nt copies of 4-wise independe nt hash functio ns (namely , h i (1) , . . . , h i ( n ) ∈ {− 1 , 1 } are 4 -wise independe nt hash functio ns for each i ∈ [ k ] , and h 1 ( · ) , . . . , h k ( · ) are mutually independe nt.). Let H ( p ) = Q k i =1 h j ( p j ) . The estimator Y is defined as Y ≡  P p ∈ [ n ] k ~ v p H ( p )  2 . Our goa l is to sho w that E[ Y ] = k ~ v k 2 and V ar[ Y ] is reason ably small so that a streaming algori thm maintaini ng multiple independe nt instances of estimator Y will be able to output an ap- proximat ely correct estimat ion of k ~ v k with high probability . Notice that when k ~ v k represe nts the ℓ 2 distan ce between the joint distrib ution an d the tensor s of the marg inal distrib utions, the estimator can be computed efficien tly in a streaming model similarly to as in Figure 1. W e stress tha t ou r result is applica ble to a broader cl ass of ℓ 2 -norm estimation problems, as long as the v ector ~ v to be estimated has a corresp onding efficientl y computable estimator Y in an approp riate streaming model. Formally , we shall prov e the follo wing m ain lemma in the nex t subsectio n. Lemma 4.1. Let ~ v be a vector in R n k , and h 1 , . . . , h k : [ n ] → {− 1 , 1 } be independe nt copies of 4-wise independen t hash functions. Define H ( p ) = Q k i =1 h j ( p j ) , and Y ≡  P p ∈ [ n ] k ~ v p H ( p )  2 . W e ha ve E [ Y ] = || ~ v || and V ar[ Y ] ≤ 3 k E[ Y ] 2 . W e remark that the bou nd on the varian ce in the above lemma is tigh t. One can verify that when the vecto r ~ v is a uniform ve ctor (i.e., all entries of ~ v are the same), the var iance of Y is Ω(3 k E [ Y ] 2 ) . W ith the ab ov e lemma, the followin g main theorem mentio ned in the introdu ction immediatel y follo ws by a standard argumen t presented in the proof of Theorem 3.4 in the previo us sectio n. 126 V . BRA VE RMAN, K. CHUNG, Z. LIU, M. MITZENMACHER, AND R. OSTR O VSKY Theor em 4.2. Let ~ v be a ve ctor in R [ n ] k that maint ains an ar bitr ary statistic s in a data str eam of size m , in which every item is fr om [ n ] k . Let ǫ, δ ∈ (0 , 1) be r eal number s. If ther e e xists an algori thm that m aintain s an instance of Y using O ( µ ( n, m, k , ǫ , δ )) memory bits, then ther e exis ts an algorit hm Λ such that: (1) W ith pr obability ≥ 1 − δ the algorithm Λ output s a v alue b etween [(1 − ǫ ) k ~ v k 2 , (1 + ǫ ) k ~ v | 2 ] and (2) the sp ace complex ity of Λ is O (3 k 1 ǫ 2 log 1 δ µ ( n, m, k, ǫ, δ )) . As discuss ed abo ve, an immed iate corollary is the e xistence o f a one-pass s pace ef ficient streamin g algorithm to detec t the dependenc y of k random v ariable s in ℓ 2 -norm: Cor ollary 4.3. F or ever y ǫ > 0 and δ > 0 , th er e e xists a randomize d alg orithm that compu tes, given a sequ ence a 1 , . . . , a m of k -tuples , in one pass and using O (3 k ǫ − 2 log 1 δ (log m + log n )) memory bits, a number Y so that the pr obability Y deviates fr om the squar e of the ℓ 2 distan ce between pr oduct and joint distrib ution by mor e than a facto r of (1 + ǫ ) is at most δ . 4.1. Analysis of the Sketch Y This se ction is de vo ted to pro ve Lemma 4.1, w here the main challe nge is to bo und the v ariance of Y . The geometric approach of Indyk and McGregor [12] presented in S ection 3 for the case of k = 2 can be exten ded to analyz e the general case. Howe ver , we remark that the generalizat ion requir es new ideas. In particular , instead of performing “local analysi s” that maps each rectang le to its diagona ls, a more complex “global analysis” is needed in higher dimensio ns to achie ve the desire d bounds. The alternati ve proof w e present here utilizes similar ideas, bu t relies on a more combina torial rather than geometri c appro ach. For the e xpectation of Y , we hav e E[ Y ] = E   X p , q ∈ [ n ] k ~ v p · ~ v q · H ( p ) · H ( q )   = X p ∈ [ n ] k ~ v 2 p · E  H ( p ) 2  + X p 6 = q ∈ [ n ] k ~ v p · ~ v q · E [ H ( p ) H ( q )] = X p ∈ [ n ] k ~ v 2 p = || ~ v || 2 , where the last equalit y follows by H ( p ) 2 = 1 , and E [ H ( p ) H ( q )] = 0 for p 6 = q . No w , let us star t to prov e V ar[ Y ] ≤ 3 k E[ Y ] 2 . By definition, V ar [ Y ] = E[( Y − E[ Y ]) 2 ] , so we need to unders tand the following rando m v ariable: E rr ≡ Y − E[ Y ] = X p 6 = q ∈ [ n ] k H ( p ) H ( q ) ~ v p ~ v q . (4.1) The random va riable E r r is a sum of terms inde xed by pairs ( p , q ) ∈ [ n ] k × [ n ] k with p 6 = q . A t a very high lev el, our analys is consists of two steps. In the first step, we group the terms in E r r proper ly and simplify the summation in each group . In the second step, we expand the square of the sum in V ar[ Y ] = E[ E rr 2 ] according to th e groups and apply Cauchy-Schw artz inequa lity three times to bound the v ariance . W e shall now gradua lly intro duce the necessary notation for gro uping th e terms in E r r and simplify ing the summation. W e remind the reader that vectors ove r the reals (e.g., ~ v ∈ R n k ) are AMS WITHOUT 4 -WISE INDEPENDENCE ON PR ODUCT DOMAINS 127 denote d by ~ v , ~ w , ~ r , and vectors o ver [ n ] are de noted by p , q , a , b , c , d and referr ed as in de x vector s . W e use S ⊆ [ k ] to denote a subset of [ k ] , and let ¯ S = [ k ] \ S . W e use Ham ( p , q ) to denote the Hamming distance of index vectors p , q ∈ [ n ] k , i.e., the number of coordinates where p and q are dif ferent. Definition 4.4. (Pr ojection and in verse pr ojection) Let c ∈ [ n ] k be an index vector and S ⊆ [ k ] a subset . W e define the pr ojection of c to S , denoted by Φ S ( c ) ∈ [ n ] | S | , to be the vector c restricted to the coo rdinate s in S . Also, let a ∈ [ n ] | S | and b ∈ [ n ] k −| S | be index vectors . W e define t he in ver se pr ojectio n of a and b w ith r espect to S , denoted by Φ − 1 S ( a , b ) ∈ [ n ] k , as the inde x vector c ∈ [ n ] k such that Φ S ( c ) = a and Φ ¯ S ( c ) = b . W e next define pa ir gr oups and use the definition to group the terms in E r r . Definition 4.5. (P air Gr oup) L et S ⊆ [ k ] be a subset of size | S | = t . L et c , d ∈ [ n ] t be a pair of inde x vectors with Ham( c , d ) = t (i.e., all coordi nates of c and d are distinct.). The pair gr oup σ S ( c , d ) is the set of pairs ( p , q ) ∈ [ n ] k × [ n ] k such that (i) on coo rdinate S , Φ S ( p ) = c and Φ S ( q ) = d , and (ii) on coordina te ¯ S , p and q are the same, i.e., Φ ¯ S ( p ) = Φ ¯ S ( q ) . Namely , σ S ( c , d ) = n ( p , q ) ∈ [ n ] k × [ n ] k :  c = Φ S ( p )  ∧  d = Φ S ( q )  ∧  Φ ¯ S ( p ) = Φ ¯ S ( q ) o . (4.2) T o giv e some intuiti on for the abov e definitions, w e note that for ev ery a ∈ [ n ] | ¯ S | , there is a uniqu e pair ( p , q ) ∈ σ S ( c , d ) with a = Φ ¯ S ( p ) = Φ ¯ S ( q ) , and so | σ S ( c , d ) | = n | ¯ S | . On the other hand, for ev ery pair ( p , q ) ∈ [ n ] k × [ n ] k with p 6 = q , there is a unique non-emtpy S ⊆ [ k ] such that p and q are distinct on exa ctly coordinates in S . Therefore, ( p , q ) belongs to exactly one pair group σ S ( c , d ) . It follo w s tha t we can partition the su mmation in E r r according to the pair grou ps: E rr = X S ⊆ [ k ] S 6 = ∅ X c , d ∈ [ n ] | S | , Ham( c , d )= | S | X ( p , q ) ∈ σ S ( c , d ) H ( p ) H ( q ) ~ v p ~ v q . (4.3) W e next observe that for any pair ( p , q ) ∈ σ S ( c , d ) , since p and q agree on coord inates in ¯ S , the v alue of the produ ct H ( p ) H ( q ) depends only on S , c and d . More precisel y , H ( p ) H ( q ) = Y i ∈ [ k ] h i ( p i ) h i ( q i ) = Y i ∈ S h i ( p i ) h i ( q i ) ! ·   Y i ∈ ¯ S h i ( p i ) 2   = Y i ∈ S h i ( p i ) h i ( q i ) , which depends only on S , c and d since Φ S ( p ) = c and Φ S ( q ) = d . This motiv ates the definition of pr oject ed hashin g . Definition 4.6. (Pr ojected hashing) Let S = { s 1 , s 2 , . . . , s t } be a subset of [ k ] , where s 1 < s 2 < · · · < s j . Let c ∈ [ n ] t . W e define the pr ojecte d hashing H S ( c ) = Q i ≤ t h s i ( c i ) . W e can no w translate the random vari able E r r as follo ws: E rr = X S ⊆ [ k ] S 6 = ∅ X c , d ∈ [ n ] | S | , Ham( c , d )= | S |     H S ( c ) H S ( d ) X ( p , q ) ∈ σ S ( c , d ) ~ v p ~ v q     . (4.4) Fix a pair gro up σ S ( c , d ) , we n ext consider the s um P ( p , q ) ∈ σ S ( c , d ) ~ v p ~ v q . Recall that fo r e very a ∈ [ n ] | ¯ S | , there is a unique pair ( p , q ) ∈ σ S ( c , d ) w ith a = Φ ¯ S ( p ) = Φ ¯ S ( q ) . The sum can be 128 V . BRA VE RMAN, K. CHUNG, Z. LIU, M. MITZENMACHER, AND R. OSTR O VSKY vie w ed as the inn er product of two v ectors of dimensio n n | ¯ S | with entrie s index ed by a ∈ [ n ] | ¯ S | . T o formalize this obser va tion, we introd uce the definitio n of hype r- pr ojection as follo w s. Definition 4.7. (Hyper -pr ojectio n) L et ~ v ∈ R n k , S ⊆ [ k ] , and c ∈ [ n ] | S | . The hyper -pr ojection Υ S, c ( ~ v ) of ~ v (with respect to S and c ) is a vecto r ~ w = Υ S, c ( ~ v ) in R [ n ] k −| S | such that ~ w d = ~ v Φ − 1 S ( c , d ) for all d ∈ [ n ] k −| S | . Using the abov e definition, w e contin ue to re w rite the E r r as E rr = X S ⊆ [ k ] S 6 = ∅ X c , d ∈ [ n ] | S | , Ham( c , d )= | S | H S ( c ) H S ( d ) · h Υ S, c ( ~ v ) , Υ S, d ( ~ v ) i . (4.5) Finally , we consid er the produc t H S ( c ) H S ( d ) again and introduce the follo wing definition to furthe r simplify the E rr . Definition 4.8. (Simila rity and dominance) Let t be a positi ve integer . • T wo pairs of inde x vectors ( c , d ) ∈ [ n ] t × [ n ] t and ( a , b ) ∈ [ n ] t × [ n ] t are similar if for all i ∈ [ t ] , the two sets { c i , d i } and { a i , b i } are equal. W e denote this as ( a , b ) ∼ ( c , d ) . • Let c and d ∈ [ n ] t be two index vec tors. W e say c is dominated by d if c i < d i for all i ∈ [ t ] . W e denote this as c ≺ d . Note that c ≺ d ⇒ Ham( c , d ) = t . No w , note that if ( a , b ) ∼ ( c , d ) , th en H S ( a ) H S ( b ) = H S ( c ) H S ( d ) since the v alue of the produ ct H S ( c ) H S ( d ) depends on the v alues { c i , d i } only as a set. It is also not hard to see that ∼ is an equi valen ce relatio n, and for ev ery equi valen t class [( a , b )] , there is a unique ( c , d ) ∈ [( a , b )] with c ≺ d . Therefore , we can furth er re write the E rr as E rr = X S ⊆ [ k ] S 6 = ∅ X c ≺ d ∈ [ n ] | S | H S ( c ) H S ( d ) ·   X ( a , b ) ∼ ( c , d ) h Υ S, a ( ~ v ) , Υ S, b ( ~ v ) i   . (4.6) W e are rea dy to bound the term E[ E r r 2 ] by e xpanding the square of the sum accor ding to Equation (4.6). W e first sho w in L emma 4.9 belo w that all the cross terms i n the followin g e xpansion v anish. V ar [ Y ] = X S,S ′ ⊆ [ k ] S,S ′ 6 = ∅ X c ≺ d ∈ [ n ] | S | c ′ ≺ d ′ ∈ [ n ] | S | ′ E[ H S ( c ) H S ( d ) H S ′ ( c ′ ) H S ′ ( d ′ )] ·     X ( a , b ) ∼ ( c , d ) h Υ S, a ( ~ v ) , Υ S, b ( ~ v ) i     X ( a ′ , b ′ ) ∼ ( c ′ , d ′ ) h Υ S ′ , a ′ ( ~ v ) , Υ S ′ , b ′ ( ~ v ) i     . (4.7) Lemma 4 .9. Let S and S ′ be su bsets of [ k ] , and c ≺ d ∈ [ n ] | S | and c ′ ≺ d ′ ∈ [ n ] | S ′ | inde x vectors. W e ha ve E [ H S ( c ) H S ( d ) H S ′ ( c ′ ) H S ′ ( d ′ )] ∈ { 0 , 1 } . Furthermor e, we have E[ H S ( c ) H S ( d ) H S ′ ( c ′ ) H S ′ ( d ′ )] = 1 iff ( S = S ′ ) ∧ ( c = c ′ ) ∧ ( d = d ′ ) . Pr oof. Reca ll that h 1 , . . . , h k are indepen dent copies of 4 -wise independen t uniform random v ari- ables ov er {− 1 , 1 } . Namely , for ev ery i ∈ [ k ] , h i (1) , . . . , h i ( n ) are 4 -wise indepe ndent, and h 1 ( · ) , . . . , h k ( · ) are m utuall y indepen dent. Observ e that for ev ery i ∈ [ k ] , there are at most 4 terms out of h i (1) , . . . , h i ( n ) appearing in the product H S ( c ) H S ( d ) H S ′ ( c ′ ) H S ′ ( d ′ ) . It follo ws that all distinct terms ap pearing in H S ( c ) H S ( d ) H S ′ ( c ′ ) H S ′ ( d ′ ) are mutual ly indepe ndent uniform AMS WITHOUT 4 -WISE INDEPENDENCE ON PR ODUCT DOMAINS 129 random vari able o ver {− 1 , 1 } . Therefo re, the expectati on is ei ther 0, if there is some h i ( j ) that appear s an odd number of times, or 1, if all h i ( j ) appea r an e ven number of times. By inspection, the latter case happen s if and only if ( S = S ′ ) ∧ ( c = c ′ ) ∧ ( d = d ′ ) . By the abov e lemma, Equatio n (4.1 ) is simplified to V ar [ Y ] = X S ⊆ [ k ] S 6 = ∅ X c ≺ d ∈ [ n ] | S |   X ( a , b ) ∼ ( c , d ) h Υ S, a ( ~ v ) , Υ S, b ( ~ v ) i   2 . (4.8) W e next apply the Cauchy-Sch wartz inequalit y three times to bound the abov e formula. Con- sider a subset S ⊆ [ k ] and a pair c ≺ d ∈ [ n ] | S | . Note that there are precisely 2 | S | pairs ( a , b ) such that ( a , b ) ∼ ( c , d ) . Thus, by the Cauchy-Schw artz inequality :      X ( a , b ) ∈ [ n ] | S | ( a , b ) ∼ ( c , d ) h Υ S, a ( ~ v ) , Υ S, b ( ~ v ) i      2 ≤ 2 | S | X ( a , b ) ∈ [ n ] | S | ( a , b ) ∼ ( c , d ) ( h Υ S, a , Υ S, b i ) 2 ≤ 2 | S | X ( a , b ) ∈ [ n ] | S | ( a , b ) ∼ ( c , d ) h Υ S, a ( ~ v ) , Υ S, a ( ~ v ) i · h Υ S, b , Υ S, b ( ~ v ) i . Notice that in the second ine quality , we ap plied Cauchy-Sch wartz in a compon ent-wise manner . Next, for a su bset S ⊆ [ k ] , we can apply the Cauchy-Sch wartz inequal ity a third time (from the third line to the fourth line) as follo ws: X c ≺ d ∈ [ n ] | S |      X ( a , b ) ∈ [ n ] | S | ( a , b ) ∼ ( c , d ) h Υ S, a ( ~ v ) , Υ S, b ( ~ v ) i      2 ≤ 2 | S | X c ≺ d ∈ [ n ] | S | X ( a , b ) ∈ [ n ] | S | ( a , b ) ∼ ( c , d ) h Υ S, a ( ~ v ) , Υ S, a ( ~ v ) i · h Υ S, b ( ~ v ) , Υ S, b ( ~ v ) i = 2 | S | X c , d ∈ [ n ] | S | Ham( c , d )= | S | h Υ S, c ( ~ v ) , Υ S, c ( ~ v ) i · h Υ S, d ( ~ v ) , Υ S, d ( ~ v ) i ≤ 2 | S | X c , d ∈ [ n ] | S | h Υ S, c ( ~ v ) , Υ S, c ( ~ v ) i · h Υ S, d ( ~ v ) , Υ S, d ( ~ v ) i = 2 | S |   X c ∈ [ n ] | S | h Υ S, c ( ~ v ) , Υ S, c ( ~ v ) i   2 . Finally , we note that by d efinition, we ha ve P c ∈ [ n ] | S | h Υ S, c ( ~ v ) , Υ S, c ( ~ v ) i = || ~ v || 2 , w hich equals to E[ Y ] . It follo ws that the v ariance in Equation (4.8) can be bound ed by V ar[ Y ] ≤ X S ⊆ [ k ] ,S 6 = ∅ 2 | S | · E[ Y ] 2 = E[ Y ] 2 k X i =1  k i  2 i = (3 k − 1)E[ Y ] 2 , 130 V . BRA VE RMAN, K. CHUNG, Z. LIU, M. MITZENMACHER, AND R. OSTR O VSKY which finishes the proof of Lemma 4.1. 5. Conclusion There remain sev eral open questio ns left in this space . L o wer bounds, particular ly bounds that depen d non-tri vially on the dimensi on k , would be useful. There may still be room for better algo- rithms for testing k -wise independe nce in this manner using the ℓ 2 norm. A natural generaliza tion would be to find a particularly efficien t algorithm for testing k -out-of- s -wise independen ce (other than handling e ach set of k variab le sep arately ). More generally , a question gi ven in [ 12], to ident ify random varia bles whose correlation exce eds some threshold accord ing to some measure, remains widely open. Refer ences [1] Noga Al on, Al exand r Andoni, T ali Kaufman, Ke vin Matulef, Ronitt Rubinfeld, and Ning Xie. T esti ng k-wise and almost k-wise independen ce. In STOC ’07: Proc eedings of the Thirty-ninth A nnual ACM Symposium on Theory of Computing , pages 496–5 05, New Y ork, NY , USA, 2007. A CM. [2] Noga Alon, L ´ aszl ´ o Babai, an d Alon Itai. A fast and simple randomized parallel algorithm for the maximal indepen- dent set problem. J . Algorithms , 7(4):567–58 3, 1986. [3] Noga Al on, Oded Goldreich, and Y i shay Mansour . Almost k-wise independen ce versus k-wise independen ce. Inf. Pr ocess. Lett. , 88(3):107 –110, 2003. [4] Noga Al on, Y ossi Matias, and Mario Szegedy . The space complexity of approximating the frequency moments. J. Comput. Syst. Sci. , 58(1):137– 147, 1999. [5] T . Batu, L . Fortno w , E. Fischer , R. Kumar , R. Rubinfeld, and P . White. T esting random variables for independen ce and identity . In FOCS ’01: Proc eedings of the 42nd IEEE symposium on F oundations of Computer Science , page 442, W ashington, DC, USA, 2001. IE EE Computer Society . [6] Tugk an Batu, Ravi Kumar , and Ronitt R ubinfeld. Sublinear algorithms for testing monotone and unimodal distri- butions. In STOC ’04: Pr oceedings of the Thirty-sixth Annual ACM Symposium on Theory of Computing , pages 381–39 0, New Y ork, NY , USA, 2004. ACM . [7] Vladimir Braverman and Rafail Ostrovsky . Measuring $k$-wise indep endence of streaming data. CoRR , abs/0806.47 90v1, 2008. [8] Vladimir Braverman an d Rafail Ostrovsk y . Measuring independence of datasets. CoRR , abs/090 3.0034, 2009. [9] Vladimir Brav erman and Rafail Ostrovsky . AMS wit hout 4 -wise independen ce on product domains. CoRR , abs/0806.47 90v3, 2009. [10] Kai-Min Chung, Zhenming Liu, and Michael Mitzenmacher . T esting k - wise independe nce ov er streaming data. Unpublished manuscript, av ailable at http://www.ee cs.harvard.ed u/ ˜ michaelm/post scripts/sketc hexttemp.pdf , 2009. [11] Oded Goldreich and Dana Ron. O n testi ng expansion in bounded-de gree graphs. T echnical report, E lectronic Col- loquium on Computational Complexity , 2000. [12] Piotr Indyk and Andre w McGregor . Declaring independence via the sketching of sketches. In SODA ’08: P r oceed- ings of the nineteenth annual ACM-SIAM symposium on Discr ete algorithms , pages 737–745, Philadelphia, P A, USA, 2008. Society for Industrial and Applied Mathematics. [13] E. L. Lehmann and Springer . T esting Statistical Hypothese s (Springer T exts in Statistics) . Springer , January 1997. [14] Michael Mitzen macher and Eli Upf al. Pr obability an d Compu ting: Randomized Algorithms and Pro babilistic Anal- ysis . Cambridge Uni versity Press, Ne w Y ork, NY , USA, 2005. [15] Ronitt Rubinfeld a nd Rocco A. Served io. T esting monotone h igh-dimensional distribu tions. In STOC ’05: Pr oceed- ings of the t hirty-seventh annual ACM symposium on Theory of computing , pages 147–156, Ne w Y ork, NY , USA, 2005. A CM. [16] Mikkel Thorup and Y in Zhang. T abulation based 4-univ ersal hashing with applications to second moment estima- tion. In SOD A ’04: Proceed ings of the F ifteenth Annual ACM-SIAM Symposium on Discrete Algorithms , pages 615–62 4, Philadelphia, P A, USA, 2004. Society for Industrial and Applied Mathematics. This wor k is licensed u nder th e Creative Co mmons Attr ibution-NoDer ivs License. T o view a copy of this license, visit http://creativec ommons.org/licenses/b y- nd /3.0/ .

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment