Optimal lower bounds for locality sensitive hashing (except when q is tiny)

Optimal lo w er b ounds for lo cality sensitive hashing (except when q is tin y) Ry an O’Donnell ∗ Yi W u Y uan Zhou Computer Scienc e Departmen t Carnegie Mellon Univ ersit y { o donnell,yiwu,yuanzhou } @cs.cm u.edu Abstract W e study low er bounds for Lo cality Sensitiv e Hashing (LSH) in the stro ngest setting: p oint sets in { 0 , 1 } d under the Hamming distanc e . Recall that H is said to be an ( r, cr , p , q )-sensitive hash fa mily if all pairs x, y ∈ { 0 , 1 } d with dis t( x, y ) ≤ r hav e probability at least p of c ollision under a randomly chosen h ∈ H , whereas a ll pairs x, y ∈ { 0 , 1 } d with dis t( x, y ) ≥ cr hav e probability at most q of co llis ion. Typically , one co ns iders d → ∞ , with c > 1 ﬁxed and q bo unded awa y from 0. F o r its applica tions to a pproximate nearest neig hbor se a rch in high dimensions, the quality of a n LSH family H is g ov erned by how small its “ rho parameter ” ρ = ln(1 /p ) / ln(1 /q ) is as a function o f the parameter c . The seminal pap er of Indyk and Motwani show ed that for each c ≥ 1, the extremely simple family H = { x 7→ x i : i ∈ d } achiev es ρ ≤ 1 /c . The only k nown low er b ound, due to Motw ani, Na or, and Panigrahy , is that ρ must be a t least ( e 1 /c − 1) / ( e 1 /c + 1) ≥ . 46 /c (minus o d (1)). In this pap er we show an o ptima l low er b ound: ρ m ust b e at lea st 1 /c (min us o d (1)). This low er bo und for Ha mming space yields a lower b ound o f 1 /c 2 for Euclidea n space (or the unit sphere) and 1 /c for the Jaccard distance on sets; b oth of these match known upp er b ounds. Our pro of is simple; the essence is that the noise stability of a b o olean function at e − t is a log-co n vex function o f t . Like the Mo t wani–Naor–Panigra h y low er bo und, o ur pr o o f relies on the assumption that q is no t “tiny”, meaning of the form 2 − Θ( d ) . Some low er bo und on q is alwa ys necess ary , a s otherwise it is trivial to achieve ρ = 0. The r a nge of q for which our lower b ound holds is the same as the r a nge of q for which ρ accurately r e ﬂe c ts a n LSH fa mily’s quality . Still, we c onclude by discussing why it would be more satisfying to ﬁnd LSH lower b ounds that hold for tin y q . ∗ Supp orted by N SF grants CCF-0747250 and CCF-0915893, BSF gran t 2008477, and Sloan and Ok aw a fello wships. 1 Lo calit y Sensitiv e Hashing Lo cality Sens itiv e Hashing (LSH) is a widely -used algor ithmic to ol which brings the clas- sic technique of hashing to geometric s ettings. It was intro duced for gener al metric spa c es in the s eminal work o f Indyk and Motw ani [IM9 8]. Indyk and Motw ani sho wed that the impo rtant pro blem of (approximate) nea rest neigh b or sear ch can b e reduced to the prob- lem of devising go o d LSH families. Subsequen tly , num ero us pap ers demonstrating the pra c- tical utilit y o f so lving high-dimensio na l near est neighbor se arch pro blems via the LSH ap- proach [GIM99, B uh01, CDF + 01, SVD03, RP H0 5, DDGR07]. F o r a survey o n LSH, s ee Andoni and Indyk [AI08]. W e reca ll the basic deﬁnition from [IM98]: Deﬁnition 1. L et ( X , dist) b e a distanc e sp ac e 1 , and let U b e any ﬁnite or c ountably inﬁnite set. L et r > 0 , c > 1 . A pr ob ability distribution H over fun ct ions h : X → U is ( r, cr , p, q )-sensitive if for al l x, y ∈ X , dist( x, y ) ≤ r ⇒ Pr h ∼H [ h ( x ) = h ( y )] ≥ p, dist( x, y ) ≥ cr ⇒ Pr h ∼H [ h ( x ) = h ( y )] ≤ q , wher e q < p . We often r efer to H as a lo cally se nsitive ha sh (LSH) family for ( X , dist) . As men tioned, the most useful a pplication of LSH is to the appr oximate ne ar neighb or pr oblem in hig h dimensions : Deﬁnition 2. F or a set of n p oints P in a metric sp ac e ( X, dist) , t he ( r, c )-near neig hbor problem is t o pr o c ess the p oints into a data structu re that s upp orts the fol lowing typ e of query: given a p oint x ∈ X , if ther e exists y ∈ P with dist( x, y ) ≤ r , t he data structu re should r eturn a p oint z ∈ P such that dist( x, z ) ≤ cr . Several impo rtant pro blems in computational geometry reduce to the a pproximate near neighbor problem, including appr oximate versions of nearest neighbor, furthest neighbor , clo se pair, minimum spanning tree, a nd facility lo ca tion. F or a short survey of these topics, see In- dyk [Ind04]. Regarding the re duction from ( r , c )- ne a r neighbor problem to LSH, it is usua l (see [Ind01, DIIM0 4 ]) to credit roughly the following theorem to [IM98, GIM99]: Theorem 1.1. Supp ose H is an ( r, cr , p, q ) -sensitive family for the metric sp ac e ( X, dist) . Then one c an solve the ( r, c ) -ne ar neighb or pr oblem with a (r andomize d) data stru ct ur e that uses O ( n 1+ ρ + dn ) sp ac e and has query time dominate d by O ( n ρ log 1 /q ( n )) hash fun ction evaluations. (The pr epr o c essing time is not much mor e than the sp ac e b ound.) Here we are using the following: Deﬁnition 3. The r ho para meter of an ( r, cr , p, q ) -sensitive LSH family H is ρ = ρ ( H ) = ln(1 /p ) ln(1 /q ) ∈ (0 , 1 ) . Please note that in Theorem 1.1, it is implicitly a ssumed [Ind0 9] that q is bo unded aw ay from 0 . F o r “s ub cons tan t” v a lues of q , the theor em do es not hold. This po in t is discuss ed further in Section 4. Because of Theore m 1.1, ther e has b een signiﬁcant int eres t [DI IM04, TT07, AI0 8, Ney10] in determining the smallest p ossible ρ that can b e obtained for a given metric space and v a lue of c . Constant fa ctors are imp orta nt her e , esp ecially for the most na tural reg ime o f c close to 1. F o r example, shrinking ρ by an a dditiv e . 5 lea ds to time and space savings of Θ( √ n ). 1 A metric space where th e triangle inequ ality need not hold. 1 2 Previous w ork 2.1 Upp er b ounds The original w or k o f Indyk and Motw ani [IM9 8] co nt ains the following simple yet strong re sult: Theorem 2.1. Ther e is an LSH family H for { 0 , 1 } d under the Hamming distanc e which for e ach c > 1 has rho p ar ameter ρ ( H ) ≤ 1 c , simultane ously for e ach r < d/c . In this theorem, the family is simply the unifor m distribution ov er the d functions h i ( x ) = x i . F o r a given c and r , this family is o b vious ly ( r , cr, 1 − r/d, 1 − cr /d )-sensitive, whence ρ ( H ) = ln(1 / (1 − r/ d )) ln(1 / (1 − cr / d )) ր 1 c as r /d → 0 . W e r emark that the upp er b ound of 1 / c in Theor em 2.1 b ecomes tigh t only for asymptotically small r/d . Indyk and Mo t wani show ed that the same bo und holds for the closely related “ Jac- card metric” (see [IM98]), a nd a lso extended Theor em 2 .1 to an LSH family for the metric space ℓ 1 (see also [AI06]). Perhaps the most natural setting is when the metric space is the usual d -dimensional Eu- clidean space ℓ d 2 . Here, Andoni a nd Indyk [AI08] showed, ro ughly sp eaking , that ρ ≤ 1 /c 2 : Theorem 2. 2 . F or any r > 0 , c > 1 , d ≥ 1 , ther e is a se quenc e of LSH families H t for ℓ d 2 satisfying lim sup t →∞ ρ ( H t ) ≤ 1 c 2 . (The c omplexity of evaluating a hash funct ion h ∼ H t also incr e ases as t incr e ases.) F o r o ther ℓ s distance/metric spaces , Datar , Immor lica, Indyk, and Mirr okni [DI IM04] have similarly shown: 2 Theorem 2.3. F or any r > 0 , c > 1 , d ≥ 1 , and 0 < s < 2 , t her e is a se quenc e of LS H families H t for ℓ d s satisfying lim sup t →∞ ρ ( H t ) ≤ ma x  1 c s , 1 c  . Other practical LSH families have been sug gested for the Euclidean sphere [TT07] and ℓ 2 [Ney10]. 2.2 Lo w er b ounds There is one known res ult o n lo wer b ounds for LSH, due to Motw ani, Naor, and Panigrahy [MNP07]: Theorem 2.4. Fix c > 1 , 0 < q < 1 , and c onsider d → ∞ . Then ther e exists some r = r ( d ) such that for any LSH family H for { 0 , 1 } d under Hamming distanc e which is ( r, cr , p, q ) -sensitive must satisfy ρ ( H ) ≥ exp(1 /c ) − 1 exp(1 /c ) + 1 − o d (1) . The metr ic setting of { 0 , 1 } d under Hamming distance is the most p ow erful s etting for low er bo unds; a s Motw ani, Nao r, and Panigrahy note, one can immediately deduce a lower b ound of exp(1 /c s ) − 1 exp(1 /c s ) + 1 − o d (1) 2 Please note that in [Pa n06, MNP07] it is stated that [DI IM04] also imp ro ves the Indy k–Mot wani 1 /c up p er b ound for ℓ 1 when c ≤ 10. H ow ever this is in error. 2 for the setting of ℓ d s . This is s imply b ecause k x − y k s = k x − y k 1 /s 1 when x, y ∈ { 0 , 1 } d . As c → ∞ , the low er b ound in Theorem 2.4 a pproaches 1 2 c . This is a factor o f 2 aw ay from the upp er b ound of Indyk and Mot wani. The g ap is slightly larg er in the mor e natural reg ime of c close to 1 ; here one only has that ρ ( H ) ≥ e − 1 e +1 1 c ≈ . 46 c . Note tha t in Theorem 2.4, the para meter q is ﬁxed b efor e one lets d tend to ∞ ; i.e., q is assumed to b e at lea st a “co nstant ”. Even though this is the same ass umption implicitly made in the application of LSH to near-neig h b ors (Theorem 1 .1), we feel it is not completely sa tis- factory . In fact, as stated in [MNP0 7], Theorem 2.4 still ho lds so long as q ≥ 2 − o ( d ) . Our new low er b ound for LSH also holds for this r ange o f q . B ut w e b elieve the most satisfactory low er bo und would hold even for “tiny” q , meaning q = 2 − Θ( d ) . This p oint is discussed further in Section 4. W e c lo se b y mentioning the recent work of Panigrahy , T alwar, and Wieder [PTW08] which obtains a time/s pace lower b ound for the ( r, c )-near neig h b or problem itself in several metric space s e ttings , including { 0 , 1 } d under Hamming distance, and ℓ 2 . 3 Our result In this work, we improv e on Theor em 2.4 by obta ining a s harp lower b ound of 1 c − o d (1) fo r every c > 1. This dep endence on c is o ptimal, by the upp er b ound o f Indyk and Mo t wani. The precise s tatemen t of our result is a s follows: Theorem 3. 1. Fix d ∈ N , 1 < c < ∞ , and 0 < q < 1 . Then for a c ertain choic e of 0 < τ < 1 , any ( τ d, cτ d, p, q ) -sensitive hash family H for { 0 , 1 } d under Hamming distanc e must satisfy ρ ( H ) ≥ 1 c − e O  ln(2 /q ) d  1 / 3 . (1) Her e, the pr e cise me aning of the e O ( · ) expr ession is K · ln(2 /q ) d · ln  d ln(2 /q )  , wher e K is a universal c onstant, and we assume d/ ln(2 /q ) ≥ 2 , say. As mentioned, the low er bo und is only of the fo rm 1 c − o d (1) under the ass umption that q ≥ 2 − o ( d ) . F o r q of the form 2 − d/B for a larg e co nstant B , the b ound (1) still gives some useful information. As with the Motw ani–Naor– Panigrahy r esult, b ecause o ur low er b ound is for { 0 , 1 } d we may immediately conclude: Corollary 3.2 . The or em 3.1 also holds for LSH families for t he distanc e sp ac e ℓ s , 0 < s < ∞ , with t he lower b ound 1 / c s r eplacing 1 /c . This low er bo und matches the known upp er b ounds for E uc lidea n space s = 2 ([AI08]) and 0 < s ≤ 1 ([DIIM0 4 ]). It seems rea sonable to conjecture tha t it is also tigh t at least for 1 < s < 2. Finally , the low er b ound in Theor em 3.1 als o holds for the Ja c c ard distance o n sets, matching the upper b ound of Indyk a nd Motw ani [IM9 8]. W e explain why this is true in Section 3 .2, although we omit the very mino r necessa ry c hanges to the pro of details. 3 3.1 Noise stability Our pro of of Theo rem 3.1 r e q uires so me facts ab out b o olean noise stability . W e begin by recalling some bas ics of the analy sis of bo ole an functions. Deﬁnition 4. F or 0 < ρ ≤ 1 , we say that ( x , y ) ar e ρ -c o rrelated random strings in { 0 , 1 } d if x is chosen uniformly at r andom and y is forme d by r er andomizing e ach c o or dinate of x indep endently with pr ob ability 1 − ρ . Deﬁnition 5. Given f : { 0 , 1 } d → R , the noise stability of f a t ρ is deﬁne d to b e S f ( ρ ) = E ( x , y ) ρ -c orr elated [ f ( x ) f ( y )] . We c an ext end the deﬁnition t o functions f : { 0 , 1 } d → R U via S f ( ρ ) = E ( x , y ) ρ -c orr elated [ h f ( x ) , f ( y ) i ] , wher e h w , z i = P i ∈ U w i z i is the usual inner pr o duct. 3 Prop osition 3.3 . L et f : { 0 , 1 } d → R U and write b f ( S ) for the us ual F ourier c o eﬃcient of f asso ciate d with S ⊆ [ d ] ; i.e., b f ( S ) = 1 2 d X x ∈{ 0 , 1 } d f ( x ) Y i ∈ S ( − 1) x i ∈ R U . Then S f ( ρ ) = X S ⊆ [ d ] k b f ( S ) k 2 2 ρ | S | . (This fo rmula is s tandard when f ha s r ange R ; see, e.g., [O’D03]. The ca se when f has r ange R U follows by rep eating the sta ndard pro o f.) W e ar e particular ly int eres ted in hash functions h : { 0 , 1 } d → U ; we view thes e also as functions { 0 , 1 } d → R U by ident ifying i ∈ U with the vector e i ∈ R U , which has a 1 in the i th co ordinate and a 0 in all o ther co ordinates. Under this identiﬁcation, h h ( x ) , h ( y ) i b ecomes the 0-1 indicator of the event h ( x ) = h ( y ). Hence for a ﬁxed ha s h function h , S h ( ρ ) = Pr ( x , y ) ρ -correlated [ h ( x ) = h ( y )] . (2) W e also extend the no tio n of nois e stability to hash families : Deﬁnition 6. If H is a hash family on { 0 , 1 } d , we deﬁne S H ( ρ ) = E h ∼H [ S h ( ρ )] . By combining this deﬁnition with equation (2) and Pro po s ition 3 .3, we immediately deduce: Prop osition 3.4. L et H b e a hash family on { 0 , 1 } d . Then S H ( ρ ) = Pr h ∼H , ( x , y ) ρ -corr’d [ h ( x ) = h ( y )] = X S ⊆ [ d ] E h ∼H [ k b h ( S ) k 2 2 ] ρ | S | . Finally , it is sometimes more natura l to express the par ameter ρ as ρ = e − t , whe r e t ∈ [0 , ∞ ). (F or example, we ca n think of a ρ -cor related pa ir ( x , y ) by tak ing x to be uniformly random and y to b e the string that r esults fro m r unning the standard contin uous-time Ma rko v Chain on { 0 , 1 } d , s ta rting from x , for time td .) W e ma ke the following deﬁnition: Deﬁnition 7. F or t ∈ [0 , ∞ ) , we deﬁne K h ( t ) = S h ( e − t ) , and we similarly deﬁne K H ( t ) . 3 In the case th at U is countably inﬁnite, we req uire our functions f to have k f ( x ) k 2 < ∞ for all x ∈ { 0 , 1 } d . 4 3.2 The pro of, mo dulo some tedious calculations W e now present the essence o f our pro of o f Theorem 3.1. It will be quite simple to see how it gives a low er b ound o f the form 1 c − o d (1) (a ssuming q is not tin y). Some very tedious calcula- tions (Chernoﬀ b ounds, elementary inequalities, etc.) ar e needed to get the prec ise statement given in Theor em 3.1; the formal pro of is therefor e deferred to Section 5. Let H b e a hash family on { 0 , 1 } d , a nd let us co nsider K H ( t ) = Pr h ∼H , ( x , y ) e − t -corr’d [ h ( x ) = h ( y )] . (3) Let us supp ose that t is v ery small, in which c a se e − t ≈ 1 − t . When ( x , y ) are (1 − t )-cor related strings, it means that y is formed from the random string x by re r andomizing each co ordinate with pro bability t . This is the same as ﬂipping each co o rdinate with pr obability t/ 2. Thus if we think of d a s lar ge, a simple Chernoﬀ b ound shows that the Ha mming distance dist( x , y ) will be very clos e to ( t/ 2 ) d with ov erwhelming probability . 4 Suppo se now that H is (( t/ 2) d + o ( d ) , ( ct/ 2) d − o ( d ) , p, q )-se ns itiv e, so the distance ratio is c − o d (1). In (3), reg ardless of h we will almost surely have dist( x , y ) ≤ ( t/ 2) + o ( d ); hence K H ( t ) ≥ p − o d (1). Similarly , we deduce K H ( ct ) ≤ q + o d (1). Hence, ne g lecting the o d (1) ter ms, we g et ρ ( H ) = ln(1 /p ) ln(1 /q ) & ln(1 / K H ( t )) ln(1 / K H ( ct )) . W e then deduce the desir e d low er b ound of 1 /c from the following theorem and its corolla ry: Theorem 3. 5. F or any hash family H on { 0 , 1 } d , the fu n ction K H ( t ) is lo g-c onvex in t . Pr o of. F r om P rop osition 3.4 we hav e K H ( t ) = X S ⊆ [ d ] E h ∼H [ k b h ( S ) k 2 2 ] e − t | S | . Thu s K H ( t ) is log-co nv ex, b eing a nonneg ative linear combination of log -conv ex functions e − t | S | . Corollary 3.6. F or any hash family H on { 0 , 1 } d , t ≥ 0 , and c ≥ 1 , ln(1 / K H ( t )) ln(1 / K H ( ct )) ≥ 1 c . Pr o of. By lo g-conv exity , K H ( t ) ≤ K H ( ct ) 1 /c · K H (0) 1 − 1 /c = K H ( ct ) 1 /c . Here we used the fact that K H (0) = 1, which is immediate fr o m the deﬁnitions b ecause e − 0 -corre lated strings a re alwa ys identical. The res ult follows. As mentioned, w e g iv e the ca reful pro of keeping track o f a pproximations in Sec tion 5. But ﬁrst, we no te what we view as a sho rtcoming of the pro of: a fter deducing K H ( ct ) ≥ q − o d (1), we wish to “negle ct” the additive o d (1) term. This requir es that o d (1) indeed b e neg lig ible compare d to q ! Being more careful, the o d (1) aris es fro m a Chernoﬀ b ound applied to a Binomial( d, c t ) random v ar iable, where t > 0 is very small. So to b e more precise, the erro r term is of the form exp( − ǫd ), and hence is o nly neg ligible if q ≥ 2 − o ( d ) . 4 Similarly , if we think of x and y as subsets of [ d ], t h eir Jaccard distance will b e very close to t/ (1 + t / 2) ≈ t with o verwhelming probability . With th is observ ation, one obtains ou r low er b ound on LSH families for th e Jaccard distance on sets. 5 4 Discussion 4.1 On the reduction from LSH to near neigh b or data structures As describ ed in Section 1, it is nor mally stated tha t the quality of an ( r , cr, p, q )-s e ns itiv e LSH family H is gov erned by ρ = ln(1 / p ) / ln(1 /q ), and more sp eciﬁcally that H can be used to solve the ( r, c )- ne a r neighbor problem with r oughly O ( n 1+ ρ ) space and query time O ( n ρ ). How ever, this involv es the implicit ass umption that q is b ounded aw ay from 0. It is easy to s ee that some lower b ound on q is ess en tial. Indee d, for a n y (ﬁnite, say) distance space ( X, dist) there is a trivially “optimal” L SH family for any r and c : F o r e ach pair x, y ∈ X with dist( x, y ) ≤ r , deﬁne h x,y by setting h x,y ( x ) = h x,y ( y ) = 0 and letting h x,y ( z ) ha ve distinct po sitive v a lues for all z 6 = x, y . If H is the uniform dis tr ibution ov er all such h x,y , then p > 0 and q = 0, leading to ρ ( H ) = 0. T o see why this triv ia l so lution is not use ful, and what low er bound o n q is de s irable, we r e c all some asp ects of the Indyk–Motw ani reduction from LSH families to ( r, c )-near neighbor data structures. Suppo se o ne wishes to build an ( r, c )-near neighbor da ta structure for an n -p oint subset P of the metric space ( X , dist). The ﬁrst step in [IM98] is to apply the following: P o w ering Construction: Giv en an ( r, cr, p, q )-sensitive family H of functions X → U and a p ositive integer k , we deﬁne the family H ⊗ k by drawing h 1 , . . . , h k independently fro m H and forming the function h : X → U k , h ( x ) = ( h 1 ( x ) , . . . , h k ( x )). It is ea sy to check that H ⊗ k is ( r , cr, p k , q k )-sensitive. Indyk and Motw ani show that if one has a n ( r , cr, p ′ , q ′ )-sensitive ha sh family with q ′ ≤ 1 /n , then one can obtain a ( r , c )-nea r neighbo r data str ucture w ith space roughly O ( n/p ′ ) and query time ro ughly O (1 / p ′ ). Thus g iven an a r bitrary ( r , cr, p, q )- sensitive fa mily H , Indyk a nd Motw ani suggest using the Po wering Construction with k = log 1 /q ( n ). The r esulting H ⊗ k is ( r, cr, p ′ , 1 /n )- sensitive, with p ′ = p k = n − ρ , yie lding an O ( n 1+ ρ ) spa ce, O ( n ρ ) time data structure. How ever this arg ument makes s e nse o nly if k is a p ositive integer. F or ex ample, with the trivially “o ptimal” LSH family , we hav e q = 0 and thus k = −∞ . Indeed, whenever q ≤ 1 /n to beg in with, one do es n’t get O ( n 1+ ρ ) spa ce and O ( n ρ ) time, one simply gets O ( n/p ) space and O (1 /p ) time. F or exa mple, a hypothetical LSH family with p = 1 / n . 5 and q = 1 /n 1 . 5 has ρ = 1 / 3 but only yields an O ( n 1 . 5 ) spa ce, O ( n . 5 ) time near neig hbo r data structure. The assumption q > 1 /n is still not enoug h for the deduction in Theorem 1.1 to hold pre- cisely . The rea son is that the Indyk–Mo t wani choice of k may not b e an integer. F or example, suppo se we design an ( r, cr, p, q )-sensitive family H with p = 1 /n . 15 and q = 1 /n . 3 . Then ρ = . 5. How ever, we ca nnot a ctually get an O ( n 1 . 5 ) space, O ( n . 5 ) time da ta structure fro m this H . The reason is that to get q k ≤ 1 / n , we need to take k = 4 . Then p k = 1 /n . 6 , so we o nly g et a n O ( n 1 . 6 ) spa ce, O ( n . 6 ) time data structure. The eﬀect o f rounding k up to the ne a rest integer is not completely eliminated unless o ne makes the as sumption, implicit in Theorem 1.1, tha t q ≥ Ω(1). Under the weak er assumption that q ≥ n − o (1) , the conclus ion of Theor em 1.1 remains true up to n o (1) factors. T o b e c o mpletely precise, o ne should ass ume q ≥ 1 /n and take k = ⌈ log 1 /q ( n ) ⌉ . If we then use k ≤ lo g 1 /q ( n ) + 1, the Po wering Construction will y ie ld an LSH family with q ′ ≤ 1 /n and p ′ = ( n/ q ) − ρ . In this wa y , one o btains a reﬁnement of Theore m 1.1 with no additional a ssumptions: Theorem 4.1. Supp ose H is an ( r, cr, p, q ) -s en sitive family for the metric sp ac e ( X , dis t) . Then for n -p oint subsets of X (and assuming q ≥ 1 / n ), one c an solve the ( r , c ) -ne ar neighb or pr oblem with a (r andomize d) data structu r e that uses n · O (( n/q ) ρ + d ) sp ac e and has qu ery time dominate d by O (( n/q ) ρ log 1 /q ( n )) hash fu n ction evaluations. 6 4.2 On assuming q is not t in y Let us return fro m the near-neig h b or pr oblem to the study of lo cality sensitive hashing itself. Because of the “trivia l” LSH family , it is essential to impo se some kind o f lower b ound on how small the pa rameter q is allowed to b e. Mot wani, Naor, and Panigrahy carr y out their low er bo und for LSH families on { 0 , 1 } d under the assumption that q ≥ Ω(1), but also note that it go es through assuming q ≥ 2 − o ( d ) . Our main r esult, Theorem 3.1, is a lso best when q ≥ 2 − o ( d ) , and is only nont rivia l assuming q ≥ 2 − d/B for a suﬃciently large co nstant B . One may as k what the “corr e c t” lower b ound assumed on q should b e. F o r the Indyk– Motw ani a pplication to ( r, c )-near neighbor data structures, the ans w er seems obvious: “1 /n ”. Indeed, s ince the Indyk–Mo tw ani r e ductio n immediately uses Pow ering to reduce the q pa ram- eter down to 1 /n , the mo st meaningful LSH low er b ounds would simply inv olve ﬁx ing q = 1 / n and trying to low er b ound p . There is a n obvious ca tc h here, though, whic h is that in the deﬁnition of LSH, ther e is no notion of “ n ” ! St ill, in settings such as { 0 , 1 } d which have a notion of dimensio n, d , it seems r easonable to think that a pplications will hav e n = 2 Θ( d ) . In this ca se, to ma intain the Indyk–Motw ani Theorem 4.1 up to n o (1) factors one would require q ≥ 2 − o ( d ) . This is precisely the assumption tha t this pap er and the Motw ani–Naor– Panigrahy pap er hav e made. Still, we belie v e that the most comp elling kind o f LSH low er b ound for { 0 , 1 } d would be nont rivia l even for q = 2 − d/b with a “medium” constant b , say b = 1 0. W e cur rently do not have s uch a low er bo und. 5 Pro of details W e r e quire the following lemma, whose pro o f follows easily fro m Pro po sition 3.4 a nd the deﬁ- nition o f has h family sensitivity: Lemma 5.1. L et H b e an ( r , cr, p, q ) -s en sitive hash family on { 0 , 1 } d and supp ose ( x , y ) is a p air of e − u -c orr elate d r andom strings. Then p (1 − Pr [dist( x , y ) > r ]) ≤ K H ( u ) ≤ q + Pr [dist( x , y ) < cr ] . W e now prov e Theore m 3.1, which for conv enience we slightly r ephrase as follows: Theorem 5. 2. Fix d ∈ N , 1 < c < ∞ , and 0 < q < 1 . Then for a c ertain choic e of 0 < ǫ < 1 , any (( ǫ/c ) d, ǫd, p, q ) -sensitive hash family for { 0 , 1 } d under Hamming distanc e must satisfy ρ = ln(1 /p ) ln(1 /q ) ≥ 1 c − K · λ ( d, q ) 1 / 3 , wher e K is a universal c onstant, λ ( d, q ) = ln(2 /q ) d ln  d ln(2 /q )  , and we assume d/ ln(2 /q ) ≥ 2 , say. Pr o of. Let 0 < ∆ = ∆( c, d, q ) < . 005 b e a small quantit y to b e chosen later, and let ǫ = . 00 5 ∆. Suppo se that H is a n (( ǫ/c ) d, ǫd, p, q )-sensitive hash family for { 0 , 1 } d . Our goal is to low er bo und ρ = ln(1 /p ) / ln(1 /q ). By the Pow ering Constructio n we may a ssume that q ≤ 1 /e , and hence will use ln (1 /q ) ≥ 1 without further comment. Deﬁne also t = 2 ǫ (1 + ∆ / 2) and c ′ = c (1 + ∆). Let ( x 1 , y 1 ) b e exp( − t/c ′ )-correla ted ra ndom strings and let ( x 2 , y 2 ) b e exp( − t )-cor related random s trings. Using the tw o b ounds in Lemma 5 .1 separ ately , w e have K H ( t/c ′ ) ≥ p (1 − e 1 ) , K H ( t ) ≤ q + e 2 , 7 where e 1 = Pr [dist( x 1 , y 1 ) > ( ǫ / c ) d ] , e 2 = Pr [dist( x 2 , y 2 ) < ǫ d ] . By Corollar y 3 .6, we hav e 1 c ′ ≤ ln  1 / K H ( t/c ′ )  ln  1 / K H ( t )  ≤ ln  1 p (1 − e 1 )  ln  1 q + e 2  = ln(1 /p ) + ln(1 / (1 − e 1 )) ln(1 /q ) + ln(1 / (1 + e 2 /q )) . (4) W e will use the following estimates: 1 c ′ = 1 c (1 + ∆) ≥ 1 c (1 − ∆) = 1 c − ∆ c , (5) ln(1 / (1 − e 1 )) ≤ 1 . 01 e 1 , (6) ln(1 /q ) + ln(1 / (1 + e 2 /q )) ≥ ln(1 /q ) − e 2 /q = ln(1 / q )  1 − e 2 q ln(1 / q )  . (7) F o r (6) we made the following assumption: e 1 ≤ . 01 . (8) W e will also ensure that the qua n tity in (7) is p ositive b y making the following assumption: e 2 < q ln(1 / q ) . (9) Substituting the thre e estimates (5 )–(7) into (4) we obtain 1 c − ∆ c ≤ ln(1 /p ) + 1 . 01 e 1 ln(1 /q )  1 − e 2 q ln(1 /q )  ⇒ ln(1 /p ) + 1 . 01 e 1 ln(1 /q ) ≥  1 c − ∆ c   1 − e 2 q ln(1 / q )  ⇒ ln(1 /p ) ln(1 /q ) ≥ 1 c − ∆ c − e 2 q ln(1 / q ) − 1 . 01 e 1 ln(1 /q ) . Thu s we hav e established ρ ≥ 1 c − e, where e = ∆ c + 1 . 01 e 1 ln(1 /q ) + e 2 q ln(1 / q ) . (10) W e now estimate e 1 and e 2 in terms of ∆ (and ǫ ), a fter which we will cho ose ∆ so as to minimize e . By deﬁnition, e 1 is the pro babilit y that a Binomial( d, η 1 ) r a ndom v ariable exceeds ( ǫ/c ) d , where η 1 = (1 − exp( t/c ′ )) / 2. Let us select δ 1 so that (1 + δ 1 ) η 1 = ǫ/c . Thus δ 1 = ǫ cη 1 − 1 = 2 ǫ/c 1 − exp( − t/c ′ ) − 1 ≥ 2 ǫ/c t/c ′ − 1 = 1 + ∆ 1 + ∆ / 2 − 1 ≥ . 498∆ . Here we used the deﬁnitions of t and c ′ , and then the assumption ∆ < . 005 . Using a standard Chernoﬀ bo und, we conclude e 1 = Pr [Binomial( d, η 1 ) > (1 + δ 1 ) η 1 d ] < e x p  − δ 2 1 2 + δ 1 η 1 d  < exp  − ∆ 2 8 . 08 η 1 d  , (11) using the fact that δ 2 / (2 + δ ) is inc r easing in δ , a nd ∆ < . 00 5 again. W e additionally es timate η 1 = 1 − exp( t/c ′ ) 2 ≥ t/c ′ − ( t/c ′ ) 2 / 2 2 = ( t/ 2 c ′ ) − ( t/ 2 c ′ ) 2 ≥ . 99( t/ 2 c ′ ) = . 9 9 ǫ c  1 + ∆ / 2 1 + ∆  ≥ . 98 ǫ c . Here the s econd inequality used t/ 2 c ′ ≤ . 01, which certainly ho lds since t/ 2 c ′ ≤ ǫ = . 0 05∆. T he third ineq uality us e d ∆ ≤ . 00 5. Substituting this into (11) we obtain our upper b ound for e 1 , e 1 < exp  − ∆ 2 8 . 25 ǫ c d  = exp  − . 005∆ 3 8 . 25 c d  < exp  − ∆ 3 2000 c d  . (12) 8 Our es timation of e 2 is quite simila r : e 2 = Pr [B ino mial( d, η 2 ) < (1 − δ 2 ) η 2 d ] < e x p  − δ 2 2 2 η 2 d  , (13) where η 2 = (1 − exp( − t )) / 2 a nd δ 2 is chosen so that (1 − δ 2 ) η 2 = ǫ . This entails δ 2 = 1 − ǫ η 2 = 1 − 2 ǫ 1 − e x p( − t ) ≥ 1 − 2 ǫ t − t 2 / 2 = 1 − 1 ( t/ 2 ǫ ) − ǫ ( t/ 2 ǫ ) 2 = 1 − 1 (1 + ∆ / 2 ) − ǫ (1 + ∆ / 2) 2 . This express ion is the rea son we were for c ed to take ǫ no ticeably smaller than ∆. Using o ur sp eciﬁc s e tting ǫ = . 005 ∆, we conclude δ 2 ≥ 1 − 1 (1 + ∆ / 2) − ǫ (1 + ∆ / 2) 2 = 1 − 1 1 + . 495∆ − . 005 ∆ 2 − . 00125 ∆ 3 ≥ . 49∆ , where we used ∆ ≤ . 005 aga in. As for η 2 , we can low er b o und it similarly to η 1 , o btaining η 2 ≥ . 99( t/ 2) = . 9 9 ǫ (1 + ∆ / 2) ≥ . 99 ǫ. Substituting our lower b ounds for δ 2 and η 2 int o (13) yields e 2 < exp  − ( . 49∆) 2 2 · . 99 ǫd  < exp  − ∆ 3 2000 d  . (14) Plugging o ur upp er b ounds (12), (14) for e 1 , e 2 int o (10) gives e = ∆ c + 1 . 01 exp( − ∆ 3 2000 c d ) ln(1 /q ) + exp( − ∆ 3 2000 d ) q ln(1 / q ) . (15) Finally , we would lik e to cho ose ∆ = K 1 c 1 / 3 λ ( d, q ) 1 / 3 , where K 1 is an absolute cons tant . F or K 1 suﬃciently larg e , this makes all three ter ms in the bo und (15 ) at most 2 K 1 λ ( d, q ) 1 / 3 = e O  ln(2 /q ) d  1 / 3 . This w ould establish the theorem. It only r emains to check whether this is a v a lid choice for ∆. First, we note that with this choice, a ssumptions (8 ) and (9) fo llow from (12) a nd (14) (and increasing K 1 if nece ssary). Second, w e required tha t ∆ ≤ . 005. This may not ho ld. How ever, if it fails then we hav e λ ( d, q ) 1 / 3 > . 005 K 1 c 1 / 3 . W e can then trivialize the theorem by taking K = ( K 1 /. 005) 3 , mak ing the claimed low er b o und for ρ smaller than 1 / c − 1 /c 1 / 3 ≤ 0. Ac kno wledgmen ts The authors would like to thank Alexandr Andoni, Piotr Indyk, Ass af Naor, and Kunal T alwar for helpful discuss ions. 9 References [AI06] A . Andoni and P . Indyk . Eﬃcient algorithms for substring ne a r neighbor problem. In Pr o c. 17th Ann. ACM -SIA M Symp osium on Discr et e Algorithm , pa ges 120 3–121 2, 2006. [AI08] A . Andoni and P . Indyk . Near-optimal hashing algorithms for approximate neares t neighbor in high dimensions. Communic ations of the ACM , 5 1(1):117– 122, 200 8. [Buh01] J. Buhler. Eﬃcient lar ge-scale sequence c o mparison by lo c a lit y-s e nsitive hashing . Bioinfo rmatics , 17(5 ):419–42 8, 2001. [CDF + 01] E. Co hen, M. Datar, S. F ujiw ara , A. Gionis, P . Indyk, R. Motw ani, J.D. Ullman, and C. Y ang . Finding interesting ass o ciations without s uppor t pruning . IEEE T r ansactions on Know le dge and Data Engine ering , 1 3 (1):64–78 , 200 1. [DDGR07] A. Das, M. Datar , A. Garg , and S. Ra jaram. Go ogle news p ers onalization: s calable online collab or ative ﬁltering. In Pr o c. 16th Intl. Conf. on World Wide Web , pa ges 271–2 80, 2 0 07. [DIIM0 4 ] M. Datar, N. Immorlica , P . Indyk, and V. S. Mir rokni. Locality-sensitive hashing scheme based on p-stable distributions . In Pr o c. 20th Ann. Symp osium on Compu- tational Ge ometry , pages 253 –262, New Y ork, NY, USA, 200 4. [GIM99] A. Gionis, P . Indyk, and R. Motw ani. Similarity search in hig h dimensions via hashing. In Pr o c. 25th Intl. Conf. on V ery L ar ge D ata Bases , 1999 . [IM98] P . Indyk and R. Mo t wani. Approximate near est neighbors: towards removing the curse of dimens io nality . In Pr o c. 13th Ann. ACM S ymp osium on The ory of Com- puting , pages 604 –613, 1998. [Ind01] P . Indyk. High-dimensional c omput ational ge ometry . PhD thesis , Stanford Univer- sity , 200 1. [Ind04] P . Indyk. Nea rest neighbors in high-dimensional spaces . Handb o ok of Discr ete and Computational Ge ometry , pages 87 7–892 , 2004 . [Ind09] P . Indyk. Persona l co mmunication, 20 09. [MNP07] R. Mo t wani, A. Naor , and R. Panigra hi. Low er b ounds o n lo ca lit y sensitive hashing. SIAM Jou rn al on Discr ete Mathematics , 21(4):93 0–935 , 2 007. [Ney10] T. Neylon. A lo cality-sensitive hash for real vectors. T o app e ar in the 21st Ann. ACM -SIAM Symp osium on Discr ete Al gorithms , 201 0. [O’D03] R. O’Do nnell. Computational applic ations of noise sensitivity . P hD thesis , Mas- sach usetts Institute of T echnology , 2 003. [Pan06] R. Panigrahy . E n tropy based ne a rest neighbor sear c h in high dimensions. In Pr o c. 17th Ann. ACM-SIAM Symp osium on Discr ete Algorithm , page 1195, 2 006. [PTW08] R. Panigrahy , K. T alwar, and U. Wieder. A geometric appr oach to low er b ounds for appr oximate near-neighbor sea rch and partial match. In Pr o c. 49th Ann. IEEE Symp osium on F oundations of Computer Scienc e , pages 4 14–42 3 . IE EE Computer So ciety , 200 8. [RPH05] D. Ravichandran, P . Pan tel, and E. Hovy . Randomized algor ithms a nd NLP: Us- ing lo cality sensitive ha sh functions fo r high sp eed noun clustering . In Pr o c. 43r d Ann. Me eting of the Asso ciation for Computational Linguistics , pag es 622 –629, Ann Arb or, Michigan, June 2005. Asso c ia tion for Computational L inguistics. [SVD03] G. Shakhnar ovich, P . Viola , and T. Dar rell. F ast p ose estimation with parameter - sensitive hashing. In Pr o c. 9th Ann. IEEE Intl. Conf. on Computer Vision , pa ges 750–7 57. Citesee r , 200 3. [TT07] K. T era s aw a and Y. T anak a . Spherical LSH for appr oximate nearest neig h b or sea rch on unit hypers phere. L e ctur e Notes in Computer Scienc e , 461 9 :27, 2007. 10

Optimal lower bounds for locality sensitive hashing (except when q is tiny)

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment