On Approximating the Lp Distances for p>2
Applications in machine learning and data mining require computing pairwise Lp distances in a data matrix A. For massive high-dimensional data, computing all pairwise distances of A can be infeasible. In fact, even storing A or all pairwise distances…
Authors: Ping Li
On Approxima ting the l p Distances for p > 2 (When p Is Ev en) Ping Li Department of Statistical Science Faculty of Computing and Information Sc ience Cornell Unive rsity , Ithaca, NY 14850 November 3, 2021 Abstract Many 1 applications in machine learning and data mining require computing pairwise l p distances in a data matrix A ∈ R n × D . For massiv e high-dimensional data, compu ting all pairwise distances o f A can be infeasib le. In fac t, ev en storing A or all pairwise distances of A in the memory may be also infeasible. For 0 < p ≤ 2 , efficien t small space algorithms e xist, for ex ample, based on t he method of stable r andom pr ojections , which unfortunately is not directly applicable to p = 3 , 4 , 5 , 6 , ... This paper proposes a simple method for p = 2 , 4 , 6 , ... W e first de compose the l p (where p is ev en) d istances into a sum of 2 marginal norms and p − 1 “inner products” at different orders. Then we apply normal or sub-Gaussian random projections to approximate the resultant “inner products, ” assuming that the marginal norms can be comp uted exactly by a linear scan. W e propose two strategies for applying random projections. T he basic projec- tion strategy requires on ly one projection matrix but it is mo re difficult to analyze , while the alt ernati ve projection strategy requires p − 1 projection matrices but its theoretical analysis is much eas ier . In terms of the accura cy , at least for p = 4 , the basic strategy is always more accurate than the alternativ e strategy if the data are non-ne gativ e, which is common in reality . 1 Introd uction This stud y pr oposes a simple me thod fo r efficiently comp uting the l p distances in a massi ve data matrix A ∈ R n × D for p > 2 (wh ere p is even), u sing random pr ojections [22]. While many previous work on rando m p rojections f ocused on approximating th e l 2 distances ( and in ner pro ducts), th e m ethod o f symmetric stab le rando m pr ojections [8, 13, 18, 15] is applicable to approximating the l p distances for all 0 < p ≤ 2 . This work propo ses using random projec tions for p > 2 , a least for some special cases. 1 First draft Dec. 2007. Slight ly revise d June 200 8. 1 Machine learning algor ithms of ten o perate o n the l p distances of A instead o f the original d ata. A straig htforward applicatio n would be searching fo r th e near est neigh - bors u sing l p distance. The l p distance is also a basic loss fu nctions for quality mea- sure. The wid ely used “kernel trick, ” (e.g., for suppo rt vector m achines (SVM)), is often constructed on top of the l p distances[21]. 2 Here we can tr eat p as a tuning param eter . It is co mmon to take p = 2 ( E uclidian distance), or p = ∞ ( infinity distance ), p = 1 ( Ma nhattan d istance), or p = 0 ( Ha m- ming distance); but in princip le any p values are po ssible. In fact, if there is an ef ficient mechanism to com pute the l p distances, th en it bec omes affordable to tune le arning algorithm s for many v alues of p for the best perfor mance. In mod ern data minin g and learning applicatio ns, the ubiquito us ph enomen on o f “massiv e data” impo ses ch allenges. For example, pr e-compu ting and storing all pair- wise l p distances in memory at the cost O ( n 2 ) can b e infe asible when n > 10 6 (or ev en just 10 5 )[5]. For ultra hig h-dimen sional data, even just sto ring the wh ole data matrix can be inf easible. In the mea nwhile, moder n applications can routinely in volve millions of observations; and developing scalable learning and data mining algorithms has been an active research direction. One common ly used strate gy in cu rrent practice is to compute the distances on the fly [5], in stead of storing all pairwise l p distances. Data reduction algorithms such as sampling or sk etching methods are also popular . While there h av e been extensive studies on appro ximating th e l p distances for 0 < p ≤ 2 , p > 2 can be useful too. For example, because the normal distrib ution is completely determined by its first two m oments ( mean and variance), we can identif y the non - normal comp onents of the d ata by analyzing higher mom ents, in particu lar , the f ourth moments (i.e., kurtosis ). Thus, the fo urth mo ments ar e c ritical, for example, in the field of Indep endent Compo nent Ana lysis (I CA) [11]. Theref ore, it is v iable to use the l p distance for p > 2 whe n lo wer order distances can not efficiently differentiate data. It is unf ortunate that the family o f stab le distributions [24] is limited to 0 < p ≤ 2 and hence we can n ot dire ctly usin g stable distributions for appr oximating the l p dis- tances. In the theoretical C S commun ity , th ere ha ve been many stud ies on appro ximat- ing the l p norms and distances[2, 10, 9, 12, 3, 7, 19, 20, 4, 23, 14], s ome of which also applicable to the l p distances (e.g., comp aring tw o lon g vectors). Those papers proved that small space ( ˆ O (1) ) algorithms exist only for 0 < p ≤ 2 . 1.1 The Methodology Giv en a giant d ata matrix A ∈ R n × D , we assume tha t a linear scan of the d ata is feasible, but co mputing all pairwise interaction s is no t, either due to co mputation al budget constraints or memory limits. Also, we only consider even p = 4 , 6, ..., amon g which p = 4 is pro bably the most important. Interestingly , our method is based only on norm al (or normal-like) pr ojections. The observation is tha t, when p is e ven, the l p distance can be decomposed in to marginal 2 It is well-kno wn that the radial basis kernel using the l p distanc e with 0 < p ≤ 2 satisfies the Mercer’ s conditi on. Howe ver , we can still use the l p distanc e with p > 2 as kernels, although in this case it is not guarant eed to find the “most optimal” solution. For ve ry large-sca le learning, we usually will not find the “most optimal ” solution any way . 2 l p norms and “ inner prod ucts” of v arious orde rs. For example, for tw o D -d imensional vectors x and y , when p = 4 , then d ( p ) = D X i =1 | x i − y i | p = D X i =1 x 4 i + D X i =1 y 4 i + 6 D X i =1 x 2 i y 2 i − 4 D X i =1 x 3 i y i − 4 D X i =1 x i y 3 i . Since we ass ume that a linear scan of the data is feasible, we ca n co mpute P D i =1 x 4 i and P D i =1 y 4 i exactly . W e can a pprox imate the intera ction term s P D i =1 x 2 i y 2 i , P D i =1 x 3 i y i , and P D i =1 x i y 3 i using normal (or nor mal-like) rando m projec tions. Therefore, fo r p being e ven, we are able to efficiently appro ximate the l p distances. 1.2 Paper Or ganization Section 2 concer ns u sing normal rand om projections for approxima ting l 4 distances. W e introduce two projection strategies an d the concept of utilizing the marginal nor ms to imp rove the estimate s. Section 3 extend s th is app roach to a pprox imating l 6 dis- tances. Section 4 an alyzes the effect of replacing no rmal pro jections by sub-Gau ssian projection s. 2 Normal Random Projection s f or p = 4 The goal is to ef ficiently compu te all p airwise l p ( p = 4 ) distances in A ∈ R n × D . It suffices to c onsider any two rows o f A , say x and y , where x , y ∈ R D . W e need to estimate the l p distance between x and y d ( p ) = D X i =1 | x i − y i | p . which, when p = 4 , beco mes d (4) = D X i =1 | x i − y i | 4 = D X i =1 x 4 i + D X i =1 y 4 i + 6 D X i =1 x 2 i y 2 i − 4 D X i =1 x 3 i y i − 4 D X i =1 x i y 3 i . In one pa ss, we can compu te P D i =1 x 4 i and P D i =1 y 4 i easily , but comp uting the inter- actions is more d ifficult. W e resort to rando m projec tions for appro ximating P D i =1 x 2 i y 2 i , P D i =1 x 3 i y i , and P D i =1 x i y 3 i . Since there are thre e “inner produ cts” of dif ferent orders, we can choose either on ly o ne pro jection matrix f or a ll three terms (the basic projection strategy), o r three indepen dent pro jection matr ices (the altern ativ e p rojection strate gy). 3 2.1 The Basic Proje ction Strategy First, g enerate a random matr ix R ∈ R D × k ( k ≪ D ), with i.i. d. entries 3 from a standard normal, i.e., r ij ∼ N (0 , 1 ) , E ( r ij ) = 0 , E ( r 2 ij ) = 1 , E ( r 4 ij ) = 3 . E r s ij r t i ′ j ′ = 0 , if t or s is o dd, and i 6 = i ′ or j 6 = j ′ Using random projections, we genera te six vectors in k dime nsions, u 1 , u 2 , u 3 , v 1 , v 2 , v 3 ∈ R k : u 1 ,j = D X i =1 x i r ij , u 2 ,j = D X i =1 x 2 i r ij , u 3 ,j = D X i =1 x 3 i r ij , v 1 ,j = D X i =1 y i r ij , v 2 ,j = D X i =1 y 2 i r ij , v 3 ,j = D X i =1 y 3 i r ij . W e have a simp le unbiased estimator of d (4) ˆ d (4) = D X i =1 x 4 i + D X i =1 y 4 i + 1 k 6 u T 2 v 2 − 4 u T 3 v 1 − 4 u T 1 v 3 . Lemma 1 E ˆ d (4) = d (4) , V ar ˆ d (4) = 36 k D X i =1 x 4 i D X i =1 y 4 i + D X i =1 x 2 i y 2 i ! 2 + 16 k D X i =1 x 6 i D X i =1 y 2 i + D X i =1 x 3 i y i ! 2 + 16 k D X i =1 x 2 i D X i =1 y 6 i + D X i =1 x i y 3 i ! 2 + ∆ 4 ∆ 4 = − 48 k D X i =1 x 5 i D X i =1 y 3 i + D X i =1 x 2 i y i D X i =1 x 3 i y 2 i ! − 48 k D X i =1 x 3 i D X i =1 y 5 i + D X i =1 x i y 2 i D X i =1 x 2 i y 3 i ! + 32 k D X i =1 x 4 i D X i =1 y 4 i + D X i =1 x i y i D X i =1 x 3 i y 3 i ! . 3 It is possibl e to relax the require ment of i.i.d sample s. In fact , to prov e unbiasedn ess of the estimate s only need s pairwise independence , and to deri ve the varia nce formula requires four-wise indepen dence. 4 Proof 1 See App endix A. The basic p rojection strategy is simple but its analysis is quite inv olved, especially when p > 4 . Also, if we are interested in h igher o rder mo ments (oth er than variance) of the estimator , the analysis becomes very tedious. 2.2 The Alternati ve Pr ojection Strategy Instead of one pro jection ma trix R , we generate three, R ( a ) , R ( b ) , R ( c ) , ind ependen tly . By rando m pro jections, we gen erate six vectors in k dimension s, u 1 , u 2 , u 3 , v 1 , v 2 , v 3 ∈ R k , such that u 1 ,j = D X i =1 x i r ( c ) ij , u 2 ,j = D X i =1 x 2 i r ( a ) ij , u 3 ,j = D X i =1 x 3 i r ( b ) ij , v 1 ,j = D X i =1 y i r ( b ) ij , v 2 ,j = D X i =1 y 2 i r ( a ) ij , v 3 ,j = D X i =1 y 3 i r ( c ) ij . Here we abuse the notatio n slightly by using the same u and v fo r both pr ojection strategies. Again, we have an unbiased estimator , denoted by ˆ d (4) ,a ˆ d (4) ,a = D X i =1 x 4 i + D X i =1 y 4 i + 1 k 6 u T 2 v 2 − 4 u T 3 v 1 − 4 u T 1 v 3 Lemma 2 E ˆ d (4) ,a = d (4) , V ar ˆ d (4) ,a = 36 k D X i =1 x 4 i D X i =1 y 4 i + D X i =1 x 2 i y 2 i ! 2 + 16 k D X i =1 x 6 i D X i =1 y 2 i + D X i =1 x 3 i y i ! 2 + 16 k D X i =1 x 2 i D X i =1 y 6 i + D X i =1 x i y 3 i ! 2 . Proof 2 The pr oof basically follows fr om that of Lemma 1. 5 Compared with V ar ˆ d (4) in Lemma 1, the difference would be ∆ 4 V ar ˆ d (4) − V ar ˆ d (4) ,a = ∆ 4 = − 48 k D X i =1 x 5 i D X i =1 y 3 i + D X i =1 x 2 i y i D X i =1 x 3 i y 2 i ! − 48 k D X i =1 x 3 i D X i =1 y 5 i + D X i =1 x i y 2 i D X i =1 x 2 i y 3 i ! + 32 k D X i =1 x 4 i D X i =1 y 4 i + D X i =1 x i y i D X i =1 x 3 i y 3 i ! , (1) which can be either n egati ve or po siti ve. For example, wh en all x i ’ s are n egati ve and all y i ’ s are p ositi ve, then ∆ 4 ≥ 0 , i.e., the alternative projections strategy results in smaller variance and hence it shou ld be adopted. W e can show in Lem ma 3 th at when the data are non-negative (which is more likely the r eality), the difference in (1) will never exceed zero , suggesting th at th e basic strate gy would be preferable, which is also operationally simpler (althou gh more sophisticated in the analysis). Lemma 3 I f all entries of x and y are non-negative, th en V ar ˆ d (4) − V ar ˆ d (4) ,a = ∆ 4 ≤ 0 . (2) Proof 3 See App endix B. . Thus, th e m ain advantage of the alter nativ e p rojection strategy is th at it simplifies the analysis, especially tru e when p > 4 . Also, analyzing the alterna ti ve projection strategy may provide an estimate for the basic pr ojection strategy . For example, th e variance of ˆ d (4) ,a is an upp er boun d of the v ariance of ˆ d (4) in non-negative d ata. In the next subsection, we show that the alternative strategy make the analysis fea- sible when we take advantage o f the marginal infor mation. 2.3 Impr oving the Estimates Using Margins Since we assume that a linear scan o f the data is feasible and in fact the estimators in both strategies already take advantage of the marginal l 4 norms, P D i =1 x 4 i and P D i =1 y 4 i , we might as well compute other marginal norms a nd try to take adv antag e of them in a systematic manner . Lemma 4 dem onstrates such a method for improving estimates using margins. For simplicity , we assume in Lem ma 4 tha t we ado pt the alternative projection strategy , in order to carry out the (asymptotic) analysis of the variance. 6 Lemma 4 S uppose we use the alternative pr ojection strate gy described in Sectio n 2.2 to generate samples u 1 ,j , u 2 ,j , u 3 ,j , v 1 ,j , v 2 ,j , and v 3 ,j . W e estimate d (4) by ˆ d (4) ,a,mle = D X i =1 x 4 i + D X i =1 y 4 i + 6 ˆ a 2 , 2 − 4 ˆ a 3 , 1 − 4 ˆ a 1 , 3 , wher e ˆ a 2 , 2 , ˆ a 3 , 1 , ˆ a 1 , 3 , ar e r espectively , the solutions to the following thr ee cubic equa- tions: a 3 2 , 2 − a 2 2 , 2 k u T 2 v 2 − 1 k D X i =1 x 4 i D X i =1 y 4 i u T 2 v 2 + a 2 , 2 − D X i =1 x 4 i D X i =1 y 4 i ! + a 2 , 2 k D X i =1 x 4 i k v 2 k 2 + D X i =1 y 4 i k u 2 k 2 ! = 0 . a 3 3 , 1 − a 2 3 , 1 k u T 3 v 1 − 1 k D X i =1 x 6 i D X i =1 y 2 i u T 3 v 1 + a 3 , 1 − D X i =1 x 6 i D X i =1 y 2 i ! + a 3 , 1 k D X i =1 x 6 i k v 1 k 2 + D X i =1 y 2 i k u 3 k 2 ! = 0 . a 3 1 , 3 − a 2 1 , 3 k u T 1 v 3 − 1 k D X i =1 x 2 i D X i =1 y 6 i u T 1 v 3 + a 1 , 3 − D X i =1 x 2 i D X i =1 y 6 i ! + a 1 , 3 k D X i =1 x 2 i k v 3 k 2 + D X i =1 y 6 i k u 1 k 2 ! = 0 . Asymptotically (as k → ∞ ), the varian ce would be V ar ˆ d (4) ,a,mle =36 V ar (ˆ a 2 , 2 ) + 1 6 V ar ( ˆ a 2 , 2 ) + 1 6 V ar ( ˆ a 2 , 2 ) = 36 k P D i =1 x 4 i P D i =1 y 4 i − P D i =1 x 2 i y 2 i 2 2 P D i =1 x 4 i P D i =1 y 4 i + P D i =1 x 2 i y 2 i 2 + 16 k P D i =1 x 6 i P D i =1 y 2 i − P D i =1 x 3 i y i 2 2 P D i =1 x 6 i P D i =1 y 2 i + P D i =1 x 3 i y i 2 + 16 k P D i =1 x 2 i P D i =1 y 6 i − P D i =1 x i y 3 i 2 2 P D i =1 x 2 i P D i =1 y 6 i + P D i =1 x i y 3 i 2 + O 1 k 2 7 Proof 4 [16, 1 7] pr oposed taking adva ntage of the ma r ginal l 2 norms to impr ove the estimates of l 2 distances and in ner pr o ducts. Beca use we a ssume the alterna tive p r o - jection strate gy , we can an alyze ˆ a 2 , 2 , ˆ a 3 , 1 , a nd ˆ a 1 , 3 , in depend ently and then co mbine the r esults; and hence we skip the detailed pr o of. Of co urse, in pra ctice, we pr obably still p refer the b asic pro jection strategy , i.e., only on e pro jection m atrix in stead of three. In this case, we still solve three cubic equations, b ut the precise analysis of the variance beco mes much more dif ficult. When the d ata are non-negativ e, we believe that V ar ˆ d (4) ,a,mle will also b e the upper bo und of the estima tion variance using the basic p rojection strategy , which ca n be easily ver- ified by empirical results (not included in the current report). Solving cubic equ ations is easy , as there a re closed-fo rm solu tions. W e ca n also solve the equ ations by iterati ve meth ods. In fact, it is com mon practice to do on ly a one-step iteratio n (starting with the solution without using m argins), called “o ne-step Newton-Rhapson” in statistics. 3 Normal Random Projection s f or P=6 For highe r p (whe re p is even), we can fo llow basically th e same proced ure as for p = 4 . T o illustrate this, we work out an example for p = 6 . W e only demo nstrate the basic projection strategy . The l 6 distance ca n be decom posed into 2 m arginal norms and 5 inner p roducts at various ord ers: d (6) = D X i =1 x 6 i + D X i =1 y 6 i − 20 D X i =1 x 3 i y 3 i + 15 D X i =1 x 2 i y 4 i + 15 D X i =1 x 4 i y 2 i − 6 D X i =1 x 5 i y i − 6 D X i =1 x i y 5 i Generate one random projec tion matrix R ∈ R D × k , and u 1 ,j = D X i =1 x i r ij , u 2 ,j = D X i =1 x 2 i r ij , u 3 ,j = D X i =1 x 3 i r ij , u 4 ,j = D X i =1 x 4 i r ij , u 5 ,j = D X i =1 x 5 i r ij , v 1 ,j = D X i =1 y i r ij , v 2 ,j = D X i =1 y 2 i r ij , v 3 ,j = D X i =1 y 3 i r ij , v 4 ,j = D X i =1 y 4 i r ij , v 5 ,j = D X i =1 y 5 i r ij . 8 Lemma 5 provide the v ariance of the following unbiased estimator of d (6) : ˆ d (6) = D X i =1 x 6 i + D X i =1 y 6 i + 1 k “ − 20 u T 3 v 3 + 15 u T 4 v 2 + 15 u T 2 v 4 − 6 u T 5 v 3 − 6 u T 1 v 5 ” = D X i =1 x 6 i + D X i =1 y 6 i + 1 k k X j =1 − 20 u 3 ,j v 3 ,j + 15 u 2 ,j v 4 ,j + 15 u 4 ,j v 2 ,j − 6 u 1 ,j v 5 ,j − 6 u 5 ,j v 1 ,j . Lemma 5 V ar “ ˆ d (6) ” = 400 k 0 @ D X i =1 x 6 i D X i =1 y 6 i + D X i =1 x 3 i y 3 i ! 2 1 A + 225 k 0 @ D X i =1 x 4 i D X i =1 y 8 i + D X i =1 x 2 i y 4 i ! 2 1 A + 225 k 0 @ D X i =1 x 8 i D X i =1 y 4 i + D X i =1 x 4 i y 2 i ! 2 1 A + 36 k 0 @ D X i =1 x 2 i D X i =1 y 10 i + D X i =1 x i y 5 i ! 2 1 A + 36 k 0 @ D X i =1 x 10 i D X i =1 y 2 i + D X i =1 x 5 i y i ! 2 1 A + ∆ 6 wher e ∆ 6 = − 600 k D X i =1 x 5 i D X i =1 y 7 i + D X i =1 x 3 i y 4 i D X i =1 x 2 i y 3 i ! − 600 k D X i =1 x 7 i D X i =1 y 5 i + D X i =1 x 3 i y 2 i D X i =1 x 4 i y 3 i ! + 240 k D X i =1 x 4 i D X i =1 y 8 i + D X i =1 x 3 i y 5 i D X i =1 x i y 3 i ! + 240 k D X i =1 x 8 i D X i =1 y 4 i + D X i =1 x 3 i y i D X i =1 x 5 i y 3 i ! + 450 k D X i =1 x 6 i D X i =1 y 6 i + D X i =1 x 2 i y 2 i D X i =1 x 4 i y 4 i ! − 180 k D X i =1 x 3 i D X i =1 y 9 i + D X i =1 x 2 i y 5 i D X i =1 x i y 4 i ! − 180 k D X i =1 x 7 i D X i =1 y 5 i + D X i =1 x 2 i y i D X i =1 x 5 i y 4 i ! − 180 k D X i =1 x 5 i D X i =1 y 7 i + D X i =1 x 4 i y 5 i D X i =1 x i y 2 i ! − 180 k D X i =1 x 9 i D X i =1 y 3 i + D X i =1 x 4 i y i D X i =1 x 5 i y 2 i ! + 72 k D X i =1 x 6 i D X i =1 y 6 i + D X i =1 x i y i D X i =1 x 5 i y 5 i ! . Proof 5 See App endix C. . When all entries of x and y are non -negative, we believe it is tru e that ∆ 6 ≤ 0 , but we did not proceed with the proof. Of co urse, it is again a goo d idea to take advantage of th e margina l norms, but we skip the analysis. 4 Sub-Gaussian Random Pr ojections It is well-known that it is not necessary to sample r ij ∼ N (0 , 1) . In fact, to have an unbiased estimator, it suffices to sample r ij from any distribution with zero mean (and 9 unit variance). For good higher-order beha vior s, it is often a goo d idea to sample from a sub -Gaussian distribution, of which a zero-mean no rmal distribution is a sp ecial case. The th eory of sub -Gaussian d istributions was developed in the 195 0’ s. See [6] and ref erences therein. A r andom variable x is sub-Ga ussian if there exists a constant g > 0 such that for all t ∈ R : E (exp( xt )) ≤ exp g 2 t 2 2 . In th is section, we sam ple r ij from a sub-Gau ssian distribution with the following restrictions: E ( r ij ) = 0 , E ( r ij ) = 1 , E ( r 4 ij ) = s, and we denote r ij ∼ S ubG ( s ) . It can be sho wn that we must restrict s ≥ 1 . One examp le would be the r ij ∼ U nif or m ( − √ 3 , √ 3) , fo r which s = 9 5 . Al- though the uniform distribution is simpler than normal, it is now well-known tha t we should sample from the following three- point sub-Gaussian distrib utions[1]. r ij = √ s × 1 with prob. 1 2 s 0 with prob. 1 − 1 s − 1 with prob. 1 2 s , s ≥ 1 In our analysis, we do not have to specify the exact d istribution of r ij and we can simply express the esti mation variance as a function of s . Here, we consider the basic projections st rategy , by generating one rand om pro jec- tion matrix R ∈ R n × D with i.i.d. entries r ij ∼ S ubG ( s ) , and u 1 ,j = D X i =1 x i r ij , u 2 ,j = D X i =1 x 2 i r ij , u 3 ,j = D X i =1 x 3 i r ij , v 1 ,j = D X i =1 y i r ij , v 2 ,j = D X i =1 y 2 i r ij , v 3 ,j = D X i =1 y 3 i r ij . W e again hav e a simple unbiased estimator of d (4) ˆ d (4) ,s = D X i =1 x 4 i + D X i =1 y 4 i + 1 k 6 u T 2 v 2 − 4 u T 3 v 1 − 4 u T 1 v 3 10 Lemma 6 V ar “ ˆ d (4) ,s ” = 36 k 0 @ D X i =1 x 4 i D X i =1 y 4 i + D X i =1 x 2 i y 2 i ! 2 + ( s − 3) D X i =1 x 4 i y 4 i 1 A + 16 k 0 @ D X i =1 x 6 i D X i =1 y 2 i + D X i =1 x 3 i y i ! 2 + ( s − 3) D X i =1 x 6 i y 2 i 1 A + 16 k 0 @ D X i =1 x 2 i D X i =1 y 6 i + D X i =1 x i y 3 i ! 2 + ( s − 3) D X i =1 x 2 i y 6 i 1 A − 48 k D X i =1 x 5 i D X i =1 y 3 i + D X i =1 x 2 i y i D X i =1 x 3 i y 2 i + ( s − 3) D X i =1 x 5 i y 3 i ! − 48 k D X i =1 x 3 i D X i =1 y 5 i + D X i =1 x i y 2 i D X i =1 x 2 i y 3 i + ( s − 3) D X i =1 x 3 i y 5 i ! + 32 k D X i =1 x 4 i D X i =1 y 4 i + D X i =1 x i y i D X i =1 x 3 i y 3 i + ( s − 3) D X i =1 x 4 i y 4 i ! . Proof 6 See App endix D. . 5 Conclusions It has bee n an active re search to pic on approx imating l p distances in m assi ve high - dimensiona l data, for example, a giant “d ata ma trix” A ∈ R n × D . While a linear scan on A may be feasible, it can be pro hibitive (or e ven infeasible) to compute and s tore all pairwise l p distances. Using ran dom p rojections can reduce the cost of com puting all pairwise distances fro m O ( n 2 D ) to ( n 2 k ) wh ere k ≪ D . Th e data size is red uced from O ( nD ) to O ( nk ) and hence it may be possible to store the reduced data in memory . While the well-known m ethod of stable random p r oje ctions is applicable to 0 < p ≤ 2 , not directly to p > 2 , we pr opose a practical ap proach f or app roximatin g the l p distances in massi ve data for p = 2 , 4 , 6 , ... , based o n the simple fact that, when p is e ven, the l p distances can be deco mposed into 2 marginal no rms and p − 1 “inner produ cts” of various orders. T wo p rojection strategies are prop osed to appro ximate these “inner pro ducts” as well as the l p distances; and we sh ow th e basic pro jection strategy ( which is simpler ) is always preferable over the altern ati ve strate gy in terms of th e ac curacy , at least f or p = 4 in non -negative da ta. W e also propo se utilizing the marginal norms (wh ich can be easily com puted exactly) to furth er improve the esti- mates. Finally , we analyze the performanc e using sub-Gaussian random projection s. 11 A Proof of Lemma 1 ˆ d (4) = D X i =1 x 4 i + D X i =1 y 4 i + 1 k “ 6 u T 2 v 2 − 4 u T 3 v 1 − 4 u T 1 v 3 ” = D X i =1 x 4 i + D X i =1 y 4 i + 1 k k X j =1 6 u 2 ,j v 2 ,j − 4 u 3 ,j v 1 ,j − 4 u 1 ,j v 3 ,j ! u 2 ,j v 2 ,j = D X i =1 x 2 i r ij ! D X i =1 y 2 i r ij ! = D X i =1 x 2 i y 2 i r 2 ij + X i 6 = i ′ x 2 i r ij y 2 i ′ r i ′ j Thus E ( u 2 ,j v 2 ,j ) = D X i =1 x 2 i y 2 i . Similarly , we can show E ( u 3 ,j v 1 ,j ) = D X i =1 x 3 i y i , E ( u 1 ,j v 3 ,j ) = D X i =1 x i y 3 i . Therefore, E “ ˆ d (4) ” = D X i =1 x 4 i + D X i =1 y 4 i + 1 k 0 @ k X j =1 E (6 u 2 ,j v 2 ,j − 4 u 3 ,j v 1 ,j − 4 u 1 ,j v 3 ,j ) 1 A = D X i =1 x 4 i + D X i =1 y 4 i + 1 k 0 @ k X j =1 6 D X i =1 x 2 i y 2 i − 4 D X i =1 x 3 i y i − 4 D X i =1 x i y 3 i ! 1 A = d (4) . T o deriv e the variance, we need to an alyze the expectation (6 u 2 ,j v 2 ,j − 4 u 3 ,j v 1 ,j − 4 u 1 ,j v 3 ,j ) 2 =36 u 2 2 ,j v 2 2 ,j + 16 u 2 3 ,j v 2 1 ,j + 16 u 2 1 ,j v 2 3 ,j − 48 u 2 ,j u 3 ,j v 2 ,j v 1 ,j − 48 u 2 ,j u 1 ,j v 2 ,j v 3 ,j + 32 u 3 ,j u 1 ,j v 1 ,j v 3 ,j . T o simplify the expression, we will skip the terms that will be zeros when taking exp ectations. E ` u 2 2 ,j v 2 2 ,j ´ = E 0 @ 0 @ D X i =1 x 2 i y 2 i r 2 ij + X i 6 = i ′ x 2 i r ij y 2 i ′ r i ′ j 1 A 2 1 A = E 0 @ D X i =1 x 4 i y 4 i r 4 ij + 2 X i 6 = i ′ x 2 i y 2 i r 2 ij x 2 i ′ y 2 i ′ r 2 i ′ j + X i 6 = i ′ x 4 i r 2 ij y 4 i ′ r 2 i ′ j 1 A = D X i =1 3 x 4 i y 4 i + 2 X i 6 = i ′ x 2 i y 2 i x 2 i ′ y 2 i ′ + X i 6 = i ′ x 4 i y 4 i ′ = D X i =1 x 4 i D X i =1 y 4 i + 2 D X i =1 x 2 i y 2 i ! 2 . 12 Similarly E ` u 2 3 ,j v 2 1 ,j ´ = D X i =1 x 6 i D X i =1 y 2 i + 2 D X i =1 x 3 i y i ! 2 , E ` u 2 3 ,j v 2 1 ,j ´ = D X i =1 x 2 i D X i =1 y 6 i + 2 D X i =1 x i y 3 i ! 2 . E ( u 2 ,j u 3 ,j v 2 ,j v 1 ,j ) = E D X i =1 x 2 i r ij D X i =1 x 3 i r ij D X i =1 y 2 i r ij D X i =1 y i r ij ! = E 0 @ 0 @ D X i =1 x 5 i r 2 ij + X i 6 = i ′ x 2 i r ij x 3 i ′ r i ′ j 1 A 0 @ D X i =1 y 3 i r 2 ij + X i 6 = i ′ y 2 i r ij y i ′ r i ′ j 1 A 1 A = E 0 @ D X i =1 x 5 i y 3 i r 4 ij + X i 6 = i ′ x 5 i r 2 ij y 3 i ′ r 2 i ′ j 1 A + E 0 @ X i 6 = i ′ x 2 i y 2 i r 2 ij x 3 i ′ y i ′ r 2 i ′ j + X i 6 = i ′ x 2 i y i r 2 ij x 3 i ′ y 2 i ′ r 2 i ′ j 1 A =3 D X i =1 x 5 i y 3 i + X i 6 = i ′ x 5 i y 3 i ′ + X i 6 = i ′ x 2 i y 2 i x 3 i ′ y i ′ + X i 6 = i ′ x 2 i y i x 3 i ′ y 2 i ′ = D X i =1 x 5 i D X i =1 y 3 i + D X i =1 x 2 i y 2 i D X i =1 x 3 i y i + D X i =1 x 2 i y i D X i =1 x 3 i y 2 i . Similarly E ( u 2 ,j u 1 ,j v 2 ,j v 3 ,j ) = D X i =1 x 3 i D X i =1 y 5 i + D X i =1 x i y 3 i D X i =1 x 2 i y 2 i + D X i =1 x i y 2 i D X i =1 x 2 i y 3 i , E ( u 3 ,j u 1 ,j v 1 ,j v 3 ,j ) = D X i =1 x 4 i D X i =1 y 4 i + D X i =1 x i y 3 i D X i =1 x 3 i y i + D X i =1 x i y i D X i =1 x 3 i y 3 i . 13 Therefore, V ar (6 u 2 ,j v 2 ,j − 4 u 3 ,j v 1 ,j − 4 u 1 ,j v 3 ,j ) =36 D X i =1 x 4 i D X i =1 y 4 i + 72 D X i =1 x 2 i y 2 i ! 2 +16 D X i =1 x 6 i D X i =1 y 2 i + 32 D X i =1 x 3 i y i ! 2 +16 D X i =1 x 2 i D X i =1 y 6 i + 32 D X i =1 x i y 3 i ! 2 − 48 D X i =1 x 5 i D X i =1 y 3 i + D X i =1 x 2 i y 2 i D X i =1 x 3 i y i + D X i =1 x 2 i y i D X i =1 x 3 i y 2 i ! − 48 D X i =1 x 3 i D X i =1 y 5 i + D X i =1 x i y 3 i D X i =1 x 2 i y 2 i + D X i =1 x i y 2 i D X i =1 x 2 i y 3 i ! +32 D X i =1 x 4 i D X i =1 y 4 i + D X i =1 x i y 3 i D X i =1 x 3 i y i + D X i =1 x i y i D X i =1 x 3 i y 3 i ! − 6 D X i =1 x 2 i y 2 i − 4 D X i =1 x 3 i y i − 4 D X i =1 x i y 3 i ! 2 from which it follows that V ar “ ˆ d (4) ” = 36 k 0 @ D X i =1 x 4 i D X i =1 y 4 i + D X i =1 x 2 i y 2 i ! 2 1 A + 16 k 0 @ D X i =1 x 6 i D X i =1 y 2 i + D X i =1 x 3 i y i ! 2 1 A + 16 k 0 @ D X i =1 x 2 i D X i =1 y 6 i + D X i =1 x i y 3 i ! 2 1 A − 48 k D X i =1 x 5 i D X i =1 y 3 i + D X i =1 x 2 i y i D X i =1 x 3 i y 2 i ! − 48 k D X i =1 x 3 i D X i =1 y 5 i + D X i =1 x i y 2 i D X i =1 x 2 i y 3 i ! + 32 k D X i =1 x 4 i D X i =1 y 4 i + D X i =1 x i y i D X i =1 x 3 i y 3 i ! 14 B Pr oof of Lemm a 3 It suf fices to show that D X i =1 x 5 i D X i =1 y 3 i + D X i =1 x 2 i y i D X i =1 x 3 i y 2 i ! + D X i =1 x 3 i D X i =1 y 5 i + D X i =1 x i y 2 i D X i =1 x 2 i y 3 i ! − D X i =1 x 4 i D X i =1 y 4 i + D X i =1 x i y i D X i =1 x 3 i y 3 i ! ≥ 0 . W e need to use the arithmetic-geometric mean inequality: n X i =1 w i ≥ n n Y i =1 w i ! 1 /n , pro vided w i ≥ 0 . Because x 5 i y 3 j + x 3 i y 5 j ≥ 2 q x 8 i y 8 j = 2 x 4 i y 4 j , D X i =1 x 5 i D X i =1 y 3 i + D X i =1 x 3 i D X i =1 y 5 i − D X i =1 x 4 i D X i =1 y 4 i ≥ 0 . Thus it only remains to sho w that D X i =1 x 2 i y i D X i =1 x 3 i y 2 i + D X i =1 x i y 2 i D X i =1 x 2 i y 3 i − D X i =1 x i y i D X i =1 x 3 i y 3 i ≥ 0 , for which it suf fices to show that 2 D X i =1 x 3 / 2 i y 3 / 2 i D X i =1 x 5 / 2 i y 5 / 2 i − D X i =1 x i y i D X i =1 x 3 i y 3 i ≥ 0 , or equi valen tly , to show that, if z i ≥ 0 ∀ i ∈ [1 , D ] , then f ( z i , i = 1 , 2 , ..., D ) = 2 D X i =1 z 3 i D X i =1 z 5 i − D X i =1 z 2 i D X i =1 z 6 i ≥ 0 . (3) Obviou sly , (3) holds for D = 1 and D = 2 . T o see that it i s true for D > 2 , we notice that only at ( z 1 = 0 , z 2 = 0 , ..., z D = 0) , the first deriv ativ e of f ( z i ) is zero. W e can also check that f ( z i = 1 , i = 1 , 2 , ..., D ) > 0 . S ince f ( z i ) is a continuous function, we know f ( z i ) ≥ 0 must hold if z i > 0 for all i . There is no need to worry abou t the boundary case that z j = 0 and z i ≥ 0 because it is reduced to a small problem with D ′ = D − 1 and we hav e already shown the base case when D = 1 and D = 2 . T hus, we complete the proof. 15 C Proof of Lemma 5 d (6) = D X i =1 x 6 i + D X i =1 y 6 i − 20 D X i =1 x 3 i y 3 i + 15 D X i =1 x 2 i y 4 i + 15 D X i =1 x 4 i y 2 i − 6 D X i =1 x 5 i y i − 6 D X i =1 x i y 5 i ˆ d (6) = D X i =1 x 6 i + D X i =1 y 6 i + 1 k “ − 20 u T 3 v 3 + 15 u T 4 v 2 + 15 u T 2 v 4 − 6 u T 5 v 3 − 6 u T 1 v 5 ” = D X i =1 x 6 i + D X i =1 y 6 i + 1 k k X j =1 − 20 u 3 ,j v 3 ,j + 15 u 2 ,j v 4 ,j + 15 u 4 ,j v 2 ,j − 6 u 1 ,j v 5 ,j − 6 u 5 ,j v 1 ,j . T o deriv e the v ariance, we need to analyze the ex pectation of ( − 20 u 3 ,j v 3 ,j + 15 u 2 ,j v 4 ,j + 15 u 4 ,j v 2 ,j − 6 u 1 ,j v 5 ,j − 6 u 5 ,j v 1 ,j ) 2 = 400 u 2 3 ,j v 2 3 ,j + 225 u 2 2 ,j v 2 4 ,j + 225 u 2 4 ,j v 2 2 ,j + 36 u 2 1 ,j v 2 5 ,j + 36 u 2 5 ,j v 2 1 ,j − 600 u 3 ,j v 3 ,j u 2 ,j v 4 ,j − 600 u 3 ,j v 3 ,j u 4 ,j v 2 ,j + 240 u 3 ,j v 3 ,j u 1 ,j v 5 ,j + 240 u 3 ,j v 3 ,j u 5 ,j v 1 ,j + 450 u 2 ,j v 4 ,j u 4 ,j v 2 ,j − 180 u 2 ,j v 4 ,j u 1 ,j v 5 ,j − 180 u 2 ,j v 4 ,j u 5 ,j v 1 ,j − 180 u 4 ,j v 2 ,j u 1 ,j v 5 ,j − 180 u 4 ,j v 2 ,j u 5 ,j v 1 ,j + 72 u 1 ,j v 5 ,j u 5 ,j v 1 ,j Skipping the detail, we can sho w that E ` u 2 3 ,j v 2 3 ,j ´ = D X i =1 x 6 i D X i =1 y 6 i + 2 D X i =1 x 3 i y 3 i ! 2 , E ` u 2 2 ,j v 2 4 ,j ´ = D X i =1 x 4 i D X i =1 y 8 i + 2 D X i =1 x 2 i y 4 i ! 2 , E ` u 2 4 ,j v 2 2 ,j ´ = D X i =1 x 8 i D X i =1 y 4 i + 2 D X i =1 x 4 i y 2 i ! 2 , E ` u 2 1 ,j v 2 5 ,j ´ = D X i =1 x 2 i D X i =1 y 10 i + 2 D X i =1 x i y 5 i ! 2 , E ` u 2 5 ,j v 2 1 ,j ´ = D X i =1 x 10 i D X i =1 y 2 i + 2 D X i =1 x 5 i y i ! 2 . 16 And E ( u 3 ,j u 2 ,j v 3 ,j v 4 ,j ) = D X i =1 x 5 i D X i =1 y 7 i + D X i =1 x 3 i y 3 i D X i =1 x 2 i y 4 i + D X i =1 x 3 i y 4 i D X i =1 x 2 i y 3 i , E ( u 3 ,j u 4 ,j v 3 ,j v 2 ,j ) = D X i =1 x 7 i D X i =1 y 5 i + D X i =1 x 3 i y 3 i D X i =1 x 4 i y 2 i + D X i =1 x 3 i y 2 i D X i =1 x 4 i y 3 i , E ( u 3 ,j u 1 ,j v 3 ,j v 5 ,j ) = D X i =1 x 4 i D X i =1 y 8 i + D X i =1 x 3 i y 3 i D X i =1 x i y 5 i + D X i =1 x 3 i y 5 i D X i =1 x i y 3 i , E ( u 3 ,j u 5 ,j v 3 ,j v 1 ,j ) = D X i =1 x 8 i D X i =1 y 4 i + D X i =1 x 3 i y 3 i D X i =1 x 5 i y 1 i + D X i =1 x 3 i y i D X i =1 x 5 i y 3 i , E ( u 2 ,j u 4 ,j v 4 ,j v 2 ,j ) = D X i =1 x 6 i D X i =1 y 6 i + D X i =1 x 2 i y 4 i D X i =1 x 4 i y 2 i + D X i =1 x 2 i y 2 i D X i =1 x 4 i y 4 i , E ( u 2 ,j u 1 ,j v 4 ,j v 5 ,j ) = D X i =1 x 3 i D X i =1 y 9 i + D X i =1 x 2 i y 4 i D X i =1 x i y 5 i + D X i =1 x 2 i y 5 i D X i =1 x i y 4 i , E ( u 2 ,j u 5 ,j v 4 ,j v 1 ,j ) = D X i =1 x 7 i D X i =1 y 5 i + D X i =1 x 2 i y 4 i D X i =1 x 5 i y i + D X i =1 x 2 i y i D X i =1 x 5 i y 4 i , E ( u 4 ,j u 1 ,j v 2 ,j v 5 ,j ) = D X i =1 x 5 i D X i =1 y 7 i + D X i =1 x 4 i y 2 i D X i =1 x i y 5 i + D X i =1 x 4 i y 5 i D X i =1 x i y 2 i , E ( u 4 ,j u 5 ,j v 2 ,j v 1 ,j ) = D X i =1 x 9 i D X i =1 y 3 i + D X i =1 x 4 i y 2 i D X i =1 x 5 i y i + D X i =1 x 4 i y i D X i =1 x 5 i y 2 i , E ( u 1 ,j u 5 ,j v 5 ,j v 1 ,j ) = D X i =1 x 6 i D X i =1 y 6 i + D X i =1 x i y 5 i D X i =1 x 5 i y i + D X i =1 x i y i D X i =1 x 5 i y 5 i , 17 Combining the results, we obtain V ar “ ˆ d (6) ” = 400 k 0 @ D X i =1 x 6 i D X i =1 y 6 i + D X i =1 x 3 i y 3 i ! 2 1 A + 225 k 0 @ D X i =1 x 4 i D X i =1 y 8 i + D X i =1 x 2 i y 4 i ! 2 1 A + 225 k 0 @ D X i =1 x 8 i D X i =1 y 4 i + D X i =1 x 4 i y 2 i ! 2 1 A + 36 k 0 @ D X i =1 x 2 i D X i =1 y 10 i + D X i =1 x i y 5 i ! 2 1 A + 36 k 0 @ D X i =1 x 10 i D X i =1 y 2 i + D X i =1 x 5 i y i ! 2 1 A + ∆ 6 where k ∆ 6 / 6 = − 100 D X i =1 x 5 i D X i =1 y 7 i + D X i =1 x 3 i y 4 i D X i =1 x 2 i y 3 i ! − 100 D X i =1 x 7 i D X i =1 y 5 i + D X i =1 x 3 i y 2 i D X i =1 x 4 i y 3 i ! + 40 D X i =1 x 4 i D X i =1 y 8 i + D X i =1 x 3 i y 5 i D X i =1 x i y 3 i ! + 40 D X i =1 x 8 i D X i =1 y 4 i + D X i =1 x 3 i y i D X i =1 x 5 i y 3 i ! + 75 D X i =1 x 6 i D X i =1 y 6 i + D X i =1 x 2 i y 2 i D X i =1 x 4 i y 4 i ! − 30 D X i =1 x 3 i D X i =1 y 9 i + D X i =1 x 2 i y 5 i D X i =1 x i y 4 i ! − 30 D X i =1 x 7 i D X i =1 y 5 i + D X i =1 x 2 i y i D X i =1 x 5 i y 4 i ! − 30 D X i =1 x 5 i D X i =1 y 7 i + D X i =1 x 4 i y 5 i D X i =1 x i y 2 i ! − 30 D X i =1 x 9 i D X i =1 y 3 i + D X i =1 x 4 i y i D X i =1 x 5 i y 2 i ! + 12 D X i =1 x 6 i D X i =1 y 6 i + D X i =1 x i y i D X i =1 x 5 i y 5 i ! . 18 D Proof of Lemma 6 ˆ d (4) ,s = D X i =1 x 4 i + D X i =1 y 4 i + 1 k “ 6 u T 2 v 2 − 4 u T 3 v 1 − 4 u T 1 v 3 ” = D X i =1 x 4 i + D X i =1 y 4 i + 1 k k X j =1 6 u 2 ,j v 2 ,j − 4 u 3 ,j v 1 ,j − 4 u 1 ,j v 3 ,j ! E ` u 2 2 ,j v 2 2 ,j ´ = E 0 @ 0 @ D X i =1 x 2 i y 2 i r 2 ij + X i 6 = i ′ x 2 i r ij y 2 i ′ r i ′ j 1 A 2 1 A = E 0 @ D X i =1 x 4 i y 4 i r 4 ij + 2 X i 6 = i ′ x 2 i y 2 i r 2 ij x 2 i ′ y 2 i ′ r 2 i ′ j + X i 6 = i ′ x 4 i r 2 ij y 4 i ′ r 2 i ′ j 1 A = D X i =1 s x 4 i y 4 i + 2 X i 6 = i ′ x 2 i y 2 i x 2 i ′ y 2 i ′ + X i 6 = i ′ x 4 i y 4 i ′ = D X i =1 x 4 i D X i =1 y 4 i + 2 D X i =1 x 2 i y 2 i ! 2 + ( s − 3) D X i =1 x 4 i y 4 i . Similarly , E ` u 2 3 ,j v 2 1 ,j ´ = D X i =1 x 6 i D X i =1 y 2 i + 2 D X i =1 x 3 i y i ! 2 + ( s − 3) D X i =1 x 6 i y 2 i , E ` u 2 3 ,j v 2 1 ,j ´ = D X i =1 x 2 i D X i =1 y 6 i + 2 D X i =1 x i y 3 i ! 2 + ( s − 3) D X i =1 x 2 i y 6 i . E ( u 2 ,j u 3 ,j v 2 ,j v 1 ,j ) = D X i =1 x 5 i D X i =1 y 3 i + D X i =1 x 2 i y 2 i D X i =1 x 3 i y i + D X i =1 x 2 i y i D X i =1 x 3 i y 2 i + ( s − 3) D X i =1 x 5 i y 3 i , E ( u 2 ,j u 1 ,j v 2 ,j v 3 ,j ) = D X i =1 x 3 i D X i =1 y 5 i + D X i =1 x i y 3 i D X i =1 x 2 i y 2 i + D X i =1 x i y 2 i D X i =1 x 2 i y 3 i + ( s − 3) D X i =1 x 3 i y 5 i , E ( u 3 ,j u 1 ,j v 1 ,j v 3 ,j ) = D X i =1 x 4 i D X i =1 y 4 i + D X i =1 x i y 3 i D X i =1 x 3 i y i + D X i =1 x i y i D X i =1 x 3 i y 3 i + ( s − 3) D X i =1 x 4 i y 4 i . 19 Therefore, V ar (6 u 2 ,j v 2 ,j − 4 u 3 ,j v 1 ,j − 4 u 1 ,j v 3 ,j ) =36 D X i =1 x 4 i D X i =1 y 4 i + 72 D X i =1 x 2 i y 2 i ! 2 + 36( s − 3) D X i =1 x 4 i y 4 i +16 D X i =1 x 6 i D X i =1 y 2 i + 32 D X i =1 x 3 i y i ! 2 + 36( s − 3) D X i =1 x 6 i y 2 i +16 D X i =1 x 2 i D X i =1 y 6 i + 32 D X i =1 x i y 3 i ! 2 + 36( s − 3) D X i =1 x 2 i y 6 i − 48 D X i =1 x 5 i D X i =1 y 3 i + D X i =1 x 2 i y 2 i D X i =1 x 3 i y i + D X i =1 x 2 i y i D X i =1 x 3 i y 2 i + ( s − 3) D X i =1 x 5 i y 3 i ! − 48 D X i =1 x 3 i D X i =1 y 5 i + D X i =1 x i y 3 i D X i =1 x 2 i y 2 i + D X i =1 x i y 2 i D X i =1 x 2 i y 3 i + ( s − 3) D X i =1 x 3 i y 5 i ! +32 D X i =1 x 4 i D X i =1 y 4 i + D X i =1 x i y 3 i D X i =1 x 3 i y i + D X i =1 x i y i D X i =1 x 3 i y 3 i + ( s − 3) D X i =1 x 4 i y 4 i ! − 6 D X i =1 x 2 i y 2 i − 4 D X i =1 x 3 i y i − 4 D X i =1 x i y 3 i ! 2 from which it follows that V ar “ ˆ d (4) ,s ” = 36 k 0 @ D X i =1 x 4 i D X i =1 y 4 i + D X i =1 x 2 i y 2 i ! 2 + ( s − 3) D X i =1 x 4 i y 4 i 1 A + 16 k 0 @ D X i =1 x 6 i D X i =1 y 2 i + D X i =1 x 3 i y i ! 2 + ( s − 3) D X i =1 x 6 i y 2 i 1 A + 16 k 0 @ D X i =1 x 2 i D X i =1 y 6 i + D X i =1 x i y 3 i ! 2 + ( s − 3) D X i =1 x 2 i y 6 i 1 A − 48 k D X i =1 x 5 i D X i =1 y 3 i + D X i =1 x 2 i y i D X i =1 x 3 i y 2 i + ( s − 3) D X i =1 x 5 i y 3 i ! − 48 k D X i =1 x 3 i D X i =1 y 5 i + D X i =1 x i y 2 i D X i =1 x 2 i y 3 i + ( s − 3) D X i =1 x 3 i y 5 i ! + 32 k D X i =1 x 4 i D X i =1 y 4 i + D X i =1 x i y i D X i =1 x 3 i y 3 i + ( s − 3) D X i =1 x 4 i y 4 i ! Refer ences [1] D. Achlioptas. Database-friendly random projections. In PODS , pages 274–281 , Santa Barbara, CA, 2001. [2] N. Alon, Y . Matias, and M. Szeg edy . The sp ace complexity of approx imating the frequency moments. In STOC , pag es 20–29, P hiladelphia, P A, 1996. 20 [3] B. Babcock, S. Babu , M. Datar , R. Motwani, and J. W idom. Models and issues in data stream systems. In PODS , pages 1–16, Madison, WI, 2002. [4] Z. Bar-Y ossef, T . S. Jayram, R. Kumar , and D. Siv akumar . An information statistics ap- proach to data st ream and communication complexity . In FOC S , pages 209–218 , V ancou- ver , BC, Canada, 2002. [5] L. Bottou, O. Chapelle, D. D eCoste, and J. W eston, ed itors. Lar ge-Sc ale Kern el Machin es . The MIT Press, Cambridge, MA, 2007 . [6] V . V . Buldygin and Y . V . K ozachenk o. Metric Characterization of Random V ariables and Random Pr ocesses . American Mathematical Society , Providence, RI, 2000 . [7] G. Cormode, M. Datar , P . Indyk, and S. Muthukrishnan. Comparing data streams using hamming norms (ho w to zero in). In VLDB , pa ges 335–345, Hong K ong, China, 2002. [8] G. Cormode, M. Datar , P . Indyk, and S. Muthukrishnan. Comparing data streams using hamming norms (how to zero in). IEEE Tr ansactions on K nowledg e and Data Engineering , 15(3):529–5 40, 2003. [9] J. Feigenbaum, S. Kannan, M. Strauss, and M. V iswanathan. An app roximate l 1 -differen ce algorithm for massi ve data streams. In FOCS , pages 501–51 1, New Y ork, 1999. [10] M. R. Henzinger , P . R agha van, and S. Rajagopalan. Computing on Data Streams . American Mathematical Society , Boston, MA, USA, 1999. [11] A. Hyvrinen, J. Karhunen, and E. Oja. Independe nt Compon ent Analysis . John Wile y & Sons, Ne w Y ork, 2001. [12] P . Indyk. Stable distributions, pseudorando m generators, embeddings and data stream com- putation. In FOCS , pages 189–1 97, Redondo Beach, CA, 2000. [13] P . Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. J ournal of ACM , 53(3):30 7–323, 2006. [14] P . Indyk and D. P . W oodruff. Optimal approximations of the frequency moments of data streams. In STOC , p ages 202–208, Balt imore, MD, 2005. [15] P . Li. Estimators and tail bounds for dimension reduction in l α ( 0 < α ≤ 2 ) using stable random projections. In SOD A , pages 10 – 19, 2008. [16] P . Li, T . J. Hastie, and K. W . Church. I mprov ing random projections using marginal infor - mation. In COLT , pag es 635–649, Pit tsbur gh, P A, 2006. [17] P . Li, T . J. Hasti e, and K. W . Church. V ery sparse random projections. In KDD , pages 287–29 6, Philadelphia, P A, 2006. [18] P . Li, T . J. Hastie, and K. W . Church. Nonlinear estimators and tail bounds for dimensional reduction in l 1 using cauchy random projections. Journal of Machine Learning R esear ch , 8:2497–2 532, 2007. [19] S . Mu thukrishnan. Data streams: Algorithms and applications. F oundations and T rends in Theor etical Computer Science , 1:117–23 6, 2 2005. [20] M. E . Saks and X. Sun. Space lo wer bounds for distance approximation in the data stream model. In STOC , pag es 360–369, Montreal, Quebec, Canada, 2002. [21] B. Sch ¨ olkop f and A. J. Smola. Learning wit h K ernels . The MIT Press, C ambridge, MA, 2002. [22] S . V empala. The Random Pro jection Method . American Mathematical S ociety , P roviden ce, RI, 2004. 21 [23] D. P . W oodruf f. Optimal space lower bounds for all frequency moments. In SOD A , pages 167–17 5, New Orlean s, LA, 2004. [24] V . M. Z olotare v . One-dimensional Stable Distributions . A merican Mathematical Society , Providence , RI, 1986. 22 On Approxima ting the l p Distances for p > 2 (When p Is Even) Ping Li Department of Statistical Science Faculty of Computing and Information Sc ience Cornell Unive rsity , Ithaca, NY 14850 November 3, 2021 Abstract Many 1 applications in machine learning and data mining require computing pairwise l p distances in a data matrix A ∈ R n × D . For massiv e high-dimensional data, compu ting all pairwise distances o f A can be infeasib le. In fac t, ev en storing A or all pairwise distances of A in the memory may be also infeasible. For 0 < p ≤ 2 , efficient small space algorithms exist, for example, based on t he method of stable r andom pr ojections , which unfortunately is not directly applicable to p = 3 , 4 , 5 , 6 , ... This paper proposes a simple method for p = 2 , 4 , 6 , ... W e first decompose the l p (where p i s ev en) distances into a sum of 2 marginal norms and p − 1 “inner products” at different orders. Then we apply normal or sub-Gaussian random projections to approximate the resultant “inner products, ” assuming that the marginal norms can be comp uted exactly by a linear scan. W e propose two strategies for applying random projections. T he basic projec- tion strategy requires on ly one projection matrix but it is mo re difficult to analyze , while the alt ernati ve projection strategy requires p − 1 projection matrices but its theoretical analysis is much eas ier . In terms of the accura cy , at least for p = 4 , the basic strategy is always more accurate than the alternativ e strategy if the data are non-ne gativ e, which is common in reality . 1 Introd uction This stud y pr oposes a simple meth od for efficiently comp uting the l p distances in a massi ve data matrix A ∈ R n × D for p > 2 (wh ere p is even), u sing random pr ojections [ ? ]. While many previous work on rando m p rojections f ocused on approximating th e l 2 distances ( and in ner pro ducts), th e m ethod o f symmetric stab le rando m pr ojections [ ? , ? , ? , ? ] is a pplicable to approxim ating the l p distances for all 0 < p ≤ 2 . This work propo ses using random projec tions for p > 2 , a least for some special cases. 1 First draft Dec. 2007. Slight ly revise d June 200 8. 1 Machine learning algor ithms of ten o perate o n the l p distances of A instead o f the original d ata. A straigh tforward application would be searching for the nearest n eigh- bors u sing l p distance. The l p distance is also a basic loss fu nctions for quality mea- sure. The wid ely used “kernel trick, ” ( e.g., for suppor t vector ma chines ( SVM)), is often constructed on top of the l p distances[ ? ]. 2 Here we can tr eat p as a tuning param eter . It is co mmon to take p = 2 ( E uclidian distance), or p = ∞ ( infinity distance ), p = 1 ( Ma nhattan d istance), or p = 0 ( Ha m- ming distance); but in princip le any p values are po ssible. In fact, if there is an ef ficient mechanism to com pute the l p distances, th en it bec omes affordable to tune le arning algorithm s for many v alues of p for the best perfor mance. In mod ern data minin g and learning applicatio ns, the ubiquito us ph enomen on o f “massiv e data” impo ses ch allenges. For example, pr e-compu ting and storing all pair- wise l p distances in memory at the cost O ( n 2 ) can b e infe asible when n > 10 6 (or ev en just 10 5 )[ ? ]. For ultra hig h-dimen sional data, even just sto ring the wh ole data matrix can be inf easible. In the mea nwhile, moder n applications can routinely in volve millions of observations; and developing scalable learning and data mining algorithms has been an active research direction. One common ly used strate gy in cu rrent practice is to compute the distances on the fly [ ? ], in stead of storing all pairwise l p distances. Data reduction algorithms such as sampling or sk etching methods are also popular . While there h av e been extensive studies on appro ximating th e l p distances for 0 < p ≤ 2 , p > 2 can be useful too. For example, because the normal distrib ution is completely determined by its first two m oments ( mean and variance), we can identif y the non - normal comp onents of the d ata by analyzing higher mom ents, in particu lar , the f ourth moments (i.e., kurtosis ). Thus, the fo urth mo ments ar e c ritical, for example, in the field of Indepen dent Componen t Analysis (ICA) [ ? ]. Ther efore, it is viable to use the l p distance for p > 2 whe n lo wer order distances can not efficiently differentiate data. It is unfo rtunate that the family of stable distributions [ ? ] is limited to 0 < p ≤ 2 a nd hence we can not directly using stable distrib utions for approx imating th e l p distances. In the theore tical CS c ommunity , th ere have been many studies on ap proxima ting the l p norms an d distances[ ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? ], some of which also ap plicable to the l p distances (e.g ., compa ring two long vector s). Tho se papers p roved that small space ( ˆ O (1) ) algorithms exist only for 0 < p ≤ 2 . 1.1 The Methodology Giv en a giant d ata matrix A ∈ R n × D , we assume tha t a linear scan of the d ata is feasible, but co mputing all pairwise interaction s is no t, either due to co mputation al budget constraints or memory limits. Also, we only consider even p = 4 , 6, ..., amon g which p = 4 is pro bably the most important. Interestingly , our method is based only on norm al (or normal-like) pr ojections. The observation is tha t, when p is e ven, the l p distance can be decomposed in to marginal 2 It is well-kno wn that the radial basis kernel using the l p distanc e with 0 < p ≤ 2 satisfies the Mercer’ s conditi on. Howe ver , we can still use the l p distanc e with p > 2 as kernels, although in this case it is not guarant eed to find the “most optimal” solution. For ve ry large-sca le learning, we usually will not find the “most optimal ” solution any way . 2 l p norms and “ inner prod ucts” of v arious orde rs. For example, for tw o D -d imensional vectors x and y , when p = 4 , then d ( p ) = D X i =1 | x i − y i | p = D X i =1 x 4 i + D X i =1 y 4 i + 6 D X i =1 x 2 i y 2 i − 4 D X i =1 x 3 i y i − 4 D X i =1 x i y 3 i . Since we ass ume that a linear scan of the data is feasible, we ca n co mpute P D i =1 x 4 i and P D i =1 y 4 i exactly . W e can app roximate the interaction terms P D i =1 x 2 i y 2 i , P D i =1 x 3 i y i , and P D i =1 x i y 3 i using normal (or nor mal-like) rando m projec tions. Th erefore, f or p being e ven, we are able to efficiently appro ximate the l p distances. 1.2 Paper Or ganization Section 2 concer ns u sing normal rand om projections for approxima ting l 4 distances. W e introd uce tw o projection strategies and the concept of utilizing the marginal n orms to imp rove the estimate s. Section 3 extend s th is app roach to a pprox imating l 6 dis- tances. Section 4 an alyzes the effect of replacing no rmal pro jections by sub-Gau ssian projection s. 2 Normal Random Projection s f or p = 4 The goal is to ef ficiently compu te all p airwise l p ( p = 4 ) distances in A ∈ R n × D . It suffices to c onsider any two rows o f A , say x and y , where x , y ∈ R D . W e need to estimate the l p distance between x and y d ( p ) = D X i =1 | x i − y i | p . which, when p = 4 , beco mes d (4) = D X i =1 | x i − y i | 4 = D X i =1 x 4 i + D X i =1 y 4 i + 6 D X i =1 x 2 i y 2 i − 4 D X i =1 x 3 i y i − 4 D X i =1 x i y 3 i . In one pa ss, we can compu te P D i =1 x 4 i and P D i =1 y 4 i easily , but comp uting the inter- actions is more d ifficult. W e resort to rando m projec tions for appro ximating P D i =1 x 2 i y 2 i , P D i =1 x 3 i y i , and P D i =1 x i y 3 i . Since there are thre e “inner produ cts” of dif ferent orders, we can choose either on ly o ne pro jection matrix f or a ll three terms (the basic projection strategy), o r three indepen dent pro jection matr ices (the altern ativ e p rojection strate gy). 3 2.1 The Basic Proje ction Strategy First, g enerate a random matr ix R ∈ R D × k ( k ≪ D ), with i.i. d. entries 3 from a standard normal, i.e., r ij ∼ N (0 , 1) , E ( r ij ) = 0 , E ( r 2 ij ) = 1 , E ( r 4 ij ) = 3 . E r s ij r t i ′ j ′ = 0 , if t or s is o dd, and i 6 = i ′ or j 6 = j ′ Using random projections, we genera te six vectors in k dime nsions, u 1 , u 2 , u 3 , v 1 , v 2 , v 3 ∈ R k : u 1 ,j = D X i =1 x i r ij , u 2 ,j = D X i =1 x 2 i r ij , u 3 ,j = D X i =1 x 3 i r ij , v 1 ,j = D X i =1 y i r ij , v 2 ,j = D X i =1 y 2 i r ij , v 3 ,j = D X i =1 y 3 i r ij . W e have a simp le unbiased estimator of d (4) ˆ d (4) = D X i =1 x 4 i + D X i =1 y 4 i + 1 k 6 u T 2 v 2 − 4 u T 3 v 1 − 4 u T 1 v 3 . Lemma 1 E ˆ d (4) = d (4) , V ar ˆ d (4) = 36 k D X i =1 x 4 i D X i =1 y 4 i + D X i =1 x 2 i y 2 i ! 2 + 16 k D X i =1 x 6 i D X i =1 y 2 i + D X i =1 x 3 i y i ! 2 + 16 k D X i =1 x 2 i D X i =1 y 6 i + D X i =1 x i y 3 i ! 2 + ∆ 4 ∆ 4 = − 48 k D X i =1 x 5 i D X i =1 y 3 i + D X i =1 x 2 i y i D X i =1 x 3 i y 2 i ! − 48 k D X i =1 x 3 i D X i =1 y 5 i + D X i =1 x i y 2 i D X i =1 x 2 i y 3 i ! + 32 k D X i =1 x 4 i D X i =1 y 4 i + D X i =1 x i y i D X i =1 x 3 i y 3 i ! . 3 It is possibl e to relax the require ment of i.i.d sample s. In fact , to prov e unbiasedn ess of the estimate s only need s pairwise independence , and to deri ve the varia nce formula requires four-wise indepen dence. 4 Proof 1 See App endix A. The basic p rojection strategy is simple but its analysis is quite inv olved, especially when p > 4 . Also, if we are interested in h igher o rder mo ments (oth er than variance) of the estimator , the analysis becomes very tedious. 2.2 The Alternati ve Pr ojection Strategy Instead of one pro jection ma trix R , we generate three, R ( a ) , R ( b ) , R ( c ) , ind ependen tly . By rando m pro jections, we gen erate six vectors in k dimension s, u 1 , u 2 , u 3 , v 1 , v 2 , v 3 ∈ R k , such that u 1 ,j = D X i =1 x i r ( c ) ij , u 2 ,j = D X i =1 x 2 i r ( a ) ij , u 3 ,j = D X i =1 x 3 i r ( b ) ij , v 1 ,j = D X i =1 y i r ( b ) ij , v 2 ,j = D X i =1 y 2 i r ( a ) ij , v 3 ,j = D X i =1 y 3 i r ( c ) ij . Here we abuse the notatio n slightly by using the same u and v fo r both pr ojection strategies. Again, we have an unbiased estimator , denoted by ˆ d (4) ,a ˆ d (4) ,a = D X i =1 x 4 i + D X i =1 y 4 i + 1 k 6 u T 2 v 2 − 4 u T 3 v 1 − 4 u T 1 v 3 Lemma 2 E ˆ d (4) ,a = d (4) , V ar ˆ d (4) ,a = 36 k D X i =1 x 4 i D X i =1 y 4 i + D X i =1 x 2 i y 2 i ! 2 + 16 k D X i =1 x 6 i D X i =1 y 2 i + D X i =1 x 3 i y i ! 2 + 16 k D X i =1 x 2 i D X i =1 y 6 i + D X i =1 x i y 3 i ! 2 . Proof 2 The pr oof basically follows fr om that of Lemma 1. 5 Compared with V ar ˆ d (4) in Lemma 1, the difference would be ∆ 4 V ar ˆ d (4) − V ar ˆ d (4) ,a = ∆ 4 = − 48 k D X i =1 x 5 i D X i =1 y 3 i + D X i =1 x 2 i y i D X i =1 x 3 i y 2 i ! − 48 k D X i =1 x 3 i D X i =1 y 5 i + D X i =1 x i y 2 i D X i =1 x 2 i y 3 i ! + 32 k D X i =1 x 4 i D X i =1 y 4 i + D X i =1 x i y i D X i =1 x 3 i y 3 i ! , (1) which can be either n egati ve or po siti ve. For example, wh en all x i ’ s are n egati ve and all y i ’ s are p ositi ve, then ∆ 4 ≥ 0 , i.e., the alternative projections strategy results in smaller variance and hence it shou ld be adopted. W e can show in Lem ma 3 th at when the data are non-negative (which is more likely the r eality), the difference in (1) will never exceed zero , suggesting th at th e basic strate gy would be preferable, which is also operationally simpler (althou gh more sophisticated in the analysis). Lemma 3 I f all entries of x and y are non-negative, th en V ar ˆ d (4) − V ar ˆ d (4) ,a = ∆ 4 ≤ 0 . (2) Proof 3 See App endix B. . Thus, th e m ain advantage of the alter nativ e p rojection strategy is th at it simplifies the analysis, especially tru e when p > 4 . Also, analyzing the alterna ti ve projection strategy may provide an estimate for the basic pr ojection strategy . For example, th e variance of ˆ d (4) ,a is an upp er boun d of the v ariance of ˆ d (4) in non-negative d ata. In the next subsection, we show that the alternative strategy make the analysis fea- sible when we take advantage o f the marginal infor mation. 2.3 Impr oving the Estimates Using Margins Since we assume that a linear scan o f the data is feasible and in fact the estimators in both strategies already take advantage of the marginal l 4 norms, P D i =1 x 4 i and P D i =1 y 4 i , we might as well compute other marginal norms a nd try to take adv antag e of them in a systematic manner . Lemma 4 dem onstrates such a method for improving estimates using margins. For simplicity , we assume in Lem ma 4 tha t we ado pt the alternative projection strategy , in order to carry out the (asymptotic) analysis of the variance. 6 Lemma 4 S uppose we use the alternative pr ojection strate gy described in Sectio n 2.2 to generate samples u 1 ,j , u 2 ,j , u 3 ,j , v 1 ,j , v 2 ,j , and v 3 ,j . W e estimate d (4) by ˆ d (4) ,a,mle = D X i =1 x 4 i + D X i =1 y 4 i + 6 ˆ a 2 , 2 − 4 ˆ a 3 , 1 − 4 ˆ a 1 , 3 , wher e ˆ a 2 , 2 , ˆ a 3 , 1 , ˆ a 1 , 3 , ar e r espectively , the solutions to the following thr ee cubic equa- tions: a 3 2 , 2 − a 2 2 , 2 k u T 2 v 2 − 1 k D X i =1 x 4 i D X i =1 y 4 i u T 2 v 2 + a 2 , 2 − D X i =1 x 4 i D X i =1 y 4 i ! + a 2 , 2 k D X i =1 x 4 i k v 2 k 2 + D X i =1 y 4 i k u 2 k 2 ! = 0 . a 3 3 , 1 − a 2 3 , 1 k u T 3 v 1 − 1 k D X i =1 x 6 i D X i =1 y 2 i u T 3 v 1 + a 3 , 1 − D X i =1 x 6 i D X i =1 y 2 i ! + a 3 , 1 k D X i =1 x 6 i k v 1 k 2 + D X i =1 y 2 i k u 3 k 2 ! = 0 . a 3 1 , 3 − a 2 1 , 3 k u T 1 v 3 − 1 k D X i =1 x 2 i D X i =1 y 6 i u T 1 v 3 + a 1 , 3 − D X i =1 x 2 i D X i =1 y 6 i ! + a 1 , 3 k D X i =1 x 2 i k v 3 k 2 + D X i =1 y 6 i k u 1 k 2 ! = 0 . Asymptotically (as k → ∞ ), the varian ce would be V ar ˆ d (4) ,a,mle =36 V ar (ˆ a 2 , 2 ) + 1 6 V ar ( ˆ a 2 , 2 ) + 1 6 V ar ( ˆ a 2 , 2 ) = 36 k P D i =1 x 4 i P D i =1 y 4 i − P D i =1 x 2 i y 2 i 2 2 P D i =1 x 4 i P D i =1 y 4 i + P D i =1 x 2 i y 2 i 2 + 16 k P D i =1 x 6 i P D i =1 y 2 i − P D i =1 x 3 i y i 2 2 P D i =1 x 6 i P D i =1 y 2 i + P D i =1 x 3 i y i 2 + 16 k P D i =1 x 2 i P D i =1 y 6 i − P D i =1 x i y 3 i 2 2 P D i =1 x 2 i P D i =1 y 6 i + P D i =1 x i y 3 i 2 + O 1 k 2 7 Proof 4 [ ? , ? ] pr oposed taking a dvantage o f th e marginal l 2 norms to impr ove the estimates of l 2 distances and in ner pr o ducts. Beca use we a ssume the alterna tive p r o - jection strate gy , we can an alyze ˆ a 2 , 2 , ˆ a 3 , 1 , a nd ˆ a 1 , 3 , in depend ently and then co mbine the r esults; and hence we skip the detailed pr o of. Of co urse, in pra ctice, we pr obably still p refer the b asic pro jection strategy , i.e., only on e pro jection m atrix in stead of three. In this case, we still solve three cubic equations, b ut the precise analysis of the variance beco mes much more dif ficult. When the d ata are non-negativ e, we believe that V ar ˆ d (4) ,a,mle will also b e the upper bo und of the estima tion variance using the basic p rojection strategy , which ca n be easily ver- ified by empirical results (not included in the current report). Solving cubic equ ations is easy , as there a re closed-fo rm solu tions. W e ca n also solve the equ ations by iterati ve meth ods. In fact, it is com mon practice to do on ly a one-step iteratio n (starting with the solution without using m argins), called “o ne-step Newton-Rhapson” in statistics. 3 Normal Random Projection s f or P=6 For highe r p (whe re p is even), we can fo llow basically th e same proced ure as for p = 4 . T o illustrate this, we work out an example for p = 6 . W e only demo nstrate the basic projection strategy . The l 6 distance ca n be decom posed into 2 m arginal norms and 5 inner p roducts at various ord ers: d (6) = D X i =1 x 6 i + D X i =1 y 6 i − 20 D X i =1 x 3 i y 3 i + 15 D X i =1 x 2 i y 4 i + 15 D X i =1 x 4 i y 2 i − 6 D X i =1 x 5 i y i − 6 D X i =1 x i y 5 i Generate one random projec tion matrix R ∈ R D × k , and u 1 ,j = D X i =1 x i r ij , u 2 ,j = D X i =1 x 2 i r ij , u 3 ,j = D X i =1 x 3 i r ij , u 4 ,j = D X i =1 x 4 i r ij , u 5 ,j = D X i =1 x 5 i r ij , v 1 ,j = D X i =1 y i r ij , v 2 ,j = D X i =1 y 2 i r ij , v 3 ,j = D X i =1 y 3 i r ij , v 4 ,j = D X i =1 y 4 i r ij , v 5 ,j = D X i =1 y 5 i r ij . 8 Lemma 5 provide the v ariance of the following unbiased estimator of d (6) : ˆ d (6) = D X i =1 x 6 i + D X i =1 y 6 i + 1 k “ − 20 u T 3 v 3 + 15 u T 4 v 2 + 15 u T 2 v 4 − 6 u T 5 v 3 − 6 u T 1 v 5 ” = D X i =1 x 6 i + D X i =1 y 6 i + 1 k k X j =1 − 20 u 3 ,j v 3 ,j + 15 u 2 ,j v 4 ,j + 15 u 4 ,j v 2 ,j − 6 u 1 ,j v 5 ,j − 6 u 5 ,j v 1 ,j . Lemma 5 V ar “ ˆ d (6) ” = 400 k 0 @ D X i =1 x 6 i D X i =1 y 6 i + D X i =1 x 3 i y 3 i ! 2 1 A + 225 k 0 @ D X i =1 x 4 i D X i =1 y 8 i + D X i =1 x 2 i y 4 i ! 2 1 A + 225 k 0 @ D X i =1 x 8 i D X i =1 y 4 i + D X i =1 x 4 i y 2 i ! 2 1 A + 36 k 0 @ D X i =1 x 2 i D X i =1 y 10 i + D X i =1 x i y 5 i ! 2 1 A + 36 k 0 @ D X i =1 x 10 i D X i =1 y 2 i + D X i =1 x 5 i y i ! 2 1 A + ∆ 6 wher e ∆ 6 = − 600 k D X i =1 x 5 i D X i =1 y 7 i + D X i =1 x 3 i y 4 i D X i =1 x 2 i y 3 i ! − 600 k D X i =1 x 7 i D X i =1 y 5 i + D X i =1 x 3 i y 2 i D X i =1 x 4 i y 3 i ! + 240 k D X i =1 x 4 i D X i =1 y 8 i + D X i =1 x 3 i y 5 i D X i =1 x i y 3 i ! + 240 k D X i =1 x 8 i D X i =1 y 4 i + D X i =1 x 3 i y i D X i =1 x 5 i y 3 i ! + 450 k D X i =1 x 6 i D X i =1 y 6 i + D X i =1 x 2 i y 2 i D X i =1 x 4 i y 4 i ! − 180 k D X i =1 x 3 i D X i =1 y 9 i + D X i =1 x 2 i y 5 i D X i =1 x i y 4 i ! − 180 k D X i =1 x 7 i D X i =1 y 5 i + D X i =1 x 2 i y i D X i =1 x 5 i y 4 i ! − 180 k D X i =1 x 5 i D X i =1 y 7 i + D X i =1 x 4 i y 5 i D X i =1 x i y 2 i ! − 180 k D X i =1 x 9 i D X i =1 y 3 i + D X i =1 x 4 i y i D X i =1 x 5 i y 2 i ! + 72 k D X i =1 x 6 i D X i =1 y 6 i + D X i =1 x i y i D X i =1 x 5 i y 5 i ! . Proof 5 See App endix C. . When all entries of x and y are non -negative, we believe it is tru e that ∆ 6 ≤ 0 , but we did not proceed with the proof. Of co urse, it is again a goo d idea to take advantage of th e margina l norms, but we skip the analysis. 4 Sub-Gaussian Random Pr ojections It is well-known that it is not necessary to sample r ij ∼ N (0 , 1) . In fact, to have an unbiased estimator, it suffices to sample r ij from any distribution with zero mean (and 9 unit variance). For good higher-order beha vior s, it is often a goo d idea to sample from a sub -Gaussian distribution, of which a zero-mean no rmal distribution is a sp ecial case. The th eory of sub -Gaussian d istributions was developed in the 195 0’ s. See [ ? ] and ref erences therein. A r andom variable x is sub-Ga ussian if there exists a constant g > 0 such that for all t ∈ R : E (exp( xt )) ≤ exp g 2 t 2 2 . In th is section, we sam ple r ij from a sub-Gau ssian distribution with the following restrictions: E ( r ij ) = 0 , E ( r ij ) = 1 , E ( r 4 ij ) = s, and we denote r ij ∼ S ubG ( s ) . It can be sho wn that we must restrict s ≥ 1 . One examp le would be the r ij ∼ U nif or m ( − √ 3 , √ 3) , fo r which s = 9 5 . Al- though the uniform distribution is simpler than normal, it is now well-known tha t we should sample from the following three- point sub-Gaussian distrib utions[ ? ]. r ij = √ s × 1 with prob. 1 2 s 0 with prob. 1 − 1 s − 1 with prob. 1 2 s , s ≥ 1 In our analysis, we do not have to specify the exact d istribution of r ij and we can simply express the esti mation variance as a function of s . Here, we consider the basic projections st rategy , by generating one rand om pro jec- tion matrix R ∈ R n × D with i.i.d. entries r ij ∼ S ubG ( s ) , and u 1 ,j = D X i =1 x i r ij , u 2 ,j = D X i =1 x 2 i r ij , u 3 ,j = D X i =1 x 3 i r ij , v 1 ,j = D X i =1 y i r ij , v 2 ,j = D X i =1 y 2 i r ij , v 3 ,j = D X i =1 y 3 i r ij . W e again hav e a simple unbiased estimator of d (4) ˆ d (4) ,s = D X i =1 x 4 i + D X i =1 y 4 i + 1 k 6 u T 2 v 2 − 4 u T 3 v 1 − 4 u T 1 v 3 10 Lemma 6 V ar “ ˆ d (4) ,s ” = 36 k 0 @ D X i =1 x 4 i D X i =1 y 4 i + D X i =1 x 2 i y 2 i ! 2 + ( s − 3) D X i =1 x 4 i y 4 i 1 A + 16 k 0 @ D X i =1 x 6 i D X i =1 y 2 i + D X i =1 x 3 i y i ! 2 + ( s − 3) D X i =1 x 6 i y 2 i 1 A + 16 k 0 @ D X i =1 x 2 i D X i =1 y 6 i + D X i =1 x i y 3 i ! 2 + ( s − 3) D X i =1 x 2 i y 6 i 1 A − 48 k D X i =1 x 5 i D X i =1 y 3 i + D X i =1 x 2 i y i D X i =1 x 3 i y 2 i + ( s − 3) D X i =1 x 5 i y 3 i ! − 48 k D X i =1 x 3 i D X i =1 y 5 i + D X i =1 x i y 2 i D X i =1 x 2 i y 3 i + ( s − 3) D X i =1 x 3 i y 5 i ! + 32 k D X i =1 x 4 i D X i =1 y 4 i + D X i =1 x i y i D X i =1 x 3 i y 3 i + ( s − 3) D X i =1 x 4 i y 4 i ! . Proof 6 See App endix D. . 5 Conclusions It has bee n an active re search to pic on approx imating l p distances in m assi ve high - dimensiona l data, for example, a giant “d ata ma trix” A ∈ R n × D . While a linear scan on A may be feasible, it can be pro hibitive (or e ven infeasible) to compute and s tore all pairwise l p distances. Using ran dom p rojections can reduce the cost of com puting all pairwise distances fro m O ( n 2 D ) to ( n 2 k ) wh ere k ≪ D . Th e data size is red uced from O ( nD ) to O ( nk ) and hence it may be possible to store the reduced data in memory . While the well-known m ethod of stable random p r oje ctions is applicable to 0 < p ≤ 2 , not directly to p > 2 , we pr opose a practical ap proach f or app roximatin g the l p distances in massi ve data for p = 2 , 4 , 6 , ... , based o n the simple fact that, when p is e ven, the l p distances can be deco mposed into 2 marginal no rms and p − 1 “inner produ cts” of various orders. T wo p rojection strategies are prop osed to appro ximate these “inner pro ducts” as well as the l p distances; and we sh ow th e basic pro jection strategy ( which is simpler ) is always preferable over the altern ati ve strate gy in terms of th e ac curacy , at least f or p = 4 in non -negative da ta. W e also propo se utilizing the marginal norms (wh ich can be easily com puted exactly) to furth er improve the esti- mates. Finally , we analyze the performanc e using sub-Gaussian random projection s. 11 A Proof of Lemma 1 ˆ d (4) = D X i =1 x 4 i + D X i =1 y 4 i + 1 k “ 6 u T 2 v 2 − 4 u T 3 v 1 − 4 u T 1 v 3 ” = D X i =1 x 4 i + D X i =1 y 4 i + 1 k k X j =1 6 u 2 ,j v 2 ,j − 4 u 3 ,j v 1 ,j − 4 u 1 ,j v 3 ,j ! u 2 ,j v 2 ,j = D X i =1 x 2 i r ij ! D X i =1 y 2 i r ij ! = D X i =1 x 2 i y 2 i r 2 ij + X i 6 = i ′ x 2 i r ij y 2 i ′ r i ′ j Thus E ( u 2 ,j v 2 ,j ) = D X i =1 x 2 i y 2 i . Similarly , we can show E ( u 3 ,j v 1 ,j ) = D X i =1 x 3 i y i , E ( u 1 ,j v 3 ,j ) = D X i =1 x i y 3 i . Therefore, E “ ˆ d (4) ” = D X i =1 x 4 i + D X i =1 y 4 i + 1 k 0 @ k X j =1 E (6 u 2 ,j v 2 ,j − 4 u 3 ,j v 1 ,j − 4 u 1 ,j v 3 ,j ) 1 A = D X i =1 x 4 i + D X i =1 y 4 i + 1 k 0 @ k X j =1 6 D X i =1 x 2 i y 2 i − 4 D X i =1 x 3 i y i − 4 D X i =1 x i y 3 i ! 1 A = d (4) . T o deriv e the variance, we need to an alyze the expectation (6 u 2 ,j v 2 ,j − 4 u 3 ,j v 1 ,j − 4 u 1 ,j v 3 ,j ) 2 =36 u 2 2 ,j v 2 2 ,j + 16 u 2 3 ,j v 2 1 ,j + 16 u 2 1 ,j v 2 3 ,j − 48 u 2 ,j u 3 ,j v 2 ,j v 1 ,j − 48 u 2 ,j u 1 ,j v 2 ,j v 3 ,j + 32 u 3 ,j u 1 ,j v 1 ,j v 3 ,j . T o simplify the expression, we will skip the terms that will be zeros when taking exp ectations. E ` u 2 2 ,j v 2 2 ,j ´ = E 0 @ 0 @ D X i =1 x 2 i y 2 i r 2 ij + X i 6 = i ′ x 2 i r ij y 2 i ′ r i ′ j 1 A 2 1 A = E 0 @ D X i =1 x 4 i y 4 i r 4 ij + 2 X i 6 = i ′ x 2 i y 2 i r 2 ij x 2 i ′ y 2 i ′ r 2 i ′ j + X i 6 = i ′ x 4 i r 2 ij y 4 i ′ r 2 i ′ j 1 A = D X i =1 3 x 4 i y 4 i + 2 X i 6 = i ′ x 2 i y 2 i x 2 i ′ y 2 i ′ + X i 6 = i ′ x 4 i y 4 i ′ = D X i =1 x 4 i D X i =1 y 4 i + 2 D X i =1 x 2 i y 2 i ! 2 . 12 Similarly E ` u 2 3 ,j v 2 1 ,j ´ = D X i =1 x 6 i D X i =1 y 2 i + 2 D X i =1 x 3 i y i ! 2 , E ` u 2 3 ,j v 2 1 ,j ´ = D X i =1 x 2 i D X i =1 y 6 i + 2 D X i =1 x i y 3 i ! 2 . E ( u 2 ,j u 3 ,j v 2 ,j v 1 ,j ) = E D X i =1 x 2 i r ij D X i =1 x 3 i r ij D X i =1 y 2 i r ij D X i =1 y i r ij ! = E 0 @ 0 @ D X i =1 x 5 i r 2 ij + X i 6 = i ′ x 2 i r ij x 3 i ′ r i ′ j 1 A 0 @ D X i =1 y 3 i r 2 ij + X i 6 = i ′ y 2 i r ij y i ′ r i ′ j 1 A 1 A = E 0 @ D X i =1 x 5 i y 3 i r 4 ij + X i 6 = i ′ x 5 i r 2 ij y 3 i ′ r 2 i ′ j 1 A + E 0 @ X i 6 = i ′ x 2 i y 2 i r 2 ij x 3 i ′ y i ′ r 2 i ′ j + X i 6 = i ′ x 2 i y i r 2 ij x 3 i ′ y 2 i ′ r 2 i ′ j 1 A =3 D X i =1 x 5 i y 3 i + X i 6 = i ′ x 5 i y 3 i ′ + X i 6 = i ′ x 2 i y 2 i x 3 i ′ y i ′ + X i 6 = i ′ x 2 i y i x 3 i ′ y 2 i ′ = D X i =1 x 5 i D X i =1 y 3 i + D X i =1 x 2 i y 2 i D X i =1 x 3 i y i + D X i =1 x 2 i y i D X i =1 x 3 i y 2 i . Similarly E ( u 2 ,j u 1 ,j v 2 ,j v 3 ,j ) = D X i =1 x 3 i D X i =1 y 5 i + D X i =1 x i y 3 i D X i =1 x 2 i y 2 i + D X i =1 x i y 2 i D X i =1 x 2 i y 3 i , E ( u 3 ,j u 1 ,j v 1 ,j v 3 ,j ) = D X i =1 x 4 i D X i =1 y 4 i + D X i =1 x i y 3 i D X i =1 x 3 i y i + D X i =1 x i y i D X i =1 x 3 i y 3 i . 13 Therefore, V ar (6 u 2 ,j v 2 ,j − 4 u 3 ,j v 1 ,j − 4 u 1 ,j v 3 ,j ) =36 D X i =1 x 4 i D X i =1 y 4 i + 72 D X i =1 x 2 i y 2 i ! 2 +16 D X i =1 x 6 i D X i =1 y 2 i + 32 D X i =1 x 3 i y i ! 2 +16 D X i =1 x 2 i D X i =1 y 6 i + 32 D X i =1 x i y 3 i ! 2 − 48 D X i =1 x 5 i D X i =1 y 3 i + D X i =1 x 2 i y 2 i D X i =1 x 3 i y i + D X i =1 x 2 i y i D X i =1 x 3 i y 2 i ! − 48 D X i =1 x 3 i D X i =1 y 5 i + D X i =1 x i y 3 i D X i =1 x 2 i y 2 i + D X i =1 x i y 2 i D X i =1 x 2 i y 3 i ! +32 D X i =1 x 4 i D X i =1 y 4 i + D X i =1 x i y 3 i D X i =1 x 3 i y i + D X i =1 x i y i D X i =1 x 3 i y 3 i ! − 6 D X i =1 x 2 i y 2 i − 4 D X i =1 x 3 i y i − 4 D X i =1 x i y 3 i ! 2 from which it follows that V ar “ ˆ d (4) ” = 36 k 0 @ D X i =1 x 4 i D X i =1 y 4 i + D X i =1 x 2 i y 2 i ! 2 1 A + 16 k 0 @ D X i =1 x 6 i D X i =1 y 2 i + D X i =1 x 3 i y i ! 2 1 A + 16 k 0 @ D X i =1 x 2 i D X i =1 y 6 i + D X i =1 x i y 3 i ! 2 1 A − 48 k D X i =1 x 5 i D X i =1 y 3 i + D X i =1 x 2 i y i D X i =1 x 3 i y 2 i ! − 48 k D X i =1 x 3 i D X i =1 y 5 i + D X i =1 x i y 2 i D X i =1 x 2 i y 3 i ! + 32 k D X i =1 x 4 i D X i =1 y 4 i + D X i =1 x i y i D X i =1 x 3 i y 3 i ! 14 B Pr oof of Lemm a 3 It suf fices to show that D X i =1 x 5 i D X i =1 y 3 i + D X i =1 x 2 i y i D X i =1 x 3 i y 2 i ! + D X i =1 x 3 i D X i =1 y 5 i + D X i =1 x i y 2 i D X i =1 x 2 i y 3 i ! − D X i =1 x 4 i D X i =1 y 4 i + D X i =1 x i y i D X i =1 x 3 i y 3 i ! ≥ 0 . W e need to use the arithmetic-geometric mean inequality: n X i =1 w i ≥ n n Y i =1 w i ! 1 /n , pro vided w i ≥ 0 . Because x 5 i y 3 j + x 3 i y 5 j ≥ 2 q x 8 i y 8 j = 2 x 4 i y 4 j , D X i =1 x 5 i D X i =1 y 3 i + D X i =1 x 3 i D X i =1 y 5 i − D X i =1 x 4 i D X i =1 y 4 i ≥ 0 . Thus it only remains to sho w that D X i =1 x 2 i y i D X i =1 x 3 i y 2 i + D X i =1 x i y 2 i D X i =1 x 2 i y 3 i − D X i =1 x i y i D X i =1 x 3 i y 3 i ≥ 0 , for which it suf fices to show that 2 D X i =1 x 3 / 2 i y 3 / 2 i D X i =1 x 5 / 2 i y 5 / 2 i − D X i =1 x i y i D X i =1 x 3 i y 3 i ≥ 0 , or equi valen tly , to show that, if z i ≥ 0 ∀ i ∈ [1 , D ] , then f ( z i , i = 1 , 2 , ..., D ) = 2 D X i =1 z 3 i D X i =1 z 5 i − D X i =1 z 2 i D X i =1 z 6 i ≥ 0 . (3) Obviou sly , (3) holds for D = 1 and D = 2 . T o see that it i s true for D > 2 , we notice that only at ( z 1 = 0 , z 2 = 0 , ..., z D = 0) , the first deriv ativ e of f ( z i ) is zero. W e can also check that f ( z i = 1 , i = 1 , 2 , ..., D ) > 0 . S ince f ( z i ) is a continuous function, we know f ( z i ) ≥ 0 must hold if z i > 0 for all i . There is no need to worry abou t the boundary case that z j = 0 and z i ≥ 0 because it is reduced to a small problem with D ′ = D − 1 and we hav e already shown the base case when D = 1 and D = 2 . T hus, we complete the proof. 15 C Proof of Lemma 5 d (6) = D X i =1 x 6 i + D X i =1 y 6 i − 20 D X i =1 x 3 i y 3 i + 15 D X i =1 x 2 i y 4 i + 15 D X i =1 x 4 i y 2 i − 6 D X i =1 x 5 i y i − 6 D X i =1 x i y 5 i ˆ d (6) = D X i =1 x 6 i + D X i =1 y 6 i + 1 k “ − 20 u T 3 v 3 + 15 u T 4 v 2 + 15 u T 2 v 4 − 6 u T 5 v 3 − 6 u T 1 v 5 ” = D X i =1 x 6 i + D X i =1 y 6 i + 1 k k X j =1 − 20 u 3 ,j v 3 ,j + 15 u 2 ,j v 4 ,j + 15 u 4 ,j v 2 ,j − 6 u 1 ,j v 5 ,j − 6 u 5 ,j v 1 ,j . T o deriv e the v ariance, we need to analyze the ex pectation of ( − 20 u 3 ,j v 3 ,j + 15 u 2 ,j v 4 ,j + 15 u 4 ,j v 2 ,j − 6 u 1 ,j v 5 ,j − 6 u 5 ,j v 1 ,j ) 2 = 400 u 2 3 ,j v 2 3 ,j + 225 u 2 2 ,j v 2 4 ,j + 225 u 2 4 ,j v 2 2 ,j + 36 u 2 1 ,j v 2 5 ,j + 36 u 2 5 ,j v 2 1 ,j − 600 u 3 ,j v 3 ,j u 2 ,j v 4 ,j − 600 u 3 ,j v 3 ,j u 4 ,j v 2 ,j + 240 u 3 ,j v 3 ,j u 1 ,j v 5 ,j + 240 u 3 ,j v 3 ,j u 5 ,j v 1 ,j + 450 u 2 ,j v 4 ,j u 4 ,j v 2 ,j − 180 u 2 ,j v 4 ,j u 1 ,j v 5 ,j − 180 u 2 ,j v 4 ,j u 5 ,j v 1 ,j − 180 u 4 ,j v 2 ,j u 1 ,j v 5 ,j − 180 u 4 ,j v 2 ,j u 5 ,j v 1 ,j + 72 u 1 ,j v 5 ,j u 5 ,j v 1 ,j Skipping the detail, we can sho w that E ` u 2 3 ,j v 2 3 ,j ´ = D X i =1 x 6 i D X i =1 y 6 i + 2 D X i =1 x 3 i y 3 i ! 2 , E ` u 2 2 ,j v 2 4 ,j ´ = D X i =1 x 4 i D X i =1 y 8 i + 2 D X i =1 x 2 i y 4 i ! 2 , E ` u 2 4 ,j v 2 2 ,j ´ = D X i =1 x 8 i D X i =1 y 4 i + 2 D X i =1 x 4 i y 2 i ! 2 , E ` u 2 1 ,j v 2 5 ,j ´ = D X i =1 x 2 i D X i =1 y 10 i + 2 D X i =1 x i y 5 i ! 2 , E ` u 2 5 ,j v 2 1 ,j ´ = D X i =1 x 10 i D X i =1 y 2 i + 2 D X i =1 x 5 i y i ! 2 . 16 And E ( u 3 ,j u 2 ,j v 3 ,j v 4 ,j ) = D X i =1 x 5 i D X i =1 y 7 i + D X i =1 x 3 i y 3 i D X i =1 x 2 i y 4 i + D X i =1 x 3 i y 4 i D X i =1 x 2 i y 3 i , E ( u 3 ,j u 4 ,j v 3 ,j v 2 ,j ) = D X i =1 x 7 i D X i =1 y 5 i + D X i =1 x 3 i y 3 i D X i =1 x 4 i y 2 i + D X i =1 x 3 i y 2 i D X i =1 x 4 i y 3 i , E ( u 3 ,j u 1 ,j v 3 ,j v 5 ,j ) = D X i =1 x 4 i D X i =1 y 8 i + D X i =1 x 3 i y 3 i D X i =1 x i y 5 i + D X i =1 x 3 i y 5 i D X i =1 x i y 3 i , E ( u 3 ,j u 5 ,j v 3 ,j v 1 ,j ) = D X i =1 x 8 i D X i =1 y 4 i + D X i =1 x 3 i y 3 i D X i =1 x 5 i y 1 i + D X i =1 x 3 i y i D X i =1 x 5 i y 3 i , E ( u 2 ,j u 4 ,j v 4 ,j v 2 ,j ) = D X i =1 x 6 i D X i =1 y 6 i + D X i =1 x 2 i y 4 i D X i =1 x 4 i y 2 i + D X i =1 x 2 i y 2 i D X i =1 x 4 i y 4 i , E ( u 2 ,j u 1 ,j v 4 ,j v 5 ,j ) = D X i =1 x 3 i D X i =1 y 9 i + D X i =1 x 2 i y 4 i D X i =1 x i y 5 i + D X i =1 x 2 i y 5 i D X i =1 x i y 4 i , E ( u 2 ,j u 5 ,j v 4 ,j v 1 ,j ) = D X i =1 x 7 i D X i =1 y 5 i + D X i =1 x 2 i y 4 i D X i =1 x 5 i y i + D X i =1 x 2 i y i D X i =1 x 5 i y 4 i , E ( u 4 ,j u 1 ,j v 2 ,j v 5 ,j ) = D X i =1 x 5 i D X i =1 y 7 i + D X i =1 x 4 i y 2 i D X i =1 x i y 5 i + D X i =1 x 4 i y 5 i D X i =1 x i y 2 i , E ( u 4 ,j u 5 ,j v 2 ,j v 1 ,j ) = D X i =1 x 9 i D X i =1 y 3 i + D X i =1 x 4 i y 2 i D X i =1 x 5 i y i + D X i =1 x 4 i y i D X i =1 x 5 i y 2 i , E ( u 1 ,j u 5 ,j v 5 ,j v 1 ,j ) = D X i =1 x 6 i D X i =1 y 6 i + D X i =1 x i y 5 i D X i =1 x 5 i y i + D X i =1 x i y i D X i =1 x 5 i y 5 i , 17 Combining the results, we obtain V ar “ ˆ d (6) ” = 400 k 0 @ D X i =1 x 6 i D X i =1 y 6 i + D X i =1 x 3 i y 3 i ! 2 1 A + 225 k 0 @ D X i =1 x 4 i D X i =1 y 8 i + D X i =1 x 2 i y 4 i ! 2 1 A + 225 k 0 @ D X i =1 x 8 i D X i =1 y 4 i + D X i =1 x 4 i y 2 i ! 2 1 A + 36 k 0 @ D X i =1 x 2 i D X i =1 y 10 i + D X i =1 x i y 5 i ! 2 1 A + 36 k 0 @ D X i =1 x 10 i D X i =1 y 2 i + D X i =1 x 5 i y i ! 2 1 A + ∆ 6 where k ∆ 6 / 6 = − 100 D X i =1 x 5 i D X i =1 y 7 i + D X i =1 x 3 i y 4 i D X i =1 x 2 i y 3 i ! − 100 D X i =1 x 7 i D X i =1 y 5 i + D X i =1 x 3 i y 2 i D X i =1 x 4 i y 3 i ! + 40 D X i =1 x 4 i D X i =1 y 8 i + D X i =1 x 3 i y 5 i D X i =1 x i y 3 i ! + 40 D X i =1 x 8 i D X i =1 y 4 i + D X i =1 x 3 i y i D X i =1 x 5 i y 3 i ! + 75 D X i =1 x 6 i D X i =1 y 6 i + D X i =1 x 2 i y 2 i D X i =1 x 4 i y 4 i ! − 30 D X i =1 x 3 i D X i =1 y 9 i + D X i =1 x 2 i y 5 i D X i =1 x i y 4 i ! − 30 D X i =1 x 7 i D X i =1 y 5 i + D X i =1 x 2 i y i D X i =1 x 5 i y 4 i ! − 30 D X i =1 x 5 i D X i =1 y 7 i + D X i =1 x 4 i y 5 i D X i =1 x i y 2 i ! − 30 D X i =1 x 9 i D X i =1 y 3 i + D X i =1 x 4 i y i D X i =1 x 5 i y 2 i ! + 12 D X i =1 x 6 i D X i =1 y 6 i + D X i =1 x i y i D X i =1 x 5 i y 5 i ! . 18 D Proof of Lemma 6 ˆ d (4) ,s = D X i =1 x 4 i + D X i =1 y 4 i + 1 k “ 6 u T 2 v 2 − 4 u T 3 v 1 − 4 u T 1 v 3 ” = D X i =1 x 4 i + D X i =1 y 4 i + 1 k k X j =1 6 u 2 ,j v 2 ,j − 4 u 3 ,j v 1 ,j − 4 u 1 ,j v 3 ,j ! E ` u 2 2 ,j v 2 2 ,j ´ = E 0 @ 0 @ D X i =1 x 2 i y 2 i r 2 ij + X i 6 = i ′ x 2 i r ij y 2 i ′ r i ′ j 1 A 2 1 A = E 0 @ D X i =1 x 4 i y 4 i r 4 ij + 2 X i 6 = i ′ x 2 i y 2 i r 2 ij x 2 i ′ y 2 i ′ r 2 i ′ j + X i 6 = i ′ x 4 i r 2 ij y 4 i ′ r 2 i ′ j 1 A = D X i =1 s x 4 i y 4 i + 2 X i 6 = i ′ x 2 i y 2 i x 2 i ′ y 2 i ′ + X i 6 = i ′ x 4 i y 4 i ′ = D X i =1 x 4 i D X i =1 y 4 i + 2 D X i =1 x 2 i y 2 i ! 2 + ( s − 3) D X i =1 x 4 i y 4 i . Similarly , E ` u 2 3 ,j v 2 1 ,j ´ = D X i =1 x 6 i D X i =1 y 2 i + 2 D X i =1 x 3 i y i ! 2 + ( s − 3) D X i =1 x 6 i y 2 i , E ` u 2 3 ,j v 2 1 ,j ´ = D X i =1 x 2 i D X i =1 y 6 i + 2 D X i =1 x i y 3 i ! 2 + ( s − 3) D X i =1 x 2 i y 6 i . E ( u 2 ,j u 3 ,j v 2 ,j v 1 ,j ) = D X i =1 x 5 i D X i =1 y 3 i + D X i =1 x 2 i y 2 i D X i =1 x 3 i y i + D X i =1 x 2 i y i D X i =1 x 3 i y 2 i + ( s − 3) D X i =1 x 5 i y 3 i , E ( u 2 ,j u 1 ,j v 2 ,j v 3 ,j ) = D X i =1 x 3 i D X i =1 y 5 i + D X i =1 x i y 3 i D X i =1 x 2 i y 2 i + D X i =1 x i y 2 i D X i =1 x 2 i y 3 i + ( s − 3) D X i =1 x 3 i y 5 i , E ( u 3 ,j u 1 ,j v 1 ,j v 3 ,j ) = D X i =1 x 4 i D X i =1 y 4 i + D X i =1 x i y 3 i D X i =1 x 3 i y i + D X i =1 x i y i D X i =1 x 3 i y 3 i + ( s − 3) D X i =1 x 4 i y 4 i . 19 Therefore, V ar (6 u 2 ,j v 2 ,j − 4 u 3 ,j v 1 ,j − 4 u 1 ,j v 3 ,j ) =36 D X i =1 x 4 i D X i =1 y 4 i + 72 D X i =1 x 2 i y 2 i ! 2 + 36( s − 3) D X i =1 x 4 i y 4 i +16 D X i =1 x 6 i D X i =1 y 2 i + 32 D X i =1 x 3 i y i ! 2 + 36( s − 3) D X i =1 x 6 i y 2 i +16 D X i =1 x 2 i D X i =1 y 6 i + 32 D X i =1 x i y 3 i ! 2 + 36( s − 3) D X i =1 x 2 i y 6 i − 48 D X i =1 x 5 i D X i =1 y 3 i + D X i =1 x 2 i y 2 i D X i =1 x 3 i y i + D X i =1 x 2 i y i D X i =1 x 3 i y 2 i + ( s − 3) D X i =1 x 5 i y 3 i ! − 48 D X i =1 x 3 i D X i =1 y 5 i + D X i =1 x i y 3 i D X i =1 x 2 i y 2 i + D X i =1 x i y 2 i D X i =1 x 2 i y 3 i + ( s − 3) D X i =1 x 3 i y 5 i ! +32 D X i =1 x 4 i D X i =1 y 4 i + D X i =1 x i y 3 i D X i =1 x 3 i y i + D X i =1 x i y i D X i =1 x 3 i y 3 i + ( s − 3) D X i =1 x 4 i y 4 i ! − 6 D X i =1 x 2 i y 2 i − 4 D X i =1 x 3 i y i − 4 D X i =1 x i y 3 i ! 2 from which it follows that V ar “ ˆ d (4) ,s ” = 36 k 0 @ D X i =1 x 4 i D X i =1 y 4 i + D X i =1 x 2 i y 2 i ! 2 + ( s − 3) D X i =1 x 4 i y 4 i 1 A + 16 k 0 @ D X i =1 x 6 i D X i =1 y 2 i + D X i =1 x 3 i y i ! 2 + ( s − 3) D X i =1 x 6 i y 2 i 1 A + 16 k 0 @ D X i =1 x 2 i D X i =1 y 6 i + D X i =1 x i y 3 i ! 2 + ( s − 3) D X i =1 x 2 i y 6 i 1 A − 48 k D X i =1 x 5 i D X i =1 y 3 i + D X i =1 x 2 i y i D X i =1 x 3 i y 2 i + ( s − 3) D X i =1 x 5 i y 3 i ! − 48 k D X i =1 x 3 i D X i =1 y 5 i + D X i =1 x i y 2 i D X i =1 x 2 i y 3 i + ( s − 3) D X i =1 x 3 i y 5 i ! + 32 k D X i =1 x 4 i D X i =1 y 4 i + D X i =1 x i y i D X i =1 x 3 i y 3 i + ( s − 3) D X i =1 x 4 i y 4 i ! 20
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment