Compressed Counting Meets Compressed Sensing

Compressed Counting Meets Compressed Sensing Ping Li Department of Statistics & Biostatist ics Department of Computer Science Rutgers Univ ersity Piscataw ay , NJ 08854 pingli@sta t.rutgers. edu Cun-Hui Zhang Department of Statistics & Biostatistics Rutgers Univ ersity Piscataw ay , NJ 08854 czhang@sta t.rutgers. edu T ong Zhang Department of Statistics & Biostatist ics Rutgers Univ ersity Piscataw ay , NJ 08854 tongz@rci. rutgers.ed u Abstract Compressed 1 sensing (sparse signal recovery) has been a popular and important research topic in recent years. By observing that natural signals are often no nnegative, we propose a new fram ew ork for nonneg- ativ e signal recovery using Compr essed Coun ting (CC) . CC is a techn ique built on maximally-skewed α -stable r andom pr ojections o riginally developed for d ata stream computation s. Our recovery pro cedure is computatio nally v ery ef ﬁcient in that it requires only one linear scan of the coordinates. In our setting s, the signal x ∈ R N is assum ed to be nonn egati ve, i.e. , x i ≥ 0 , ∀ i . Our analysis demon - strates that, wh en α ∈ (0 , 0 . 5] , it sufﬁces to use M = ( C α + o (1 )) ǫ − α  P N i =1 x α i  log N/δ m easure- ments so that, with p robab ility 1 − δ , all coor dinates will be recovered within ǫ add iti ve p recision, in one scan of the co ordina tes. The constant C α = 1 when α → 0 and C α = π / 2 when α = 0 . 5 . In particular, when α → 0 , the r equired numbe r of measu rements is essentially M = K log N /δ , where K = P N i =1 1 { x i 6 = 0 } is the numb er of nonzero coordinates of the signal. 1 The wo rk wa s p resented at Simons Institute W orkshop on S uccinct Data Repr esenta tions and App lications in September 20 13. 1 1 Introd uction In this paper , we dev elop a new frame work for compr essed sensing (sparse signal recove ry) [8 , 6, 5, 7, 2]. W e fo cus on n onne gati ve sparse sign als, i.e., x ∈ R N and x i ≥ 0 , ∀ i . Note tha t real-wor ld signals are of ten nonne gat i ve. W e consider the scenario in which neither the magnit udes nor the location s of the n onzero entries of x are unkno wn (e.g., data strea ms). The task of compr essed sens ing is to re cov er the locati ons and magnitud es of the nonzero entrie s. Our frame w ork differs from mainstream work in that we use maximally - ske w ed α -stable distrib utions for gen eratin g our design matri x, while classical co mpresse d sensin g algo - rithms typ ically adopt Gau ssian or Gauss ian-lik e distrib u tions (e.g., dist rib uti ons w ith ﬁnit e v arian ces). The use of ske wed stabl e random projections was original ly dev elope d in [19, 18, 21], named Comp r essed Counting (CC) , in the cont ext of da ta stream computations. Note tha t in th is paper we focus on dens e design m atrix and lea v e the potential use of “very sparse stable random projecti ons” [16] for sparse reco v- ery as future w ork, which w ill connect this line of work with th e w ell-kno wn “spars e matri x” algorithm [1 3]. In compresse d sensing, the stand ard procedure ﬁrst collect s M non-adapt i ve linear measurements y j = N X i =1 x i s ij , j = 1 , 2 , ..., M (1) and then reconstruct s the signal x from the m easureme nts, y j , and the design matrix, s ij . In this context, the design matrix is indeed “designed” in that one can manually generate the entries to faci litate signal reco ve ry . In fact, the design matrix can be inte grated in the sensing hardwar e (e.g., cameras, scanners, or other sensors). In classical settings , entries of the design matrix, s ij , are typically sampled from Gaussian or Gaussian- lik e distrib utio ns. The recov ery algorithms are often based on linear programming ( basis pur - suit ) [4] or greedy pursuit algorithms such as orthogo nal matching purs uit [23, 12, 26]. In genera l, LP is computa tional ly expen si v e. OMP migh t be faster although it still requires sc annin g the coordin ates K times. It would be desirable to dev elop a new frame wor k for sparse reco ve ry which is much faster than linear progra mming decoding (and other algorithms) without requiring more m easure ments. It would be also desira ble if the method is rob ust against measureme nt noises and is applica ble t o data str eams . In this paper , our method meets these require ments by sampling s ij from maximally -sk e wed α -stabl e distrib u tions [31]. 1.1 Maximally-Skewed Stable Distrib utions In our pro posal , we samp le en tries of the design matrix s ij from an α -sta ble maximall y-sk e wed distrib ution, denote d by S ( α, 1 , 1) , where the ﬁrst “1” deno tes maximal ske w ness and the seco nd “1” denote s unit scale. If a rando m v ariab le Z ∼ S ( α, 1 , 1) , then its character istic functio n is F Z ( λ ) = E exp  √ − 1 Z λ  = exp  −| λ | α  1 − sign ( λ ) √ − 1 tan  π α 2  , α 6 = 1 (2) Suppose s 1 , s 2 ∼ S ( α, 1 , 1) i.i.d. For any constants c 1 ≥ 0 , c 2 ≥ 0 , we ha v e c 1 s 1 + c 2 s 2 ∼ S ( α, 1 , c α 1 + c α 2 ) . More general ly , P N i =1 x i s i ∼ S  α, 1 , P N i =1 x α i  if s i ∼ S ( α, 1 , 1) i.i.d. There is a standa rd procedure to sample from S ( α, 1 , 1) [3]. W e ﬁrst generate an expon entia l random v ariab le with mean 1, w ∼ exp(1) , and a uniform random vari able u ∼ unif (0 , π ) , and then compute sin ( αu ) [sin u cos ( απ / 2) ] 1 α  sin ( u − αu ) w  1 − α α ∼ S ( α, 1 , 1) (3) In practice, we can replace the stable distrib utio n with a hea vy-ta iled distrib ut ion in the domain of attrac- tions [11], for exa mple, 1 [ unif (0 , 1)] 1 /α . Again, we lea ve it as futur e work to use a sparsiﬁed design matrix. 2 1.2 Data Str eams and Linear Projection Me thods The use of maximally-sk e wed stable random projections for nonne gati ve (dynamic) data stream computa- tions was propos ed in a line of work called Compre ssed C ountin g (CC ) [19, 18, 21]. Prior to CC, it was popul ar to use symmetric stable random pr ojections [15, 17] in data stream computa tions. In the s tanda rd tu rnstil e data stream model [ 24], at t ime t , an arri vi ng s tream element ( i t , I t ) updates one entry of the data vecto r in a linear fashion : x ( t ) i t = x ( t − 1) i t + I t . T he dynamic nature of data streams makes computin g the summary statisti cs, e.g., P N i =1 | x i | 2 , and reco ve ring the nonzero entries more challengi ng, especi ally if the streams arri v e at high -speed (e.g., network traf ﬁc). Linear projec tions are naturall y capable of hand ling data streams. T o see this , suppose we denote the linear measurements as y ( t ) j = N X i =1 x ( t ) i s ij , j = 1 , 2 , ..., M (4) When a ne w stream element ( i t , I t ) arri v es, we only need to update the measurement as y ( t ) j = y ( t − 1) j + I t s i t ,j , j = 1 , 2 , ..., M (5) The entries s i t ,j are re-generat ed as needed by using pseudo-ran dom numbers [25], i.e., no need to materi- alize the entire design matrix. This is the standard practice in data stream computation s. Here, we should mention that this streaming model is actually very general. For example, the proces s of histogram-b uil ding can be viewed as a typical examp le of turnstil e data streams. In m achine learning, databa ses, computer vision, and NLP (natural language processing) applica tions, histogram-b ased features are popular . In networ k applications , monitoring trafﬁc histogr ams is an impo rtant mechanism for (e.g.,) anomaly detectio ns [10]. Detecting (recov ering ) hea vy component s (e.g., so called “elephant dete ction” ) using compress ed sensin g is an activ e researc h topic in networ ks; see (e.g.,) [30, 22, 28, 29]. For t he rest o f paper , we will drop the su perscr ipt ( t ) in y ( t ) j and x ( t ) i , while rea ders shoul d keep in mind that our results are natural ly applic able to data streams. 1.3 The Pr oposed Algorithm and Main Result For reco ve ring a nonne gat i ve signal x i ≥ 0 , i = 1 to N , we collect linear measurement s y j = P N i =1 x i s ij , j = 1 to M , where s ij ∼ S ( α, 1 , 1) i.i.d. In this paper , we focus on α ∈ (0 , 0 . 5] and leav e the study for α > 0 . 5 in future work. At the decodin g stage, we estimate the signal coordina te-wise: ˆ x i,min = min 1 ≤ j ≤ M y j /s ij (6) The number of measuremen ts M is cho sen so that P N i =1 Pr ( ˆ x i,min − x i ≥ ǫ ) ≤ δ (e.g., δ = 0 . 05 ). Main Result : When α ∈ (0 , 0 . 5] , it sufﬁces to use M = ( C α + o (1)) ǫ − α  P N i =1 x α i  log N /δ measure- ments so that, with probabilit y 1 − δ , a ll coord inates wil l be reco ve red within ǫ additi ve pr ecisio n, in one scan of the coordinates. The consta nt C α = 1 when α → 0 and C α = π / 2 when α = 0 . 5 . In particul ar , w hen α → 0 , the require d number of measurements is essen tially M = K log N /δ , where K = P N i =1 1 { x i 6 = 0 } is the number of nonze ro coordi nates of the signal. In the literature, it is known that the sample complexit y of compressed sensing using Gaussian design (i.e., α = 2 ) is essentiall y about 2 K log N/δ [9, 12]. This means our work already achiev es smaller com- ple xity with explicit constant, by requirin g only one linear scan of the coordi nates. V ery encourag ingly , it 3 is perhaps not surpris ing that our method as presen ted in this paper is merely a tip of the iceber g and we exp ect a varie ty of follo wup works can be dev elo ped along this line. For example, it appears pos sible to furthe r improv e the algori thm by introduc ing iteratio ns. It is also possib le to sparsify the design matrix to signiﬁca ntly speed up the proc essing (matrix-vec tor m ultipli cation ) and reco ver y . 2 Pr eparation: Rele v ant Prob ability Results Our pro posed algorithm utiliz es only the ratio stat istics y j /s ij for reco v ery , while the obser ved data in clude more information , i.e., ( y j , s ij ) for i = 1 , 2 , ..., N , and j = 1 , 2 , ..., M . Thus, we ﬁrst need to provid e an exp lanati on why we restric t ourselve s to the ratio statistics . For con veni ence, we deﬁne θ = N X i =1 x α i ! 1 /α , θ i = ( θ α − x α i ) 1 /α (7) and denote the probabilit y den sity functi on of s ij ∼ S ( α, 1 , 1) by f S . By a condi tional p robab ility argumen t, the joint de nsity of ( y j , s ij ) can be shown to be 1 θ i f S ( s ij ) f S  y j − x i s ij θ i  ∝ 1 θ i f S  y j − x i s ij θ i  . The MLE proced ure amounts to ﬁnding ( x i , θ i ) to maximize the join t likelihoo d L ( x i , θ i ) = M Y j =1 1 θ i f S  y j − x i s ij θ i  (8) Interes tingly , the follo wing Lemma sho ws that L ( x i , θ i ) appro aches inﬁnity at the poles y j − x i s ij = 0 . Lemma 1 The likel ihood in (8) appr oaches inﬁnity , i.e ., L ( x i , θ i ) → + ∞ , if y j − x i s ij → 0 , for any j , 1 ≤ j ≤ M . Proof : See Appen dix A.  The result in Lemma 1 suggests us to use only the ratio statistics y j /s ij to reco ve r x i . By the property of stable distrib utions, y j s ij = P N t =1 x t s tj s ij = x i + P N t 6 = i x t s tj s ij = x i + θ i S 2 S 1 (9) where θ i =  P t 6 = i x α i  1 /α and S 1 , S 2 ∼ S ( α, 1 , 1) i.i.d. T his m oti v ate s us to study the probab ility distri- b ution of two inde penden t stable random var iables : S 2 /S 1 . For con venien ce, we deﬁne F α ( t ) = Pr  ( S 2 /S 1 ) α/ (1 − α ) ≤ t  , t ≥ 0 (10) Lemma 2 F or an y t ≥ 0 , S 1 , S 2 ∼ S ( α, 1 , 1) , i.i.d., F α ( t ) = Pr  ( S 2 /S 1 ) α/ (1 − α ) ≤ t  = 1 π 2 Z π 0 Z π 0 1 1 + Q α /t du 1 du 2 (11) wher e Q α =  sin ( αu 2 ) sin ( αu 1 )  α/ (1 − α )  sin u 1 sin u 2  1 1 − α sin ( u 2 − αu 2 ) sin ( u 1 − αu 1 ) (12) 4 In parti cular , closed -forms expr essions ar e ava ilabl e when α → 0+ or α = 0 . 5 : lim α → 0 F α ( t ) = 1 1 + 1 /t , F 0 . 5 ( t ) = 2 π tan − 1 √ t (13) Mor eo ver , for any t ∈ [0 , 1] , 0 < α 1 ≤ α 2 ≤ 0 . 5 , we have 1 1 + 1 /t ≤ F α 1 ( t ) ≤ F α 2 ( t ) ≤ 2 π tan − 1 √ t (14) Proof: See Appendix B. F ig ur e 1 plo ts F α ( t ) for selected α values.  0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 t F α (t) α = 0.01 α = 0.5 Figure 1: F α ( t ) for t ∈ [0 , 1] and α = 0 . 01 , 0 . 1 , 0 . 2 , 0 . 3 , 0 . 4 and 0 . 5 (from bottom to top). Lemma 2 has prov ed that, when α → 0+ , F α ( t ) is of order t , and when α = 0 . 5 , F α ( t ) is of order √ t . Lemma 3 prov ide a more general result that F α ( t ) = Θ  t 1 − α  . Lemma 3 F or 0 ≤ t < α α/ (1 − α ) and 0 < α ≤ 0 . 5 , F α ( t ) = t 1 − α C α + o (1) (15) Proof: See Appendix C.  Remarks f or Lemma 3 : • The result restricts t < α α/ (1 − α ) . Here α α/ (1 − α ) is m onoto nically decreasin g in α and 0 . 5 ≤ α α/ (1 − α ) ≤ 1 for α ∈ (0 , 0 . 5] . Later w e will sho w that our method on ly requires very small t . • The co nstant C α can be numericall y ev aluated as shown in Figure 2. • When α → 0+ , we ha v e F 0+ ( t ) = 1 1+1 /t = t − t 2 + t 3 ... . Hence C 0+ = 1 . • When α = 0 . 5 , we ha v e F 0 . 5 ( t ) = 2 π tan − 1 √ t = 2 π  t 1 / 2 − t 3 / 2 / 3 + ...  . Hence C 0 . 5 = π / 2 . 5 0 0.1 0.2 0.3 0.4 0.5 1 1.1 1.2 1.3 1.4 1.5 1.6 α C α Figure 2: The constant C α as in Lemma 3. Numerically , it vari es between 1 and π / 2 . T o conclud e this section, the next Lemma sho ws that the maximum likelih ood estimator using the ratio statist ics is actually the “minimum esti mator”. Lemma 4 Use the ratio statistics , y j /s ij , j = 1 to M . When α ∈ (0 , 0 . 5] , the maximum likel ihood estimato r (ML E) of x i is the sample minimum ˆ x i,min = min 1 ≤ j ≤ M y j s ij (16) Proof: See Appendix D.  Lemma 4 large ly explai ns our proposed algorithm. In the next sectio n, we analyze the error probabili ty of ˆ x i,min and its sample comple xity bound. 3 Error Pr obab ility , Sample Complexity Bound, and Bias Analysis The follo wing Lemma concer ns th e tail probabi lity of the estimator ˆ x i,min . Because ˆ x i,min alw ays ove r - estimates x i , we only need to provi de a one-sided error probabi lity bound. Lemma 5 Pr ( ˆ x i,min − x i ≥ ǫ ) = h 1 − F α  ( ǫ/θ i ) α/ (1 − α ) i M (17) ≤ " 1 1 + ( ǫ/θ i ) α/ (1 − α ) # M (18) F or 0 < α ≤ 0 . 5 and ǫ/θ i < α , Pr ( ˆ x i,min − x i ≥ ǫ ) = [1 − Θ ( ǫ α /θ α i )] M (19) In parti cular , when α = 0 . 5 , Pr ( ˆ x i,min − x i ≥ ǫ, α = 0 . 5) =  1 − 2 π tan − 1 r ǫ θ i  M (20) 6 Proof: Recal l y j s ij = x i + θ i S 2 S 1 and ˆ x i,min = min 1 ≤ j ≤ M y j s ij . W e h ave Pr ( ˆ x i,min > x i + ǫ ) = Pr  y j s ij > x i + ǫ, 1 ≤ j ≤ M  =  Pr  S 2 S 1 > ǫ θ i  M = h 1 − F α  ( ǫ/θ i ) α/ (1 − α ) i M The r es t of the pr oof follows fr om Lemma 2 and Lemma 3.  Remark for Lemma 5: The pr obabil ity bound (18) is con v enie nt to use . H o we ve r , it is conserv ati v e in that it does not giv e the right order unl ess α is small (i.e., when α/ (1 − α ) ≈ α ). In compari son, (19) pro vides the exac t order , w hich will be useful for analyzing the precise sample complexity of our proposed algori thm. As sho wn in Lemma 3, F α ( t ) = Θ ( t 1 − α ) holds for relati vel y small t < α α/ (1 − α ) . In our case, t = ( ǫ/θ i ) α/ (1 − α ) , i.e., the result requires ǫ/θ i < α , or ǫ α /θ α i = ǫ α / ( P N l 6 = i x α l ) < α α . When α → 0 , this means we need 1 /K < 1 , which is virtually alw ays true. For larg er α , th e relation ǫ α / ( P N l 6 = i x α l ) < α α should hold for an y reasonab le setting s. Theor em 1 T o ensur e P N i =1 Pr ( ˆ x i,min − x i ≥ ǫ ) ≤ δ , it sufﬁce s to cho ose M by M ≥ log N /δ − log h 1 − F α  ( ǫ/θ ) α/ (1 − α ) i (21) wher e F α is deﬁned in Lemma 2. If ǫ/θ < 1 , then it suf ﬁces to use M ≥ log N /δ log h 1 + ( ǫ/θ ) α/ (1 − α ) i (22) which is sharp when α → 0 . In ge ner al, for α ∈ (0 , 0 . 5] and ǫ/θ < α , the ( sharp ) bound can be written as, M ≥ ( C α + o (1))  θ ǫ  α log N/δ (23) wher e the co nstan t C α is the same consta nt in Lemma 3. When α = 0 . 5 and ǫ/θ < 1 , a pr ecise bound e xists: M ≥ π 2 r θ ǫ log N /δ (24) Proof: The r esult (21) follows fr om Lemma 5, (22) fr om Lemma 2, (23) fr om Lemma 3. W e pr o vide mor e det ails for the pr oo f of the more pr e cise bound (24). When α = 0 . 5 , M ≥ log N /δ − log  1 − 2 π tan − 1 p ǫ θ  which can be simpliﬁed to be M ≥ π 2 q θ ǫ log N/δ , using the fact that − log  1 − 2 π tan − 1 ( z )  ≥ 2 π z , ∀ z ∈ [0 , 1] . T o see this inequality , we can chec k ∂ ∂ z  − log(1 − 2 π tan − 1 ( z )) − 2 π z  = 2 π  1 − 2 π tan − 1 z  (1 + z 2 ) − 2 π 7 It suf ﬁc es to show z 2 − 2 π tan − 1 z − 2 π z 2 tan − 1 z ≤ 0 which is tru e because the equalit y holds when z = 0 or z = 1 , and ∂ 2 ∂ z 2  z 2 − 2 π tan − 1 z − 2 π z 2 tan − 1 z  = 2 − 2 π  2 z tan − 1 z + 2 z 1 + z 2  > 0 This complet es the pr oof.  Remarks f or Theore m 1: The con venient b ound (22) is only sharp for α → 0 . For example, when α = 0 . 5 , α/ (1 − α ) = 1 , but the true order shoul d be in terms of √ ǫ instead of ǫ . The other bound (23) provide s the precis e ord er , w here the c onstan t C α is the s ame as in Lemma 3 . The fact th at the compl exi ty is proportio nal to ǫ − α is important and presents a substantia l improv ement ov er the previo us O  ǫ − 1  result in Count-Min ske tch [5]. F or e xample, if we let α → 0 , then  θ ǫ  α → K . I n other wo rds, the complexit y for exact K -sparse reco ver y is essentially K log N /δ and the constan t is basicall y 1. W e will comment more on the choice of α later in the paper . T o conclud e this section, we provide the analysis for the bias. The minimum estimator ˆ x i,min is biased and it alw ays ov er -esti mates x i . The follo wing Lemma ev al uates the bias precisel y . Lemma 6 E ( ˆ x i,min ) = x i + θ i D M ,α (25) D M ,α = Z ∞ 0 h 1 − F α  t α/ (1 − α ) i M dt (26) In parti cular , when α = 0 . 5 , D M ,α =0 . 5 = M ( M − 1) 4 π 2 ∞ X j =0 ( − 1) j π 2 j ( M + 2 j − 2)(2 j )! B 2 j − 1 (27) wher e B ( a, b ) = R 1 0 t a − 1 (1 − t ) b − 1 dt is the Beta funct ion and B j is the Bernoul li number satisfy ing t e t − 1 = ∞ X j =0 B j t j j ! = ∞ X j =0 B 2 j t 2 j (2 j )! − t 2 e.g ., B 0 = 1 , B 1 = − 1 / 2 , B 2 = 1 / 6 , B 4 = − 1 / 30 , B 6 = 1 / 42 , B 8 = − 1 / 30 , B 10 = 5 / 66 , ... Proof: See Appendix E. F ig ur e 3 plo ts the D M ,α for α = 0 . 5 .  8 10 0 10 1 10 2 10 3 10 4 10 −8 10 −6 10 −4 10 −2 10 0 M D M,0.5 D M, α =0.5 Figure 3: The constant D M ,α =0 . 5 for the bias analysi s in Lemma 6. 4 Experiments Our proposed algorithm for sparse recov ery is simple and require s merely one scan of the coordinates. Our theore tical analysi s provides the sharp sample complexi ty bound with the constant (i.e., C α ) speciﬁed (e.g., Figure 2 ). It is ne ver theles s still interesti ng to includ e an expe rimental study . All experiments presente d in this study were condu cted in Matlab on a workstati on w ith 256GB memory . W e did not make s pecial ef fort to optimiz e our code for ef ﬁcien cy . W e compare our propose d method with two popular L1 decoding packages : L1Magic [1] and SPGL 1 [27] 2 , on simulated data. Although it is certain ly not our intension to compare the two L1 decoding solvers, w e decide to present the results of both. While it is known that SPG L1 can often be faste r than L 1Magic, we observ e that in some cases SPGL1 could not achi e ve the d esired accu rac y . On the other hand, SPGL1 be tter uses memory and can handle lar ger problems than L1Magic. In each s imulatio n, w e rand omly select K out N co ordina tes and set the ir v alue s ( x i ) to be 1 . The other N − K coordin ates are set to be 0. T o simulate the design matrix S , we genera te two random matrices: { u ij } and { w ij } , i = 1 to N , j = 1 to M , w here u ij ∼ unif (0 , π ) and w ij ∼ exp (1) , i.i.d. Then we apply the formula (3) to g enera te s ij ∼ ( α, 1 , 1) , for α = 0 . 04 to 0.5, spac ed at 0.01. W e also u se th e same u ij and w ij to gene rate standard Gaussian ( N (0 , 1) ) v ariabl es for the design matrix used by L1Magic and SPGL1, based on the intere sting fact : − √ 2 cos ( u ij ) √ w ij ∼ N (0 , 1) . In this expe rimental setting, since K = P N i =1 x α i , the sample comple xit y of our algorit hm is essenti ally M = C α K/ǫ α log N /δ , w here C 0+ = 1 and C 0 . 5 = π / 2 ≈ 1 . 6 . In our simulations , we choose M by two option s: (i) M = K log N /δ ; (ii) M = 1 . 6 K log N /δ , where δ = 0 . 01 . 2 W e must specify some parameters in order to ach ie ve suf ﬁcient accuracies. For L1Magic, we use the follo wing Matlab script: l1eq_pd(x0, Afun, Atfun, y,1e-3,100,1 e-8,1000); For SPGL1, after consulting the author of [27], we used the following script: opts = spgSetParms( ’verbosity’,0 ); o pts.optTol=1e- 6;opts.decTol =1e-6;spg_bp(A, y, opts); Ho we ver , it looks for N = 10 , 000 , 000 and K = 10 we probably should reduce the tolerance further (which would increase the computational time substantially). Here, we would like to thank the authors of bo th [1] and [27] for discussions on this issue. 9 W e compare our method with L1Magic and S PGL1 in terms of the decodi ng times and the reco ver y errors. The (normalized ) recov ery error is deﬁned as er r or = s P N i =1 ( x i − estimated x i ) 2 P N i =1 x 2 i (28) 4.1 M = K log N /δ Figure 4 presen ts the re cov ery errors (left panel) and ratios of the decoding times (right panel), for N = 1 , 000 , 000 , K = 10 , and M = K log N /δ (where δ = 0 . 01 ). The r esults c onﬁrm that our p ropos ed method is comput ationa lly very efﬁci ent and is capable of produ cing accurat e recov ery for α < 0 . 38 . 0 0.1 0.2 0.3 0.4 0.5 0 0.2 0.4 0.6 0.8 1 α Normalized Error L1Magic SPGL1 N = 1000000, K = 10 M = K log N CC 0 0.1 0.2 0.3 0.4 0.5 200 300 400 500 600 α Ratio of Decoding Time L1Magic / CC SPGL1 / CC N = 1000000, K = 10 M = K log N Figure 4: Experiment s for comparing o ur proposed algorithm (la beled “CC”) w ith SPGL1 [27] a nd L1Magic [1], fo r N = 1 , 000 , 000 , K = 10 , and M = K log N /δ (where δ = 0 . 01 ). For ea ch α (from 0.04 to 0.5 sp aced at 0.01), we conduct simula tions 1 00 times an d report the median re sults. In the left panel , ou r propo sed method (solid curve) produces very accurate reco ve ry results for α < 0 . 3 8 . For larger α v alues, ho we v er , the errors become lar ge. This is expe cted because when α = 0 . 5 , the required number of samples should be ( π / 2) K log N /δ inste ad of K log N /δ . In this case, L1Magic also produc es accuracy recov ery results . Note th at for all methods, w e repor t the top- K entries of the recov ered signal as the estimated nonze ro entries. In the right panel , we plot the ratios of the decoding times. Basically , SPGL1 package uses about 580 times more time than our proposed method (which requires only one scan), and L1Magic packag e needs abou t 290 times m ore time than our method. Figure 5 presents the results for a larg er problem, with N = 10 , 000 , 000 and K = 10 . Because we can no t run L1Magic in thi s case , w e only present the comparisons with SPGL1. Again, our method is computa tional ly ver y efﬁcie nt and produc es accurat e recov ery for about α < 0 . 38 . 10 0 0.1 0.2 0.3 0.4 0.5 0 0.2 0.4 0.6 0.8 1 1.2 α Normalized Error SPGL1 N = 10000000, K = 10 M = K log N CC 0 0.1 0.2 0.3 0.4 0.5 200 400 600 800 1000 α Ratio of Decoding Time SPGL1 N = 10000000, K = 10 M = K log N SPGL1 / CC M = K log N N = 1000000, K = 10 Figure 5: Experiments for comparing our pr opose d algo rithm (CC) w ith S PGL1 and L1Magic, for N = 10 , 000 , 000 , K = 10 , and M = K log N /δ (where δ = 0 . 01 ). See the capti on of Figure 4 fo r more detai ls. In this lar ger problem, we can not run L1Magic as the program simply halts without making progres s. For α close to 0.5, we need to increase the number of measurements, a s sho wn in the theoreti cal a nalysi s. 4.2 M = 1 . 6 K lo g N /δ T o study the beha vio r as α approa ches 0.5 , w e increas e the number of measurements t o M = 1 . 6 K log N/δ . Figure 6 and F igure 7 presen t the exp erimenta l results for N = 1 , 000 , 000 and N = 10 , 000 , 000 , respec- ti ve ly . Interestin gly , when α = 0 . 5 , our algorithm still produces ac curate recov ery results (with the nor- malized errors around 0.007), although the results at smaller α values are ev en more accurate . In the next subsec tion (S ection 4.3), w e will exper imentally show that the recov ery accurac y can be further improv ed by a bias-co rrectio n proced ure as deriv ed Lemma 6. 0 0.1 0.2 0.3 0.4 0.5 0 2 4 6 8 10 x 10 −3 α Normalized Error L1Magic SPGL1 N = 1000000, K = 10 M = 1.6 K log N CC 0 0.1 0.2 0.3 0.4 0.5 0 10 20 30 40 50 α Ratio of Decoding Time L1Magic / CC SPGL1 / CC N = 1000000, K = 10 M = 1.6 K log N Figure 6: Experiments for comparing our pr opose d algo rithm (CC) w ith S PGL1 and L1Magic, for N = 1 , 000 , 000 , K = 10 , and M = 1 . 6 K log N /δ (where δ = 0 . 01 ). Again, fo r each α , we conduct s imulatio ns 100 ti mes and re port the median resul ts. In the left panel , our pr opose d method (solid cu rve ) produc es accura te recov ery results, although the errors increas e with increasin g α (the maximum error is around 0.007). In the right p anel , we can see that in this case, our method is only , respect i ve ly , 2 7 times and 39 times faster than SPG L1 and L1Magic. W e should mention that we did not make special ef fort to optimize our Matlab code for ef ﬁcienc y . 11 0 0.1 0.2 0.3 0.4 0.5 0 2 4 6 8 10 x 10 −3 α Normalized Error SPGL1 N = 10000000, K = 10 M = 1.6 K log N CC 0 0.1 0.2 0.3 0.4 0.5 0 50 100 150 α Ratio of Decoding Time SPGL1 / CC N = 10000000, K = 10 M = 1.6 K log N Figure 7: Experiments for comparing our pr opose d algo rithm (CC) w ith S PGL1 and L1Magic, for N = 10 , 000 , 000 , K = 10 , and M = K log N /δ (where δ = 0 . 01 ). 4.3 Bias-Corr ection As analyzed in L emma 6, the minimu m estimator ˆ x i,min is slightly bia sed: E ( ˆ x i,min ) = x i + θ i D M ,α , where the constant D M ,α can be pre-co mputed an d tab ulated f or each M and α (e.g., F igure 3 for D M ,α with α = 0 . 5 ). W e also need to estimate θ i , for which we resort the estimator in the prior work on Compressed Counting [18]. For ex ample, for α = 0 . 5 , the bias-corre cted estimator is α = 0 . 5 : ˆ x i,min,c = ˆ x i,min − "  1 − 3 4 M  s M P M j =1 1 /y j − p ˆ x i,min # 2 D M , 0 . 5 (29) As v eriﬁed in Figure 8, the b ias-co rrecte d estimator (2 9) d oes i mprov e the origin al minimum e stimator . 10 5 10 6 10 7 0 0.002 0.004 0.006 0.008 0.01 N Normalized Error min min,c Bias Correction, α = 0.5 K = 10, M = 1.6 K log N Figure 8: Bias correcti on for further improving the minimum estimator of our proposed algorithm at α = 0 . 5 . In this e xperimen t, we choose K = 10 , M = 1 . 6 K log N /δ , and N = 10 5 , 10 6 , 10 7 . In each simulatio n, we use the original min imum estimator ˆ x i,min togeth er with the bias-corre cted estimato r ˆ x i,min,c as in (29). W e can see that the bias-correcti on step does improve the accuracy , as the dashed error curv e ( ˆ x i,min,c ) is lo wer than the solid error curv e ( ˆ x i,min ). 12 4.4 Rob ustness against Measur ement Noise Figure 9 prese nts an experimen tal study to illustr ate that our prop osed algorithm is rob ust against usual measuremen t noise m odel: y j = N X i =1 x i s ij + n j , where n j ∼ N  0 , σ 2  , j = 1 , 2 , ..., M , i.i.d. (30) where the noise n j can, for ex ample, come from transmission channel after collecting the measurements . 0 2 4 6 8 10 x 10 4 −0.5 0 0.5 1 1.5 2 Coordinates Values CC ( α = 0.05) 0 2 4 6 8 10 x 10 4 −0.5 0 0.5 1 1.5 2 Coordinates Values CC ( α = 0.2) 0 2 4 6 8 10 x 10 4 −1 0 1 2 3 4 5 Coordinates Values SPGL1 0 2 4 6 8 10 x 10 4 −1 0 1 2 3 4 5 Coordinates Values L1Magic Figure 9: In this expe riment, we choose N = 100 , 000 , K = 10 , M = K log N /δ (with δ = 0 . 01 ). W e add noises to the measurement s: y j = P N i =1 x i s ij + n j , where n j ∼ N (0 , σ 2 ) i.i.d. In this example, w e let σ 2 = N σ 2 0 where σ 0 = 0 . 1 . W e the n run ou r proposed algorithm (CC, for α = 0 . 05 and α = 0 . 2 ) and L1 sol ver s (L1Magic and SP GL1). In eac h panel, the so lid st raight lines stan d for the val ues of the nonze ro entries and the (red) circles are the reco ve red nonzero coordi nates reporte d by algorith ms. Clear ly , our p ropos ed algorithm is essentially indif ferent to the measuremen t noises while the t wo L1 solve rs are no t rob ust agains t m easure ment noises. It is actually very intuiti v e to understan d why our p ropos ed al gorith m c an be robu st against measurement noises . Using the ratio statistics, w e ha v e y j s ij = x i + θ i S 2 S 1 + n j S 1 . Because S 1 is very hea v y-taile d, the noise in terms of n j /S 1 , has essenti ally no impact. In this paper , we only prov ide the intuiti ve explana tion and lea v e a formal analysis in future work. 13 5 Discussion and Futur e W ork While our pro posed al gorith m for sparse rec ov ery based on compres sed cou nting (maxi mally-sk e wed α - stable random projectio ns) is simple and fast, it is clear that the work presen ted in this paper is merely a tip of the icebe r g. W e expe ct many intere sting related research problems will arise. One important issu e is the ch oice of α . In thi s paper , our an alysis fo cuses on α ∈ (0 , 0 . 5] and our theore tical results show that smaller α valu es lead to better performan ce. T he natural question is: why can we simply use a ver y small α ? There are numeric al issues which prev ents us from using a too small α . For con ven ience, consider the approx imate mechanis m for genera ting S ( α, 1 , 1) by using 1 /U 1 /α , where U ∼ unif (0 , 1) (based on the theor y of domain of att ractio ns and generalized centr al limit theo- rem). If α = 0 . 04 , then we hav e to comput e (1 /U ) 25 , which may potenti ally create numerical problems. In our Ma tlab simulatio ns, we use α ∈ [0 . 04 , 0 . 5] and we do no t not ice ob viou s numeric al probl ems e v en with α = 0 . 04 as sho wn in S ection 4. Howe ve r , if a de vice (e.g., camera or other hand-h eld devi ce) has more limited precision and memory , then we expect that we m ust use a larg er α . Fortu nately , our exp eriments in Section 4 show that the performance is not too sensit i ve to α . For e xample , in our experimen tal setting, the reco ve ry accuracies are very good for α < 0 . 38 ev en when we cho ose the sample size M based on α → 0 . Among many po tential future research problems, we list a few e xamp les as follows: • When the signal can hav e both positiv e and neg ati v e components , we need to use symmetric stable random projecti ons. • The sample comple xity of our algori thm is O ( ǫ − α ) . For small α , the va lue of ǫ − α is close to 1 ev en for small ǫ , for example 0 . 01 − 0 . 04 = 1 . 2 . If a devic e allo ws the use of very small α , then we expe ct some iteratio n scheme might be able to substa ntiall y reduce the required number of m easureme nts. • In this paper , we focus on dense design matrix. In KDD’07, the work on “very sparse random pro- jection s” [1 6] showed that one can signiﬁcantly sparsify the design matrix w ithout hurting the per - formance in estimatin g summary statistics. W e expec t that it is also possib le to use sparsiﬁed design matrix in our frame wo rk for spars e recov ery . H o we ver , since reco ve ring summary statistic s is in gen- eral an easier task than recov ering all the coordina tes, we expect there will be nontri vial analysis for (e.g.,) deciding the le ve l of sparsity without hurting the recov ery results. • Anot her intere sting issue is the coding of the measur ements y j , which is a pr actical is sue b ecause storin g and transmittin g the measurements can be costly . Recen tly , there is work [20] for coding Gaussian ra ndom pr ojecti ons in the conte xt of search and learning. W e expect some ideas in [20] might be borro wed for sparse reco ve ry . 6 Conclusion W e dev elop a new compresse d sensin g algorit hm using Compr esse d Counting (CC) which is based on maximally-s ke wed α -stable ra ndom pr ojecti ons . Our method pr oduce s accurate reco ver y of nonnega ti ve sparse signals and our pr ocedu re is computati onally v ery efﬁci ent. The cost is just one linear scan of the coordi nates. Our theoretical analysis provide s the sharp comple xity bound. While our prelimina ry results are encour aging, we ex pect many pro mising future research problems can be pursued in this line of work. 14 Refer ences [1] Emmanuel Cand ` es and Justin Romber g. l 1 -magic: Reocve ry of sparse signals via con v e x program- ming. T echni cal report, Calinfornia Institute of T e chnol ogy , 2005. [2] Emmanuel Cand ` es, Justin R omber g, and T er ence T ao. Rob ust uncertai nty principles: exac t signal re- constr uction from highly incomplete freque ncy information. IEEE T ra ns. Inform. Theory , 52(2): 489– 509, 2006. [3] Joh n M . C hambers, C . L. M allo ws, and B. W . Stu ck. A metho d for simul ating st able random v a riable s. J ourna l of the American Statistical Association , 71(354):34 0–344 , 1976. [4] Scott Shaobing Chen, Davi d L. Donoho, Michael, and A. S aunder s. Atomic decompo sition by basis pursui t. SIAM Journ al on Scientiﬁc Computing , 20:33–61 , 1998. [5] Graham Cormode and S. Muthukrishn an. An improv ed data stream summary: the count-min sketch and its applica tions. Jou rnal of Algorithm , 55(1):58 –75, 2005. [6] Dai vd L. Donoho and X iaoming Huo. Uncertaint y princip les and ideal atomic decomposit ion. Infor - mation Theory , IEEE T r ansa ctions on , 40(7):28 45–2 862, nov . 2001. [7] Da vid L. Donoh o. Compressed sensing. IEEE T rans. Inform. Theory , 52(4):1 289–1 306, 2006. [8] Da vid L . Donoho and Philip B. S tark. Uncertaint y principles and signal recove ry . SIAM Jour nal of Applied Mathemat ics , 49(3):9 06–9 31, 1989. [9] Da vid L. Donoho and Jared T ann er . Counti ng face s of rand omly project ed polytopes when the proje c- tion radica lly lowers dimensio n. Jou rnal of the Am erican Mathe matical Society , 22(1), jan. 2009. [10] Laura Feinstein, Dan Schnack enbe r g, Rav indra B alupari , and Darrell Kindred. Statistical approa ches to DD oS attack detection and response. In DARP A Information Surviv abilit y Confer en ce and E xposi- tion , pages 303–31 4, 2003. [11] W illiam Feller . An Intr oducti on to Pr ob abilit y Theory and Its Appl icatio ns (V olume II) . John W ile y & Sons, Ne w Y ork, NY , second edition , 1971. [12] Alyso n K. Fletcher and S undee p Rangan. Orthogon al matching pursuit from noisy measurements: A ne w analys is. In NIP S . 200 9. [13] A. Gilbert and P . Indyk. Sparse recov ery using sparse matrices. Pr oc. of the IEEE , 98(6):9 37 –947, june 2010 . [14] Izr ail S . Gradshtey n and Iosif M. Ryzhik. T able of Inte gr als , Series, and P r oducts . Academic Press, New Y ork, sev ent h edition, 2007. [15] Piotr Indyk. Stable distri b ution s, pseudora ndom g enera tors, embed dings , and data stream computati on. J ourna l of ACM , 53 (3):30 7–323 , 2006. [16] Ping Li. V ery sparse stable random projecti ons for dimensio n reduction in l α ( 0 < α ≤ 2 ) norm. In KDD , San Jose, CA, 2007. [17] Ping Li. Estimat ors and tail bounds for dimension reductio n in l α ( 0 < α ≤ 2 ) using stable random projec tions. In SODA , pages 10 – 19, San Francisco , C A, 2008. [18] Ping Li. Impr ov ing compressed counting. In U AI , Mon treal, CA, 2009. 15 [19] Ping Li. Compr essed countin g. In SODA , Ne w Y ork, NY , 2009 (arXi v:080 2.0802 , arXi v:08 08.1766 ). [20] Ping Li, Mi chael Mitzenmache r , and Anshumal i S hri v ast a v a. Coding for rando m projec tions. T echni- cal report, arXi v:13 08.221 8 , 201 3. [21] Ping Li and Cun-Hui Zhang. A new algorithm for compressed counting with applications in shannon entrop y estimation in dynamic data. In COLT , 201 1. [22] Tsun g-Han Lin and H. T . Kung. Compressi v e sensing med ium access control fo r wireless lans. In Globecom , 2012. [23] S.G. Mallat and Zhifeng Zhang. Matching pursuits w ith time-frequenc y diction aries. Signa l Pr oce ss- ing , IEE E T rans actio ns on , 41(12):3 397 –3415, 1993. [24] S. Muthukrish nan. Data streams: Algorithms and applicatio ns. F oundations and T r en ds in Theor eti cal Computer Scien ce , 1:117–236 , 2 2005. [25] Noam Nisan. Pseudora ndom generators for space-boun ded computa tions. In Pr o ceedin gs of the twenty-se cond annua l ACM sympo sium on Theory of computing , STOC, pag es 204–212, 1990. [26] J.A. T ropp. Greed is good: algori thmic results for sparse approximati on. Informatio n Theory , IEEE T ransact ions on , 50(10):22 31 – 2242, oct. 2004. [27] Ewou t van den Ber g and Micha el P . F riedlan der . Probing the pareto fro ntier fo r basis purs uit solu tions . SIAM J . Sci. Comput. , 31(2): 890–9 12, 2008. [28] Ju n W ang, Haitham Hassanieh , Dina K atabi, and P iotr Indyk. Efﬁcie nt and reliable low-po wer backsc atter networ ks. In SIGCOMM , pages 61–72 , Helsinki, Finland , 2012. [29] Meng W ang, W eiyu Xu, Enrique Mallada, and Ao T ang. Sparse recov ery with graph constraints: Fundament al limits and measurement constr uction . In Infomcom , 2012. [30] Haiqu an (Chuc k) Zhao, Ashwin L all, Mitsu nori Ogihara , Oliv er Spat scheck , Jia W ang, a nd Jun Xu. A data streamin g algorithm for estimat ing entropies of od ﬂ o ws. In IMC , San Die go, CA, 2007 . [31] Vladi mir M. Zolotare v . One-d imension al Stable Dis trib uti ons . American Mathe matical Society , Prov- idence , RI, 1986 . 16 A Proof of Lemma 1 For S ∼ S ( α, 1 , 1) , the sampling approa ch in (3) pro vides a method to compute its CDF F S ( s ) = Pr sin ( αu ) [sin u cos ( απ / 2) ] 1 α  sin ( u − αu ) w  1 − α α ≤ s ! = Pr [sin ( αu )] α/ (1 − α ) [sin u cos ( απ / 2) ] 1 1 − α  sin ( u − αu ) w  ≤ s α/ (1 − α ) ! = 1 π Z π 0 exp ( − [sin ( αu )] α/ (1 − α ) [sin u cos ( απ / 2) ] 1 1 − α  sin ( u − αu ) s α/ (1 − α )  ) du = 1 π Z π 0 exp n − q α ( u ) s − α/ (1 − α ) o du and the PDF f S ( s ) = 1 π Z π 0 exp n − q α ( u ) s − α/ (1 − α ) o q α ( u ) α/ (1 − α ) s − α/ (1 − α ) − 1 du Hence, 1 θ i f S  y j − x i s ij θ i  = α/ (1 − α ) π Z π 0 q α ( u ) exp ( − q α ( u )  θ i y j − x i s ij  α/ (1 − α ) )  θ i y j − x i s ij  α/ (1 − α ) 1 ( y j − x i s ij ) du Therefore , the likeliho od L ( x i , θ i ) → + ∞ if y j − x i s ij → 0 , prov ided θ i / ( y j − x i s ij ) → const . Note that here we can choose θ i and x i to maximize the likel ihood . B Pr oof of Lemma 2 Since S 1 , S 2 ∼ S ( α, 1 , 1) , i.i.d., we kno w that S 1 = sin ( αu 1 ) [sin u 1 cos ( απ / 2)] 1 α  sin ( u 1 − αu 1 ) w 1  1 − α α , S 2 = sin ( αu 2 ) [sin u 2 cos ( απ / 2)] 1 α  sin ( u 2 − αu ) w 2  1 − α α where u 1 , u 2 ∼ unif or m (0 , π ) , w 1 , w 2 ∼ exp(1) , u 1 , u 2 , w 1 , w 2 are indepe ndent. Thus, we can write ( S 2 /S 1 ) α/ (1 − α ) = Q α w 1 w 2 , Q α =  sin ( αu 2 ) sin ( αu 1 )  α/ (1 − α )  sin u 1 sin u 2  1 1 − α sin ( u 2 − αu 2 ) sin ( u 1 − αu 1 ) Using proper ties of exp onenti al distrib utions, for any t ≥ 0 , F α ( t ) = Pr  ( S 2 /S 1 ) α/ (1 − α ) ≤ t  = Pr ( Q α w 1 /w 2 ≤ t ) = E  1 1 + Q α /t  = 1 π 2 Z π 0 Z π 0 1 1 + Q α /t du 1 du 2 17 When α → 0+ , Q α → 1 point-wise. By dominated con ver genc e, F 0+ ( t ) = 1 1+1 /t . When α = 0 . 5 , Q α can be simpliﬁed to be Q 0 . 5 =  sin ( u 2 / 2) sin ( u 1 / 2)   sin u 1 sin u 2  2 sin ( u 2 / 2) sin ( u 1 / 2) = cos 2 ( u 1 / 2) cos 2 ( u 2 / 2) which can be used to obtain the closed -form expressio n for F 0 . 5 ( t ) : F 0 . 5 ( t ) = 1 π 2 Z π 0 Z π 0 1 1 + Q 0 . 5 /t du 1 du 2 = 1 π 2 Z π 0 Z π 0 1 1 + cos 2 ( u 1 / 2) t cos 2 ( u 2 / 2) du 1 du 2 = 4 π 2 Z π / 2 0 Z π / 2 0 1 1 + b cos 2 ( u 1 ) du 1 du 2 , b = 1 t cos 2 ( u 2 ) = 4 π 2 Z π / 2 0 − 1 √ 1 + b tan − 1  √ 1 + b cos u 1 sin u 1      π / 2 0 du 2 = 2 π Z π / 2 0 1 q 1 + 1 t sec 2 u 2 du 2 = 2 π Z 1 0 1 q 1 + 1 t − z 2 dz = 2 π Z 1 / √ 1+1 /t 0 1 √ 1 − z 2 dz = 2 π sin − 1  1 / p 1 + 1 /t  = 2 π tan − 1 √ t T o show F α ( t ) ≥ 1 / (1 + 1 /t ) for any t ∈ [0 , 1] , we ﬁrst note that the equalit y holds when t = 0 and t = 1 . T o see the latter cas e, we write Q α = q 2 /q 1 , where q 1 and q 2 are i.i.d. When t = 1 , F α ( t ) = E ( 1 / (1 + q 2 /q 1 )) = E  q 1 q 1 + q 2  = 1 2 by symmetry . It remains to sho w F α ( t ) is monoton ically increasing in α for ﬁxed t ∈ [0 , 1 ] . For con venien ce, we deﬁne q α ( u ) and g α ( u ) , where Q α = q α ( u 2 ) /q α ( u 1 ) , q α ( u ) = [sin ( αu )] α/ (1 − α ) [sin u ] − 1 1 − α sin ( u − αu ) g α ( u ) = ∂ log q α ( u ) ∂ α = cos αu sin αu αu 1 − α + 1 (1 − α ) 2 log sin αu − 1 (1 − α ) 2 log sin u − u cos( u − αu ) sin( u − αu ) W e can chec k that both q α ( u ) and g α ( u ) are monoton ically increasing in u ∈ [0 , π ] . ∂ g α ( u ) ∂ u = − α sin 2 αu αu 1 − α + cos αu sin αu α 1 − α + α (1 − α ) 2 cos αu sin αu − 1 (1 − α ) 2 cos u sin u − cos( u − αu ) sin( u − αu ) + (1 − α ) u sin 2 ( u − αu ) =  (1 − α ) u sin 2 ( u − αu ) − α sin 2 αu αu 1 − α  +  cos αu sin αu α 1 − α − cos( u − αu ) sin( u − αu )  +  α (1 − α ) 2 cos αu sin αu − 1 (1 − α ) 2 cos u sin u  W e cons ider three terms (in curly bracke ts) separa tely and show the y are all ≥ 0 when α ∈ [0 , 0 . 5] . For the ﬁrst term, (1 − α ) u sin 2 ( u − αu ) − α sin 2 αu αu 1 − α ≥ 0 ⇐ ⇒ 1 − α sin((1 − α ) u ) ≥ α sin αu ⇐ ⇒ (1 − α ) sin αu − α sin((1 − α ) u ) ≥ 0 18 where the last inequality holds because the deri v ati ve (w . r .t. u ) i s (1 − α ) α cos αu − (1 − α ) α cos((1 − α ) u ) ≥ 0 . For the secon d term, it sufﬁce s to show ∂ ∂ u { α cos αu sin ( u − αu ) − (1 − α ) sin αu cos( u − αu ) } ≥ 0 ⇐ ⇒ − α 2 sin αu sin ( u − αu ) + (1 − α ) 2 sin αu sin ( u − αu ) ≥ 0 For the t hird term, it suf ﬁces to sho w α sin u cos αu − cos u sin αu ≥ 0 ⇐ ⇒ α sin( u − αu ) + (1 − α ) cos u sin αu ≥ 0 Thus, we ha v e prov ed the monoton icity of g α ( u ) in u ∈ [0 , π ] , when α ∈ [0 , 0 . 5] . T o pro v e the monoton icity of q α ( u ) in u , it sufﬁces to check if its logarit hm is monotonic, i.e. ∂ ∂ u log q a α ( u ) = 1 1 − α  α 2 cos αu sin αu + (1 − α ) 2 cos( u − αu ) sin( u − αu ) − cos u sin u  ≥ 0 for which it suf ﬁces to sho w α 2 cos αu sin ( u − αu ) sin u + (1 − α ) 2 cos( u − αu ) sin αu sin u − cos u sin αu sin( u − αu ) ≥ 0 ⇐ ⇒ α 2 sin 2 ( u − αu ) + (1 − α ) 2 sin 2 αu − 2 α (1 − α ) cos u sin αu sin( u − αu ) ≥ 0 ⇐ ⇒ ( α s in( u − αu ) − (1 − α ) sin αu ) 2 + 2 α (1 − α )(1 − cos u ) sin αu sin ( u − αu ) ≥ 0 At this point, w e ha ve pro ved that both q α ( u ) and g α ( u ) are monotoni cally increasin g in u ∈ [0 , π ] at least for α ∈ [0 , 0 . 5] . ∂ F α ( t ) ∂ α = E    − 1 t g α ( u 2 ) q α ( u 2 ) q α ( u 1 ) − g α ( u 1 ) q α ( u 1 ) q α ( u 2 ) q 2 α ( u 1 )  1 + q α ( u 2 ) tq α ( u 1 )  2    = 1 t E  q α ( u 1 ) q α ( u 2 ) ( g α ( u 1 ) − g α ( u 2 )) ( q α ( u 1 ) + q α ( u 2 ) /t ) 2  By symmetry ∂ F α ( t ) ∂ α = 1 t E  q α ( u 1 ) q α ( u 2 ) ( g α ( u 2 ) − g α ( u 1 )) ( q α ( u 2 ) + q α ( u 1 ) /t ) 2  Thus, to sho w ∂ F α ( t ) ∂ α ≥ 0 , it suf ﬁces to sho w E  q α ( u 1 ) q α ( u 2 ) ( g α ( u 1 ) − g α ( u 2 )) ( q α ( u 1 ) + q α ( u 2 ) /t ) 2  + E  q α ( u 1 ) q α ( u 2 ) ( g α ( u 2 ) − g α ( u 1 )) ( q α ( u 2 ) + q α ( u 1 ) /t ) 2  ≥ 0 ⇐ ⇒ E q α ( u 1 ) q α ( u 2 ) ( g α ( u 1 ) − g α ( u 2 ))  q 2 α ( u 1 ) − q 2 α ( u 2 )   1 /t 2 − 1  ( q α ( u 1 ) + q α ( u 2 ) /t ) 2 ( q α ( u 2 ) + q α ( u 1 ) /t ) 2 ! ≥ 0 which hol ds because 1 /t 2 − 1 ≥ 0 and ( g α ( u 1 ) − g α ( u 2 )) ( q α ( u 1 ) − q α ( u 2 )) ≥ 0 as both g α ( u ) and q α ( u ) are monotoni cally increas ing functions of u ∈ [0 , π ] . T his comple tes the proof. 19 C Proof of Lemma 3 The goal is to sho w that F α ( t ) = Θ  t 1 − α  . By our deﬁnition, F α ( t ) = E  1 1 + Q α /t  = E   1 1 + 1 t q α ( u 2 ) q α ( u 1 )   where q α ( u ) = [sin ( αu )] α/ (1 − α )  1 sin u  1 1 − α sin ( u − αu ) W e can write the inte gra l as F α ( t ) = E   1 1 + 1 t q α ( u 2 ) q α ( u 1 )   = 1 π 2 Z π / 2 0 Z π / 2 0 1 1 + t − 1 q α ( u 2 ) /q α ( u 1 ) du 1 du 2 + 1 π 2 Z π / 2 0 Z π / 2 0 1 1 + t − 1 q ′ α ( u 2 ) /q α ( u 1 ) du 1 du 2 + 1 π 2 Z π / 2 0 Z π / 2 0 1 1 + t − 1 q α ( u 2 ) /q ′ α ( u 1 ) du 1 du 2 + 1 π 2 Z π / 2 0 Z π / 2 0 1 1 + t − 1 q ′ α ( u 2 ) /q ′ α ( u 1 ) du 1 du 2 where q ′ α ( u ) = [sin ( α ( π − u )) ] α/ (1 − α )  1 sin( π − u )  1 1 − α sin ( π − u − α ( π − u )) = [sin ( α ( π − u ))] α/ (1 − α )  1 sin u  1 1 − α sin ( u + α ( π − u )) First, using the fact that α sin u ≤ sin( αu ) ≤ αu , we obtain q α ( u ) ≥ [ α sin ( u )] α/ (1 − α )  1 sin u  1 1 − α (1 − α ) sin ( u ) = α α/ (1 − α ) (1 − α ) W e ha v e prov ed in the proof of L emma 2 that q α ( u ) is a monotonica lly increasing function of u ∈ [0 , π ] . Since q α ( π / 2) = [sin ( απ / 2) ] α/ (1 − α ) cos ( απ / 2) , we hav e 1 / 4 ≤ α α/ (1 − α ) (1 − α ) ≤ q α ( u ) ≤ [sin ( απ / 2)] α/ (1 − α ) cos ( απ / 2) ≤ 1 , u ∈ [0 , π / 2] In other words , we can vie w q α ( u ) as a consta nt (i.e., q α ( u ) ≍ 1 ) when u ∈ [0 , π / 2] . On t he ot her han d, note tha t q ′ α ( u ) → ∞ as u → 0 . Moreo ve r , when u ∈ [0 , π / 2] , we hav e αu ≤ π − u and u − αu ≤ u + α ( π − u ) . Thus, q ′ α ( u ) dominates q α ( u ) . Therefore, the ord er of F α ( t ) is determined by one term: F α ( t ) ≍ Z π / 2 0 Z π / 2 0 1 1 + t − 1 q α ( u 2 ) /q ′ α ( u 1 ) du 1 du 2 ≍ Z π / 2 0 1 1 + t − 1 /q ′ α ( u ) du Since q ′ α ( u ) ≍ α α/ (1 − α ) max { u, α } u 1 / (1 − α ) ≍ max n u − α/ (1 − α ) , αu − 1 / (1 − α ) o 20 we ha ve , for α ∈ [0 , 1 / 2] , F α ( t ) ≍ Z α 0 1 1 + t − 1 /q ′ α ( u ) du + Z π / 2 α 1 1 + t − 1 /q ′ α ( u ) du ≍ Z α 0 1 1 + ( αt ) − 1 u 1 / (1 − α ) du + Z π / 2 α 1 1 + t − 1 u α/ (1 − α ) du Consider t < α α/ (1 − α ) . Because t − 1 u α/ (1 − α ) > ( u/α ) α/ (1 − α ) ≥ 1 for u ≥ α , we hav e Z π / 2 α 1 1 + t − 1 u α/ (1 − α ) du ≍ Z π / 2 α 1 t − 1 u α/ (1 − α ) du = t 1 − α 1 − 2 α u (1 − 2 α ) / ( 1 − α )     π / 2 α ≍ t unifor mly for α < 1 / 2 . When α = 1 / 2 (i.e., t < 1 / 2 ), we also hav e Z π / 2 α 1 1 + t − 1 u α/ (1 − α ) du = Z π / 2 1 / 2 1 1 + t − 1 u du = t log( u + t ) | π / 2 1 / 2 ≍ t For the o ther term with u ∈ [0 , α ] , we hav e Z α 0 1 1 + ( αt ) − 1 u 1 / (1 − α ) du = Z ( αt ) 1 − α 0 1 1 + ( αt ) − 1 u 1 / (1 − α ) du + Z α ( αt ) 1 − α 1 1 + ( αt ) − 1 u 1 / (1 − α ) du = Z ( αt ) 1 − α 0 1 1 + ( αt ) − 1 u 1 / (1 − α ) du + Z α ( αt ) 1 − α 1 1 + ( αt ) − 1 u 1 / (1 − α ) du ≍ ( αt ) 1 − α − ( αt ) 1 − α α u ( − α ) / (1 − α )    α ( αt ) 1 / (1 − α ) =( αt ) 1 − α − t (1 − α ) α ( − α ) / (1 − α ) + t (1 − α )( αt ) − α = t 1 − α α − α − t (1 − α ) α ( − α ) / (1 − α ) Combining the results , we obtai n F α ( t ) ≍ t  1 − α ( − α ) / (1 − α ) + α (1 − 2 α ) / ( 1 − α )  + t 1 − α α − α ≍ t 1 − α This complet es the proof. D Proof of Lemma 4 Deﬁne F Z ( t ) = Pr  y j s ij ≤ t  and f Z ( t ) = F ′ Z ( t ) . T o ﬁ nd the MLE of x i , we need to maximize Q M j =1 f Z ( z i,j ) . Using the result in Lemma 2, for S 1 , S 2 ∼ S ( α, 1 , 1) , we ha ve F Z ( t ) = Pr  y j s ij ≤ t  = Pr  S 2 /S 1 ≤ t − x i θ i  = E    1 1 +  θ i t − x i  α/ (1 − α ) Q α    f Z ( t ) = E      θ α/ (1 − α ) i Q α α/ (1 − α )( t − x i ) − 1 / (1 − α )  1 +  θ i t − x i  α/ (1 − α ) Q α  2      21 f ′ Z ( t ) = E      A  1 +  θ i t − x i  α/ (1 − α ) Q α  4      where Q α is deﬁned in Lemma 2 and A = θ α/ (1 − α ) i Q α α/ (1 − α )( − 1 / (1 − α ))( t − x i ) − 1 / (1 − α ) − 1 1 +  θ i t − x i  α/ (1 − α ) Q α ! 2 + 2 1 +  θ i t − x i  α/ (1 − α ) Q α !  θ α/ (1 − α ) i Q α α/ (1 − α )( t − x i ) − 1 / (1 − α )  2 = 1 +  θ i t − x i  α/ (1 − α ) Q α ! θ α/ (1 − α ) i Q α α/ (1 − α ) 2 ( t − x i ) − 1 / (1 − α ) − 1 × − 1 +  θ i t − x i  α/ (1 − α ) Q α ! + 2 θ α/ (1 − α ) i Q α α ( t − x i ) − α/ (1 − α ) ! = 1 +  θ i t − x i  α/ (1 − α ) Q α ! θ α/ (1 − α ) i Q α α/ (1 − α ) 2 ( t − x i ) − 1 / (1 − α ) − 1 × − 1 −  θ i t − x i  α/ (1 − α ) (1 − 2 α ) ! A ≤ 0 if α ≤ 0 . 5 . This means, f Z ( t ) → ∞ when t → x i and f Z ( t ) is nonde creasin g in t ≥ x i if α ≤ 0 . 5 . Therefore , gi v en M observ atio ns, z i,j = y j /s ij , the MLE is th e sample m inimum. This c omplete s the proof. E Pr oof of Lemma 6 E ( ˆ x i,min ) = x i + Z ∞ x i Pr ( ˆ x i,min > t ) dt = x i + Z ∞ x i " 1 − F α  t − x i θ i  α/ (1 − α ) !# M dt = x i + θ i Z ∞ 0 h 1 − F α  ( t ) α/ (1 − α ) i M dt = x i + θ i D M ,α W e ha v e pro ved in Lemma 2 tha t 1 1 + 1 /t = F 0 ( t ) ≤ F α ( t ) ≤ F 0 . 5 ( t ) = 2 π tan − 1 √ t 22 Thus, D M ,α = Z ∞ 0 h 1 − F α  ( t ) α/ (1 − α ) i M dt ≤ Z ∞ 0 " 1 1 + ( t ) α/ (1 − α ) # M dt = 1 − α α Z 1 0 t M (1 /t − 1) (1 − α ) /α − 1 1 t 2 dt = 1 − α α Z 1 0 t M − (1 − α ) /α − 1 (1 − t ) (1 − α ) /α − 1 dt = 1 − α α B eta ( M − (1 − α ) /α, (1 − α ) /α ) When α = 0 . 5 , then α/ (1 − α ) = 1 , and D M ,α =0 . 5 = Z ∞ 0 h 1 − F α  ( t ) α/ (1 − α ) i M dt = Z ∞ 0  1 − 2 π tan − 1 t  M dt = Z π / 2 0  1 − 2 u π  M d tan 2 u = Z π / 2 0  1 − 2 u π  M d 1 cos 2 u = Z 0 1 u M d 1 sin 2 ( uπ / 2) = M Z 1 0 u M − 1 sin 2 ( uπ / 2) du − 1 = M  2 π  M Z π / 2 0 u M − 1 sin 2 u du − 1 From the integ ral table [14, 2.643.7], we hav e Z u n sin 2 u du = − u n cos u sin u + n n − 1 u n − 1 + n ∞ X j =1 ( − 1) j 2 2 j u n +2 j − 1 ( n + 2 j − 1)(2 j )! B 2 j Therefore , to fa cilitat e numerical calcula tions, we resort to (let n = M − 1 ) Z π / 2 0 u M − 1 sin 2 u du = M − 1 M − 2 ( π / 2) M − 2 + ( M − 1) ∞ X j =1 ( − 1) j 2 2 j ( π / 2) M +2 j − 2 ( M + 2 j − 2)(2 j )! B 2 j =  π 2  M   M − 1 M − 2 ( π / 2) − 2 + ( M − 1) ∞ X j =1 ( − 1) j 2 2 j ( π / 2) 2 j − 2 ( M + 2 j − 2)(2 j )! B 2 j   where B j is the Bernoul li number satisfy ing t e t − 1 = ∞ X j =0 B j t j j ! = ∞ X j =0 B 2 j t 2 j (2 j )! − t 2 and B 0 = 1 , B 1 = − 1 / 2 , B 2 = 1 / 6 , B 4 = − 1 / 30 , B 6 = 1 / 42 , B 8 = − 1 / 30 , B 10 = 5 / 66 , ... 23 D M ,α =0 . 5 = M   M − 1 M − 2 ( π / 2) − 2 + ( M − 1) ∞ X j =1 ( − 1) j 2 2 j ( π / 2) 2 j − 2 ( M + 2 j − 2)(2 j )! B 2 j   − 1 = M ( M − 1) ∞ X j =0 ( − 1) j 2 2 j ( π / 2) 2 j − 2 ( M + 2 j − 2)(2 j )! B 2 j − 1 = M ( M − 1) 4 π 2 ∞ X j =0 ( − 1) j π 2 j ( M + 2 j − 2)(2 j )! B 2 j − 1 This complet es the proof. 24

Compressed Counting Meets Compressed Sensing

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment