Analysis of Basis Pursuit Via Capacity Sets
Finding the sparsest solution $\alpha$ for an under-determined linear system of equations $D\alpha=s$ is of interest in many applications. This problem is known to be NP-hard. Recent work studied conditions on the support size of $\alpha$ that allow …
Authors: Joseph Shtok, Michael Elad
The Journal of F ourier Analysis and Applications Analysis of Basis Pursuit Via Capacit y Sets Joseph Sh tok and Mic hael Elad The Computer Science Departmen t, The T ec hnion - Israel Institute of T ec hnolog y , Haifa 32000 Israel, email: [sh t ok,elad]@cs.tec hnion.ac.il. ABSTRACT. Finding the sp arsest solution α for an under-determine d line ar system of e quations D α = s is of int er est in many app lic ations. This pr oblem is known to b e N P-har d. R e c en t work st u die d c onditions on t he supp ort size of α that al low its r e c overy using ℓ 1 -minimization, via t he Basis Pursuit algorithm. These c onditions ar e o ften r elying o n a sc alar pr op erty of D c al le d the mutual-c oher enc e. In this work we intr o- duc e an alternative set of fe atur es of an arbitr arily given D , c al le d the capacit y sets . W e show how those c ould b e use d to analyze the p erfor- manc e of the b asis pursuit, le ading to impr ove d b ounds and pr e dictio ns of p erformanc e. Both the or etic al and n umeric al metho ds ar e pr esent e d, al l using the c ap acity values, and shown to le ad to impr ove d assessments of the b asis pursuit suc c ess in finding the sp ar est solution of D α = s . 1. In tro duction A p ow erful trend in signal pro cessing tha t has ev olv ed in r ecen t y ears is the use of redundant dictionaries, rather than just bases, for a sparse represen tation of signals (images, sound trac ks, and more). In suc h Math Subje ct Classific ations: 68P3 0, 68W25 . Keywor ds and Phr ases: sparse repr esentations, ℓ 1 -reconstr uction, Basis Pur- suit, random support, capacity sets. c 2004 Birkh¨ auser Boston. All rights reserved ISSN 1069-5869 DOI: jfaatest- 05/28/04 2 Joseph Shtok and Mic hael Elad a setting, w e consider a linear equation s = D α , where s is a given signal, D is the represen tation dictionary , and α is the signal’s repre- sen ta tion. The matrix D is a general full rank N × L matrix, where L > N , assumed to ha v e ℓ 2 normalized columns. The n um b er of non- zero elemen ts in the co efficien t v ector α is measured by the ℓ 0 -norm, k · k 0 , on R L . The goal is to find, within the ( L − N )-dimensional affine space o f the solutions for this equation, the sparsest represen tat ion for s , i.e. one whic h has t he least n um b er of non-zero en tries. This goal is formalized b y the follo wing optimization problem: ( P 0 ) : Arg min α ∈ R L k α k 0 s.t. D α = s . In this pa p er, we consider the signals for whic h the solution of ( P 0 ) is unique, and w e define S ( D ) a s the fa mily of suc h signals. W e denote Ω = { 1 , ..., L } , and refer to the sup p ort o f the v ector α = ( α 1 , ..., α L ) T as the set Γ = supp ( α ) = { n ∈ Ω | α n 6 = 0 } . The problem ( P 0 ) is NP-hard, demanding an exhaustiv e searc h o v er all the subsets of columns of D [16]. One of the most effectiv e tec hniques to appro ximate its solution is the con vex relaxation of the ℓ 0 -norm. It use s the ℓ 1 -norm, the closes t conv ex nor m on R L : ( P 1 ) : Arg min α ∈ R L k α k 1 s.t. D α = s . The solution of ( P 1 ) is carried out by linear programming. W e are in terested in s ignals s ∈ S ( D ) for which t he solutions o f ( P 0 ) and ( P 1 ) coincide. The idea of using ( P 1 ) t o find the sparsest solution is called Basis Pursuit (BP), as coined b y Chen, Donoho and Saunders [4, 5]. Let α b e a represen tation of s , with supp ort Γ = supp ( α ) ⊂ Ω. The matrix D Γ is a matrix of size N × | Γ | c on taining the columns (a lso Analysis of Basis Pursuit Via Capacity Set s 3 referred to as atoms) of D used for the construction of s . This matrix is necessarily full-rank (with rank equals | Γ | ). Kno wing the supp ort Γ suffices to enable p erfect reco v ery of α , and thus our in terest is confined to the abilit y to rec o v er the supp ort Γ. Definition 1.1. A s ubset Γ ⊂ Ω is c al le d ℓ 1 -r e c onstructible with r esp e ct to the di c tionary D if the solution of ( P 1 ) c oincides with the solution of ( P 0 ) for every signal s ∈ S ( D ) that admits a r epr esentation with the supp ort Γ . The main task of the pap er is to obtain conditions on s upp ort sizes whic h imply that they are ℓ 1 -reconstructible. F or any specific supp o rt Γ ⊂ Ω there exists a straigh tforw ard (y et exhaustiv e) t est whether it a dmits reco v ery b y BP – simply apply BP to the finite family of signals s = D α generated from co efficien t v ectors α with the supp ort Γ co v ering all p o ssible sign patterns (i.e. 2 | Γ | suc h tes ts 1 ). If the reco v ery succeeds for all these c hoices of α , it will also succeed for any other represen tation with supp ort Γ [9, 15]. Clearly , suc h a testing approach is impra ctical in most cases. If w e aim to find the prospects of success of the BP for a fixed cardinalit y | Γ | , this requires a set of tests as describ ed abov e p er each p ossible supp ort Γ ha ving suc h a cardinalit y , and this implies a need for approximately L | Γ | groups o f tests. Th us, the exhaustiv e approac h should be replace d either b y a random set of tests with empirical claims, or a theoretical study . Within the theoretical attempts to estimate the p o w er of the BP , t w o appro ac hes are distinguished in the existing literature. Earlier 1 In fact, half o f this amount is required bec a use if α is re c onstructible, then so is − α . 4 Joseph Shtok and Mic hael Elad w ork carried out the worst case analysis fo r a giv en dictionar y , pro- viding conditions on the supp ort cardinalit y that guarantee that an y supp ort satisfying them is ℓ 1 -reconstructible [8, 9, 11, 12 , 13 , 20]. These conditions ar e often v ery restrictiv e and far from empirical evidence. Another, more recen t, appro ac h presen ts a probabilistic analysis, pro- viding conditions for sp ecial families of dictionaries under whic h most signals of a giv en cardinality are ℓ 1 -reconstructible [1, 2, 6, 10, 1 9]. The results depict a general asymptotic b eha vior with regard to the sparse supp ort reco v ery . In b oth worst-case and probabilistic-analysis bra nc hes of w ork, man y classical results rely heavily on a scalar feature of the dictionary , kno wn as the mutual-c ohe r enc e [8, 1 2, 13, 20]. A related measure also used is t he Bab el function [8 , 2 0]. More recen t w ork emplo ys the Re- stricted Isometry Prop ert y (RIP) [3]. The informatio n carried b y all these measures is v ery p essimistic ; f urthermore, the RIP is v ery ex- p ensiv e computationally and mainly used fo r theoretical analysis. In this w ork we set to impro v e the existing w orst case results for a given general dictiona ry D , as rep or ted in [8, 12 , 13, 20]. W e ac hiev e this progress b y replacing the abov e-men tioned with a set of alternativ e fea- tures that w e refer to as the c ap acity sets of the dictionary . A thorough computational analysis of D and probabilistic to ols are applied to the problem, leading to impro v ed probabilistic b ounds. In the next section w e recall the existing theoretical results con- cerning ℓ 1 -reco v ery as a function of the sup p ort cardinality . In section 3 w e define t w o v ersions of the c a p acity set and presen t the main the- oretical results of this pap er using these features. Section 4 expands on t he ab o v e results b y pro viding t wo nume rical algorithms using the Analysis of Basis Pursuit Via Capacity Set s 5 c ap aci ty sets . Section 5 provides an ov erall comparison of the v arious metho ds presen ted in this w ork to assess the p erforma nce of BP fo r sev eral test-cases . 2. Bac kground Most kno wn results on sparsit y rely on the mutual-c oher en c e , denoted as µ , of the dictionary . This is the maximum of the inner pro ducts b et w een t he columns: µ = max i 6 = j ∈ Ω | < d i , d j > | . This correlation b et w een the columns, reflected in its w orst v alue by µ , helps establish- ing the ”saf e zone” for the suppor t sizes, wh ere b oth the un iqueness of sparsest represen tation and its ℓ 1 -reco v ery can b e guaran teed. F or D = [ Φ 1 , Φ 2 ] a pair of orthonormal bases, the following su ffi- cien t condition for Γ to be ℓ 1 -reconstructible is pro v en in [11]: | Γ | ≤ √ 2 − 0 . 5 µ . Donoho and Elad in [8] treat a gene ral dictionary D . They define the problem ( C Γ ) : max δ ∈ N ull ( D ) X k ∈ Γ | δ k | s.t. k δ k 1 = 1 , (2.1) and sho w tha t its solution is in timately tied to t he abilit y to recov er the suppo rt Γ, b y the follo wing lemma: Lemma 2.1. ([ 8 ] , L emma 2) A sufficien t c ondition on the supp ort Γ to b e ℓ 1 -r e c onstructible is v al ( C Γ ) < 1 2 . (2.2) This criteria is used to pro v e the following the orem: 6 Joseph Shtok and Mic hael Elad Theorem 2.2. ([8], The or em 7) A sufficien t c ondition on a supp o rt Γ ⊂ Ω to b e ℓ 1 -r e c onstructible is | Γ | < 1 2 1 + 1 µ . (2.3) T ypically , the coherence b ehav es at b est like O ( 1 √ N ), hence the results stated ab o v e predict quite w eak ℓ 1 -reco v ery , which is refuted b y the empirical evidence: usually BP reco v ers supp orts of size prop or- tional to N (and not its squared-ro ot). A generalization of the coherence is introduced in [8] and la ter used by J. T ro pp in [20]: for any 0 ≤ m ≤ L , the Bab el function µ 1 ( m ) is defined b y µ 1 ( m ) = max | Λ | = m max η ∈ Ω \ Λ X λ ∈ Λ | < φ λ , φ η > | . In terms o f t his function, a supp ort of size m is pro ve n to be ℓ 1 - reconstructible prov ided the follow ing inequalit y holds [20]: µ 1 ( m − 1) + µ 1 ( m ) < 1 . Unfortunately , in cases where the coherence µ is close to 1 (implying an existence of a t least one pro blematic pair of at oms), the gro wth o f µ 1 ( m ) is to o fast to pro vide any impro ve men t. Av era ge case analysis impro v es the asymptotic b ounds on recon- structible supp ort sizes. The w ork in [2] shows that for the dictionary D = [ I , F ∗ ], where F is the F our ier transform, random uniformly sam- pled suppo rt admits ℓ 1 -reco v ery with high pro babilit y if (the exp ec- tation of ) its cardinality is O ( N / log N ), whic h improv es the O ( √ N ) estimation of the w orst case approa c h. F or a g eneral or thonormal pair , it is shown in ([2 ], Theorem 5.3 ) tha t most ra ndom supp o rts whic h Analysis of Basis Pursuit Via Capacity Set s 7 cardinalit y b ehav ing like O (1 / ( µ 2 log 6 N )) admit recov ery by BP . The log N app earing in these expres sions is susp ected by the authors of [2] to be unnecess ary , whic h in effect turns this express ion in to O ( N ) (for incoheren t dictionaries). A similar and related result, exhibiting the square of the m utual coherence in t he denominator of the b ound, app ears in [19 ]. As suc h, this result is effectiv e in cases where the dictionary is “uniformly coheren t”, and the me tho ds emplo y ed are not v ery suitable for dictionaries with high coherence . The idea that represen tations with cardinalities O ( N ) a re ℓ 1 - reconstructible is suppo rted by the results rep orted in [6, 7, 10]. This result is obtained for asymptotically gro wing dictionaries of size N × δ N constructed b y concatenating random v ectors of unit l 2 -norm, inde- p enden tly dra wn from the uniform distribution. It is sho wn that all supp orts of size up to ρ ( δ ) N are ℓ 1 -reconstructible with probability ap- proac hing 1. The work in [7, 10] provides theoretical assessme n ts f or ρ ( δ ), based on connection to study on neighborly p olytop es. Despite b eing asymptotical, these results illuminate the empirically-supported evidence regarding the reconstruction abilities of minimal L 0 -norm sup- p orts b y linear programming. As go o d as these results sound, they do no t provide useful n u- merical information ab out the abilit y of ℓ 1 -reconstruction applied to a sp ecifically giv en dictionary D o f certain size, whic h is a pra ctical and cen tra l question in the a pplication of BP . Suc h information can only b e obtained t o day b y r esults in v olving the coherence µ or its descendan ts. Th us, the gap is esp ecially big when the dictionary is not unifo rmly coheren t and when µ ≫ 1 √ N . In this w ork w e introduce new f eatures of the dictionary D , t he 8 Joseph Shtok and Mic hael Elad c ap aci ty sets . These features are obtained as the solutions t o sp ecific linear pr ogramming problems that prob e the dictionary D . W e consider t w o suc h options: a v ector of capa cities q and a matrix Q , as w e shall explain in details in the next section. These features a re used to dev elop no v el analysis of BP p erformance as a function of the s upp ort’s cardinalit y . One interes ting b enefit of the prop osed analysis is a b etter t reat- men t of dictionaries w hic h are not “uniformly coheren t”. In cases where there exists a small set of columns in D with strong linear dep endency , the coherence and t he bab el function b eha v e badly , t ending to lead to o v erly p essimistic b o unds. As w e sho w, t he use of the capacities leads in these cases to m uch b etter results. Besides that, the capacities are sho wn to b e more delicate indicators of the dictionary , as reflected in a better prediction of the BP p erformance. Use of c ap acity sets bridges the ga p b etw een purely theoretical estimations of the reconstructible supp ort sizes for giv en dictionary D , whic h are usually fast but pro vide p essimistic low er bo und, and the empirical tests of D , whic h giv e v ery accurate account on BP- reconstruction abilities, but are computationally prohibitiv e. W e pro- p ose theoretical results and algorithms that emplo y the c ap acity sets to p erform computational assess men t of these abilities, whic h is fast relativ e to full empirical test a nd more optimistic than kno wn practi- cal formulae. The question of computational complexit y is discussed in details in sec tion 5 .4. Analysis of Basis Pursuit Via Capacity Set s 9 3. Capacit y Sets and Their U se In this section w e define t w o versions of the c ap acity sets , and state the main theoretical results that emplo y them for the ana lysis of the BP . 3.1 The Capacity V ecto r q The capacit y v ector consists of elemen ts related to an in termediate to ol used in the pro of of Theorem 2.2 in [8]: Definition 3.1. The c ap acity ve ctor q = ( q 1 , ..., q L ) T of a dic tion a ry D ∈ R N × L is define d for al l k ∈ Ω by q k = max δ ∈ N ull ( D ) δ k s.t. k δ k 1 = 1 . (3.1) Computing the elemen ts of q is relativ ely easy , and amounts to a simple set of L indep enden t linear programming problems of the form ˆ x k = Arg min x || x || 1 sub ject to Dx = 0 and x k = 1 , and then assigning q k = 1 / | | ˆ x k || 1 . T o see the equiv alence of the t w o problems , notice tha t the vec tor ˜ x k = ˆ x k / k ˆ x k k 1 is an elemen t of n ull space of D with unit ℓ 1 -norm. Since ( ˆ x k ) k = 1 and k ˆ x k k 1 is smallest p o ssible, the v alue q k = 1 / || ˆ x k || 1 = ( ˜ x k ) k is just the solution of 3 .1. Via Lemma 2.1, the definition of q pro vides a sufficien t condi- tion P k ∈ Γ q k < 1 2 on a given supp or t Γ to ensure its reco v ery by ℓ 1 - minimization. F urthermore, by gathering the | Γ | larg est en tries from q , a simple generalization of Theorem 2.2 can b e prop osed. Ho wev er, in this work w e seek a b etter b ound that tak es in to account the v ariet y of p ossible supp orts, rather than the w orst one. One suc h nume rical tec hnique is suggested in section 4, prop osing a sp ecial quan tizatio n o f 10 Joseph Shtok and Michael Elad the v a lues in q to obtain a low er b o und on t he fra ction of supp ort sizes whic h admit reco ve ry b y BP . In t his section w e a im to obtain a more theoretically flav ored res ult that uses q . Denote b y E q the mean v alue of the capacity v ector q , and b y σ 2 q its v ariance 1 L P k ∈ Ω ( q k − E q ) 2 . The following theorem uses these quan tities to ev aluate the probabilit y of ℓ 1 -reconstruction for a giv en supp ort size: Theorem A. F or any 1 ≤ ℓ < 1 2 E q , a supp ort Γ of size ℓ , sa mple d uniformly at r andom fr om Ω , admits ℓ 1 -r e c overy with pr o b ability P ( ℓ ) > 1 2 − ℓE q 2 ℓσ 2 q + 1 2 − ℓE q 2 . (3.2) In the sp ecial case of a constant capacit y v ector, the theorem b oils do wn to supp o rt size threshold of 1 2 E q , since then the v ariance b ecomes zero. W e show in Section 3.2 that weak ened v ersion of Theorem A yields the classic al threshold of | Γ | < 1 2 1 + 1 µ (see Theorem 2.2). Pro of : W e fix ℓ and chose subsets Λ , Γ ⊂ Ω according to t w o differ- en t probabilit y mo dels. The elemen ts of Γ are c hosen uniformly from Ω without r eplacemen t and form a set of ℓ distinct column indices. The ℓ elemen ts of Λ are c hosen uniformly with replacemen t (i.e. Λ is a m ultiset of size ℓ with p ossible duplicates). Now , define random v ariables x ℓ = X k ∈ Γ q k , y ℓ = X m ∈ Λ q m . (3.3) In these terms, the probability P ( ℓ ) , defined in the statemen t of the theorem, is b ounded b elo w b y P ( x ℓ < 1 2 ). In turn, w e shall b ound the proba bilit y P ( x ℓ < 1 2 ) b y means of the Tc hebyc hev inequality , whic h inv olv es the mean and the v a riance o f x ℓ . These parameters Analysis of Basis Pursuit Via Capacity Sets 11 are easily computable for y ℓ : b y its definition, w e hav e E ( y ℓ ) = ℓE q , v ar ( y ℓ ) = ℓσ 2 q . O ur result is based on the following connection b et w een the v a riables x ℓ and y ℓ , as sho wn in App endix A: E ( x ℓ ) = E ( y ℓ ) and v ar ( x ℓ ) ≤ v ar ( y ℓ ) . (3.4) Giv en an y real scalar a > 0, the one-tailed v ersion of the Tche b yc hev inequalit y [14] for x ℓ reads P ( x ℓ − E x ≥ aσ x ) = P ( x ℓ ≥ E x + aσ x ) ≤ 1 1 + a 2 , where E x = E ( x ℓ ), σ 2 x = v ar ( x ℓ ). By (3.4 ), w e substitute E x = ℓE q . Also, since a larger v aria nce implies a lo w er probabilit y , w e put √ ℓσ q instead of σ x and obtain P x ℓ ≥ ℓE q + a √ ℓσ q ≤ P ( x ℓ ≥ E x + aσ x ) ≤ 1 1 + a 2 . The parameter a is c hosen suc h that ℓE q + a √ ℓσ q = 1 2 , leading to a = ( 1 2 − ℓE q ) / ( √ ℓσ q ). Note that the condition a > 0 translates to the requiremen t ℓ < 1 2 E q as claimed in the theorem. In case it holds, w e ha v e P x ℓ ≥ 1 2 ≤ 1 1 + ( 1 2 − ℓE q ) 2 ℓσ 2 q , or put differen tly , P ( x ℓ < 1 2 ) > 1 − 1 1 + ( 1 2 − ℓE q ) 2 ℓσ 2 q = 1 2 − ℓE q 2 ℓσ 2 q + 1 2 − ℓE q 2 , as stated b y the theorem. ✷ 3.2 F rom Capacit y V ector to Coherence W e mentioned earlier that previous w o rk often uses the mutual c oh er- enc e to deriv e p erformance b o unds on ℓ 1 -reconstructible supp orts. The 12 Joseph Shtok and Michael Elad relation b et w een the capacities in q and the inner pro ducts b et w een the dictionary a toms, | < d i , d j > | has b een already discussed in [8]. Giv en a dictionary D , construct its Gram matrix as G = D T D . Define the sequen ce µ k = max i 6 = k | G i,k | for k ∈ Ω . (3.5) Namely , µ k is the maximal v alue o n the k -th column of | G | , disregard- ing the main diagonal en try . As [8] sho ws, this sequence of v alues satisfies q k ≤ µ k µ k + 1 . Th us the condition P k ∈ Γ q k < 1 2 can b e replaced with P k ∈ Γ µ k µ k +1 < 1 2 , leading of-course, to w eake r bo unds. F urther relaxation q k ≤ µ k µ k + 1 < µ µ + 1 (3.6) yields a constan t capacit y vec tor with entries of size µ µ +1 . Applying Theorem A to this v ector w e obtain, as a sp ecial case, the classical Theorem 2.2. 3.3 Using the Capacit y M atrix Q One problem with the capacity v ector q is the independence with whic h its entrie s q k are computed. This implies that one (or more) o f the en tries in q ma y b ecome unnecessarily large, compared to the v al- ues obtained in Equation (2.1), causing a we ak er b ound. By working with pairs of suc h entrie s, one could in princ iple improv e the obtained b ounds. This leads us to the follo wing definition: Definition 3.2. Denote by Ω 2 the set of in dic es Ω 2 = { ( i, j ) | i, j ∈ Ω , i < j } . The upp er triangular c ap acity matrix Q = { Q i,j } is the Analysis of Basis Pursuit Via Capacity Sets 13 matrix with non -zer o eleme n ts indexe d by ( i, j ) ∈ Ω 2 , define d as fol lo w s: Q i,j = max δ ∈ N ull ( D ) { max( δ i + δ j , δ i − δ j ) } s.t. k δ k 1 = 1 . Eac h of these en tries can b e computed b y t wo independen t linear programming problems of the form x + ( i,j ) = Arg min x || x || 1 sub ject to Dx = 0 and x i + x j = 1 x − ( i,j ) = Arg min x || x || 1 sub ject to Dx = 0 and x i − x j = 1 and then assigning Q i,j = 1 / min( || ˆ x + ( i,j ) || 1 , || ˆ x − ( i,j ) || 1 ). As in section 3.1, the o btained v alues Q i,j could b e used to form an impro ved w orst-case b ound for L emma 2 .1 and consequen tly for Theorem 2.2: Let Γ ⊂ Ω b e a randomly chosen supp ort of size 2 ℓ = 2 n . By definition, the non-zero elemen ts of Q satisfy max δ ∈ N ull ( D ) k δ k 1 =1 | δ i | + | δ j | = Q i,j ≤ max δ ∈ N ull ( D ) k δ k 1 =1 | δ i | + max δ ∈ N ull ( D ) k δ k 1 =1 | δ j | = q i + q j . Th us the v alues Q i,j can b e used in the ev aluation of an upp er b ound on C Γ . T o any partition I of Γ in t o disjoin t pairs there corresp onds the sum P ( k 1 ,k 2 ) ∈I Q k 1 ,k 2 that b ounds the v alue of C Γ from ab ov e. There- fore, Γ is ℓ 1 -reconstructible if there exists suc h a partitio n satisfying P ( k 1 ,k 2 ) ∈I Q k 1 ,k 2 < 1 2 . Naturally , among all suc h p o ssible part itions, w e are in terested in the one that leads to the smallest sum. Just one glance at the v alues of Q giv es a lo w er b ound fo r sizes of ℓ 1 -reconstructible subs ets: namely , if max( Q ) ≤ 1 ℓ , then a sum of an y ℓ/ 2 of its elemen ts do es not exceed 1 / 2; hence any subs et of columns o f 2 W e co nsider herea fter even supp or t sizes. Generalization to o dd ones is relatively simple, requir ing us e of o ne entry from q . W e omit this discussion for simplicit y . 14 Joseph Shtok and Michael Elad size up to ℓ is guarantee d to b e reco vere d by BP . Conjecture B b elo w estimates the uncertain ty c aused by replacing max( Q ) with me an ( Q ). Some n umerical tec hniques based on Q are describ ed in section 4 . Here we concen trate again on a theoretical b o und that uses Q , similar to the one prop osed in Theorem A with few necessary mo difi- cations. W e arrange the v alues { Q i,j | i < j ∈ Ω } of the Capacit y matrix in a ve ctor Q V . D enote b y E Q the mean v alue of Q V , and b y σ 2 Q its v ariance, σ 2 Q = 2 L ( L − 1) P i 1 2 − ℓ 2 E Q 2 ℓ 2 σ 2 Q + 1 2 − ℓ 2 E Q 2 . (3.7) Notice that the expression obtained in Equation(3.7 ) is the same as the one in (3.2), with ℓ replaced by ℓ/ 2. Since E Q and σ Q refer to pairs, if E Q = 2 E q and σ 2 Q = 2 σ 2 q the t w o b ounds are the same. Ho w eve r, as w e shall demons trate in se ction 5, E Q < 2 E q and σ 2 Q < 2 σ 2 q for random dictionaries, implying that this b ound is indeed stronger. Pro of: Fix an ev en supp ort size ℓ . In order t o translate the condi- tion P ( i,j ) ∈I Q i,j < 1 2 to a pro babilistic one, w e use again the mo del in v o lving a subset Γ ⊂ Ω of size ℓ whic h elemen ts a re chosen uniformly from Ω without replacemen t. Also, w e let I b e a random partition of 3 This cla im is a conjecture since it relies o n a pro p er ty that is used here without a pro of. More on this is giv en in App endix B. Analysis of Basis Pursuit Via Capacity Sets 15 the index set Γ into pairs. Based on these notions, w e define a random v ariable x ℓ = P ( k 1 ,k 2 ) ∈I Q k 1 ,k 2 . In effect, x ℓ is a sum of elemen ts o f Q randomly c hosen “ without replacemen t” in a stronger sense, i.e. not only the ele men ts are not rep eated, but tw o elemen ts w ith common in- dex are not allo we d. The probabilit y P ( ℓ ) , defined in the statemen t of the theorem, is bounded b elo w by P ( x ℓ < 1 2 ). This b ound is not tigh t, since the supp ort Γ is reconstructible if there exists some partition I opt suc h that P ( k 1 ,k 2 ) ∈I opt Q k 1 ,k 2 drops b elow the half , while P ( x ℓ < 1 2 ) is only the probabilit y this will happen for a r and o m partition I . In or der to analyze the v ariable x ℓ w e consider a multis et Φ o f size ℓ 2 c hosen unifor mly with replaceme n t from Q V , a nd define the random v ariable y ℓ to b e its sum, y ℓ = P Φ . Then w e ha v e E ( y ℓ ) = ℓ 2 E Q , v ar ( y ℓ ) = ℓ 2 σ 2 Q . The exp ectation o f x ℓ equals to that of y ℓ , whic h is prov en in App endix B. Regarding the v ariance, w e a re making an assum ption similar to 3.4: v ar ( x ℓ ) ≤ v ar ( y ℓ ) . (3.8) W e do not pro vide its pro of and leav e it as an op en question at this stage. Empirical v erification of this inequalit y is demonstrated in App endix B. F ollo wing the steps of Theorem A, g iv en any real a > 0, the one- tailed v ersion of the T c heb yche v inequalit y [14] for x ℓ reads P x ℓ ≥ ℓ 2 E Q + a r ℓ 2 σ Q ! ≤ 1 1 + a 2 . The parameter a is chose n suc h that ℓ 2 E Q + a q ℓ 2 σ Q = 1 2 , leading to a = ( 1 2 − ℓ 2 E Q ) / ( q ℓ 2 σ Q ), im plying that we should require ℓ < 1 E Q to 16 Joseph Shtok and Michael Elad get a > 0. This leads to P x ℓ ≥ 1 2 ≤ 1 1 + ( 1 2 − ℓ 2 E Q ) 2 ℓ 2 σ 2 Q , or put differen tly , P ( x ℓ < 1 2 ) > 1 − 1 1 + ( 1 2 − ℓ 2 E Q ) 2 ℓ 2 σ 2 Q = 1 2 − ℓ 2 E Q 2 ℓ 2 σ 2 Q + 1 2 − ℓ 2 E Q 2 , as stated in the theorem. ✷ 4. Numerical Algorithms Giv en the capacit y v ector q (o r its w eak er v ersion as describ ed in sec- tion 3.2) or matrix Q , w e can use Theorems A and B to predict the ℓ 1 -reconstructible supports, and sho w low er b ounds of the probabilit y for suc cess as a func tion of the supp ort size ℓ . Ho w eve r, w e can a lterna- tiv ely ev alua te thes e probabilities n umerically , provided that there are shortcuts t hat av oid the exp onen tial gro wth in supp ort p ossibilities. This leads us to the follo wing tw o algorithms. 4.1 A F ast Com binatorial Count Using q Belo w w e prop ose an algorithm whic h pro vides worst-case b ounds on reconstructible supp ort sizes. W e w ould like to establish the fraction of t he total n um b er of supp orts Γ of size ℓ t hat satisfy v al ( C Γ ) < 1 2 . T esting the sufficien t c ondition P k ∈ Γ q k < 1 2 for ev ery single Γ requires O ( L ℓ ) flops, whic h is prohibitiv e. Instead, w e prop ose to p erform a quan tization of the en tries of q to d distinct v alues, and lead to a more reasonable computational pro cess. Supp ose w e are giv en a partition Λ = { Λ i } d i =1 of Ω in to d disjoin t clusters, suc h t hat Ω = S d i =1 Λ i . The corresp onding quantize d v alues Analysis of Basis Pursuit Via Capacity Sets 17 in q are denoted b y { q i Λ } , eac h set to b e the ma ximal in its subset, { q i Λ = max k ∈ Λ i ( q k ) | 1 ≤ i ≤ d } . Giv en the quan tization parameters Λ = { Λ i , q i Λ } d i =1 , eve ry ℓ -sized supp ort Γ ∈ Ω can b e describ ed as the union S d i =1 Γ i , where Γ i ⊆ Λ i is the subset of indices in Γ allo cated t o the quan tized v alue q i Λ . Thus , the sum P k ∈ Γ q i can b e replaced b y a larger sum, P d i =1 | Γ i | q i Λ . In order to t est all p ossible supp o rts Γ ∈ Ω of size ℓ , a com bi- natorial coun t of all sequenc es p = ( p 1 , ..., p d ) is perfor med, suc h that 0 ≤ | p i | ≤ | Λ i | and P d i =1 | p i | = ℓ . F or eac h o f these w e ev aluate P d i =1 | p i | q i Λ and coun t the relative n um b er o f those 4 b elo w 1 2 . The com- plexit y of suc h computatio n do es not exce ed O ( L d ) d . As to the c hoice of the quantization parameters Λ = { Λ i , q i Λ } d i =1 , as said ab o v e, we let q i Λ = max k ∈ Λ i q k to guara n tee that the ev alua ted summations a re considering a w orst-case scenario. The clustering is done b y an attempt to m inimize the function f { Λ i , q i Λ } d i =1 = d X i =1 | Λ i | q i Λ − X k ∈ Λ i q k ! . (4.1) The difference | Λ i | q i Λ − P k ∈ Λ i q k is the quantiz ation error for the ele- men ts in the sub set Λ i , and the abov e error simply sums these v alues. The minimization of f { Λ i , q i Λ } d i =1 can b e done exhaustiv ely in case d is small – in our exp erimen ts w e hav e used d = 3 implying that the ab o v e req uires O ( L 3 ) flops. F or larg er v a lues of d a s equen tia l algorithm that ch o o ses Λ i can b e prop osed, separating the set Ω to t w o parts, and pro ceeding in a tree and greedy se paration sc heme. Computationally , the results of the com binatorial count are v ery close to t hose predicted b y Theorem A. Therefore, this metho d serv es as 4 Each instance m ust be weigh ted by the num b er of its possible oc c urrences. 18 Joseph Shtok and Michael Elad a supp orting evidence f or the probabilistic a pproac h ta k en in Theorem A, but its n umerical output is omitted from o ur d ispla y of exp erimen tal results in section 5. 4.2 A Sampling Algorithm Using Q An a lternativ e to Conjecture B is a direct ev aluation of ℓ 1 -reconstructible supp orts Γ of cardinalit y ℓ , b y t he follo wing stages: • W e dra w M ≫ L suc h supp orts { Γ i } M i =1 . • F or eac h Γ i w e see k to find a pa rtition I i that leads to the s mall- est v alue of P ( k, l ) ∈I Q k ,l . While finding the b est suc h partition is com binatorial in complexit y , we use an approx imate greedy algorithm of complexit y O ( ℓ 2 · l og ( ℓ )) whic h computes the fo l- lo wing subo ptimal partitio n: 1. Begin with empt y set I of pairs. 2. denote b y Q r es the sub-matrix o f Q whic h ro ws an columns consist of only those indices from | Γ | whic h do not o ccur in I . R etriev e the couple ( i 0 , j 0 ) , ( i 1 , j 1 ) of index pairs whic h minimize the sum Q ( i 0 , j 0 ) + Q ( i 1 , j 1 ) o v er Q r es . 3. joint the couple ( i 0 , j 0 ) , ( i 1 , j 1 ) to I and r eturn to item 2 while Q r es is nonempt y . Therefore, the algorithm is, in a sense, ”second-order g reedy”, i.e. a t eac h step the least- sum couple of v alues from Q , rather than least single v alue, is ex tracted. P o ssibly , b etter algor ithms will impro v e the p erformance of this sc heme, but we b eliev e it to b e quite close to optimal, while ke eping low computatio nal costs. The fact suc h partition can b e found in O ( ℓ 2 · l og ( ℓ )) Analysis of Basis Pursuit Via Capacity Sets 19 follo ws from the next com binatorial claim: let ( i ∗ , j ∗ ) b e t he index pair of minimal v alue in submatrix o f Q s upp orted on | Γ | . Then b oth i ∗ , j ∗ necessarily presen t among indic es ( i 0 , j 0 , i 1 , j 1 ) defined ab ov e. • Giv en the partition I , test P ( k, l ) ∈I Q k ,l < 1 2 . Accum ulate the relativ e n umber of s uc h o ccurrences o ve r the c ollection { Γ i } M i =1 . The fact that t his metho d relies on capacit y v a lues implies that the predicted p erformance is expected to b e we ak er compared to the true b eha vior of BP . Nev ertheless, a mong the v arious metho ds discussed th us far, this metho d is expected to be the most optimistic b ecause it uses Q and not q , and also b ecause it do es not build the ev alua tion through the Tc heb yche v inequalit y that lo oses also part of t he tight- ness. How ev er, as o pp osed to all the other metho ds described ab o v e, this metho d cannot claim theoretical correctness of its res ults. In the lig h t of similarity of the prop osed sc heme to the pure em- pirical test, w e can mak e a direct comparison of the c omputational cost of the t wo tests. See the details in the Section 5.4. 5. Exp erimen tal Results 5.1 T est-Cases to Study W e carry out a n um b er of tests on eac h of the three follo wing dictio- naries: 1. D − Random is the dictionary of siz e 128 × 256, whic h consists of ℓ 2 -normalized random v ectors, indep enden tly drawn from the Normal distribution on the unit sphere. Suc h a dictionary is often us ed in nume rical exp erimen ts as well as in v a rious appli- 20 Joseph Shtok and Michael Elad cations. 2. D − S poiled is the dictionary D − Rando m , whic h has under- gone an op eration designed to create a small set of columns with high linear dep endence. More precisely , w e re- generate a set o f 3 columns as a r andom linear com bina tion of 12 other columns. This dictionary is used to demonstrate the abilit y of the c ap acity-sets metho ds to b etter handle dictionaries with a non-uniform distribution of inner pro ducts. 3. D − DC T is the o rthonormal pair [ I , C ∗ ] of size 128 × 256, where C is the 1 -dimensional Discrete Cosine basis and I the identit y matrix. 5.2 Beha vior of q and Q As explained earlier, the passage from the capacit y v ector q to the matrix Q w as motiv ated by the fact that Q i,j pro vide a lo w er b ound in this con text. T o exhibit the n umerical behavior of t hese b ounds, w e compute the mean and the v ariance of the family of ratios R k ,l = Q k ,l q k + q l for k 6 = l ∈ Ω . (5.1) The mean and v a riance of these ratios for the three tes t cases is giv en in T able 1.1. As thes e figures sho w, we earn up to 30% of the upp er b ound v alue b y upgrading to Capacity Matrix fro m the Capacity V ector. This ratio b et w een t he t w o b ounds for the corresp onding indices is very stable, as seen from the lo w v alues of the standard deviation σ ( R ). T o displa y the p o w er of Conjecture B, w e sho w that E Q < 2 E q and either σ 2 Q < 2 σ 2 q or σ 2 Q ≪ E 2 Q . The corresp onding v alues fo r v ario us dictionaries are presen ted in the table below . Analysis of Basis Pursuit Via Capacity Sets 21 Dictionary E ( R ) σ ( R ) D − Random 0.7175 0.0008 D − S poil e d 0.7154 0.001 D − D C T 0.6509 0.0109 T able 1.1 Beha vior of the c ap acity-sets q and Q by ev aluating t he mean and v ariance of the ratios. Dictionary E Q 2 E q σ 2 Q 2 σ 2 q D -Random 32 × 128 0.2329 0.3179 0.5849e-3 0.8252e-3 D -Random 64 × 128 0.1695 0.2345 0.1405e-3 0.1654e-3 D -Random 128 × 256 0. 1235 0.1721 0. 4511e-4 0.5652e-4 D -DCT 64 × 128 0.1687 0.2586 0.4732e-3 0.0112e-3 D -DCT 128 × 256 0.1265 0.1943 0.4070e-3 0.4144e-5 T able 1.2 Comparison of mean and v ar iance of capacit y sets. Notice that for the D − D C T dictionary the v ariance of the capac- it y v ector is smaller than that of the Capacity matrix, due to the sp ecial structure of this dictionary . Nev ertheless , as seen later in the results section, C onjecture B predicts BP success on supp o rt s izes larger tha n those allo w ed b y Theorem A. 5.3 Compared Metho ds W e p erfo rm a num b er of computat ions, applying v arious metho ds for the estimation of BP p erformance on the giv en dictionar ies. The results are expressed via a set of Estimation F unctions, E F : Ω → R , whic h v alue at ℓ ∈ Ω is the predicted p ercen tage o f ℓ -sized supp o rts which admit reco very b y ℓ 1 -norm optimization. The EFs considered are the follo wing: 22 Joseph Shtok and Michael Elad 1. EF-emp - The standard empirical test o n the dictionary . This test is done b y dra wing 1 , 0 00 random suppor ts for eac h cardi- nalit y ℓ , generating a corresp onding signal, and solving the BP p er eac h. EF-emp is obta ined b y sho wing the relativ e n um b er of succe sses in recov ering the supp ort. 2. EF-CB - the classical coherence-based upp er b o und 1 2 (1 + 1 µ ), pro vided b y the Theorem 2.2. 3. EF-thmA - expresses the results of the Theorem A, EF-thmA ( ℓ ) = P ( ℓ ) as defined in the statement of the theorem. The v alues are computed from q of t he dictionary . 4. EF-thmB - expresses the res ults of the Conjecture B, computed from the capacit y mat rix Q of the dic tionary . 5. EF-compB - The results of the sampling algorithm based on Q , whic h results supp ort the estimation o f Conjecture B (see section 4.2). 6. EF-GB - The Grassmanian upp er b ound, computed by the for- m ula for the Classical Bound using the ideal coherence µ = q L − N N ( L − 1) . This last EF deserv es more explanation: Among all p ossible dictionar- ies of size N × L , the Grasssmanian frame is the one leading to the small- est p ossible coherence µ = q L − N N ( L − 1) [17]. Th us, this leads to the most optimistic w orst-case b ound. When the dictionary is “un- balanced”, implying a large sp read of inner-pro ducts in the G ram-matrix, w e kno w that the mutual-c oher enc e -b ound deteriorates dramatically . Th us, by using the Grassmanian Bound, we test what is the b est achie v able coherence-based p erformance b ehav ior f or the same dictiona ry size. Analysis of Basis Pursuit Via Capacity Sets 23 5.4 Complexit y Analysis of the Metho ds W e argue the usefulness of Capacity-based numerical algorithms for an ev aluation of a g iv en dictionary D . T o that end, w e consider the computational complexit y of eac h method listed in previous section. 1. EF-emp - The standard empirical test of D is conv ey ed as fol- lo ws: for eac h supp ort size ℓ , pic k M >> L random subsets Γ of columns o f size ℓ . F o r eac h Γ, generate a signal with ran- dom co efficien ts v ector supp orted on Γ and t est if BP will re- co v er the supp ort. Since in practice maximal relev an t size ℓ is prop ortiona l to L , the computatio nal complex it y of this test is O ( M · L · C LP ( L )), where C LP ( L ) denotes the complexit y of linear programming algorithm for problem of size L . 2. EF-CB requires the computation of µ , whic h tak es O ( L · N ) flops. 3. EF-thmA - T o emplo y results of the Theorem A, the capacit y v ector q is computed in ( O ( L · C LP ( L ))), and then for eac h ℓ the proba bilit y P ( ℓ ), define d in the statemen t of Theorem A, is computed in O ( L ). Overall complexit y - O ( L 2 + L · C LP ( L )) = O ( L · C LP ( L )). 4. EF-thmB - T o e mplo y results of the Conjecture B, the capacit y v ector q is compute d in ( O ( L 2 · C LP ( L ))), and then for eac h ℓ the probabilit y P ( ℓ ), defined in the statemen t of Conjecture B, is computed in O ( L 2 ). Ov erall comple xit y - O ( L 3 + L 2 · C LP ( L )) = O ( L 2 · C LP ( L )). 5. EF-compB - Our hea viest (and best-p erformance) a lgorithm conducts a semi-emp irical test: for eac h supp ort size ℓ , pic k 24 Joseph Shtok and Michael Elad M >> L random subsets of columns of size ℓ , and emplo y the analysis detailed in 4.2. The computational cost of sin- gle supp ort treatmen t is O ( ℓ 2 · log ( ℓ )). Overall complexit y is O ( L 2 · C LP ( L ) + M · L 2 · log ( L)). As se en from the analysis a b ov e, only the EF- compB ha s non- negligible computational complexit y . When comparing EF- emp and EF-compB, w e can concen trate on the relativ e complexities of linear programming solv er v ersus the O ( ℓ 2 · l og ( ℓ )) of the partit ion algorithm, and the b enefit of the later is ev iden t. 5.5 Comparison Results Figure 1 presen ts the obtained graphs of the v arious EF-s functions described ab ov e, for the three dictionaries describ ed at the top of this section. As w e se e from the left-side graphs in the figures, for all the dic- tionaries the empirically established support size whic h admits BP re- co v ery is at least 4 0 columns. Note that this relativ e n um b er of columns is also predicted in [10], ho w ev er, this holds true only asymptotically (for dictionaries of grow ing sizes) and for specific ra ndom dictionaries. Returning t o statemen ts whic h hold for our mo dest size of 1 28 × 256, w e notice that the estimation made by the sampling algorithm based on the Capacity Matrix (EF- compB) is muc h b etter than the Classical b ound, established so far in the literature. The difference is esp ecially high for the D-Sp oiled dic tionary , whic h reflects the fact that metho ds based on c ap acity sets manage w ell the non-uniform dis tribu- tion of inner pro ducts. On the right side of eac h figure w e displa y v arious metho d dev el- op ed in t his w ork. Noticeably , the results of Conjecture B(EF-thmB) Analysis of Basis Pursuit Via Capacity Sets 25 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 0 0.2 0.4 0.6 0.8 1 Support size Fraction of recovered supports Dictionary D−Random EF−CB EF−compB EF−emp 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 0.2 0.4 0.6 0.8 1 Support size Fraction of recovered supports Dictionary D−Random EF−CB EF−GB EF−thmA EF−thmB EF−compB 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 0 0.2 0.4 0.6 0.8 1 Support size Fraction of recovered supports Dictionary D−Spoiled EF−CB EF−compB EF−emp 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 0.2 0.4 0.6 0.8 1 Support size Fraction of recovered supports Dictionary D−Spoiled EF−CB EF−GB EF−thmA EF−thmB EF−compB 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 0 0.2 0.4 0.6 0.8 1 Support size Fraction of recovered supports Dictionary D−DCT EF−CB EF−compB EF−emp 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 0.2 0.4 0.6 0.8 1 Support size Fraction of recovered supports Dictionary D−DCT EF−CB EF−GB EF−thmA EF−thmB EF−compB Figure 1: Estimation F unctions for v arious dictionaries of size 128 × 256. 26 Joseph Shtok and Michael Elad are stronger tha n those of Theorem A (EF-thmA), whic h is explained b y the b enefit of using the Capacity Matrix rather t han the Capacit y V ector. This b enefit is expressed in the ratio v alues g iv en in T a bles 1.1, 1.2 and explained thereafter. Apparently , Conjecture B do es not express the full pow er of the Capacit y Matrix estimation, since the sampling algorithm ba sed o n its v alues (EF-compB) outp erforms EF- thmB b y 15 − 20%. This algorithm pro duces v alues whic h a re quite close to the Grassmanian Bound, the b est p ossible b ound one can hop e to obtain using coherence-based estimation for the given dictionary size. W e do not ha v e enough information to explain the fact that v alues of EF-compB and of Grassmanian bound nearly coincide for a ll the dictio- naries discussed here (and additional ones ex amined during the w ork); Disco v ering the reason underlying this connection may b e a lead to imp ortant ins igh ts regarding the Basis Pursuit p erformance. App endix A W e prov e the claim 3.4. Theorem C. F or the two r andom varia bles, x ℓ and y ℓ , define d in 3.3, the fo l lowing r elations b etwe en the first and se c ond moments hold: E ( x ℓ ) = E ( y ℓ ) a nd v ar ( x ℓ ) ≤ v ar ( y ℓ ) . (A-1) Pro of: W e b egin b y in tro ducing some notat ion. Fix the supp ort size 1 ≤ ℓ ≤ L . F or any 1 ≤ k ≤ ℓ , w e denote by C k ℓ the collection of all ℓ - sized non-ordered m ultisets of indices from Ω (with rep etitions), whic h ha v e precisely k distinct elemen ts eac h. F or instance, { 1 , 4 , 5 , 4 , 7 } and { 5 , 1 , 7 , 4 , 4 } are tw o distinct elemen ts of C 4 5 . Suc h m ultiset will b e sometimes referred to as ”index s et”. Also, we define D n ℓ = C ℓ ℓ ∪ C ℓ − 1 ℓ ∪ Analysis of Basis Pursuit Via Capacity Sets 27 ... ∪ C ℓ − n ℓ , the collection of all ℓ -sized m ultisets ha ving at least ℓ − n distinct elemen ts. In this notatio n, x ℓ is a random v ariable with uniform distribution o v er the domain D 0 ℓ , whic h admits v alue P k ∈ Λ q k on a giv en elemen t Λ ∈ D 0 ℓ . The v ariable y ℓ has the same definition on a larger domain D ℓ − 1 ℓ , con taining the domain of x ℓ . Therefore, we t reat bo th x ℓ and y ℓ as restrictions of the same uniformly distributed random v aria ble x on the corresp o nding domains: x ℓ = x |D 0 ℓ , y ℓ = x |D ℓ − 1 ℓ . In the pro of w e use the follo wing basic prop ert y of the v aria nce: Prop osition 5.1. L et z b e a r andom va riable define d over a dom a in given as the disjoint union D = D 1 ∪ D 2 ∪ ... ∪ D n , with unifo rm distribution. Denote v = v ar ( z |D ) , v i = v ar ( z |D i ) , s i = |D i | . Then v = P n i =1 s i v i P n i =1 s i . P art 1. The exp ectation of the random v aria ble x restricted to D 0 ℓ is computed b y E ( x |D 0 ℓ ) = 1 |D 0 ℓ | X Λ ∈D 0 ℓ X k ∈ Λ q k . This sum con tains |D 0 ℓ | · ℓ elemen ts, and for eac h j ∈ Ω, q j app ears in it the same n um b er of times. Therefore, each q j app ears |D 0 ℓ | ℓ L times, and w e hav e E ( x |D 0 ℓ ) = ℓ L X k ∈ Ω q k = ℓE q . The mean o f x |D ℓ − 1 ℓ is computed similarly: E ( x |D ℓ − 1 ℓ ) = 1 |D ℓ − 1 ℓ | X Λ ∈D ℓ − 1 ℓ X k ∈ Λ q k . Here eac h q j app ears |D ℓ − 1 ℓ | ℓ L times, and w e hav e E ( x |D ℓ − 1 ℓ ) = ℓ L X k ∈ Ω q k = ℓE q . This prov es our first claim, E ( x ℓ ) = E ( y ℓ ). F or the rest of the pro of, where only the v ariance o f the tw o v a riables is considered, w e 28 Joseph Shtok and Michael Elad assume w.l.g. that the exp ectation of x ℓ and y ℓ is zero (in the light of equalit y v ar ( z ) = v ar ( z − E ( z ) for any random v aria ble z ), that is E q = 0. P art 2. W e consider the extension of x , defined so far on domain comprising of distinct ℓ -sized index sets, to the domain where eac h suc h set may appear an y finite num b er of times. x still has a uniform distribution o v er this collection. Th us, a dis join t union of tw o or more (non-necessarily distinct) index sets is a sub-domain to whic h x ma y b e restricted. F or an y 0 ≤ n < ℓ , w e define tw o disjoin t unions A n = [ Γ ∈D n ℓ − 1 { Γ ∪ { j } | j ∈ Γ } , B n = [ Γ ∈D n ℓ − 1 { Γ ∪ { j } | j ∈ Ω } (In the definition of A n , the set Γ ∪ { j } is added to the collection one time for eac h a pp earance of j in Γ.) Let Λ ∈ C k ℓ b e a set whic h con tains distinct indices j 1 , ..., j k with m ultiplicities m 1 , .., m k (so that P k i =1 m i = ℓ ). F or eac h 1 ≤ i ≤ k , Λ is obtained in A n m i − 1 times in the form Γ ∪ { j i } for an appropriate Γ = Γ i ∈ C k ℓ − 1 (this claim also holds v a cuously for m i = 1). Therefore, the n um b er of copies of Λ in A n equals P k i =1 ( m i − 1 ) = ℓ − k . Also, Λ appear s in B n precisely o nce for each j 1 , ..., j k , in the fo rm Γ ∪ { j i } (for an appropriate Γ = Γ i eac h time). Therefore, B n con tains k copies of Λ. Denote a disjoint union of a distinct copies of some collection C b y a · C . Then w e can write A n , B n as A n = 0 · C ℓ ℓ ∪ 1 · C ℓ − 1 ℓ ∪ ... ∪ n · C ℓ − n ℓ (A-2) Analysis of Basis Pursuit Via Capacity Sets 29 B n = ℓ · C ℓ ℓ ∪ ( ℓ − 1) · C ℓ − 1 ℓ ∪ ... ∪ ( ℓ − n ) · C ℓ − n ℓ (A-3) W e prov e the following ineq ualit y: v ar ( x |B n ) ≤ v ar ( x |A n ) . Since E q = 0 b y our assumption, the exp ectations of x |A n and x |B n also equal zero: b y the argument similar to one presen ted in the first part of the pro of, E ( x |A n ) = E ( x |B n ) = ℓ · E q . Thus w e hav e v ar ( x |A n ) = 1 |D n ℓ − 1 | X Γ ∈D n ℓ − 1 1 ℓ − 1 X j ∈ Γ ( X k ∈ Γ q k + q j ) 2 . F or the brevit y of the argument w e introduce the notation q Γ = X k ∈ Γ q k . Then v ar ( x |A n ) reads as v ar ( x |A n ) = 1 |D n ℓ − 1 | X Γ ∈D n ℓ − 1 1 ℓ − 1 X j ∈ Γ ( q 2 Γ + q 2 j + 2 q Γ q j ) = = 1 |D n ℓ − 1 | X Γ ∈D n ℓ − 1 q 2 Γ + 1 ℓ − 1 X j ∈ Γ ( q 2 j + 2 q Γ q j ) . Similarly , w e ha ve v ar ( x |B n ) = 1 |D n ℓ − 1 | X Γ ∈D n ℓ − 1 1 L X j ∈ Ω ( X k ∈ Γ q k + q j ) 2 = = 1 |D n ℓ − 1 | X Γ ∈D n ℓ − 1 q 2 Γ + 1 L X j ∈ Ω ( q 2 j + 2 q Γ q j ) . The summand 1 |D n ℓ − 1 | X Γ ∈D n ℓ − 1 q 2 Γ app ears in b o th expressions hence cancels out . W e consider the term 1 |D n ℓ − 1 | X Γ ∈D n ℓ − 1 1 ℓ − 1 X j ∈ Γ q 2 j in v ar ( x |A n ). The elemen t q 2 a app ears in it same num b er of times for ev ery a ∈ Ω. Hence 1 |D n ℓ − 1 | X Γ ∈D n ℓ − 1 1 ℓ − 1 X j ∈ Γ q 2 j = 1 L X a ∈ Ω q 2 a . By same argumen t, in 30 Joseph Shtok and Michael Elad the expres sion o f v ar ( x |B n ) we ha ve 1 |D n ℓ − 1 | X Γ ∈D n ℓ − 1 1 L X j ∈ Ω q 2 j = 1 L X a ∈ Ω q 2 a , hence this quadratic term also cance ls out. In the ligh t of these obs er- v ations, w e obtain v ar ( x |A n ) − v ar ( x |B n ) = 2 |D n ℓ − 1 | X Γ ∈D n ℓ − 1 q Γ ( 1 ℓ − 1 X i ∈ Γ q i − 1 L X j ∈ Ω q j ) . Here we substitute ag ain q Γ for X i ∈ Γ q i and recall 1 L X j ∈ Ω q j = E q = 0. Th us, w e hav e v ar ( x |A n ) − v ar ( x |B n ) = 2 ( ℓ − 1) |D n ℓ − 1 | X Γ ∈D n ℓ − 1 q 2 Γ ≥ 0 . In order to use this result for the pro of of t he theorem, w e make the follo wing observ ations : Denote v n = v ar ( x |C n ℓ ) and s n = |C n ℓ | . By virtue of the decomp osition (A-2), v ar ( x |A n ) can b e written as v ar ( x |A n ) = P n i =0 i · s ℓ − i v ℓ − i P n i =0 i · s ℓ − i (see Prop osition 5.1). Similarly , w e ha v e v ar ( x |B n ) = P n i =0 ( ℓ − i ) · s ℓ − i v ℓ − i P n i =0 ( ℓ − i ) · s ℓ − i . W e compute the co efficien ts of v i in the expres sion v ar ( x |A n ) − v ar ( x |B n ) = P n i =0 i · s ℓ − i v ℓ − i P n i =1 i · s ℓ − i − P n i =0 ( ℓ − i ) · s ℓ − i v ℓ − i P n i =1 ( ℓ − i ) · s ℓ − i . F or an y 0 ≤ k ≤ n , the coefficien t of v l − k is 1 D en s ℓ − k k n X i =1 ( ℓ − i ) · s ℓ − i − ( ℓ − k ) n X i =1 i · s ℓ − i ! = 1 D en ℓ · s ℓ − k P n i =0 ( k − i ) s ℓ − i , with D en = n X i =1 i · s ℓ − i · n X i =1 ( ℓ − i ) · s ℓ − i . Analysis of Basis Pursuit Via Capacity Sets 31 W e denote α ℓ − k = ℓ n X i =0 ( k − i ) s ℓ − i , f or 1 ≤ k ≤ n , in or der to write the ab o v e difference as 0 ≤ v ar ( x |A n ) − v ar ( x |B n ) = 1 D en n X k =0 α ℓ − k s ℓ − k v ℓ − k . (A-4) The constan t 1 D en is p o sitiv e, since n < ℓ . Thus ,it can b e omitted while preserving the inequalit y: 0 ≤ n X k =0 α ℓ − k s ℓ − k v ℓ − k . (A-5) The co efficien ts in this ex pression ha v e the t w o following prop erties: 1. P n k =0 s ℓ − k α ℓ − k = 0 . 2. ∀ j, α j − 1 − α j = ℓ P n i =0 s ℓ − i . T o sho w t he first equalit y , w e consider the sum in (1 ) a s the linear com bination of t he elemen t s s ℓ − i s ℓ − j , i, j = 0 , ..., n . The co efficien t of s ℓ − i s ℓ − i is zero for a n y i . F or any i 6 = j , s ℓ − i s ℓ − j app ears just in tw o comp onen ts o f the sum ab o v e, namely , s ℓ − i α ℓ − i and s ℓ − j α ℓ − j . Sp ecifically , α ℓ − i con tains the summand ℓ ( i − j ) s ℓ − j , and α ℓ − j con tains the sum mand ℓ ( j − i ) s ℓ − i , t herefore in the sum s ℓ − i α ℓ − i + s ℓ − j α ℓ − j the co efficien t of s ℓ − i s ℓ − j is zero. The second prop ert y follows from t he definition of α i . In the ligh t of the first prop ert y , A-5 can b e written as ( n X k =1 α ℓ − k s ℓ − k ) v ℓ ≤ n X k =1 α ℓ − k s ℓ − k v ℓ − k . (A-6) Equipped with these observ a tions, we pro v e, b y induction on n , the inequalit y v ar ( x |D 0 ℓ ) ≤ v ar ( x |D n ℓ ) . 32 Joseph Shtok and Michael Elad for an y n = 1 , ..., ℓ − 1. The theorem follows for n = ℓ − 1. By Prop osition 5.1 , v ar ( x |D n ℓ ) = P n i =0 s ℓ − i v l − i P n i =0 s ℓ − i , and v ar ( x |D 0 ℓ ) is just v ℓ . Th us w e need to prov e v ℓ ≤ P n i =0 s ℓ − i v l − i P n i =0 s ℓ − i , or ( n X i =1 s ℓ − i ) v ℓ ≤ n X i =1 s ℓ − i v l − i . (A-7) F or n = 1, A-6 re ads as α ℓ − 1 s ℓ − 1 v ℓ ≤ α ℓ − 1 s ℓ − 1 v ℓ − 1 . Here α ℓ − 1 = ℓs ℓ > 0, th us w e obtain the inequalit y s ℓ − 1 v ℓ ≤ s ℓ − 1 v ℓ − 1 , as required. Now, w e assume b y induction that inequalit y A- 7 holds up to n − 1 and pro v e for n . W e use (A-6): ( E 1) : ( n X k =1 α ℓ − k s ℓ − k ) v ℓ ≤ n X k =1 α ℓ − k s ℓ − k v ℓ − k . This inequality undergo es a series o f transformat ions designed to bring it to the form of A-7 . First, w e ha v e α ℓ − 1 < α ℓ − 2 . Since v ℓ ≤ v ℓ − 1 b y the pro of for n = 1, w e ha v e an ine qualit y ( d 1) : ( α ℓ − 2 − α ℓ − 1 ) s ℓ − 1 v ℓ ≤ ( α ℓ − 2 − α ℓ − 1 ) s ℓ − 1 v ℓ − 1 Adding ( d 1) to the ineq ualit y ( E 1) , w e arriv e at ( E 2) : α ℓ − 2 ( s ℓ − 1 + s ℓ − 2 ) + n X k =3 α ℓ − k s ℓ − k ! v ℓ ≤ ≤ α ℓ − 2 ( s ℓ − 1 v ℓ − 1 + s ℓ − 2 v ℓ − 2 ) + n X k =3 α ℓ − k s ℓ − k v ℓ − k . Analysis of Basis Pursuit Via Capacity Sets 33 Second, by induction ass umption for n = 2 w e ha v e the inequality ( s ℓ − 1 + s ℓ − 2 ) v ℓ ≤ s ℓ − 1 v ℓ − 1 + s ℓ − 2 v ℓ − 2 . Also, α ℓ − 2 ≤ α ℓ − 3 as noticed earlier. Then w e can construct the next inequalit y in order to add it to ( E 2): ( d 1) : ( α ℓ − 3 − α ℓ − 2 )( s ℓ − 1 + s ℓ − 2 ) v ℓ ≤ ( α ℓ − 3 − α ℓ − 2 )( s ℓ − 1 v ℓ − 1 + s ℓ − 2 v ℓ − 2 ) This results in the follo wing expres sion: ( E 3) : α ℓ − 3 3 X i =1 s ℓ − i + n X k =4 α ℓ − k s ℓ − k ! v ℓ ≤ ≤ α ℓ − 3 3 X i =1 ( s ℓ − i v ℓ − i ) + n X k =4 α ℓ − k s ℓ − k v ℓ − k . In this fashion w e make n − 1 steps resulting in the inequalit y ( E ( n )) : ( α ℓ − n n X i =1 s ℓ − i ) v ℓ ≤ α ℓ − n n X i =1 s ℓ − i v ℓ − i Notice that α ℓ − n is p ositiv e: α ℓ − n = s ℓ − n ℓ ( ns ℓ +( n − 1) s ℓ − 1 + ... + s ℓ − n +1 ). Th us, w e obtain the de sired result. As men tioned, the theorem follo ws for n = ℓ − 1. ✷ App endix B W e prov e the equalit y of exp ectations E ( x ℓ ) = E ( y ℓ ) , (B-1) for random v ariables x ℓ and y ℓ defined in the pro o f of Conjecture B. Recall that y ℓ is a sum of ℓ 2 v alues from Q , uniformly distributed o v er this matrix, therefore E ( y ℓ ) = ℓ 2 E Q . W e show E ( x ℓ ) = ℓ 2 E Q , t o o, by con- siderations of symmetry , similar to tho se used in the proo f of Theorem A, part 1. 34 Joseph Shtok and Michael Elad Namely , w e consider a totality P ℓ of partitions of all ℓ -sized sup- p orts Λ ⊂ Ω, into ordere d pairs of indices . An elemen t in this collection is therefore a pair (Λ , I Λ ). W e clarify that the index sets Λ ⊂ Ω are c hosen without rep etitions and up to a p erm utation of their elemen ts. No w, let ( i, j ) b e an ordered pair o f indices from Ω. W e argue that the n um b er of app earances of this pair in the eleme n ts of P ℓ do es not dep end on c hoice of i and j . Indeed, this nu m b er is just the size of the collection P ℓ − 2 , built for submatrix of Q with i -th a nd j -th row s and columns missing. Since x ℓ (Λ , I Λ ) is the sum P ( i,j ) ∈I Λ Q ( i, j ), w e conclude that all the elemen ts Q ( i, j ) con tribute to the v alue of x ℓ with equal probabilit y , hence E ( x ℓ ) = ℓ 2 E Q as desired. No w w e pro vide an empirical evide nce to the claim v ar ( x ℓ ) ≤ v ar ( y ℓ ) (B-2) Statistical data b elo w supp orts this inequality . While the v ariance of y ℓ is kno wn precis ely , for x ℓ w e estimate it b y drawing 10 4 random subsets of indices for each supp ort size up to half the signal dimension of the dictiona ry . Results are presen ted in Figure 2. The computation is carried out for a n um b er of dictionary s izes on dictionary D -Ra ndom. As can b e seen fro m these figures, the g ap b et w een v ar ( x ℓ ) and v ar ( y ℓ ) is roughly prop ortional to the support size. Same experimen ts on dictionary D -DCT display differen t results: the v a riance of bo th v ariables coincides. As nu m b er of samples grows , w e observ e that the difference o f v ariance v alues, for a ll supp ort sizes, tends to zero. W e conclude that for this specific dictionary , B-2 is an equalit y . Analysis of Basis Pursuit Via Capacity Sets 35 0 5 10 15 20 0 2 4 6 8 10 Half of support size l Variance D−Random 32x64 var(x l ) var(y l ) 0 10 20 30 40 0 1 2 3 4 5 Half of support size l Variance D−Random 64x128 var(x l ) var(y l ) 0 20 40 60 80 0 0.5 1 1.5 2 2.5 3 Half of support size l Variance D−Random 128x256 var(x l ) var(y l ) Figure 2: The v ari ances of x ℓ and y ℓ (scaled by 10 3 ) 36 Joseph Shtok and Michael Elad References [1] E .J . Cand` es, J. Romberg, a nd T. T ao , Robust uncer taint y principles: exact signal reconstructio n fro m highly incomplete frequency information. IEEE T r ans. on Information The ory , vol. 52 (2), pp. 489–5 09, 20 06. [2] E . Cand` es, J. Romberg, Qua nt itative r obust uncertaint y principles and optimally sparse decomp os itions, F oundations of Computational Mathe- matics , V o l. 6, Issue 2, April 2006. [3] E .J . Cand ` es and T. T ao, D eco ding by linear progra mming, IEEE T r ans. Inform. The ory , vol. 5 1, pp. 4 2 03–4 215, 2 005. [4] S.S. Chen, Basis pursuit, Ph.D. dissertation, Stanford Univ., Stanford, CA, No v. 1 995 [5] S.S. Chen, D. L. Dono ho , and M. A. Saunders, Atomic dec omp osition by basis pursuit, SIAM J. Scientific Computing , vol. 20, no. 1, pp. 33– 61, 1999. [6] D.L. Donoho, F or most la r ge underdetermined s ystems of linear equations the minimal l 1 -norm solution is also the s parsest solution, Communic a- tions On Pur e And Applie d Mathematics , vol. 5 9, No. 6, pp. 797– 829, June 2006. [7] D.L. Dono ho, Neighbor ly polyto p es a nd sparse solution of underdeter- mined linear equa tions, T echnical rep ort. Stanfor d University , Depart- men t of Statistics, #200 5-04 ( 2005 ) [8] D.L. Donoho and M. Elad, Optimally spar se representation in g eneral (non- orthog onal) dictionar ies v ia l 1 minimization, Pr o c e e dings of the Na- tional A c ademy of Scienc es , v ol 10 0(5), pp. 2 1 97–2 202, 2 003. [9] D.L. Donoho and X. Huo, Uncertaint y principles and ideal atomic deco m- po sition, IEEE T r ans. On Information Th e ory , 47 (7):2845– 2862 , 1999 . [10] D.L. Donoho a nd J. T anner, Thresholds for the recov ery o f sparse so- lutions vial L1 Minimization, 40th Annual Confer enc e on Information Scienc es a nd Systems, 200 6 [11] M. Elad and A.M. Bruckstein, A generaliz e d uncertaint y principle and sparse r epresentations in pairs of bas es, IEEE T r ans. Info. Thry , v o l. 49, pp. 2558–25 67, 20 02 [12] J.J. F uchs, O n sparse re pr esentations in arbitr a ry redundant bases, IEEE T r ans. on Information The ory , vol. 5 0(6), pp. 13 41–1 344, 2 004. [13] R. Gr ib o nv al a nd M. Nielsen, Sparse decomp ositions in unions of bases , IEEE T r ans. on Information The ory , vol 49(12), pp. 3320–332 5, 200 3. [14] G.H. Har dy , J .E. Littlew o o d, and G. P´ oly a, Ine qualities , 2nd ed. Cam- bridge, England: Cambridge Universit y P ress, pp. 43 –45 a nd 123, 19 88. [15] D.M. Maliouto v, M. Cetin, and A.S. Willsky , O ptimal spar se representa- tions in general ov ercomplete bases, IEEE In t ernational Confer enc e on A c oustics, Sp e e ch, and Signal Pr o c essing - ICASSP , May 2004, Montreal, Canada. [16] B.K. Nata r a jan, Sparse approximate solutions to linear systems, SIAM Journal on C omputing , v ol. 24, pp. 227 –234, 199 5. [17] T. Strohmer and R.W. Heath, Jr., Gra ssmannian F rames with applica- tions to co ding and communications, Applie d and Computational Har- monic Analysis , v o l. 14, Issue 3, pp. 257 –275 , May 2003. [18] T. T a o, An uncertain t y principle for cyclic groups of prime o rder, Math. R es. L ett. , v o l. 12, no. 1, pp 121 –127 , 2005 . [19] J. T ropp, Ra ndo m sub dictionar ies o f rando m dictionar ies, P r eprint, 200 6. [20] J. T ropp, Gr eed is g o o d: algoritmic results for s parse approximation, IEEE T r ans. Inform. The ory , vol. 50, no . 10, pp. 2231– 2242 , Octob er 2004.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment