On Recovery of Sparse Signals via $ell_1$ Minimization

This article considers constrained $\ell_1$ minimization methods for the recovery of high dimensional sparse signals in three settings: noiseless, bounded error and Gaussian noise. A unified and elementary treatment is given in these noise settings f…

Authors: T. Tony Cai, Guangwu Xu, Jun Zhang

On Reco v ery of Sparse Signals via ℓ 1 Minimization T. T on y Cai ∗ Guangwu Xu † and Jun Zhang ‡ Abstract This article considers constrained ℓ 1 minimization metho ds for the reco v ery of high dimensional sp arse signals in three settings: n o iseless, b ounded err o r and Gaus- sian noise . A unified and elemen tary treatmen t is giv en in these noise settings f o r t w o ℓ 1 minimization metho ds: the Dan tzig selector and ℓ 1 minimization with an ℓ 2 constrain t. The r e su lt s of this pap er improv e th e existing results in the literature b y w eak ening the co nd iti ons and tig htening the error b ounds . The impr o v emen t on the conditions shows that signals with larger supp ort can b e reco v ered accurately . This pap er also establishes connections b et we en restricted isometry p roperty and the m utual incoherence p roperty . S ome results of Candes, Romb erg and T ao (2006) and Donoho, Elad, an d T emlya ko v (2006) are extended. Keyw ords: Da n tzig selector, ℓ 1 minimization, La sso, ov ercomplete represen tation, sparse reco v ery , sparsit y . 1 In tro duct ion The problem of reco v ering a high-dimensional sparse signal based on a small n um b er of measuremen ts, p ossibly corrupted b y noise, has attra cted m uc h recen t a t t e ntion. This problem arises in many differen t settings, including mo del selection in linear regression, constructiv e appro ximation, in v erse problems, a nd compressiv e sensing. Supp ose w e hav e n o bserv ations of the form y = F β + z (1.1) ∗ Department of Statistics, The Wharton School, Univ ers ity of Pennsylv ania, P A, USA; e-mail: tcai@w harton.up e nn.edu . Resear c h supp orted in pa rt by NSF Grant DMS-06 04954. † Department of EE & CS, University of Wisconsin-Milwauk ee, WI, USA; e- mail: gxu4u wm@uwm.edu ‡ Department of EE & CS, University of Wisconsin-Milwauk ee, WI, USA; e- mail: junzh ang@uwm.ed u 1 where the matrix F ∈ R n × p with n ≪ p is give n and z ∈ R n is a v ector of measuremen t errors. The g oal is to reconstruct the unkno wn v ector β ∈ R p . Dep e nding on settings, the error v ector z can either b e zero ( in the noiseless case), b ounded, or Gaussian where z ∼ N (0 , σ 2 I n ). It is now w ell understo o d that ℓ 1 minimization pro vides an effectiv e w a y for reconstructing a sparse signal in all three settings. A sp ecial case of particular in terest is when no noise is presen t in (1 .1 ) and y = F β . This is an underdetermined system of linear equations with mo r e v ariables tha n the n um b er of equations. It is clear that the problem is ill- posed and there are generally infinite man y solutions. Ho w ev er, in many applications the v ector β is kno wn to b e sparse or nearly sparse in the sense that it con tains only a small num b er of nonzero en tries. This sparsit y assumption fundamen tally c hanges the problem, making unique solution p ossible. Indeed in man y cases the unique sparse solution can b e found exactly through ℓ 1 minimization: ( P ) min k γ k 1 sub ject t o F γ = y . (1.2) This ℓ 1 minimization problem has b een studied, for example, in F uch s [11], Candes a nd T ao [4] and Do noho [6]. Understanding the noiseless case is not o nly of significan t in terest on its o wn righ t, it a ls o pro vides deep insight into the problem of reconstructing sparse signals in the no is y case. See, fo r example, Candes and T ao [4, 5] a nd Donoho [6, 7 ]. When noise is presen t, there are tw o w ell known ℓ 1 minimization metho ds. One is ℓ 1 minimization under the ℓ 2 constrain t on the residuals: ( P 1 ) min k γ k 1 sub ject t o k y − F γ k 2 ≤ ǫ. (1.3) W riting in t erms of the Lagrangia n function of ( P 1 ), this is closely related to finding the solution to the ℓ 1 regularized least squares: min γ  k y − F γ k 2 2 + ρ k γ k 1  . (1.4) The latt er is often called the L ass o in the statistics literature (Tibshirani [13]). T ropp [14] ga ve a detailed treatmen t of the ℓ 1 regularized least squares problem. Another metho d, called t he Dantzig selector, is recen tly prop osed b y Candes and T ao [5]. The Da n tzig selector solves the sparse recov ery pro ble m through ℓ 1 -minimization with a constrain t on the correlation b et w een the residuals a nd the column ve ctors of F : ( D S ) min γ k γ k 1 sub ject to k F T ( y − F γ ) k ∞ ≤ λ. (1.5) Candes and T a o [5 ] sho w ed that the Dantzig selector can b e computed by solving a linear program and it mimics the p erformance of a n ora c le pro cedure up to a log a rithmic factor log p . 2 It is clear that regula r ity conditions are needed in order for these problems to b e well b eha v ed. Ov er the last few years, man y interes ting results for reco v ering sparse signals ha v e b een obtained in the framew ork of the R estricte d Isome try Pr op erty (RIP). In their seminal w ork [4, 5], Candes a nd T ao considered sparse recov ery problems in the RIP framework. They prov ided b eautiful solutions to t he problem under some conditions on the r e stricted isometry constant and restricted orthogona lity constan t (defined in Section 2) . Sev eral differen t conditions ha ve b ee n imp osed in v arious settings. In this pap er, w e consider ℓ 1 minimization metho ds for the sparse reco v ery pro ble m in three cases: noiseless, b ounded error and Gaussian noise. Both the Dantzig selector (DS) and ℓ 1 minimization under the ℓ 2 constrain t ( P 1 ) are considered. W e giv e a unified and elemen tary treatmen t for t he tw o metho ds under the three noise settings. Our results impro v e on the existing results in [2, 3, 4, 5] b y we ake ning the conditions and tightening the error b ounds. In all cases w e solv e the problems under the w eak er condition δ 1 . 5 k + θ k , 1 . 5 k < 1 where k is the sparsit y index and δ and θ are resp ectiv ely t he restricted isometry constan t and restricted orthog o nalit y constan t defined in Section 2. The impro vem ent on the condi- tion sho ws that signals with lar g er supp ort can b e recov ered. Alt ho ug h our main in terest is o n reco v ering sparse signals, w e state the results in the g e neral setting of reconstructing an arbitrary signal. Another widely used condition for sparse reco v ery is the so called Mutual Inc oher enc e Pr o p erty (MIP) whic h requires the pairwise correlations amo ng the column v ectors of F to b e small. See [8 , 9, 11 , 1 2, 14]. W e establish connections b et w een the concepts of RIP a nd MIP . As a n a pplic atio n, w e presen t an improv emen t to a recen t result o f Donoho, Elad, and T emly ak ov [8]. The pap er is o rganized as follow s. In Section 2, af ter basic notation and definitions are review ed, tw o elemen tary inequalities, whic h allow us to make finer a na ly sis of the sparse reco v ery problem, are introduced. W e b egin the a na ly sis of ℓ 1 minimization metho ds for sparse recov ery b y considering the exact reco v ery in the noiseless case in Section 3. Our result improv es the main result in Candes and T ao [4] b y using we ake r conditions and pro viding t ig h ter error b ounds. The analysis of the noiseless case prov ides insigh t t o the case when the observ ations are con taminated b y noise. W e then consider the case of b ounded error in Section 4. The connections b et w een the RIP and MIP are also explored. The case of Ga ussian noise is treated in Section 5. The App endix con tains the pro ofs o f some tec hnical results. 3 2 Preliminaries In this section we first in tro duce basic notation and definitions, and then dev elop some tec hnical inequalities whic h will b e used in proving our main results. Let p ∈ N . Let v = ( v 1 , v 2 , · · · , v p ) ∈ R p b e a v ector. The supp ort of v is the subset of { 1 , 2 , · · · , p } defined b y supp( v ) = { i : v i 6 = 0 } . F o r an in teger k ∈ N , a v ector v is said to b e k -sp arse if | supp( v ) | ≤ k . F o r a giv en ve ctor v w e shall denote by v max( k ) the v ector v with all but the k -largest entries (in a bs olute v alue) set to zero and define v − max( k ) = v − v max( k ) , the v ector v with the k - largest entries (in absolute v alue) set to zero. W e shall use the standard notation k v k q to denote the ℓ q -norm of the v ector v . Let the matrix F ∈ R n × p and 1 ≤ k ≤ p , the k -r estricte d isom etry c onstant δ k of F is defined to b e the smallest constan t suc h that p 1 − δ k k c k 2 ≤ k F c k 2 ≤ p 1 + δ k k c k 2 (2.1) for ev ery v ector c whic h is k -sparse. If k + k ′ ≤ p , w e can define a nother quan tity , the k , k ′ -r estricte d ortho gonality c onstant θ k ,k ′ , as the smallest n um b er tha t satisfies |h F c, F c ′ i| ≤ θ k ,k ′ k c k 2 k c ′ k 2 , (2.2) for all c and c ′ suc h that c and c ′ are k -sparse and k ′ -sparse resp ectiv ely , a nd ha ve disjoint supp orts. Candes and T ao [4] show ed that the constants δ k and θ k ,k ′ are related by the follo wing inequalities, θ k ,k ′ ≤ δ k + k ′ ≤ θ k ,k ′ + max( δ k , δ k ′ ) . Another useful pro perty is as follow s. Prop osition 2.1 If k + P l i =1 k i ≤ p , then θ k , P l i =1 k i ≤ v u u t l X i =1 θ 2 k ,k i . In p articular, θ k , P l i =1 k i ≤ q P l i =1 δ 2 k + k i . Pro of of Prop osition 2.1. Let c b e k -sparse and c ′ b e ( P l i =1 k i )-sparse. Supp ose their supp orts a r e disjoin t. Decomp ose c ′ as c ′ = c ′ 1 + c ′ 2 + · · · + c ′ l 4 suc h that c ′ i is k i -sparse for i = 1 , · · · , j and supp( c ′ ) i ∩ supp( c ′ ) j = ∅ for i 6 = j . W e hav e |h F c, F c ′ i| = |h F c, l X i =1 F c ′ i i| ≤ l X i =1 |h F c, F c ′ i i| ≤ l X i =1 θ k ,k i k c k 2 k c ′ i k 2 = k c k 2 v u u t l X i =1 θ 2 k ,k i v u u t l X i =1 k c ′ i k 2 2 = v u u t l X i =1 θ 2 k ,k i k c k 2 k c ′ k 2 . This yields θ k , P l i =1 k i ≤ q P l i =1 θ 2 k ,k i . Since θ k ,k ′ ≤ δ k + k ′ , we also ha v e θ k , P l i =1 k i ≤ q P l i =1 δ 2 k + k i . Remark: Differen t conditions on δ and θ hav e b een used in the literature. F or example, Candes and T ao [5] imp oses δ 2 k + θ k , 2 k < 1 a nd Candes [2] uses δ 2 k < √ 2 − 1. A direct consequenc e of Prop osition 2.1 is tha t δ 2 k < √ 2 − 1 is in fact a strictly stronger condition than δ 2 k + θ k , 2 k < 1 since Prop osition 2.1 yields θ k , 2 k ≤ p δ 2 2 k + δ 2 2 k = √ 2 δ 2 k whic h means that δ 2 k < √ 2 − 1 implies δ 2 k + θ k , 2 k < 1. W e no w in tro duce t w o useful elemen tary inequalities. These inequalities allo w us to p erform finer estimation on ℓ 1 , l 2 norms. Prop osition 2.2 L et w b e a p ositive inte ger. F or any desc ending chain of r e al numb ers a 1 ≥ a 2 ≥ · · · ≥ a w ≥ a w +1 ≥ · · · ≥ a 2 w ≥ 0 , we have q a 2 w +1 + a 2 w +2 + · · · + a 2 2 w ≤ a 1 + a 2 + · · · + a w + a w +1 + · · · + a 2 w 2 √ w . Pro of of Prop osition 2.2 . Since a i ≥ a j for i < j , w e ha v e ( a 1 + a 2 + · · · + a 2 w ) 2 = a 2 1 + a 2 2 + · · · a 2 2 w + 2 X i 0 and let T 1 = { n k +1 , n k +2 , · · · , n ( t +1) k } , T 2 = { n ( t +1) k +1 , n ( t +1) k +2 , · · · , n (2 t +1) k } , · · · . F o r a subset E ⊂ { 1 , 2 , · · · , m } , we use I E to denote the c haracteristic function of E , i.e., I E ( j ) = ( 1 if j ∈ E , 0 if j / ∈ E . 7 F o r eac h i , let h i = hI T i . Then h is decomp osed to h = h 0 + h 1 + h 2 + · · · . Note that T i ’s are pairwise disjoin t, supp ( h i ) ⊂ T i , and | T 0 | = k , | T i | = tk for i > 0. Without loss of generalit y , we assume k is divisible by 4. F o r eac h i > 1 , w e divide h i in to tw o halves in the following manner h i = h i 1 + h i 2 with h i 1 = h i I T i 1 , and h i 2 = h i I T i 2 , where T i 1 is the first half of T i , i.e., T i 1 = { n (( i − 1) t +1) k +1 , n (( i − 1) t +1) k +2 , · · · , n (( i − 1) t +1) k + k 2 } , and T i 2 = T i \ T i 1 . W e shall treat h 1 as a sum of four f unc tions a nd divide T 1 in to 4 equal parts T 1 = T 11 ∪ T 12 ∪ T 13 ∪ T 14 with T 11 = { n k +1 , n k +2 , · · · , n k + t k 4 } , T 12 = { n k + t k 4 +1 , · · · , n k + t k 2 } , T 13 = { n k + t k 2 +1 , · · · , n k + t 3 k 4 } and T 14 = { n k + t 3 k 4 +1 , · · · , n k + tk } . W e then define h 1 i for 1 ≤ i ≤ 4 b y h 1 i ( j ) = h 1 I T 1 i . It is clear that h 1 = 4 X i =1 h 1 i . Note that X i ≥ 1 k h i k 1 ≤ k h 0 k 1 + 2 k β − max( k ) k 1 . (3.2) In fact, since k β k 1 ≥ k ˆ β k 1 , w e ha ve k β k 1 ≥ k ˆ β k 1 = k β + h k 1 = k β max( k ) + h 0 k 1 + k h − h 0 + β − max( k ) k 1 ≥ k β max( k ) k 1 − k h 0 k 1 + X i ≥ 1 k h i k 1 − k β − max( k ) k 1 . Since k β k 1 = k β max( k ) k 1 + k β − max( k ) k 1 , this yields P i ≥ 1 k h i k 1 ≤ k h 0 k 1 + 2 k β − max( k ) k 1 . The following claim follow s f rom our Prop osition 2.3. Claim k h 13 + h 14 k 2 + X i ≥ 2 k h i k 2 ≤ P i ≥ 1 k h i k 1 √ tk ≤ k h 0 k 2 √ t + 2 k β − max( k ) k 1 √ tk . (3.3) In fact, from Prop osition 2.3 a nd the fact that k h 11 k 1 ≥ k h 12 k 1 ≥ k h 13 k 1 ≥ k h 14 k 1 , w e hav e k h 12 k 1 + 2 k h 13 k 1 + k h 14 k 1 ≤ 2 3  2 k h 11 k 1 + 2 k h 12 k 1 + k h 13 k 1 + k h 14 k 1  . 8 It then follows fr om Prop osition 2.3 that k h 13 + h 14 k 2 ≤ k h 12 k 1 + 2 k h 13 k 1 + k h 14 k 1 2 q tk 2 ≤ 2 3 2 k h 11 k 1 + 2 k h 12 k 1 + k h 13 k 1 + k h 14 k 1 2 q tk 2 ≤ 2 k h 11 k 1 + 2 k h 12 k 1 + k h 13 k 1 + k h 14 k 1 2 √ tk . Prop osition 2.3 also yields k h 2 k 2 ≤ k h 13 + h 14 k 1 + 2 k h 21 k 1 + k h 22 k 1 2 √ tk and k h i k 2 ≤ k h ( i − 1)2 k 1 + 2 k h i 1 k 1 + k h i 2 k 1 2 √ tk for an y i > 2. Therefore, k h 13 + h 14 k 2 + X i ≥ 2 k h i k 2 ≤ 2 k h 11 k 1 + 2 k h 12 k 1 + k h 13 k 1 + k h 14 k 1 2 √ tk + k h 13 + h 14 k 1 + 2 k h 21 k 1 + k h 22 k 1 2 √ tk + k h 22 k 1 + 2 k h 31 k 1 + k h 32 k 1 2 √ tk + · · · ≤ 2 k h 1 k 1 + 2 k h 2 k 1 + 2 k h 3 k 1 + · · · 2 √ tk = P i ≥ 1 k h i k 1 √ tk b y (3.2) ≤ k h 0 k 1 + 2 k β − max( k ) k 1 √ tk ≤ k h 0 k 2 √ t + 2 k β − max( k ) k 1 √ tk . 9 In the rest of our pro of w e write h 11 + h 12 = h ′ 1 . Note that F h = F ˆ β − F β = 0. So 0 = |h F h, F ( h 0 + h ′ 1 ) i| = |h F ( h 0 + h ′ 1 ) , F ( h 0 + h ′ 1 ) i + h F ( h 13 + h 14 ) , F ( h 0 + h ′ 1 ) i + X i ≥ 2 h F h i , F ( h 0 + h ′ 1 ) i| ( 2.1 , 2.2 ) ≥ (1 − δ ( 1 2 t +1) k ) k h 0 + h ′ 1 k 2 2 − θ 1 2 tk, ( 1 2 t +1) k k h 13 + h 14 k 2 k h 0 + h ′ 1 k 2 − X i ≥ 2 θ tk, ( 1 2 t +1) k k h i k 2 k h 0 + h ′ 1 k 2 ≥ k h 0 + h ′ 1 k 2  (1 − δ ( 1 2 t +1) k ) k h 0 + h ′ 1 k 2 − θ tk, ( 1 2 t +1) k  k h 13 + h 14 k 2 + X i ≥ 2 k h i k 2   ( 3.3 ) ≥ k h 0 + h ′ 1 k 2  (1 − δ ( 1 2 t +1) k ) k h 0 + h ′ 1 k 2 − θ tk, ( 1 2 t +1) k k h 0 k 2 √ t − θ tk, ( 1 2 t +1) k 2 k β − max( k ) k 1 √ tk  ≥ k h 0 + h ′ 1 k 2 (  1 − δ ( 1 2 t +1) k − θ tk, ( 1 2 t +1) k √ t  k h 0 + h ′ 1 k 2 − θ tk, ( 1 2 t +1) k 2 k β − max( k ) k 1 √ tk ) . T ake t = 1. Then k h 0 + h ′ 1 k 2 ≤ 2 θ k , 1 . 5 k 1 − δ 1 . 5 k − θ k , 1 . 5 k k − 1 2 k β − max( k ) k 1 It then follows fr om (3.3) that k h k 2 2 = k h 0 + h ′ 1 k 2 2 + k h 13 + h 14 k 2 2 + X i ≥ 2 k h i k 2 2 ≤ k h 0 + h ′ 1 k 2 2 + ( k h 13 + h 14 k 2 + X i ≥ 2 k h i k 2 ) 2 ≤ 2( k h 0 + h ′ 1 k 2 + 2 k − 1 2 k β − max( k ) k 1 ) 2 ≤ 2  2(1 − δ 1 . 5 k ) 1 − δ 1 . 5 k − θ k , 1 . 5 k k − 1 2 k β − max( k ) k 1  2 . Remarks. 1. Candes and T ao [5] considers the Ga us sian noise case. A sp ecial case with noise lev el σ = 0 of Theorem 1.1 in that pap er impro v es Theorem 3.1 b y weak ening the condition from δ k + θ k ,k + θ k , 2 k < 1 to δ 2 k + θ k , 2 k < 1 . 2. This theorem improv es the r esults in [4, 5]. The condition δ 1 . 5 k + θ k , 1 . 5 k < 1 is we aker than δ k + θ k ,k + θ k , 2 k < 1 and δ 2 k + θ k , 2 k < 1 . 3. No t e that the condition δ 1 . 75 k < √ 2 − 1 implies δ 1 . 5 k + θ k , 1 . 5 k < 1. This is due to the fact δ 1 . 5 k + θ k , 1 . 5 k ≤ δ 1 . 5 k + p δ 2 1 . 75 k + δ 2 1 . 75 k ≤ ( √ 2 + 1) δ 1 . 75 k b y Prop osition 2 .1. The condition δ 1 . 5 k + δ 2 . 5 k < 1 , whic h in v olve s only δ , can also b e used. 4. The quan tity t in the pr o of can b e an y n umber suc h that tk ∈ N . As p oin ted out in [4, 5], other v alues of t ma y b e used f or obtaining some in teresting results. 10 4 Reco v ery of S parse Signals in Boun ded Error W e no w turn to the case of b ounded error. The results obtained in this setting ha v e direct implication for the case of Ga us sian no is e whic h will b e discussed in Section 5. Let F ∈ R n × p and let y = F β + z where the noise z is b ounded, i.e., z ∈ B f o r some b ounded set B . In this case t he noise z can either b e sto c hastic o r deterministic. The ℓ 1 minimization approach is to estimate β b y the minimizer ˆ β of min k γ k 1 sub ject to y − F γ ∈ B . W e shall sp ec ifically consider tw o cases: B = { z : k F T z k ∞ ≤ λ } and B = { z : k z k 2 ≤ ǫ } . Our results impro v e the results in Candes and T ao [4, 5 ] a nd Donoho, Elad and T emly ak ov [8]. W e shall first consider y = F β + z where z satisfies k F T z k ∞ ≤ λ. Let ˆ β b e the solution to the (DS) problem, i.e., ˆ β is obta ine d b y solving min γ ∈ R p k γ k 1 sub ject t o k F T  y − F γ  k ∞ ≤ λ. (4.1) The Dan tzig selector ˆ β has the follo wing prop ert y . Theorem 4.1 Supp ose β ∈ R p and y = F β + z with z satisfying k F T z k ∞ ≤ λ . If δ 1 . 5 k + θ k , 1 . 5 k < 1 , (4.2) then the solution ˆ β to (4.1) ob eys k ˆ β − β k 2 ≤ C 1 k 1 2 λ + C 2 k − 1 2 k β − max( k ) k 1 (4.3) with C 1 = 2 √ 3 1 − δ 1 . 5 k − θ k, 1 . 5 k , and C 2 = 2 √ 2(1 − δ 1 . 5 k ) 1 − δ 1 . 5 k − θ k, 1 . 5 k . In p articular, if β is a k -sp arse ve ctor, then k ˆ β − β k 2 ≤ C 1 k 1 2 λ . Pro of of Theorem 4.1 . W e shall use the same notatio n as in t he pro of of Theorem 3.2. Since k β k 1 ≥ k ˆ β k 1 , letting h = ˆ β − β and following essen tially the same steps as in the first part of the pro of of Theorem 3.2, we get |h F h, F ( h 0 + h ′ 1 ) i| ≥ k h 0 + h ′ 1 k 2  1 − δ 1 . 5 k − θ k , 1 . 5 k  k h 0 + h ′ 1 k 2 − θ k , 1 . 5 k 2 k β − max( k ) k 1 √ k  . 11 If k h 0 + h ′ 1 k 2 = 0, then h 0 = 0 and h ′ 1 = 0. The latter forces tha t h j = 0 for eve ry j > 1, and w e ha v e ˆ β − β = 0. Otherwise k h 0 + h ′ 1 k 2 ≤ |h F h, F ( h 0 + h ′ 1 ) i|  1 − δ 1 . 5 k − θ k , 1 . 5 k  k h 0 + h ′ 1 k 2 + 2 θ k , 1 . 5 k k β − max( k ) k 1  1 − δ 1 . 5 k − θ k , 1 . 5 k  √ k . T o finish the pro of, w e observ e the follo wing. 1. | h F h, F ( h 0 + h ′ 1 ) i| ≤ √ 1 . 5 k 2 λ k h 0 + h ′ 1 k 2 . In fact, let F T 0 ∪ T 10 ∪ T 11 b e the n × (1 . 5 k ) submatrix o bta ine d b y extracting the columns of F according to the indices in T 0 ∪ T 10 ∪ T 11 , as in [5]. Then |h F h, F ( h 0 + h ′ 1 ) i| = |h ( F ˆ β − y ) + z , F T 0 ∪ T 10 ∪ T 11 ( h 0 + h ′ 1 ) i| = |h F T T 0 ∪ T 10 ∪ T 11  ( F ˆ β − y ) + z  , h 0 + h ′ 1 i| ≤ k F T T 0 ∪ T 10 ∪ T 11  ( F ˆ β − y ) + z  k 2 k h 0 + h ′ 1 k 2 ≤ √ 1 . 5 k 2 λ k h 0 + h ′ 1 k 2 . 2. k ˆ β − β k 2 ≤ √ 2  k h 0 + h ′ 1 k 2 + 2 k β − max( k ) k 1 √ k  . In fact, k ˆ β − β k 2 2 = k h k 2 2 = k h 0 + h ′ 1 k 2 2 + k h 13 + h 14 k 2 2 + X i ≥ 2 k h i k 2 2 ≤ k h 0 + h ′ 1 k 2 2 +  k h 13 + h 14 k 2 + X i ≥ 2 k h i k 2  2 b y (3.3) ≤ k h 0 + h ′ 1 k 2 2 +  k h 0 k 2 + 2 k β − max( k ) k 1 √ k  2 ≤ 2  k h 0 + h ′ 1 k 2 + 2 k β − max( k ) k 1 √ k  2 . W e g e t the result by combin ing 1 and 2. This completes the pro of. W e now turn to the second case where the noise z is b ounded in ℓ 2 -norm. Let F ∈ R n × p with n < p . The problem is to reco v er the sparse signal β ∈ R p from y = F β + z where the noise satisfies k z k 2 ≤ ǫ. W e shall again consider constrained ℓ 1 minimization: min k γ k 1 sub ject t o k y − F γ k 2 ≤ η . By using a similar argumen t, w e ha v e the following result. 12 Theorem 4.2 L et F ∈ R n × p . Supp ose β ∈ R p is a k -sp arse ve ctor and y = F β + z with k z k 2 ≤ ǫ . If δ 1 . 5 k + θ k , 1 . 5 k < 1 , (4.4) then for any η ≥ ǫ , the minimizer ˆ β to the pr oblem min k γ k 1 subje ct to k y − F γ k 2 ≤ η ob eys k ˆ β − β k 2 ≤ C ( η + ǫ ) (4.5) with C = √ 2(1+ δ 1 . 5 k ) 1 − δ 1 . 5 k − θ k, 1 . 5 k . Pro of of Theorem 4.2 . No tice that the condition η ≥ ǫ implies that k ˆ β k 1 ≤ k β k 1 , so w e can use the first part of t he pro of o f Theorem 3.2. The notation used here is the same as that in t he pro of of Theorem 3.2. First, w e hav e k h 0 k 1 ≥ X i ≥ 1 k h i k 1 , and k h 0 + h ′ 1 k 2 ≤ |h F h, F ( h 0 + h ′ 1 ) i| k h 0 + h ′ 1 k 2  1 − δ 1 . 5 k − θ k , 1 . 5 k  . Note that k F h k 2 = k F ( β − ˆ β ) k 2 ≤ k F β − y k 2 + k F ˆ β − y k 2 ≤ η + ǫ. So k ˆ β − β k 2 ≤ √ 2 k h 0 + h ′ 1 k 2 ≤ √ 2 k F h k 2 k F ( h 0 + h ′ 1 ) k 2 k h 0 + h ′ 1 k 2  1 − δ 1 . 5 k − θ k , 1 . 5 k  ≤ √ 2 ( η + ǫ )(1 + δ 1 . 5 k ) k h 0 + h ′ 1 k 2 k h 0 + h ′ 1 k 2  1 − δ 1 . 5 k − θ k , 1 . 5 k  ≤ √ 2( η + ǫ )(1 + δ 1 . 5 k ) 1 − δ 1 . 5 k − θ k , 1 . 5 k . Remarks: 1. Candes, Rom b erg and T ao [3] show ed that, if δ 3 k + 3 δ 4 k < 2, then k ˆ β − β k 2 ≤ 4 √ 3 − 3 δ 4 k − √ 1 + δ 3 k ǫ. 13 (The η w as set to b e ǫ in [3].) No w supp ose δ 3 k + 3 δ 4 k < 2. This implies δ 3 k + δ 4 k < 1 whic h yields δ 2 . 4 k + θ 1 . 6 k , 2 . 4 k < 1, since δ 2 . 4 k ≤ δ 3 k and θ 1 . 6 k , 2 . 4 k ≤ δ 4 k . It then follo ws from Theorem 4.2 tha t, with η = ǫ , k ˆ β − β k 2 ≤ 2 √ 2(1 + δ 1 . 5 k ′ ) 1 − δ 1 . 5 k ′ − θ k ′ , 1 . 5 k ′ ǫ for all k ′ -sparse v ector β where k ′ = 1 . 6 k . Therefore Theorem 4.2 improv es the a bov e result in Candes, R om berg and T a o [3] by enlarging the supp ort of β b y 6 0%. 2. Similar to Theorems 3.2 and 4.1, we can hav e the estimation without assuming that ˆ β is k -sparse. In the general case, w e hav e k ˆ β − β k 2 ≤ C ( η + ǫ ) + 2 √ 2 θ k , 1 . 5 k (1 − δ 1 . 5 k ) 1 − δ 1 . 5 k − θ k , 1 . 5 k k − 1 2 k β − max( k ) k 1 . Connections b et w een RIP and MIP In addition to the restricted isometry pro p ert y (RIP), another commonly used conditio n in the sparse reco ve ry literature is the so-called mu tual incoherence prop ert y (MIP). The m utual incoherence prop ert y of F requires t ha t the coherence b ound M = max 1 ≤ i,j ≤ p,i 6 = j |h f i , f j i| (4.6) b e small, where f 1 , f 2 , · · · , f p are the columns of F ( f i ’s are also assumed to b e of length 1 in ℓ 2 -norm). Man y interes ting results on sparse reco v ery ha v e b een obtained b y imp osing conditions on the coherence b ound M and the sparsit y k , see [8 , 9, 11 , 12 , 14]. F or example, a recen t pap er, Donoho, Elad, a nd T emly ak ov [8 ], pro ve d t ha t if β ∈ R p is a k -sparse ve ctor and y = F β + z with k z k 2 ≤ ǫ , then for an y η ≥ ǫ , the minimizer ˆ β to the problem min k γ k 1 sub ject to k y − F γ k 2 ≤ η satisfies k ˆ β − β k 2 ≤ C ( η + ǫ ) . with C = 1 √ 1 − M (4 k − 1) , prov ided k ≤ 1+ M 4 M . W e shall no w establish some connections b et w een the RIP and MIP and sho w that t he result of Donoho, Elad, and T emly ako v [8 ] can b e impro v ed under the RIP framew ork, b y using Theorem 4.2 . The following is a simple result that giv es RIP constan ts from MIP . 14 Prop osition 4.1 L et M b e the c oher enc e b ound for F . Then δ k ≤ ( k − 1) M , and θ k ,k ′ ≤ √ k k ′ M . (4.7) Pro of of Prop osition 4.1 . Let c b e a k -sparse ve ctor. Without loss of g en erality , we assume that supp( c ) = { 1 , 2 , · · · , k } . A direct calculation shows that k F c k 2 2 = k X i,j =1 h f i , f j i c i c j = k c k 2 2 + X 1 ≤ i,j ≤ k ,i 6 = j h f i , f j i c i c j . No w let us b ound the second term. Note that   X 1 ≤ i,j ≤ k ,i 6 = j h f i , f j i c i c j   ≤ M X 1 ≤ i,j ≤ k ,i 6 = j | c i c j | ≤ M ( k − 1) k X i =1 | c i | 2 = M ( k − 1) k c k 2 2 . These giv e us (1 − ( k − 1) M ) k c k 2 2 ≤ k F c k 2 2 ≤ (1 + ( k − 1) M ) k c k 2 2 , and hence δ k ≤ ( k − 1) M . F o r the second inequalit y , we notice that M = θ 1 , 1 . It then follows fr o m Prop osition 2.1 that θ k ,k ′ ≤ √ k ′ θ k , 1 ≤ √ k k ′ θ 1 , 1 = √ k k ′ M . No w we are able to show the follo wing result. Theorem 4.3 Supp ose β ∈ R p is a k -sp arse ve ctor an d y = F β + z with z satisfying k z k 2 ≤ ǫ . L e t k M = t . If t < 2+2 M 3+ √ 6 (or, e quivalently, k < 2+2 M (3+ √ 6) M ), then for any η ≥ ǫ , the minimizer ˆ β to the pr oblem min k γ k 1 subje ct to k y − F γ k 2 ≤ η ob eys k ˆ β − β k 2 ≤ C ( η + ǫ ) . (4.8) with C = √ 2(2+3 t − 2 M ) 2+2 M − (3+ √ 6) t . 15 Pro of of Theorem 4.3 . It follows from Prop osition 4.1 that δ 1 . 5 k + θ k , 1 . 5 k ≤ (1 . 5 k + √ 1 . 5 k − 1) M = (1 . 5 + √ 1 . 5) t − M . Since t < 2+2 M 3+ √ 6 , the condition δ 1 . 5 k + θ k , 1 . 5 k < 1 holds. By Theorem 4.2, k ˆ β − β k 2 ≤ √ 2(1 + δ 1 . 5 k ) 1 − δ 1 . 5 k − θ k , 1 . 5 k ( η + ǫ ) ≤ √ 2(1 + (1 . 5 k − 1) M ) 1 + M − (1 . 5 + √ 1 . 5) t ( η + ǫ ) = √ 2(2 + 3 t − 2 M ) 2 + 2 M − (3 + √ 6) t ( η + ǫ ) . Remarks. In this theorem, the result of D o noho, Elad and T emly ako v [8] is impro v ed in the followin g w ay s. 1. The sparsity k is relaxed from k < 1+ M 4 M to k < 2+2 M 3+ √ 6 M ≈ 1 . 47 1+ M 4 M . So ro ughly sp eakin g, Theorem 4.3 impro v es the result in D o noho, Elad and T emly ak ov [8] by enlarging the supp ort of β by 47%. 2. It is clear that larger t is preferred. Since M is usually v ery small, the b ound C is tigh tened from C = 1 √ 1+ M − 4 t to C = √ 2(2+3 t − 2 M ) 2+2 M − (3+ √ 6) t , as t is close to 1 4 . 5 Reco v ery of S parse Signals in Gaussi an Nois e W e now turn to the case where the noise is Gaussian. Supp ose w e observ e y = F β + z , z ∼ N (0 , σ 2 I n ) (5.1) and wish to r e cov er β from y and F . W e assume that σ is kno wn a nd that the columns of F are standardized to hav e unit ℓ 2 norm. This is a case of significan t in teresting, in particular in statistics. Many metho ds, including the Lasso (Tibshirani [13]), LARS (Efron, Hastie, Johnstone and Tibshirani [10]) and Dantzig selector (Candes and T ao [5]), ha v e b een in tro duced and studied. The fo llowing results show tha t, with large probability , the Gaussian noise z b elongs to b ounded sets. Lemma 1 The Gaussian err or z ∼ N (0 , σ 2 I n ) satisfies P  k F T z k ∞ ≤ σ p 2 log p  ≥ 1 − 1 2 √ π log p (5.2) 16 and P  k z k 2 ≤ σ q n + 2 p n log n  ≥ 1 − 1 n . (5.3) Inequalit y ( 5 .2) follo ws from standar d probability calculations and inequality (5.3) is pro v ed in the App endix. Lemma 1 suggests that one can apply the results obta ined in the previous section for the b ounded error case to solv e the G auss ian noise problem. Candes and T a o [5] intro duc ed the Dan tzig selector for sparse recov ery in t he Gaussian noise setting. Give n the observ ations in (5.1), the Dan tzig selector ˆ β D S is the minimizer of ( D S ) min γ ∈ R p k γ k 1 sub ject t o k F T  y − F γ  k ∞ ≤ λ p (5.4) where λ p = σ √ 2 log p . In the classical linear regression problem when p ≤ n the least squares estimator is the solution to the no rmal equation F T y = F T F β . (5.5) The constraint k F T ( y − F β ) k ∞ ≤ λ p in the conv ex pro gram (DS) can th us b e view ed a s a relaxation of the no rmal equation (5.5) . And similar to the noiseless case ℓ 1 minimization leads to the “sparsest” solution ov er the space of all feasible solutio ns. Candes and T ao [5] sho w ed the follow ing result. Theorem 5.1 (Candes and T ao [5]) Supp ose β ∈ R p is a k -sp arse ve ctor ob eying δ 2 k + θ k , 2 k < 1 . Cho ose λ p = σ √ 2 log p in (1.5). Then with lar ge pr ob ability, the D a ntzig sele ctor ˆ β ob eys k ˆ β − β k 2 ≤ C 1 σ √ k p 2 log p, (5.6) with C 1 = 4 1 − δ k − θ k, 2 k 1 . Another commonly used metho d in statistics is the Lasso whic h solv es the ℓ 1 regular- ized least squares problem (1.4). This is equiv a len t to the ℓ 2 -constrained ℓ 1 minimization problem ( P 1 ). In the G auss ian error case, w e shall consider a particular setting. Let ˆ β ℓ 2 b e t he minimizer of min γ ∈ R p k γ k 1 sub ject t o k y − F γ k 2 ≤ ǫ n (5.7) 1 It app ears that the co nstan t C 1 in Ca ndes a nd T ao [5] sho uld b e C 1 = 4 / (1 − δ 2 k − θ k, 2 k ). 17 where ǫ n = σ p n + 2 √ n log n . Com bining our results from the last section together with Lemma 1, w e hav e the follow- ing results on the D an tzig selector ˆ β D S and the estimator ˆ β ℓ 2 obtained from ℓ 1 minimization under the ℓ 2 constrain t. Again, these results improv e the previous results in the litera t ur e b y w eake ning the conditions a nd pro viding more precise b ounds. Theorem 5.2 Supp ose β ∈ R p is a k -sp arse ve ctor and the matrix F satisfies δ 1 . 5 k + θ k , 1 . 5 k < 1 . Then with pr ob ability P ≥ 1 − 1 2 √ π log p , the Dan t zig sele ctor ˆ β D S ob eys k ˆ β D S − β k 2 ≤ C 1 σ √ k p 2 log p, (5.8) with C 1 = 2 √ 3 1 − δ 1 . 5 k − θ k, 1 . 5 k , and with pr ob ability at le a s t 1 − 1 n , ˆ β ℓ 2 ob eys k ˆ β ℓ 2 − β k 2 ≤ D 1 σ q n + 2 p n log n (5.9) with D 1 = 2 √ 2(1+ δ 1 . 5 k ) 1 − δ 1 . 5 k − θ k, 1 . 5 k . Remark: Similar to the results o btained in the previous sections, if β is not necessarily k -sparse, in general we ha v e, with probability P ≥ 1 − 1 2 √ π log p , k ˆ β D S − β k 2 ≤ C 1 σ √ k p 2 log p + C 2 k − 1 2 k β − max( k ) k 1 . where C 1 = 2 √ 3 1 − δ 1 . 5 k − θ k, 1 . 5 k and C 2 = 2 √ 2(1 − δ 1 . 5 k ) 1 − δ 1 . 5 k − θ k, 1 . 5 k , and with probabilit y P ≥ 1 − 1 n , k ˆ β ℓ 2 − β k 2 ≤ D 1 σ q n + 2 p n log n + D 2 k − 1 2 k β − max( k ) k 1 where D 1 = 2 √ 2(1+ δ 1 . 5 k ) 1 − δ 1 . 5 k − θ k, 1 . 5 k and D 2 = 2 √ 2 θ k, 1 . 5 k (1 − δ 1 . 5 k ) 1 − δ 1 . 5 k − θ k, 1 . 5 k . 6 App endi x Pro of of Prop osition 2.3 . Let Λ =  ( a 1 + · · · + a w ) + 2( a w +1 + · · · + a 2 w ) + ( a 2 w + 1 + · · · + a 3 w )  2 = Λ 1 + Λ 2 + Λ 3 + Λ 4 + Λ 5 + Λ 6 . 18 Where eac h Λ i is giv en (and b ounded) by Λ 1 =  a 1 + a 2 + · · · + a w  2 ≥ a 2 1 + 3 a 2 2 + · · · + (2 w − 1) a 2 w Λ 2 = 4  a w +1 + a w +2 + · · · + a 2 w  2 ≥ 4  a 2 w +1 + 3 a 2 w +2 + · · · + (2 w − 1) a 2 2 w  Λ 3 =  a 2 w + 1 + a 2 w + 2 + · · · + a 3 w  2 ≥ a 2 2 w + 1 + 3 a 2 2 w + 2 + · · · + (2 w − 1) a 2 3 w Λ 4 = 4  a 1 + a 2 + · · · + a w  a w +1 + a w +2 + · · · + a 2 w  ≥ 4 w  a 2 w +1 + a 2 w +2 + · · · + a 2 2 w  Λ 5 = 2  a 1 + a 2 + · · · + a w  a 2 w + 1 + a 2 w + 2 + · · · + a 3 w  ≥ 2 w  a 2 2 w + 1 + a 2 2 w + 2 + · · · + a 2 3 w  Λ 6 = 4  a w +1 + a w +2 + · · · + a 2 w  a 2 w + 1 + a 2 w + 2 + · · · + a 3 w  ≥ 4 w  a 2 2 w + 1 + a 2 2 w + 2 + · · · + a 2 3 w  . Without loss of g enerality , we assume that w is ev en. W rite Λ 2 = Λ 21 + Λ 22 , where Λ 21 = 4  a 2 w +1 + 3 a 2 w +2 + · · · + ( w − 1) a 2 w + w 2 + w a 2 w + w 2 +1 + w a 2 w + w 2 +2 + · · · + w a 2 2 w  , and Λ 22 = 4  a 2 w + w 2 +1 + 3 a 2 w + w 2 +2 · · · + ( w − 1) a 2 2 w  ≥ w 2 a 2 2 w = (2 w − 1) a 2 2 w + (2 w − 3) a 2 2 w + · · · + 3 a 2 2 w + · · · + a 2 2 w . No w Λ 3 + Λ 5 + Λ 6 + Λ 22 ≥ 6( w + 1) a 2 2 w + 1 + (6 w + 3) a 2 2 w + 2 + · · · + (8 w − 1) a 2 3 w +(2 w − 1) a 2 2 w + (2 w − 3) a 2 2 w + · · · + 3 a 2 2 w + · · · + a 2 2 w ≥ 6( w + 1) a 2 2 w + 1 + (6 w + 3) a 2 2 w + 2 + · · · + (8 w − 1) a 2 3 w +(2 w − 1) a 2 2 w + 1 + (2 w − 3) a 2 2 w + 2 + · · · + 3 a 2 3 w − 1 + a 2 3 w ≥ 8 w  a 2 2 w + 1 + a 2 2 w + 3 + · · · + a 2 3 w − 1 + a 2 3 w  19 and Λ 1 + Λ 21 + Λ 4 ≥ a 2 1 + 3 a 2 2 + · · · + (2 w − 1) a 2 w +4  a 2 w +1 + 3 a 2 w +2 + · · · + ( w − 1) a w + w 2 + w a 2 w + w 2 +1 + w a 2 w + w 2 +2 + · · · + wa 2 2 w  +4 w  a 2 w +1 + a 2 w +2 + · · · + a 2 2 w  ≥ w 2 a 2 w + 4( w + 1) a 2 w +1 + 4( w + 3) a 2 w +2 + · · · 4(2 w − 1) a 2 w + w 2 +8 w a 2 w + w 2 +1 + 8 w a 2 w + w 2 +2 + · · · + 8 w a 2 2 w ≥ w 2 terms z }| { 4( w − 1) a 2 w + 4( w − 3) a 2 w + · · · + 4 a 2 w +4( w + 1) a 2 w +1 + 4( w + 3) a 2 w +2 + · · · 4(2 w − 1) a 2 w + w 2 +8 w a 2 w + w 2 +1 + 8 w a 2 w + w 2 +2 + · · · + 8 w a 2 2 w ≥ 8 w  a 2 w +1 + a 2 w +3 + · · · + a 2 2 w − 1 + a 2 2 w  . Therefore Λ ≥ 8 w  a 2 w +1 + a 2 w +3 + · · · + + a 2 2 w + a 2 2 w + 1 + · · · + + a 2 3 w  , and the inequalit y is prov ed. Pro of of Lem ma 1. The first inequalit y is standard. W e no w pro v e inequality (5.3). Note tha t X = k z k 2 2 /σ 2 is a χ 2 n random v aria ble . It follows from Lemma 4 in Cai [1] that for an y λ > 0 P ( X > (1 + λ ) n ) ≤ 1 λ √ π n exp {− n 2 ( λ − log(1 + λ )) } . Hence, P  k z k 2 ≤ σ q n + 2 p n log n  = 1 − P ( X > (1 + λ ) n ) ≥ 1 − 1 λ √ π n exp {− n 2 ( λ − log (1+ λ )) } where λ = 2 p n − 1 log n . It now follows from the fact log (1 + λ ) ≤ λ − 1 2 λ 2 + 1 3 λ 3 that P  k z k 2 ≤ σ q n + 2 p n log n  ≥ 1 − 1 n · 1 2 √ π log n exp { 4(log n ) 3 / 2 3 √ n } . Inequalit y (5.3) now follo ws by v erifying directly that 1 2 √ π log n exp( 4(log n ) 3 / 2 3 √ n ) ≤ 1 for all n ≥ 2. 20 References [1] T. Cai, On block thresholding in wa v elet regression: Adaptivit y , blo c k size and threshold lev el, Statist. S i n ic a , 12 (200 2 ), 1 241-1273. [2] E. J. Candes, The restricted isometry prop ert y and its implications for compressed sensing, (2008) , technic al rep ort. [3] E. J. Candes, J. Romberg and T. T ao, Stable signal reco v ery from incomplete and inaccurate measuremen ts, Comm. Pure Appl. Math. , 5 9(2006), 120 7-1223. [4] E. J. Candes and T. T ao, D ec o ding b y linear programming, IEEE T rans. Inf. Theory , 51(2005) 4203- 4215. [5] E. J. Candes and T. T ao, The Dantz ig selector: statistical estimation when p is muc h larger than n (with discuss ion), Ann. Statist. , 35(200 7 ), 2 313-2351. [6] D . L. D onoho, F or most larg e underdetermined systems of linear equations the minimal ℓ 1 -norm solution is also the sparsest solution, Comm. Pur e Appl. Math. , 59(200 6 ), 797- 829. [7] D . L. Donoho, F or most large underdetermined systems of equations, the minimal ℓ 1 - norm near- s olutio n approxim ates the sparsest near-solutio n, Comm. Pur e Appl. Math. , 59(2006), 907- 934. [8] D .L. Donoho , M. Elad, and V.N. T emly ak ov, Stable recov ery of sparse o v ercomplete represen tations in the presence o f noise, IEEE T rans. Inf. Theory , 52 (20 06), 6- 18. [9] D . L. Donoho, X. Huo, Uncertaint y principles and ideal atomic decomp osition, IEEE T rans. Inf. Theory , 47(200 1 ), 2845- 2862. [10] B. Efron, T. Hastie, I. Johnstone, a nd R . Tibshirani, Least angle regression (with discussion). Ann. Statist. 32(2004), 407- 4 51. [11] J.-J. F uc hs, On sparse represen tations in arbitra r y redundant bases, IEEE T rans. Inf. Theory , 50(2004), 1341 - 1344. [12] J.-J. F uc hs, Reco ve ry of exact sparse represen tations in the pr esence of b ounded noise, IEEE T rans. Inf. Theory , 51(2 005), 3601- 3608. [13] R. Tibshirani, Regression shrink age and selection via the lasso, J. R oy. Statist. So c. Ser. B , 58(1996 ), 2 67-288. 21 [14] J. T ropp, Just relax: con ve x programming metho ds fo r iden tifying sparse signals in noise, IEEE T rans. Inf. Theory , 52(2006) , 1030 - 1051. 22

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment