Orthonormal Expansion l1-Minimization Algorithms for Compressed Sensing

1 Orthonormal Expansion ` 1 -Minimization Algorithms for Compressed Sensing Zai Y ang, Cishen Zhang, Jun Deng, and W enmiao Lu Abstract —Compressed sensing aims at reconstructing sparse signals from signiﬁcantly r educed number of samples, and a popular reconstruction approach is ` 1 -norm minimization. In this corr espondence, a method called orthonormal expansion is presented to reformulate the basis pursuit problem f or noiseless compressed sensing. T wo algorithms are proposed based on con vex optimization: one exactly solves the pr oblem and the other is a relaxed version of the ﬁrst one. The latter can be considered as a modiﬁed iterative soft thresholding algorithm and is easy to implement. Numerical simulation sho ws that, in dealing with noise-free measur ements of sparse signals, the relaxed version is accurate, fast and competitiv e to the recent state-of-the-art algorithms. Its practical application is demonstrated in a more general case where signals of interest are appr oximately sparse and measur ements ar e contaminated with noise. Index T erms —A ugmented Lagrange multiplier , Compressed sensing, ` 1 minimization, Orthonormal expansion, Phase tran- sition, Sparsity-undersampling tradeoff. I . I N T R O D U C T I O N C OMPRESSED sensing (CS) aims at reconstructing a signal from its signiﬁcantly reduced measurements with a priori kno wledge that the signal is (approximately) sparse [1]–[3]. In CS, the signal x o ∈ R N of interest is acquired by taking measurements of the form b = Ax o + e , where A ∈ R n × N is the sampling matrix, b ∈ R n is the vector of our measurements or samples, e ∈ R n is the measurement noise, with N and n being sizes of the signal and acquired samples, respecti vely . A standard approach to reconstructing the original signal x o is to solv e ( BP  ) min x k x k 1 , subject to k b − Ax k 2 ≤ , which is kno wn as the basis pursuit denoising (BPDN) prob- lem, with k e k 2 ≤  . Another frequently discussed approach is to solve the problem in its Lagrangian form ( QP λ ) min x  λ k x k 1 + 1 2 k b − Ax k 2 2  . T o appear in IEEE T ransactions on Signal Processing. Copyright (c) 2011 IEEE. Personal use of this material is permitted. Howe ver, permission to use this material for an y other purposes must be obtained from the IEEE by sending a request to pubs-permissions@ieee.org. Z. Y ang, J. Deng and W . Lu are with the School of Electrical and Electronic Engineering, Nanyang T echnological Uni versity , 639798, Singapore (e-mail: yang0248@e.ntu.edu.sg; de0001un@e.ntu.edu.sg; wenmiao@ntu.edu.sg). C. Zhang is with the Faculty of Engineering and Industrial Sciences, Swinburne University of T echnology , Ha wthorn VIC 3122, Australia (e-mail: cishenzhang@swin.edu.au). It follows from the knowledge of con ve x optimization that ( BP  ) and ( QP λ ) are equiv alent with appropriate choices of  and λ . In general, λ decreases as  decreases. In the limiting case of λ,  → 0 , both ( BP  ) and ( QP λ ) con ver ges to the following basis pursuit (BP) problem in noiseless CS: ( BP ) min x k x k 1 , subject to Ax = b . It is important to dev elop accurate and computationally efﬁcient algorithms to deal with the ` 1 minimization problems of high dimensional signals in CS, such as an image of 512 × 512 pixels. One popular approach for solving ( QP λ ) is the iterativ e soft thresholding (IST) algorithm of the form (stating from x 0 = 0 ) [4], [5] x t +1 = S λ  x t + A 0 z t  , z t = b − Ax t , (1) where 0 denotes the transpose operator and S λ ( · ) is the soft thresholding operator with a threshold λ , which will be deﬁned more precisely in Section II-A. IST has a concise form and is easy to implement, b ut its con vergence can be very slow in some cases [5], especially for small λ . Other algorithms proposed to solve ( QP λ ) include interior-point method [6], conjugate gradient method [7] and ﬁxed-point continuation [8]. Fe w algorithms can accurately solve large-scale BPDN problem ( BP  ) with a low computational comple xity . The ` 1 - magic package [9] includes a primal log barrier code solving ( BP  ) , in which the conjugate gradient method may not ﬁnd a precise Ne wton step in the lar ge-scale mode. NEST A [10] approximately solves ( BP  ) based on Nesterov’ s smoothing technique [11], with continuation. In the case of strictly sparse signals and noise-free mea- surements, many fast algorithms ha ve been proposed to exactly reconstruct x o . One class of algorithms uses the greedy pursuit method, which iterativ ely reﬁnes the support and entries of a sparse solution to yield a better approximation of x o , such as OMP [12], StOMP [13] and CoSaMP [14]. These algorithms, howe ver , may not produce satisfactory sparsity- undersampling tradeoff compared with ( BP ) because of their greedy operations. As mentioned before, ( QP λ ) is equiv alent to ( BP ) as λ → 0 . Hence, ( BP ) can be solved with high accuracy using algorithms for ( QP λ ) by setting λ to a small value, e.g. 1 × 10 − 6 . IST has attracted much attention because of its simple form. In the case where λ is small, ho we ver , the standard IST in (1) can be very slow . T o improve its speed, a ﬁxed-point continuation (FPC) strategy is exploited [8], in which λ is decreased in a continuation scheme and a q -linear con ver gence rate is achieved. Further, FPC-AS [15] is dev eloped to improv e the performance of FPC by introducing 2 an activ e set, inspired by greedy pursuit algorithms. An alternativ e approach to improving the speed of IST is to use an aggressi ve continuation, which takes the form x t +1 = S λ t  x t + A 0 z t  , z t = b − Ax t , (2) where λ t may decrease in each iteration. The algorithm of this form typically has a worse sparsity-undersampling tradeoff than ( BP ) [16]. Such a disadvantage is partially overcome by approximately message passing (AMP) algorithm [17], in which a modiﬁcation is introduced in the current residual z t : x t +1 = S λ t  x t + A 0 z t  , z t = b − Ax t + N k x t k 0 n z t − 1 , (3) where k x k 0 counts the number of nonzero entries of x . It is noted that AMP having the same spasity-undersampling tradeoff as ( BP ) is only established based on heuristic argu- ments and numerical simulations. Moreov er , it cannot be easily extended to deal with more general complex-v alued sparse signals, though real-v alued signals are only considered in this correspondence. This correspondence focuses on solving the basis pursuit problem ( BP ) in noiseless CS. W e assume that AA 0 is an identity matrix, i.e., the ro ws of A are orthonormal. This is reasonable since most fast transforms in CS are of this form, such as discrete cosine transform (DCT), discrete Fourier transform (DFT) and some wa velet transforms, e.g. Haar wa velet transform. Such an assumption has also been used in other algorithms, e.g. NEST A. A no vel method called or- thonormal e xpansion is introduced to reformulate ( BP ) based on this assumption. The exact OrthoNormal Expansion ` 1 minimization (eONE-L1) algorithm is then proposed to ex- actly solve ( BP ) based on the augmented Lagrange multiplier (ALM) method, which is a conv ex optimization method. The relaxed ONE-L1 (rONE-L1) algorithm is further de- veloped to simplify eONE-L1. It is shown that rONE-L1 con verges at least exponentially and is in the form of modiﬁed IST in (2). In the case of strictly sparse signals and noise- free measurements, numerical simulations show that rONE-L1 has the same sparsity-undersampling tradeof f as ( BP ) does. Under the same conditions, rONE-L1 is compared with state- of-the-art algorithms, including FPC-AS, AMP and NEST A. It is sho wn that rONE-L1 is faster than AMP and NEST A when the number of measurements is just enough to exactly reconstruct the original sparse signal using ` 1 minimization. In a general case of approximately sparse signals and noise- contaminated measurements, where AMP is omitted for its poor performance, an example of 2D image reconstruction from its partial-DCT measurements demonstrates that rONE- L1 outperforms NEST A and FPC-AS in terms of computa- tional ef ﬁciency and reconstruction quality , respectiv ely . The rest of the correspondence is organized as follows. Section II introduces the proposed e xact and relaxed ONE-L1 algorithms followed by their implementation details. Section III reports the efﬁciency of our algorithm through numerical simulations in both noise-free and noise-contaminated cases. Conclusions are dra wn in Section IV. I I . O N E - L 1 A L G O R I T H M S A. Pr eliminary: Soft Thr esholding Operator For w ∈ R , the soft thresholding of w with a threshold λ ∈ R + is deﬁned as: S λ ( w ) = sgn ( w ) · ( | w | − λ ) + , where ( · ) + = max( · , 0) and sgn ( w ) =  w / | w | , w 6 = 0; 0 , w = 0 . The operator S λ ( · ) can be extended to vector variables by its element-wise operation. The soft thresholding operator can be applied to solve the following ` 1 -norm regularized least square problem [4], i.e., S λ ( w ) = arg min v  λ k v k 1 + 1 2 k w − v k 2 2  . (4) where v , w ∈ R n , and S λ ( w ) is the unique minimizer . B. Pr oblem Description Consider the ` 1 -minimization problem ( BP ) with the sam- pling matrix A satisfying that AA 0 = I , where I is an identity matrix. W e call that A is a partially orthonormal matrix hereafter as its rows are usually randomly selected from an orthonormal matrix in practice, e.g. partial-DCT ma- trix. Hence, there exists another partially orthonormal matrix B ∈ R ( N − n ) × N , whose rows are orthogonal to those of A , such that Φ =  A B  is orthonormal. Let p = Φ x . The problem ( BP ) is then equi valent to ( BP o ) min x , p , Γ( p )= b k x k 1 , subject to Φ x = p , where Γ( p ) is an operator projecting the vector p onto its ﬁrst n entries. In ( BP o ) , the sampling matrix A is expanded into an orthonormal matrix Φ . It corresponds to the scenario where the full sampling is carried out with the sampling matrix Φ and p is the vector containing all measurements. Note that only b , as a part of p , is actually observ ed. T o expand the sampling matrix A into an orthonormal matrix Φ is a ke y step to show that the ALM method exactly solves ( BP o ) and, hence, ( BP ) . The next subsection describes the proposed algorithm, referred to as orthonormal e xpansion ` 1 -minimization. C. ALM Based ONE-L1 Algorithms In this subsection we solve the ` 1 -minimization problem ( BP o ) using the ALM method [18], [19]. The ALM method is similar to the quadratic penalty method except an addi- tional Lagrange multiplier term. Compared with the quadratic penalty method, ALM method has some salient properties, e.g. the ease of parameter tuning and the con vergence speed. The augmented Lagrangian function is L ( x , p , y , µ ) = k x k 1 + h p − Φ x , y i + µ 2 k p − Φ x k 2 2 , (5) 3 where Lagrange multiplier y ∈ R N , µ ∈ R + and h u , v i = u 0 v ∈ R is the inner product of u , v ∈ R N . Eq. (5) can be expressed as follows: L ( x , p , y , µ ) = k x k 1 + µ 2   p − Φ x + µ − 1 y   2 2 − 1 2 µ k y k 2 2 . (6) Subsequently , we have the following optimization problem ( S P ) : ( S P ) min x , p , Γ( p )= b L ( x , p , y , µ ) . Instead of solving ( S P ) , let us consider the two related problems ( S P 1 ) min x L ( x , p , y , µ ) , and ( S P 2 ) min p , Γ( p )= b L ( x , p , y , µ ) . Note that problem ( S P 1 ) is similar to the ` 1 -regularized problem in (4). In general, ( S P 1 ) cannot be directly solved using the soft thresholding operator as that in (4) since there is a matrix product of Φ and x in the term of ` 2 -norm. Howe ver , the soft thresholding operator does apply to the special case where Φ is orthonormal. Given k Φ u k 2 = k u k 2 for any u ∈ R N , we can apply the soft thresholding to obtain S µ − 1  Φ 0  p + µ − 1 y  = arg min x L ( x , p , y , µ ) . (7) T o solve ( S P 2 ) , we let ∂ Γ( p ) L ( x , p , y , µ ) = 0 to obtain Γ( p ) = Γ  Φ x − µ − 1 y  , i.e.,  b Γ  Φ x − µ − 1 y   = arg min p , Γ( p )= b L ( x , p , y , µ ) , (8) where Γ ( · ) is the operator projecting the variable to its last N − n entries. As a result, an iterative solution of ( S P ) is stated in the follo wing Lemma 1. Lemma 1: For ﬁxed y and µ , the iterativ e algorithm given by x j +1 = S µ − 1  Φ 0  p j + µ − 1 y  , (9) p j +1 =  b Γ  Φ x j +1 − µ − 1 y   (10) con verges to an optimal solution of ( S P ) . Pr oof: Denote L ( x , p , y , µ ) as L ( x , p ) , for simplicity . By the optimality and uniqueness of x j +1 and p j +1 , we hav e L  x j +1 , p j +1  ≤ L  x j , p j  and the equality holds if and only if  x j +1 , p j +1  =  x j , p j  . Hence, the sequence  L  x j , p j  is bounded and con verges to a constant L ∗ , i.e., L  x j , p j  → L ∗ as j → + ∞ . Since the sequence  x j  is also bounded by   x j   1 ≤ L  x j , p j  + 1 2 µ k y k 2 2 , there exists a sub-sequence  x j i  + ∞ i =1 such that x j i → x ∗ s as i → + ∞ , where x ∗ s is an accumulation point of  x j  . Correspondingly , p j i → p ∗ s , and moreo ver , L ( x ∗ s , p ∗ s ) = L ∗ . W e now use contradiction to show that ( x ∗ s , p ∗ s ) is a ﬁxed point of the algorithm. Suppose that there exist ( x s , p s ) 6 = ( x ∗ s , p ∗ s ) such that x s = arg min x L ( x , p ∗ s ) and p s = arg min p , Γ( p )= b L ( x ∗ s , p ) , and L ( x s , p s ) < L ∗ . By (9), (10) and [4, Lemma 2.2], we hav e   x j i +1 − x s   2 ≤   x j i − x ∗ s   2 → 0 , i.e., x j i +1 → x s , as i → + ∞ . Meanwhile, p j i +1 → p s . Hence, L  x j i +1 , p j i +1  → L ( x s , p s ) < L ∗ , which contradicts L  x j , p j  → L ∗ , resulting in that ( x ∗ s , p ∗ s ) is a ﬁxed point. Moreover , it follows from   x j i + q − x ∗ s   2 ≤   x j i − x ∗ s   2 → 0 for any positi ve inte ger q , that x j → x ∗ s , as j → + ∞ . Note that orthonormal matrix Φ =  A B  and Φ 0 Φ = A 0 A + B 0 B = I . W e can obtain x ∗ s = S µ − 1  x ∗ s + A 0  b + µ − 1 Γ( y ) − Ax ∗ s  . (11) Meanwhile, ( S P ) is equiv alent to min x L  x ,  b Γ  Φ x − µ − 1 y   = k x k 1 + µ 2   Ax − b − µ − 1 Γ( y )   2 2 − 1 2 µ k y k 2 2 . (12) By [4, Proposition 3.10], x ∗ s is an optimal solution of the problem in (12) and equiv alently , ( x ∗ s , p ∗ s ) is an optimal solution of ( S P ) . Remark 1: (1) Lemma 1 shows that to solve problem ( S P ) is equiv alent to solve, iteratively , problems ( S P 1 ) and ( S P 2 ) . (2) Reference [4, Proposition 3.10] only deals with the spe- cial case k A k 2 < 1 and it is, in f act, straightforward to extend the result to arbitrary A . Follo wing from the frame work of the ALM method [19] and Lemma 1, the ALM based ONE-L1 algorithm is outlined in Algorithm 1, where ( x ∗ t , p ∗ t ) is the optimal solution to ( S P ) in the t th iteration and y ∗ t is the corresponding Lagrange multiplier . Algorithm 1: Exact ONE-L1 Algorithm via ALM Method Input: Expanded orthonormal matrix Φ and observ ed sample data b . 1. x ∗ 0 = 0 ; p ∗ 0 =  b 0  ; y ∗ 0 = 0 ; µ 0 > 0 ; t = 0 . 2. while not con verged do 3. Lines 4-9 solve  x ∗ t +1 , p ∗ t +1  = arg min ( x , p , Γ( p )= b ) L ( x , p , y ∗ t , µ t ) ; 4. x 0 t +1 = x ∗ t , p 0 t +1 = p ∗ t , j = 0 ; 5. while not conver ged do 6. x j +1 t +1 = S µ − 1 t  Φ 0  p j t +1 + µ − 1 t y ∗ t  ; 7. p j +1 t +1 = " b Γ  Φ x j +1 t +1 − µ − 1 t y ∗ t  # ; 8. set j = j + 1 ; 9. end while 10. y ∗ t +1 = y ∗ t + µ t  p ∗ t +1 − Φ x ∗ t +1  ; 11. choose µ t +1 > µ t ; 12. set t = t + 1; 13. end while Output: ( x ∗ t , p ∗ t ) . The con ver gence of Algorithm 1 is stated in the following theorem. Theor em 1: Any accumulation point ( x ∗ , p ∗ ) of sequence { ( x ∗ t , p ∗ t ) } + ∞ t =1 of Algorithm 1 is an optimal solution of ( BP o ) and the con vergence rate with respect to the outer iteration loop index t is at least O  µ − 1 t − 1  in the sense that   k x ∗ t k 1 − x †   = O  µ − 1 t − 1  , where x † = k x ∗ k 1 . 4 Pr oof: W e ﬁrst sho w that the sequence { y ∗ t } is bounded. By the optimality of  x ∗ t +1 , p ∗ t +1  we ha ve 0 ∈ ∂ x L  x ∗ t +1 , p ∗ t +1 , y ∗ t , µ t  = ∂   x ∗ t +1   1 − Φ 0 y ∗ t +1 , 0 = ∂ Γ( p ) L  x ∗ t +1 , p ∗ t +1 , y ∗ t , µ t  = Γ  y ∗ t +1  , where ∂ x denotes the partial differential operator with respect to x resulting in a set of subgradients. Hence, Φ 0 y ∗ t +1 ∈ ∂   x ∗ t +1   1 . It follo ws that   Φ 0 y ∗ t +1   ∞ ≤ 1 and { y ∗ t } is bounded. By x † ≥ L  x ∗ t +1 , p ∗ t +1 , y ∗ t , µ t  ,   x ∗ t +1   1 = L  x ∗ t +1 , p ∗ t +1 , y ∗ t , µ t  − 1 2 µ t    y ∗ t +1   2 2 − k y ∗ t k 2 2  ≤ x † − 1 2 µ t    y ∗ t +1   2 2 − k y ∗ t k 2 2  . By { y ∗ t } is bounded,   x ∗ t +1   1 ≤ x † + O  µ − 1 t  . (13) For any accumulation point x ∗ of x ∗ t , without loss of generality , we ha ve x ∗ t → x ∗ as t → + ∞ . Hence, k x ∗ k 1 ≤ x † . In the mean time, p ∗ t +1 = Φ x ∗ t +1 + µ − 1 t  y ∗ t +1 − y ∗ t  → p ∗ and Φ x ∗ = p ∗ result in that ( x ∗ , p ∗ ) is an optimal solution to ( BP o ) . Moreov er , by x ∗ t +1 = Φ 0  p ∗ t +1 − µ − 1 t  y ∗ t +1 − y ∗ t  and x † = min Φ x = p , Γ( p )= b k x k 1 = min p , Γ( p )= b k Φ 0 p k 1 ≤   Φ 0 p ∗ t +1   1 , we have   x ∗ t +1   1 ≥ x † − O  µ − 1 t  , which establishes the theorem with (13). Algorithm 1 contains, respectively , an inner and an outer iteration loops. Theorem 1 presents only the con ver gence rate of the outer loop. A natural way to speed up Algorithm 1 is to terminate the inner loop without conv ergence and use the obtained inner-loop solution as the initialization for the next iteration. This is similar to a continuation strategy and can be realized with reasonably set precision and step size µ t [19]. When the continuation parameter µ t increases very slowly , in a few iterations, the inner loop can produce a solution with high accuracy . In particular , for the purpose of fast and simple computing, we may update the variables in the inner loop only once before stepping into the outer loop operation. This results in a relaxed version of exact ONE-L1 algorithm (eONE-L1), namely relaxed ONE-L1 algorithm (rONE-L1) outlined in Algorithm 2. Algorithm 2: Relaxed ONE-L1 Algorithm Input: Expanded orthonormal matrix Φ and observ ed sample data b . 1. x 0 = 0 ; p 0 =  b 0  ; y 0 = 0 ; µ 0 > 0 ; t = 0 . 2. while not con verged do 3. x t +1 = S µ − 1 t  Φ 0  p t + µ − 1 t y t  ; 4. p t +1 = " b Γ  Φ x t +1 − µ − 1 t y t  # ; 5. y t +1 = y t + µ t  p t +1 − Φ x t +1  ; 6. choose µ t +1 > µ t ; 7. set t = t + 1; 8. end while Output: ( x t , p t ) . Theor em 2: The iterati ve solution ( x t , p t ) of Algorithm 2 con ver ges to a feasible solution ( x f , p f ) of ( BP o ) if P + ∞ t =1 µ − 1 t < + ∞ . It conv erges at least exponentially to ( x f , p f ) if { µ t } is an e xponentially increasing sequence. Pr oof: W e sho w ﬁrst that sequences { ˆ y t } and { y t } are bounded, where ˆ y t = y t − 1 + µ t − 1  p t − 1 − Φ x t  . By the optimality of x t +1 and p t +1 we ha ve 0 ∈ ∂ x L ( x t +1 , p t , y t , µ t ) = ∂ k x t +1 k 1 − Φ 0 ˆ y t +1 , 0 = ∂ Γ( p ) L  x t +1 , p t +1 , y t , µ t  = Γ  y t +1  . Hence,   Φ 0 ˆ y t +1   ∞ ≤ 1 and it follows that { ˆ y t } is bounded. Since y t +1 = ˆ y t +1 + µ t  p t +1 − p t  , we obtain Γ  y t +1  = Γ  ˆ y t +1  . This together with Γ  y t +1  = 0 results in   y t +1   2 ≤   ˆ y t +1   2 and the boundedness of { y t } . By p t +1 − p t = µ − 1 t  y t +1 − ˆ y t +1  , we have   p t +1 − p t   2 ≤ C µ − 1 t with C being a constant. Then { p t } is a Cauchy sequence if P + ∞ t =1 µ − 1 t < + ∞ , resulting in p t → p f as t → + ∞ . In the mean time, x t → x f , Φ x f = p f . Hence,  x f , p f  is a feasible solution of ( BP o ) . Suppose that { µ t } is an exponentially increasing sequence, i.e., µ t +1 = r µ t with r > 1 . By the boundedness of { y t } and { ˆ y t } we ha ve   p t − p f   2 =      + ∞ X i = t  p i − p i +1       2 ≤ + ∞ X i = t   p i − p i +1   2 ≤ C µ − 1 t + ∞ X i =0 r − i = O  µ − 1 t  . Hence, { p t } con verges at least exponentially to p f since  µ − 1 t  exponentially con verges to 0 , and the same result holds for { x t } . Remark 2: It is shown in Theorem 2 that faster growth of { µ t } can result in f aster conv ergence of { x t } . Intuitiv ely , the reduced number of iterations for the inner loop problem ( S P ) may result in some error from the optimal solution x ∗ t of the inner loop. This will likely affect the accurac y of the ﬁnal solution x f for ( BP ) . Therefore, the gro wth speed of { µ t } provides a tradeoff between the conv ergence speed of the algorithm and the precision of the ﬁnal solution, which will be illustrated in Section III through numerical simulations. D. Relationship Between rONE-L1 and IST The studies and applications of IST type algorithms ha ve been v ery acti ve in recent years because of their concise pre- sentations. This subsection considers the relationship between rONE-L1 and IST . Note that Γ ( y t ) = 0 in Algorithm 2 and Φ 0 Φ = A 0 A + B 0 B = I . After some deriv ations, it can be shown that the rONE-L1 algorithm is equiv alent to the following iteration (starting from x t = 0 , as t ≤ 0 , and z t = 0 , as t < 0 ): x t +1 = S λ t  x t + A 0 z t  , z t = b − A [(1 + κ t ) x t − κ t x t − 1 ] + κ t z t − 1 , (14) where λ t = µ − 1 t and κ t = µ t − 1 µ t . Compared with the general form of IST in (2), one more term κ t z t − 1 is added when computing the current residual z t in rONE-L1. Moreover , a weighted sum (1 + κ t ) x t − κ t x t − 1 is used instead of the current solution x t . It will be shown later that these two changes essentially improv e the sparsity-undersampling tradeoff. Remark 3: Equations in (14) show that the expansion from the partially orthonormal matrix A to orthonormal Φ is not at 5 all in volved in the actual implementation and computation of rONE-L1. The same claim also holds for eONE-L1 algorithm. Nev ertheless, the orthonormal e xpansion is a key instrumen- tation in the deri vation and analysis of Algorithms 1 and 2. E. Implementation Details As noted in Remark 3, the expansion from A to Φ is not in volved in the computing of ONE-L1 algorithms. In our implementations, we consider using e xponentially increasing µ t , i.e., ﬁxing r > 1 and µ t = r t µ 0 . Let Q α ( · ) be an α - quantile operator and µ 0 = 1 /Q α    A 0 b    , with |·| applying to the v ector variable elementwise, µ − 1 0 being the threshold in the ﬁrst iteration and α = 0 . 99 . In eONE-L1, a large r can speed up the con vergence of the outer loop iteration according to Theorem 1. Ho we ver , simulations show that a larger r can result in more iterations in the inner loop. W e use r = 1 + n/ N as default. In rONE-L1, the value of r pro vides a tradeoff between the con vergence speed of the algorithm and the precision of the ﬁnal solution. Our recommendation of r to achie ve the optimal sparsity-undersampling tradeof f is r = min (1 + 0 . 04 n/ N , 1 . 02) , which will be illustrated in Section III-A. An iterativ e algorithm needs a termination criterion. The eONE-L1 algorithm is considered con verged if k Ax ∗ t − b k 2 k b k 2 < τ 1 with τ 1 being a user-deﬁned tolerance. The inner iteration is considered conv erged if k x j +1 t − x j t k 2 k x j t k 2 < τ 2 . In our implemen- tation, the default v alues are ( τ 1 , τ 2 ) =  10 − 5 , 10 − 6  . The rONE-L1 algorithm is considered con ver ged if k Ax t − b k 2 k b k 2 < τ , with τ = 10 − 5 as default. I I I . N U M E R I C A L S I M U L A T I O N S A. Sparsity-Undersampling T radeof f This subsection considers the sparsity-undersampling trade- off of rONE-L1 in the case of strictly sparse signals and noise-free measurements. Phase transition is a measure of the sparsity-undersampling tradeoff in this case. Let the sampling ratio be δ = n/ N and the sparsity ratio be ρ = k/n , where k is a measure of sparsity of x , and we call that x is k -sparse if at most k entries of x are nonzero. As k , n, N → ∞ with ﬁxed δ and ρ , the behavior of the phase transition of ( BP ) is controlled by ( δ , ρ ) [20], [21]. W e denote this theoretical curve by ρ = ρ T ( δ ) , which is plotted in Fig 1. W e estimate the phase transition of rONE-L1 using a Monte Carlo method as in [17], [22]. T w o matrix ensembles are con- sidered, including Gaussian with N = 1000 and partial-DCT with N = 1024 . Here the ﬁnite- N phase transition is deﬁned as the v alue of ρ at which the success probability to recov er the original signal is 50% . W e consider 33 equispaced v alues of δ in { 0 . 02 , 0 . 05 , · · · , 0 . 98 } . For each δ , 21 equispaced v alues of ρ are generated in the interv al [ ρ T ( δ ) − 0 . 1 , ρ T ( δ ) + 0 . 1] . Then M = 20 random problem instances are generated and solved with respect to each combination of ( δ, ρ ) , where n = d δ N e , k = d ρn e , and nonzero entries of sparse signals are generated from the standard Gaussian distribution. Success is declared if the relativ e root mean squared error (relati ve 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 δ ρ L1, theoretical rONE−L1, Gaussian, recommended r rONE−L1, DCT, recommended r rONE−L1, DCT, r = 1+0.2 δ IST, DCT, recommended r IST, DCT, r = 1+0.2 δ Fig. 1. Observed phase transitions of rONE-L1, and comparison with those of IST . Note that, 1) the observed phase transitions of rONE-L1 with the recommended r strongly agree with the theoretical calculation based on ( BP ) ; 2) rONE-L1 has signiﬁcantly enlarged success phases compared with IST . RMSE) k ˆ x − x o k 2 k x o k 2 < 10 − 4 , where ˆ x is the recovered signal. The number of success among M e xperiments is recorded. Finally , a generalized linear model is used to estimate the phase transition as in [22]. The experiment result is presented in Fig. 1. The observed phase transitions using the recommended value of r strongly agree with the theoretical result of ( BP ) . It shows that the rONE-L1 algorithm has the optimal sparsity-undersampling tradeoff in the sense of ` 1 minimization. B. Comparison with IST The rONE-L1 algorithm can be considered as a modi- ﬁed version of IST in (2). In this subsection we compare the sparsity-undersampling tradeoff and speed of these two algorithms. A similar method is adopted to estimate the phase transition of IST , which is implemented using the same parameter values as rONE-L1. Only nine values of δ in { 0 . 1 , 0 . 2 , · · · , 0 . 9 } are considered with the partial-DCT matrix ensemble for time consideration. Another choice of r = 1 + 0 . 2 δ is considered besides the recommended one. Correspondingly , the phase transition of rONE-L1 with r = 1 + 0 . 2 δ is also estimated. The observ ed phase transitions are shown in Fig. 1. As a modiﬁed version of IST , obviously , rONE-L1 makes a great improv ement ov er IST in the sparsity-undersampling tradeoff. Meanwhile, comparison of the av eraged number of iterations of the two algorithms shows that rONE-L1 is also faster than IST . For example, as δ = 0 . 2 and the recommended r is used, rONE-L1 is about 6 times faster than IST . C. Comparison with AMP , FPC-AS and NEST A in Noise-fr ee Case In this subsection, we report numerical simulation results comparing rONE-L1 with state-of-the-art algorithms, includ- ing AMP , FPC-AS and NEST A, in the case of sparse signals 6 T ABLE I C O MPA R IS O N R E S ULT S O F O N E - L1 A LG O R I TH M S W I T H S T A T E - O F - T HE - A RT M E TH O D S ρ Method # calls A & A 0 CPU time (s) Error (10 − 5 ) 0.1 eONE-L1 1819 (1522,2054) ∗ 5.62 (4.67,6.52) 0.42 (0.11,0.94) rONE-L1 515.4 (286,954) 2.14 (1.19,3.92) 1.08 (0.53,1.30) AMP 222.7 (216,234) 0.80 (0.76,0.86) 1.02 (0.85,1.15) FPC-AS 150.2 (135,170) 0.50 (0.44,0.56) 1.13 (1.07,1.23) NEST A 468.9 (458,484) 2.70 (2.55,2.98) 1.05 (0.99,1.13) 0.22 eONE-L1 9038 (7270,11194) 28.5 (22.0,35.8) 1.87 (0.46,2.66) rONE-L1 722.3 (440,972) 2.61 (1.63,3.93) 1.80 (1.37,3.05) AMP 1708 (1150,2252) 6.21 (4.19,9.11) 10.5 (6.96,15.8) FPC-AS 589.4 (476,803) 2.10 (1.65,2.80) 1.96 (1.46,3.60) NEST A 1084 (890,1244) 6.47 (5.22,7.50) 2.90 (1.62,3.98) ∗ Three numbers are av eraged, minimum and maximum values, respecti vely . and noise-free measurements. Our experiments used FPC-AS v .1.21, NEST A v .1.1, and AMP codes provided by the author . W e choose parameter v alues for FPC-AS and NEST A such that each method produces a solution with approximately the same precision as that produced by rONE-L1. In our e xperiments we consider the recovery of e xactly sparse signals from partial- DCT measurements. W e set N = 2 14 and δ = 0 . 2 , and an ‘easy’ case where ρ = 0 . 1 and a ‘hard’ case where ρ = 0 . 22 are considered, respectiv ely . 1 T wenty random problems are created and solv ed for each algorithm with each combination of ( δ, ρ ) , and the minimum, maximum and averaged relative RMSE, number of calls of A and A 0 , and CPU time usages are recorded. All experiments are carried on Matlab v .7.7.0 on a PC with a W indows XP system and a 3GHz CPU. Default parameter v alues are used in eONE-L1 and rONE-L1. AMP: terminating if k Ax t − b k 2 k b k 2 < 10 − 5 . FPC-AS: λ = 2 × 10 − 6 and g tol = 1 × 10 − 6 , where g tol is the termination criterion on the maximum norm of sub- gradient. FPC-AS solv es the problem ( QP λ ) . NEST A: λ = 2 × 10 − 6 ,  = 0 and the termination criterion tolv ar = 1 × 10 − 8 . NEST A solves ( BP  ) using the Nesterov algorithm [11], with continuation. Our experiment results are presented in T able I. In both ‘easy’ and ‘hard’ cases, rONE-L1 is much faster than eONE- L1. In the ‘easy’ case, the proposed rONE-L1 algorithm takes the most number of calls of A and A 0 , except that of eONE- L1, due to a conservati ve setting of r . But this number of calls (515.4) is very close to that of NEST A (468.9), and furthermore, the CPU time usage of rONE-L1 (2.14 s) is less than that of NEST A (2.70 s) because of its concise implementation. In the ‘hard’ case, rONE-L1 has the second best performance with signiﬁcantly less CPU time than that of AMP and NEST A. AMP has the second worst CPU time and the worst accuracy as the dynamic threshold in each iteration depends on the mean squared error of the current iterativ e solution, which cannot be calculated exactly in the implementation. 1 Here ‘easy’ and ‘hard’ refer to the difﬁculty degree in recovering a sparse signal from a speciﬁc number of measurements. The setting ( δ , ρ ) = (0 . 2 , 0 . 22) is close to the phase transition of ( BP ) . D. A Practical Example This subsection demonstrates the efﬁciency of rONE-L1 in the general CS where the signal of interest is approximately sparse and measurements are contaminated with noise. W e seek to reconstruct the Mondrian image of size 256 × 256 , shown in Fig. 2, from its noise-contaminated partial-DCT coefﬁcients. This image presents a challenge as its wa velet expansion is compressible but not exactly sparse. The sam- pling pattern, which is inspired by magnetic resonance imaging (MRI) and is shown in Fig. 2, is adopted as most energy of the image concentrates at low-frequenc y components af- ter the DCT transform. The measurement vector b contains n = 7419 DCT measurements ( δ = 0 . 113 ). White Gaussian noise with standard de viation σ = 1 is then added. W e set  = p n + 2 √ 2 nσ . Haar wav elet with a decomposition level 4 is chosen as the sparsifying transform W . Hence, the problem to be solved is ( BP  ) with A = F p W 0 , where F p is the partial- DCT transform. The reconstructed image is ˆ H = W 0 ˆ x with ˆ x being the reconstructed wav elet coefﬁcients and reconstruction error is calculated as k ˆ H − H o k F k H o k F , where H o is the original image and k·k F denotes the Frobenius norm. W e compare the performance of rONE-L1 with NEST A and FPC-AS. Remark 4: AMP is omitted for its poor performance in this approximately-sparse-signal case. F or AMP , the value of the dynamic threshold λ t and the term k x t k 0 in (3) depend on the condition that the signal to reconstruct is strictly sparse. In such a noisy measurement case, an exact solution for ( BP  ) is not sought after in the rONE-L1 simulation. The computation of the rONE-L1 algorithm is set to terminate if k Ax t − b k 2 k b k 2 ≤ τ =  k b k 2 , i.e., rONE-L1 outputs the ﬁrst x t when it becomes a feasible solution of ( BP  ) . FPC-AS: λ = 1 × 10 − 3 , g tol = 1 × 10 − 3 , g tol scal e x = 1 × 10 − 6 and the maximum number of iterations for subspace optimization sub mxitr = 10 . The parameters are set accord- ing to [15, Section 4.4]. NEST A: λ = 1 × 10 − 4 , and tolv ar = 1 × 10 − 6 . The parameters are tuned to achie ve the minimum reconstruction error . Fig. 2 shows the e xperiment results where rONE-L1, FPC- AS and NEST A produce f aithful reconstructions of the original image. The rONE-L1 algorithm produces a reconstruction error (0.0741) lo wer than that of FPC-AS (0.0809) with com- parable computation times (11.1 s and 11.4 s, respecti vely). While NEST A results in a slightly lower reconstruction error (0.0649), it incurs about twice more computation time (29.4 s). I V . C O N C L U S I O N In this work, we hav e presented novel algorithms to solve the basis pursuit problem for noiseless CS. The proposed rONE-L1 algorithm, based on the augmented Lagrange mul- tiplier method and heuristic simpliﬁcation, can be considered as a modiﬁed IST with an aggressiv e continuation strategy . The follo wing two cases of CS have been studied: 1) exact reconstruction of sparse signals from noise-free measurements, and 2) reconstruction of approximately sparse signals from noise-contaminated measurements. The proposed rONE-L1 7 Fig. 2. An example of 2D image reconstruction from noise-contaminated partial-DCT measurements. Upper left: original Mondrian image; upper right: sampling pattern. The lo wer three are reconstructed images respecti vely by rONE-L1 (lower left, error: 0.0741, time: 11.1 s), FPC-AS (lower middle, error: 0.0809, time: 11.4 s) and NEST A (lower right, error: 0.0649, time: 29.4 s). outperforms AMP , which is a well-kno wn IST type algorithm, in Case 2 and also in Case 1 when the setting of ( δ , ρ ) is close to the phase transition of basis pursuit. It is faster than NEST A in both Case 1 and Case 2. The numerical experiments further sho w that rONE-L1 outperforms FPC-AS in Case 2. Apart from the high computational efﬁcienc y and accurate reconstruction result, another useful property of rONE-L1 is its ease of parameter tuning. It only needs to set a termination criterion τ if the recommended r is used and the value of τ is explicit in Case 2. While this correspondence focuses on reconstruction of real-v alued signals, it is straightforward to apply ONE-L1 algorithms to the reconstruction of complex- valued signals [23]. More rigorous analysis of rONE-L1 is currently under in vestigation. A C K N O W L E D G M E N T The authors are grateful to the anonymous re viewers for helpful comments. Z. Y ang wishes to thank Arian Maleki for providing the AMP codes. The Matlab codes of ONE-L1 algorithms are av ailable at http://sites.google.com/site/zaiyang0248 . R E F E R E N C E S [1] E. Cand ` es, J. Romber g, and T . T ao, “Rob ust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information, ” IEEE T rans. Info. Theo. , vol. 52, no. 2, pp. 489–509, 2006. [2] E. Cand ` es and J. Romberg, “Sparsity and incoherence in compressiv e sampling, ” Inverse Pr oblems , v ol. 23, pp. 969–985, 2007. [3] D. Donoho, “Compressed sensing, ” IEEE T ransactions on Information Theory , vol. 52, no. 4, pp. 1289–1306, 2006. [4] I. Daubechies, M. Defrise, and C. De Mol, “An iterati ve thresholding algorithm for linear in verse problems with a sparsity constraint, ” Com- munications on Pure and Applied Mathematics , vol. 57, no. 11, pp. 1413–1457, 2004. [5] K. Bredies and D. Lorenz, “Linear conv ergence of iterativ e soft- thresholding, ” Journal of F ourier Analysis and Applications , vol. 14, no. 5, pp. 813–837, 2008. [6] S. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinevsky , “An Interior- Point Method for Large-Scale ` 1 -Regularized Least Squares, ” Selected T opics in Signal Processing , IEEE Journal of , vol. 1, no. 4, pp. 606–617, 2008. [7] M. Lustig, D. Donoho, and J. Pauly , “Sparse MRI: The application of compressed sensing for rapid MR imaging, ” Magnetic Resonance in Medicine , vol. 58, no. 6, pp. 1182–1195, 2007. [8] E. Hale, W . Y in, and Y . Zhang, “A ﬁx ed-point continuation method for l1-regularized minimization with applications to compressed sensing, ” CAAM TR07-07, Rice University , 2007. [9] E. Cand ` es and J. Romberg, “ ` 1 -magic: Recovery of sparse signals via con ve x programming, ” URL: www .acm.caltech.edu/l1magic/downloads/l1magic.pdf . [10] S. Becker , J. Bobin, and E. Candes, “NEST A: A fast and accurate ﬁrst- order method for sparse recov ery, ” SIAM J. Imaging Sciences , v ol. 4, no. 1, pp. 1–39, 2011. [11] Y . Nesterov , “Smooth minimization of non-smooth functions, ” Mathe- matical Pr ogramming , vol. 103, no. 1, pp. 127–152, 2005. [12] J. T ropp and A. Gilbert, “Signal recovery from random measurements via orthogonal matching pursuit, ” IEEE T ransactions on Information Theory , vol. 53, no. 12, pp. 4655–4666, 2007. [13] D. Donoho, Y . Tsaig, I. Drori, and J. Starck, “Sparse solution of underdetermined linear equations by stagewise orthogonal matching pursuit, ” submitted to IEEE T rans. on Signal Pr ocessing. Available at: www .cs.tau.ac.il/ ∼ idr ori/StOMP .pdf , 2006. [14] D. Needell and J. T ropp, “CoSaMP: Iterative signal recov ery from incomplete and inaccurate samples, ” Applied and Computational Har- monic Analysis , vol. 26, no. 3, pp. 301–321, 2009. [15] Z. W en, W . Yin, D. Goldfarb, and Y . Zhang, “A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization and continuation, ” SIAM Journal on Scientiﬁc Computing , vol. 32, no. 4, pp. 1832–1857, 2010. [16] A. Maleki and D. Donoho, “Optimally tuned iterative thresholding algo- rithms for compressed sensing, ” IEEE J. Select. Ar eas Signal Pr ocessing , vol. 4, pp. 330–341, 2010. [17] D. Donoho, A. Maleki, and A. Montanari, “Message-passing algorithms for compressed sensing, ” Pr oceedings of the National Academy of Sciences , vol. 106, no. 45, pp. 18 914–18 919, 2009. [18] J. Nocedal and S. Wright, Numerical Optimization . New Y ork: Springer verlag, 2006. [19] D. Bertsekas, Constrained optimization and Lagrange multiplier meth- ods . Boston: Academic Press, 1982. [20] D. Donoho and J. T anner , “Counting the faces of randomly-projected hypercubes and orthants, with applications, ” Discr ete and Computational Geometry , vol. 43, no. 3, pp. 522–541, 2010. [21] M. Stojnic, “V arious thresholds for l1-optimization in compressed sens- ing, ” , 2009. [22] D. Donoho and J. T anner , “Observed universality of phase transitions in high-dimensional geometry , with implications for modern data analysis and signal processing, ” Philosophical T ransactions of the Royal Society A , vol. 367, no. 1906, pp. 4273–4293, 2009. [23] Z. Y ang and C. Zhang, “Sparsity-undersampling tradeoff of compressed sensing in the complex domain, ” in Proceedings of ICASSP 2011 , 2011.

Orthonormal Expansion l1-Minimization Algorithms for Compressed Sensing

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment