Joint-sparse recovery from multiple measurements

Depa rtment of Computer Science, Universit y of British Columbia T echnical Rep o rt TR-2009-07, Ap ril 2009 Join t-sparse reco v ery from m ultiple measuremen ts ∗ Ew out v an den Berg Mic hael P . F riedlander Abstract The join t-sparse recov ery problem aims to recov er, from sets of compressed measurements, unkno wn sparse matrices with nonzero entries restricted to a subset of rows. This is an extension of the single-measurement-v ector (SMV) problem widely studied in compressed sensing. W e analyze the recov ery prop erties for tw o types of recov ery algorithms. First, we sho w that recov ery using sum-of-norm minimization cannot exceed the uniform recov ery rate of sequential SMV using ` 1 minimization, and that there are problems that can b e solved with one approac h but not with the other. Second, we analyze the performance of the ReMBo algorithm [M. Mishali and Y. Eldar, IEEE T r ans. Sig. Pr o c. , 56 (2008)] in com bination with ` 1 minimization, and sho w how recov ery impro ves as more measurements are tak en. F rom this analysis it follo ws that having more measurements than num b er of nonzero rows do es not impro ve the p otential theoretical recov ery rate. 1 In tro duction A problem of cen tral imp ortance in compressed sensing [1, 10] is the follo wing: given an m × n matrix A , and a measuremen t vector b = Ax 0 , recov er x 0 . When m < n , this problem is ill-p osed, and it is not generally possible to uniquely recov er x 0 without some prior information. In man y imp ortan t cases, x 0 is known to b e sparse, and it ma y b e appropriate to solve minimize x ∈ R n k x k 0 sub ject to Ax = b, (1.1) to ﬁnd the sparsest p ossible solution. (The ` 0 -norm k · k 0 of a v ector coun ts the n um b er of nonzero en tries.) If x 0 has few er than s/ 2 nonzero en tries, where s is the n um b er of nonzeros in the sparsest n ull-vector of A , then x 0 is the unique solution of this optimization problem [12, 19]. The main obstacle of this approac h is that it is com binatorial [24], and therefore impractical for all but the smallest problems. T o ov ercome this, Chen et al. [6] introduced basis pursuit: minimize x ∈ R n k x k 1 sub ject to Ax = b. (1.2) This conv ex relaxation, based on the ` 1 -norm k x k 1 , can be solv ed m uch more eﬃciently; moreov er, under certain conditions [2, 11], it yields the same solution as the ` 0 problem (1.1). A natural extension of the single-measuremen t-vector (SMV) problem just describ ed is the m ultiple-measurement-v ector (MMV) problem. Instead of a single measurement b , we are given a set of r measurements b ( k ) = Ax ( k ) 0 , k = 1 , . . . , r , in which the v ectors x ( k ) 0 are jointly sparse—i.e., ha ve nonzero entries at the same lo cations. Suc h problems arise in source lo calization [22], neuromagnetic imaging [8], and equalization of sparse- comm unication channels [7, 15]. Succinctly , the aim of the MMV problem is to recov er X 0 from observ ations B = AX 0 , where B = [ b (1) , b (2) , . . . , b ( r ) ] is an m × r matrix, and the n × r matrix ∗ Department of Computer Science, Universit y of British Columbia, V ancouver V6T 1Z4, BC, Canada ( { ewout78,mpf } @cs.ubc.ca ). Research partially supported by the Natural Sciences and Engineering Research Council of Canada. 1 X 0 is ro w sparse—i.e., it has nonzero entries in only a small n umber of ro ws. The most widely studied approach to the MMV problem is based on solving the conv ex optimization problem minimize X ∈ R n × r k X k p,q sub ject to AX = B , where the mixed ` p,q norm of X is deﬁned as k X k p,q =  n X j =1 k X j → k p q  1 /p , and X j → is the (column) vector whose entries form the j th row of X . In particular, Cotter et al. [8] consider p = 2, q ≤ 1; T ropp [28, 29] analyzes p = 1, q = ∞ ; Malioutov et al. [22] and Eldar and Mishali [14] use p = 1, q = 2; and Chen and Huo [5] study p = 1, q ≥ 1. A diﬀeren t approac h is giv en b y Mishali and Eldar [23], who propose the ReMBo algorithm, which reduces MMV to a series of SMV problems. In this pap er we study the sum-of-norms problem and the conditions for uniform recov ery of all X 0 with a ﬁxed ro w supp ort, and compare this against recov ery using ` 1 , 1 . W e then construct matrices X 0 that cannot be reco vered using ` 1 , 1 but for whic h ` 1 , 2 do es succeed, and vice v ersa. W e then illustrate the individual recov ery properties of ` 1 , 1 and ` 1 , 2 with empirical results. W e further show how recov ery via ` 1 , 1 c hanges as the num b er of measurements increases, and prop ose a b o osted- ` 1 approac h to improv e on the ` 1 , 1 approac h. This analysis provides the starting p oint for our study of the reco very prop erties of ReMBo, based on a geometrical in terpretation of this algorithm. W e begin in Section 2 by summarizing existing ` 0 - ` 1 equiv alence results, whic h give conditions under whic h the solution of the ` 1 relaxation (1.2) coincides with the solution of the ` 0 problem (1.1) . In Section 3 we consider the ` 1 , 2 mixed-norm and sum-of-norms formulations and compare their p erformance against ` 1 , 1 . In Sections 4 and 5 w e examine tw o approaches that are based on sequen tial application of (1.2). Notation. W e assume throughout that A is a full-rank matrix in R m × n , and that X 0 is an s ro w-sparse matrix in R n × r . W e follow the con ven tion that all v ectors are column v ectors. F or an arbitrary matrix M , its j th column is denoted by the column vector M ↓ j ; its i th ro w is the transp ose of the column v ector M i → . The i th en try of a v ector v is denoted by v i . W e mak e exceptions for e i = I ↓ i and for x 0 (resp., X 0 ), which represents the sparse vector (resp., matrix) w e w an t to reco ver. When there is no ambiguit y we sometimes write m i to denote M ↓ i . When concatenating vectors into matrices, [ a, b, c ] denotes horizontal concatenation and [ a ; b ; c ] denotes v ertical concatenation. When indexing with I , we deﬁne the v ector v I := [ v i ] i ∈I , and the m × |I | matrix A I := [ A ↓ j ] j ∈I . Row or column selection takes precedence ov er all other op erators. 2 Existing results for ` 1 reco very The conditions under which (1.2) giv es the sparsest p ossible solution hav e b een studied by applying a n um b er of diﬀeren t tec hniques. By far the most popular analytical approach is based on the restricted isometry property , introduced b y Cand ` es and T ao [3], whic h giv es suﬃcien t conditions for equiv alence. Donoho [9] obtains necessary and suﬃcient (NS) conditions by analyzing the underlying geometry of (1.2) . Several authors [12, 13, 19] c haracterize the NS-conditions in terms of prop erties of the kernel of A : Ker( A ) = { x | Ax = 0 } . F uchs [16] and T ropp [27] express suﬃcien t conditions in terms of the solution of the dual of (1.2) : maximize y b T y sub ject to k A T y k ∞ ≤ 1 . (2.1) In this paper we are mainly concerned with the geometric and kernel conditions. W e use the geometrical in terpretation of the problems to get a b etter understanding, and resort to the n ull-space 2 prop erties of A to analyze recov ery . T o make the discussion more self-contained, we brieﬂy recall some of the relev an t results in the next three sections. 2.1 The geometry of ` 1 reco v ery The set of all points of the unit ` 1 -ball, { x ∈ R n | k x k 1 ≤ 1 } , can be formed b y taking con vex com binations of ± e j , the signed columns of the identit y matrix. Geometrically this is equiv alen t to taking the conv ex h ull of these vectors, giving the cross-p olytop e C = con v {± e 1 , ± e 2 , . . . , ± e n } . Lik ewise, w e can lo ok at the linear mapping x 7→ Ax for all points x ∈ C , giving the p olytop e P = { Ax | x ∈ C } = A C . The faces of C can b e expressed as the conv ex hull of subsets of v ertices, not including pairs that are reﬂections with resp ect to the origin (suc h pairs are sometimes erroneously referred to as an tip o dal, which is a sligh tly more general concept [21]). Under linear transformations, each face from the cross-p olytop e C either maps to a face on P or v anishes into the interior of P . The solution found by (1.2) can b e in terpreted as follows. Starting with a radius of zero, w e slo wly “inﬂate” P un til it ﬁrst touches b . The radius at which this happ ens corresponds to the ` 1 -norm of the solution x ∗ . The vertices whose con vex hull is the face touching b determine the lo cation and sign of the non-zero en tries of x ∗ , while the position where b touc hes the face determines their relative weigh ts. Donoho [9] shows that x 0 can b e reco vered from b = Ax 0 using (1.2) if and only if the face of the (scaled) cross-p olytop e containing x 0 maps to a face on P . Two direct consequences are that reco very dep ends only on the sign pattern of x 0 , and that the probability of reco v ering a random s -sparse vector is equal to the ratio of the n umber of ( s − 1)-faces in P to the n umber of ( s − 1)-faces in C . That is, letting F d ( P ) denote the collection of all d -faces [21] in P , the probability of recov ering x 0 using ` 1 is given b y P ` 1 ( A, s ) = |F s − 1 ( A C ) | |F s − 1 ( C ) | . When we need to ﬁnd the recov erability of vectors restricted to a supp ort I , this probability b ecomes P ` 1 ( A, I ) = |F I ( A C ) | |F I ( C ) | , (2.2) where F I ( C ) = 2 |I | denotes the num b er of faces in C formed by the con vex h ull of {± e j } i ∈I , and F I ( A C ) is the num ber of faces on A C generated b y {± A ↓ j } j ∈I . 2.2 Null-space prop erties and ` 1 reco v ery Equiv alence results in terms of null-space prop erties generally c haracterize equiv alence for the set of all v ectors x with a ﬁxed supp ort, whic h is deﬁned as Supp( x ) = { j | x j 6 = 0 } . W e sa y that x can b e uniformly recov ered on I ⊆ { 1 , . . . , n } if all x with Supp ( x ) ⊆ I can b e reco vered. The follo wing theorem illustrates conditions for uniform recov ery via ` 1 on an index set; more general results are giv en b y Gribonv al and Nielsen [20]. Theorem 2.1 (Donoho and Elad [12], Grib onv al and Nielsen [19]) . L et A b e an m × n matrix and I ⊆ { 1 , . . . , n } b e a ﬁxe d index set. Then al l x 0 ∈ R n with Supp ( x 0 ) ⊆ I c an b e uniquely r e c over e d fr om b = Ax 0 using b asis pursuit (1.2) if and only if for al l z ∈ Ker ( A ) \ { 0 } , X j ∈I | z j | < X j 6∈I | z j | . (2.3) That is, the ` 1 -norm of z on I is strictly less than the ` 1 -norm of z on the c omplement I c . 3 2.3 Optimalit y conditions for ` 1 reco v ery Suﬃcien t conditions for recov ery can b e derived from the ﬁrst-order optimality conditions necessary for x ∗ and y ∗ to b e solutions of (1.2) and (2.1) resp ectiv ely . The Karush-Kuhn-T uck er (KKT) conditions are also suﬃcien t in this case b ecause the problems are conv ex. The Lagrangian function for (1.2) is given by L ( x, y ) = k x k 1 − y T ( Ax − b ); the KKT conditions require that Ax = b and 0 ∈ ∂ x L ( x, y ) , (2.4) where ∂ x L denotes the sub diﬀerential of L with resp ect to x . The second condition reduces to 0 ∈ sgn( x ) − A T y , where the sign um function sgn( γ ) ∈ ( sign( γ ) if γ 6 = 0, [ − 1 , 1] otherwise , is applied to each individual component of x . It follo ws that x ∗ is a solution of (1.2) if and only if Ax ∗ = b and there exists an m -v ector y suc h that | a T j y | ≤ 1 for j 6∈ Supp ( x ), and a T j y = sign ( x ∗ j ) for all j ∈ Supp ( x ). F uchs [16] shows that x ∗ is the unique solution of (1.2) when [ a j ] j ∈ Supp( x ) is full rank and, in addition, | a T j y | < 1 for all j 6∈ Supp ( x ). When the columns of A are in general p osition (i.e., no k + 1 columns of A span the same k − 1 dimensional h yp erplane for k ≤ n ) we can w eaken this condition by noting that for such A , the solution of (1.2) is alwa ys unique, th us making the existence of a y that satisﬁes (2.4) for x 0 a necessary and suﬃcient condition for ` 1 to recov er x 0 . 3 Reco very using sums-of-row norms Our analysis of sparse recov ery for the MMV problem of recov ering X 0 from B = AX 0 b egins with an extension of Theorem 2.1 to recov ery using the conv ex relaxation minimize X n X j =1 k X j → k sub ject to AX = B ; (3.1) note that the norm within the summation is arbitrary . Deﬁne the ro w supp ort of a matrix as Supp row ( X ) = { j | k X j → k 6 = 0 } . With these deﬁnitions we hav e the following result. (A related result is giv en by Sto jnic et al. [26].) Theorem 3.1. L et A b e an m × n matrix, k b e a p ositive inte ger, I ⊆ { 1 , . . . , n } b e a ﬁxe d index set, and let k · k denote any ve ctor norm. Then al l X 0 ∈ R n × r with Supp row ( X 0 ) ⊆ I c an b e uniquely r e c over e d fr om B = AX 0 using (3.1) if and only if for al l Z with c olumns Z ↓ k ∈ Ker ( A ) \ { 0 } , X j ∈I k Z j → k < X j 6∈I k Z j → k . (3.2) Pr o of. F or the “only if ” part, supp ose that there is a Z with columns Z ↓ k ∈ Ker ( A ) \ { 0 } suc h that (3.2) do es not hold. Now, choose X j → = Z j → for all j ∈ I and with all remaining ro ws zero. Set B = AX . Next, deﬁne V = X − Z , and note that AV = AX − AZ = AX = B . The construction of V implies that P j k X j → k ≥ P j k V j → k , and consequen tly X cannot b e the unique solution of (3.1). Con v ersely , let X b e an arbitrary matrix with Supp row ( X ) ⊆ I , and let B = AX . T o show that X is the unique solution of (3.1) it suﬃces to show that for an y Z with columns Z ↓ k ∈ Ker ( A ) \ { 0 } , X j k ( X + Z ) j → k > X j k X j → k . 4 This is equiv alent to X j 6∈I k Z j → k + X j ∈I k ( X + Z ) j → k − X j ∈I k X j → k > 0 . Applying the reverse triangle inequality , k a + b k − k b k ≥ −k a k , to the summation ov er j ∈ I and reordering exactly giv es condition (3.2). In the sp ecial case of the sum of ` 1 -norms, i.e., ` 1 , 1 , summing the norms of the columns is equiv alent to summing the norms of the ro ws. As a result, (3.1) can b e written as minimize X r X k =1 k X ↓ k k 1 sub ject to AX ↓ k = B ↓ k , k = 1 , . . . , r . Because this ob jective is separable, the problem can b e decoupled and solv ed as a series of indep enden t basis pursuit problems, giving one X ↓ k for each column B ↓ k of B . The following result relates recov ery using the sum-of-norms form ulation (3.1) to ` 1 , 1 reco very . Theorem 3.2. L et A b e an m × n matrix, r b e a p ositive inte ger, I ⊆ { 1 , . . . , n } b e a ﬁxe d index set, and k · k denote any ve ctor norm. Then uniform r e c overy of al l X ∈ R n × r with Supp row ( X ) ⊆ I using sums of norms (3.1) implies uniform r e c overy on I using ` 1 , 1 . Pr o of. F or uniform reco very on support I to hold it follo ws from Theorem 3.1 that for any matrix Z with columns Z ↓ k ∈ Ker ( A ) \ { 0 } , prop ert y (3.2) holds. In particular it holds for Z with Z ↓ k = ¯ z for all k , with ¯ z ∈ Ker ( A ) \ { 0 } . Note that for these matrices there exist a norm-dep endent constant γ such that | ¯ z j | = γ k Z j → k . Since the choice of ¯ z w as arbitrary , it follows from (3.2) that the NS-condition (2.3) for indep endent reco v ery of v ectors B ↓ k using ` 1 in Theorem 2.1 is satisﬁed. Moreov er, because ` 1 , 1 is equiv alent to indep enden t recov ery , we also hav e uniform reco v ery on I using ` 1 , 1 . An implication of Theorem 3.2 is that the use of restricted isometry conditions—or an y tec hnique, for that matter—to analyze uniform reco very conditions for the sum-of-norms approac h necessarily lead to results that are no stronger than uniform ` 1 reco very . (Recall that the ` 1 , 1 and ` 1 norms are equiv alent). 3.1 Reco very using ` 1 , 2 In this section we take a closer look at the ` 1 , 2 problem minimize X k X k 1 , 2 sub ject to AX = B , (3.3) whic h is a sp ecial case of the sum-of-norms problem. Although Theorem 3.2 establishes that uniform reco very via ` 1 , 2 is no better than uniform recov ery via ` 1 , 1 , there are many situations in whic h it recov ers signals that ` 1 , 1 cannot. Indeed, it is evident from Figure 1 that the probability of reco v ering individual signals with random signs and supp ort is muc h higher for ` 1 , 2 . The reason for the degrading performance or ` 1 , 1 with increasing k is explained in Section 4. In this section we construct examples for which ` 1 , 2 w orks and ` 1 , 1 fails, and vice versa. This helps uncov er some of the structure of ` 1 , 2 , but at the same time implies that certain techniques used to study ` 1 can no longer b e used directly . Because the examples are based on extensions of the results from Section 2.3, we ﬁrst dev elop equiv alent conditions here. 5 0 5 10 15 20 0 10 20 30 40 50 60 70 80 90 100 s Recovery rate (%) r = 2 r = 3 r = 5 Figure 1: Reco v ery rates for ﬁxed, randomly dra wn 20 × 60 matrices A , av eraged o v er 1,000 trials at each row-sparsit y level s . The nonzero en tries in the 60 × r matrix X 0 are sampled i.i.d. from the normal distribution. The solid and dashed lines represent ` 1 , 2 and ` 1 , 1 reco very , resp ectiv ely . 3.1.1 Suﬃcien t conditions for reco very via ` 1 , 2 The optimality conditions of the ` 1 , 2 problem (3.3) pla y a vital role in deriving a set of suﬃcient conditions for joint-sparse reco very . In this section we deriv e the dual of (3.3) and the corresp onding necessary and suﬃcien t optimality conditions. These allow us to derive suﬃcient conditions for reco very via ` 1 , 2 . The Lagrangian for (3.3) is deﬁned as L ( X, Y ) = k X k 1 , 2 − h Y , AX − B i , (3.4) where h V , W i : = trace ( V T W ) is an inner-pro duct deﬁned ov er real matrices. The dual is then given b y maximizing inf X L ( X, Y ) = inf X {k X k 1 , 2 − h Y , AX − B i} = h B , Y i − sup X  A T Y , X  − k X k 1 , 2  (3.5) o ver Y . (Because the primal problem has only linear constraints, there necessarily exists a dual solution Y ∗ that maximizes this expression [25, Theorem 28.2].) T o simplify the supremum term, w e note that for any con v ex, positively homogeneous function f deﬁned ov er an inner-pro duct space, sup v {h w , v i − f ( v ) } = ( 0 if w ∈ ∂ f (0), ∞ otherwise. T o deriv e these conditions, note that positive homogeneity of f implies that f (0) = 0, and thus w ∈ ∂ f (0) implies that h w , v i ≤ f ( v ) for all v . Hence, the supremum is achiev ed with v = 0. If on the other hand w 6∈ ∂ f (0), then there exists some v suc h that h w , v i > f ( v ), and by the p ositive homogeneit y of f , h w , αv i − f ( αv ) → ∞ as α → ∞ . Applying this expression for the supremum to (3.5), w e arriv e at the necessary condition A T Y ∈ ∂ k 0 k 1 , 2 , (3.6) whic h is required for dual feasibilit y . W e no w derive an expression for the subdiﬀerential ∂ k X k 1 , 2 . F or ro ws j where k X j → k 2 > 0, the gradien t is giv en b y ∇k X j → k 2 = X j → / k X j → k 2 . F or the remaining ro ws, the gradient is not 6 deﬁned, but ∂ k X j → k 2 coincides with the set of unit ` 2 -norm vectors B r ` 2 = { v ∈ R r | k v k 2 ≤ 1 } . Th us, for eac h j = 1 , . . . , n , ∂ X j → k X k 1 , 2 ∈ ( X j → / k X j → k 2 if k X j → k 2 > 0, B r ` 2 otherwise. (3.7) Com bining this expression with (3.6), w e arriv e at the dual of (3.3): maximize Y trace( B T Y ) sub ject to k A T Y k ∞ , 2 ≤ 1 . (3.8) The following conditions are therefore necessary and suﬃcien t for a primal-dual pair ( X ∗ , Y ∗ ) to b e optimal for (3.3) and its dual (3.8): AX ∗ = B (primal feasibility); (3.9a) k A T Y ∗ k ∞ , 2 ≤ 1 (dual feasibility); (3.9b) k X ∗ k 1 , 2 = trace( B T Y ∗ ) (zero duality gap). (3.9c) The existence of a matrix Y ∗ that satisﬁes (3.9) pro vides a certiﬁcate that the feasible matrix X ∗ is an optimal solution of (3.3) . Ho w ev er, it do es not guarantee that X ∗ is also the unique solution. The following theorem gives suﬃcient conditions, similar to those in Section 2.3, that also guaran tee uniqueness of the solution. Theorem 3.3. L et A b e an m × n matrix, and B b e an m × r matrix. Then a set of suﬃcient c onditions for X to b e the unique minimizer of (3.3) with L agr ange multiplier Y ∈ R m × r and r ow supp ort I = Supp row ( X ) , is that AX = B , (3.10a) ( A T Y ) ↓ j = ( X ∗ ) j → / k ( X ∗ ) j → k 2 , j ∈ I (3.10b) k ( A T Y ) ↓ j k 2 < 1 , j 6∈ I (3.10c) rank( A I ) = |I | . (3.10d) Pr o of. The ﬁrst three conditions clearly imply that ( X, Y ) primal and dual feasible, and th us satisfy (3.9a) and (3.9b). Conditions (3.10b) and (3.10c) together imply that trace( B T Y ) ≡ n X j =1 [( A T Y ) ↓ j ] T X j → = n X j =1 X j → ≡ k X k 1 , 2 . The ﬁrst and last identities ab ov e follo w directly from the deﬁnitions of the matrix trace and of the norm k · k 1 , 2 , resp ectively; the middle equalit y follo ws from the standard Cauch y inequalit y . Thus, the zero-gap requiremen t (3.9c) is satisﬁed. The conditions (3.10a) – (3.10c) are therefore suﬃcien t for ( X , Y ) to b e an optimal primal-dual solution of (3.3). Because Y determines the support and is a Lagrange m ultiplier for every solution X , this supp ort m ust b e unique. It then follo ws from condition (3.10d) that X m ust b e unique. 3.2 Coun ter examples Using the suﬃcient and necessary conditions dev elop ed in the previous section we now construct examples of problems for whic h ` 1 , 2 succeeds while ` 1 , 1 fails, and vice v ersa. Because of its simplicity , w e b egin with the latter. Reco v ery using ` 1 , 1 where ` 1 , 2 fails. Let A b e an m × n matrix with m < n and unit-norm columns that are not scalar m ultiples of each other. T ake an y vector x ∈ R n with at least m + 1 nonzero entries. Then X 0 = diag ( x ), p ossibly with all identically zero columns remov ed, can be reco vered from B = AX 0 using ` 1 , 1 , but not with ` 1 , 2 . T o see why , note that eac h column in X 0 7 has only a single nonzero en try , and that, under the assumptions on A , each one-sparse vector can b e recov ered individually using ` 1 (the p oints ± A ↓ j ∈ R m are all 0-faces of P ) and therefore that X 0 can b e reco vered using ` 1 , 1 . On the other hand, for recov ery using ` 1 , 2 there would need to exist a matrix Y satisfying the ﬁrst condition of (3.9) for all j ∈ I = { 1 , . . . , n } . F or this giv en X 0 this reduces to A T Y = M , where M is the identit y matrix, with the same columns remov ed as X . But this equalit y is imp ossible to satisfy b ecause rank ( A ) ≤ m < m + 1 ≤ rank ( M ). Thus, X 0 cannot b e the solution of the ` 1 , 2 problem (3.3). Reco v ery using ` 1 , 2 where ` 1 , 1 fails. F or the construction of a problem where ` 1 , 2 succeeds and ` 1 , 1 fails, we consider t w o v ectors, f and s , with the same supp ort I , in suc h a wa y that individual ` 1 reco very fails for f , while it succeeds for s . In addition w e assume that there exists a v ector y that satisﬁes y T A ↓ j = sign( s j ) for all j ∈ I , and | y T A ↓ j | < 1 for all j 6∈ I ; i.e., y satisﬁes conditions (3.10b) and (3.10c) . Using the v ectors f and s , we construct the 2- column matrix X 0 = [(1 − γ ) s, γ f ], and claim that for suﬃciently small γ > 0, this gives the desired reconstruction problem. Clearly , for an y γ 6 = 0, ` 1 , 1 reco very fails because the second column can nev er b e reco vered, and we only need to sho w that ` 1 , 2 do es succeed. F or γ = 0, the matrix Y = [ y , 0] satisﬁes conditions (3.10b) and (3.10c) and, assuming (3.10d) is also satisﬁed, X 0 is the unique solution of ` 1 , 2 with B = AX 0 . F or suﬃciently small γ > 0, the conditions that Y need to satisfy change slightly due to the division b y k X j → 0 k 2 for those ro ws in Supp row ( X ). By adding corrections to the columns of Y those new conditions can b e satisﬁed. In particular, these corrections can b e done b y adding weigh ted com binations of the columns in ¯ Y , whic h are constructed in such a wa y that it satisﬁes A T I ¯ Y = I , and minimizes k A T I c ¯ Y k ∞ , ∞ on the complemen t I c of I . Note that on the ab o v e argument can also b e used to sho w that ` 1 , 2 fails for γ suﬃcien tly close to one. Because the supp ort and signs of X remain the same for all 0 < γ < 1, we can conclude the follo wing: Corollary 3.4. R e c overy using ` 1 , 2 is gener al ly not only char acterize d by the r ow-supp ort and the sign p attern of the nonzer o entries in X 0 , but also by the magnitude of the nonzer o entries. A consequence of this conclusion is that the notion of faces used in the geometrical interpretation of ` 1 is not applicable to the ` 1 , 2 problem. 3.3 Experiments T o get an idea of just how m uc h more ` 1 , 2 can recov er in the ab ov e case where ` 1 , 1 fails, we generated a 20 × 60 matrix A with entries i.i.d. normally distributed, and determined a set of v ectors s i and f i with identical support for which ` 1 reco very succeeds and fails, resp ectively . Using triples of vectors s i and f j w e constructed row-sparse matrices such as X 0 = [ s 1 , f 1 , f 2 ] or X 0 = [ s 1 , s 2 , f 2 ], and attempted to recov er from B = AX 0 W , where W = diag ( ω 1 , ω 2 , ω 3 ) is a diagonal weigh ting matrix with nonnegativ e en tries and unit trace, b y solving (3.3) . F or problems of this size, interior-point methods are very eﬃcien t and we use SDPT3 [30] through the CVX in terface [17, 18]. W e consider X 0 to b e reco vered when the maxim um absolute diﬀerence b et w een X 0 and the ` 1 , 2 solution X ∗ is less than 10 − 5 . The results of the exp eriment are sho wn in Figure 2. In addition to the exp ected regions of reco v ery around individual columns s i and failure around f i , w e see that certain com binations of vectors s i still fail, while other com binations of vectors f i ma y b e recov erable. By con trast, when using ` 1 , 1 to solve the problem, any combination of s i v ectors can b e reco vered while no com bination including an f i can b e reco vered. 8 |I | = 5 |I | = 5 |I | = 5 |I | = 7 |I | = 10 |I | = 10 |I | = 10 |I | = 10 Figure 2: Generation of problems where ` 1 , 2 succeeds, while ` 1 , 1 fails. F or a 20 × 60 matrix A and ﬁxed support of size |I | = 5 , 7 , 10, we create v ectors f i that cannot be reco vered using ` 1 , and v ectors s i than can b e recov ered. Each triangle represen ts an X 0 constructed from the vectors denoted in the corners. The lo cation in the triangle determines the w eight on each vector, ranging from zero to one, and summing up to one. The dark areas indicates the weigh ts for which ` 1 , 2 successfully recov ered X 0 . 4 Bo osted ` 1 As describ ed in Section 3, reco very using ` 1 , 1 is equiv alent to individual ` 1 reco v ery of eac h column x k := X ↓ k 0 based on b k : = B ↓ k , for k = 1 , . . . , r : minimize x k x k 1 sub ject to Ax = b k . (4.1) Assuming that the signs of nonzero entries in the supp ort of eac h x k are drawn i.i.d. from { 1 , − 1 } , w e can express the probability of reco vering a matrix X 0 with ro w support I using ` 1 , 1 in terms of the probability of reco vering v ectors on that supp ort using ` 1 . T o see how, note that ` 1 , 1 reco vers the original X 0 if and only if each individual problem in (4.1) successfully recov ers eac h x k . F or the ab ov e class of matrices X 0 this therefore giv es a recov ery rate of P ` 1 , 1 ( A, I , k ) = [ P ` 1 ( A, I )] r . Using ` 1 , 1 to reco ver X 0 is clearly not a go o d idea. Note also that uniform reco very of X 0 on a supp ort I remains unchanged, regardless of the num b er of observ ations, r , that are giv en. As a consequence of Theorem 3.2, this also means that the uniform-reco v ery properties for an y sum- of-norms approach cannot increase with r . This clearly defeats the purp ose of gathering multiple observ ations. In many instances where ` 1 , 1 fails, it may still recov er a subset of columns x k from the corresp onding observ ations b k . It seems wasteful to discard this information because if w e could recognize a single correctly reco v ered x k , we would immediately know the ro w support I = Supp row ( X 0 ) = Supp ( x k ) of X 0 . Given the correct supp ort we can recov er the nonzero part ¯ X of X 0 b y solving minimize ¯ X k A I ¯ X − B k F . (4.2) In practice we ob viously do not kno w the correct supp ort, but when a giv en solution x ∗ k of (4.1) that is suﬃcien tly sparse, w e can try to solve (4.2) for that support and v erify if the residual at the solution is zero. If so, we construct the ﬁnal X ∗ using the non-zero part and declare success. 9 giv en A , B for k = 1 , . . . , r do solv e (1.2) with b k = B ↓ k to get x I ← Supp( x ) if |I | < m/ 2 then solv e (4.2) to get X if A I X = B then X ∗ = 0 ( X ∗ ) j → ← X j → for j ∈ I return solution X ∗ return failure 0 5 10 15 20 0 10 20 30 40 50 60 70 80 90 100 r Recovery rate (%) s = 8 s = 9 s = 10 Figure 3: The bo osted ` 1 algorithm Figure 4: Theoretical (dashed) and experimental (solid) p erformance of bo osted ` 1 for three problem instances with diﬀeren t ro w support s . Otherwise we simply incremen t k and rep eat this pro cess until there are no more observ ations and reco very was unsuccessful. W e refer to this algorithm, whic h is reminiscent of the ReMBo approac h [23], as b o osted ` 1 ; its sole aim is to provide a bridge to the analysis of ReMBo. The complete b o osted ` 1 algorithm is outlined in Figure 3. The recov ery properties of the bo osted ` 1 approac h are opposite from those of ` 1 , 1 : it fails only if all individual columns fail to b e recov ered using ` 1 . Hence, given an unkno wn n × r matrix X supp orted on I with its sign pattern uniformly random, the b o osted ` 1 algorithm giv es an exp ected reco very rate of P ` B 1 ( A, I , r ) = 1 − [1 − P ` 1 ( A, I )] r . (4.3) T o exp erimentally v erify this recov ery rate, we generated a 20 × 80 matrix A with entries indep enden tly sampled from the normal distribution and ﬁxed a randomly chosen supp ort s et I s for three lev els of sparsity , s = 8 , 9 , 10. On eac h of these three supp orts we generated v ectors with all p ossible sign patterns and solv ed (1.2) to see if they could b e recov ered or not (see Section 3.3). This gives exactly the face counts required to compute the ` 1 reco v ery probability in (2.2) , and the exp ected b o osted ` 1 reco very rate in (4.3) F or the empirical success rate we tak e the a verage o ver 1,000 trials with random coeﬃcient matrices X supp orted on I s , and its nonzero entries indep enden tly drawn from the normal distribution. T o reduce the computational time we av oid solving ` 1 and instead compare the sign pattern of the current solution x k against the information computed to determine the face counts (b oth A and I s remain ﬁxed). The theoretical and empirical recov ery rates using bo osted ` 1 are plotted in Figure 4. 5 Reco very using ReMBo The bo osted ` 1 approac h can b e seen as a sp ecial case of the ReMBo [23] algorithm. ReMBo pro ceeds by taking a random vector w ∈ R r and com bining the individual observ ations in B in to a single weigh ted observ ation b : = B w . It then solves a single measuremen t v ector problem Ax = b for this b (w e shall use ` 1 throughout) and c hec ks if the computed solution x ∗ is suﬃciently sparse. If not, the abov e steps are rep eated with a diﬀerent weigh t vector w ; the algorithm stops when a maxim um num ber of trials is reached. If the supp ort I of x ∗ is small, w e form A I = [ A ↓ j ] j ∈I , and c heck if (4.2) has a solution ¯ X with zero residual. If this is the case w e ha ve the nonzero ro ws of the solution X ∗ in ¯ X and are done. Otherwise, w e simply pro ceed with the next w . The ReMBo algorithm reduces to bo osted ` 1 b y limiting the n umber of iterations to r and c ho osing w = e i 10 giv en A , B . Set Iteration ← 0 while Iteration < MaxIteration do w ← Random( n, 1) solv e (1.2) with b = B w to get x I ← Supp( x ) if |I | < m/ 2 then solv e (4.2) to get X if A I X = B then X ∗ = 0 ( X ∗ ) j → ← X j → for j ∈ I return solution X ∗ Iteration ← Iteration + 1 return failure 0 5 10 15 20 0 10 20 30 40 50 60 70 80 90 100 r Recovery rate (%) s = 8 s = 9 s = 10 Figure 5: The ReMBo- ` 1 algorithm Figure 6: Theoretical p erformance mo del for ReMBo on three problem instances with diﬀerent sparsit y lev els s . in the i th iteration. W e summarize the ReMBo- ` 1 algorithm in Figure 5. The formulation given in [23] requires a user-deﬁned threshold on the cardinality of the supp ort I instead of the ﬁxed threshold m/ 2. Ideally this threshold should be half of the spark [12] of A, where Spark( A ) : = min z ∈ Ker( A ) \{ 0 } k z k 0 whic h is the num b er of nonzeros of the sparsest v ector in the kernel of A ; any vector x 0 with fewer than Spark ( A ) / 2 nonzeros is the unique sparsest solution of Ax = Ax 0 = b [12]. Unfortunately , the spark is prohibitively exp ensive to compute, but under the assumption that A is in general p osition, Spark ( A ) = m + 1. Note that c ho osing a higher v alue can help to reco v er signals with row sparsity exceeding m/ 2. How ev er, in this case it can no longer b e guaranteed to b e the sparsest solution. T o deriv e the p erformance analysis of ReMBo, w e ﬁx a support I of cardinalit y s , and consider only signals with nonzero en tries on this supp ort. Each time we multiply B b y a weigh t v ector w , w e in fact create a new problem with an s -sparse solution x 0 = X 0 w corresp onding with a righ t-hand side b = B w = AX 0 w = Ax 0 . As reﬂected in (2.2) , recov ery of x 0 using ` 1 dep ends only on its support and sign pattern. Clearly , the more sign patterns in x 0 that we can generate, the higher the probability of reco very . Moreov er, due to the elimination of previously tried sign patterns, the probabilit y of reco very go es up with each new sign pattern (excluding negation of previous sign patterns). The maxim um num b er of sign patterns w e can c heck with b o osted ` 1 is the num b er of observ ations r . The question thus b ecomes, ho w many diﬀerent sign patterns w e can generate by taking linear combinations of the columns in X 0 ? (W e disregard the situation where elimination o ccurs and | Supp ( X 0 w ) | < s .) Equiv alently , we can ask how many orthants in R s (eac h one corresp onding to a diﬀeren t sign pattern) can be properly in tersected by the h yperplane giv en b y the range of the s × r matrix ¯ X consisting of the nonzero rows of X 0 (with proper w e mean in tersection of the interior). In Section 5.1 w e derive an exact expression for the maxim um n um b er of prop er orthant intersections in R n b y a h yp erplane generated b y d vectors, denoted by C ( n, d ). Based on the abov e reasoning, a goo d mo del for the reco v ery rate of n × r matrices X 0 with Supp row ( X 0 ) = I < m/ 2 using ReMBo is giv en b y P R ( A, I , r ) = 1 − C ( |I | ,r ) / 2 Y i =1  1 − F I ( A C ) F I ( C ) − 2( i − 1)  . (5.1) The term within brack ets denotes the probabilit y of failure and the fraction represen ts the success rate, which is giv en b y the ratio of the num b er of faces F I ( A C ) that survived the mapping to the 11 total num b er of faces to consider. The total num b er reduces by tw o at eac h trial b ecause we can exclude the face f w e just tried, as w ell as − f . The factor of t wo in C ( |I | , r ) / 2 is also due to this symmetry 1 . This mo del would be a b ound for the av erage p erformance of ReMBo if the sign patterns generated would b e randomly sampled from the space of all sign patterns on the giv en supp ort. Ho wev er, b ecause it is generated from the orthan t intersections with a h yp erplane, the actual pattern is highly structured. Indeed, it is possible to imagine a situation where the ( s − 1)-faces in C that p erish in the mapping to A C ha ve sign patterns that are all con tained in the set generated b y a single h yp erplane. Any other set of sign patterns w ould then necessarily include som e faces that survive the mapping and by trying all patterns in that set w e would recov er X 0 . In this case, the av erage reco very o ver all X 0 on that supp ort could b e m uch higher than that giv en b y (5.1) . W e do not yet fully understand ho w the surviving faces of C are distributed. Due to the simplicial structure of the facets of C , w e can exp ect the faces that perish to be partially clustered (if a ( d − 2)-face p erishes, then so will the tw o ( d − 1)-faces whose intersection giv es this face), and partially unclustered (the faces that p erish while all their sub-faces survive). Note that, regardless of these patterns, reco very is guaran teed in the limit whenev er the num b er of unique sign patterns tried exceeds half the n umber of faces lost, ( |F I ( C ) | − |F I ( AC ) | ) / 2. Figure 6 illustrates the theoretical p erformance mo del based on C ( n, d ), for which we deriv e the exact expression in Section 5.1. In Section 5.2 we discuss practical limitations, and in Section 5.3 we empirically lo ok at ho w the n um b er of sign patterns generated gro ws with the n umber of normally distributed vectors w , and how this aﬀects the recov ery rates. T o allow comparison b etw een ReMBo and b o osted ` 1 , we used the same matrix A and supp ort I s used to generate Figure 4. 5.1 Maxim um n umber of orthant in tersections with subspace Theorem 5.1. L et C ( n, d ) denote the maximum attainable numb er of orthant interiors interse cte d by a hyp erplane in R n gener ate d by d ve ctors. Then C ( n, 1) = 2 , C ( n, d ) = 2 n for d ≥ n . In gener al, C ( n, d ) is given by C ( n, d ) = C ( n − 1 , d − 1) + C ( n − 1 , d ) = 2 d − 1 X i =0  n − 1 i  . (5.2) Pr o of. The num b er of intersected orthan ts is exactly equal to the n umber of prop er sign patterns (excluding zero v alues) that can b e generated b y linear com binations of those d v ectors. When d = 1, there can only b e tw o suc h sign patterns corresponding to p ositive and negative multiples of that v ector, th us giving C ( n, 1) = 2. Whenever d ≥ n , w e can choose a basis for R n and add additional v ectors as needed, and we can reach all p oints, and therefore all 2 n = C ( n, d ) sign patterns. F or the general case (5.2) , let v 1 , . . . , v d b e vectors in R n suc h that the aﬃne hull with the origin, S = aﬀ { 0 , v 1 , . . . , v d } , gives a hyperplane in R n that prop erly intersects the maxim um num b er of orthan ts, C ( n, d ). Without loss of generalit y assume that v ectors v i , i = 1 , . . . , d − 1 all ha v e their n th comp onent equal to zero. Now, let T = aﬀ { 0 , v 1 , . . . , v d − 1 } ⊆ R n − 1 b e the in tersection of S with the ( n − 1)-dimensional subspace of all p oints X = { x ∈ R n | x n = 0 } , and let C T denote the num b er of ( n − 1)-orthan ts intersected by T . Note that T itself, as em b edded in R n , do es not prop erly in tersect any orthant. Ho w ever, by adding or subtracting an arbitrarily small amoun t of v d , we intersect 2 C T orthan ts; taking v d to b e the n th column of the iden tity matrix w ould suﬃce for that matter. Any other orthants that are added ha ve either x n > 0 or x n < 0, and their n umber does not dep end on the magnitude of the n th en try of v d , pro vided it remains nonzero. Because only the ﬁrst n − 1 entries of v d determine the maximum num b er of additional orthants, the problem reduces to R n − 1 . In fact, we ask ho w man y new orthants can be added to C T taking the aﬃne h ull of T with v , the orthogonal pro jection v d on to X . Since the maximum orthan ts for this d -dimensional subspace in R n − 1 is giv en b y C ( n − 1 , d ), this num b er is clearly bounded by 1 Henceforth w e use the conven tion that the uniqueness of a sign pattern is in v ariant under negation. 12 C ( n − 1 , d ) − C T . Adding this to 2 C T , we ha ve C ( n, d ) ≤ 2 C T + [ C ( n − 1 , d ) − C T ] = C T + C ( n − 1 , d ) ≤ C ( n − 1 , d − 1) + C ( n − 1 , d ) ≤ 2 d − 1 X i =0  n − 1 i  . (5.3) The ﬁnal expression follows by expanding the recurrence relations, whic h generates (a part of ) P ascal’s triangle, and combining this with C (1 , j ) = 2 for j ≥ 1. In the ab ov e, whenev er there are free orthan ts in R n − 1 , that is, when d < n , we can alw ays choose the corresp onding part of v d in that orthant. As a consequence we hav e that no hyperplane supported by a set of vectors can in tersect the maxim um n umber of orthants when the range of those vectors includes some e i . W e now show that this expression holds with equality . Let U denote an ( n − d )-h yp erplane in R n that intersects the maximum C ( n, n − d ) orthants. W e now claim that in the in terior of each orthan t not in tersected b y U there exists a vector that is orthogonal to U . If this w ere not the case then T m ust be aligned with some e i and can therefore not b e optimal. The span of these orthogonal vectors ge nerates a d -h yp erplane V that intersects C V = 2 n − C ( n, n − d ) orthants, and it follows that C ( n, d ) ≥ C V = 2 n − C ( n, n − d ) ≥ 2 n − 2 n − d − 1 X i =0  n − 1 i  = 2 n − 1 X i =0  n − 1 i  − 2 n − d − 1 X i =0  n − 1 i  = 2 n − 1 X n − d  n − 1 i  = 2 d − 1 X i =0  n − 1 i  ≥ C ( n, d ) , where the last inequality follows from (5.3). Consequently , all inequalities hold with equality . Corollary 5.2. Given d ≤ n , then C ( n, d ) = 2 n − C ( n, n − d ) , and C (2 d, d ) = 2 2 d − 1 . Corollary 5.3. A hyp erplane H in R n , deﬁne d as the r ange of V = [ v 1 , v 2 , . . . , v d ] , interse cts the maximum numb er of orthants C ( n, d ) whenever rank ( V ) = n , or when e i 6∈ range ( V ) for i = 1 , . . . , n . 5.2 Practical considerations In practice it is generally not feasible to generate all of the C ( |I | , r ) / 2 unique sign patterns. This means that w e w ould ha ve to replace this term in (5.1) b y the n umber of unique patterns actually tried. F or a giv en X 0 the actual probability of recov ery is determined by a n umber of factors. First of all, the linear com binations of the columns of the nonzero part of ¯ X prescrib e a hyperplane and therefore a set of p ossible sign patterns. With eac h sign pattern is asso ciated a face in C that ma y or ma y not map to a face in A C . In addition, dep ending on the probabilit y distribution from whic h the w eight vectors w are drawn, there is a certain probability for reaching each sign pattern. Summing the probabilit y of reaching those patterns that can b e reco vered gives the probabilit y P ( A, I , X 0 ) of reco v ering with an individual random sample w . The probabilit y of reco v ery after t trials is then of the form 1 − [1 − P ( A, I , X 0 )] t . T o attain a certain sign pattern ¯ e , we need to ﬁnd an r -v ector w suc h that sign ( ¯ X w ) = ¯ e . F or a p ositiv e sign on the j th p osition of the support w e can take any vector w in the open halfspace { w | ¯ X j → w > 0 } , and likewise for negative signs. The region of vectors w in R r that generates a desired sign pattern thus corresp onds to the intersection of |I | op en halfspaces. The measure of this intersection as a fraction of R r determines the probabilit y of sampling such a w . T o formalize, deﬁne K as the cone generated by the ro ws of − diag ( ¯ e ) ¯ X , and the unit Euclidean ( k − 1)-sphere 13 S k − 1 = { x ∈ R r | k x k 2 = 1 } . The in tersection of halfspaces then corresp onds to the interior of the p olar cone of K : K ◦ = { x ∈ R r | x T y ≤ 0 , ∀ y ∈ K} . The fraction of R r tak en up by K ◦ is giv en b y the ( k − 1)-con tent of S k − 1 ∩ K ◦ to the ( k − 1)-con tent of S k − 1 [21]. This quantit y coincides precisely with the deﬁnition of the external angle of K at the origin. 5.3 Experiments In this section w e illustrate the theoretical results from Section 5 and examine some practical considerations that aﬀect the performance of ReMBo. F or all experiments that require the matrix A , we use the same 20 × 80 matrix that w as used in Section 4, and lik ewise for the supports I s . T o solv e (1.2) , we again use CVX in conjunction with SDPT3. W e consider x 0 to b e recov ered from b = Ax 0 = AX 0 w if k x ∗ − x 0 k ∞ ≤ 10 − 5 , where x ∗ is the computed solution. The exp eriments that are concerned with the num b er of unique sign patterns generated dep end only on the s × r matrix ¯ X represen ting the nonzero en tries of X 0 . Because an initial reordering of the rows do es not aﬀect the num b er of patterns, those exp erimen ts dep end only on ¯ X , s = |I | , and the num b er of observ ations r ; the exact indices in the support set I are irrelev ant for those tests. 5.3.1 Generation of unique sign patterns The practical p erformance of ReMBo dep ends on its abilit y to generate as many diﬀeren t sign patterns using the columns in X 0 as p ossible. A natural question to ask then is how the num b er of such patterns grows with the n um b er of randomly drawn samples w . Although this ultimately dep ends on the distribution used for generating the entries in w , we shall, for sak e of simplicity , consider only samples drawn from the normal distribution. As an experiment w e tak e a 10 × 5 matrix ¯ X with normally-distributed en tries, and o v er 10 8 trials record ho w often eac h sign-pattern (or negation) w as reac hed, and in which trial they were ﬁrst encoun tered. The results of this exp erimen t are summarized in Figure 7. F rom the distribution in Figure 7(b) it is clear that the o ccurrence levels of diﬀerent orthan ts exhibits a strong bias. The most frequently visited orthan t pairs were reached up to 7 . 3 × 10 6 times, while others, those hard to reac h using w eights from the normal distribution, were observed only four times ov er all trials. The eﬃciency of ReMBo dep ends on the rate of encountering new sign patterns. Figure 7(c) sho ws how the av erage rate c hanges ov er the num b er of trials. The curves in Figure 7(d) illustrate the theoretical probability of recov ery in (5.1) , with C ( n, d ) / 2 replaced b y the num b er of orthant pairs at a given iteration, and with face counts determined as in Section 4, for three instances with supp ort cardinalit y s = 10, and observ ations r = 5. 5.3.2 Role of ¯ X . Although the num b er of orthants that a hyperplane can in tersect does not depend on the basis with which it was generated, this choice do es greatly inﬂuence the abilit y to sample those orthan ts. Figure 8 sho ws t w o wa ys in which this can happ en. In part (a) w e sampled the num ber of unique sign patterns for t wo diﬀerent 9 × 5 matrices ¯ X , each with columns scaled to unit ` 2 -norm. The en tries of the ﬁrst matrix w ere independently dra wn from the normal distribution, while those in the second w ere generated b y rep eating a single column dra wn lik ewise and adding small random p erturbations to each entry . This caused the av erage angle b etw een any pair of columns to decrease from 65 degrees in the random m atrix to a mere 8 in the p erturb ed matrix, and greatly reduces the probability of reac hing certain orthants. The same idea applies to the case where d ≥ n , as sho wn in part (b) of the same ﬁgure. Although choosing d greater than n do es not increase the num b er of orthants that can be reac hed, it do es make reaching them easier, thus allo wing ReMBo to work more eﬃcien tly . Hence, w e can exp ect ReMBo to ha ve higher reco very on a v erage when the num ber of columns in X 0 increases and when they hav e a lo w er mutual coherence µ ( X ) = min i 6 = j | x T i x j | / ( k x i k 2 · k x j k 2 ). 14 10 0 10 2 10 4 10 6 10 8 0 50 100 150 200 250 Iterations Unique sign pattern pairs 0 50 100 150 200 250 0 1 2 3 4 5 6 7 8 Instances (% of trials) Sign pattern index (a) (b) 10 0 10 2 10 4 10 6 10 8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Iterations Unique sign pattern pairs per iteration 10 0 10 2 10 4 10 6 10 8 0 10 20 30 40 50 60 70 80 90 100 Iterations Recovery rate (%) p = 1.17% p = 0.20% p = 0.78% (c) (d) Figure 7: Sampling the sign patterns for a 10 × 5 matrix ¯ X , with (a) num b er of unique sign patterns v ersus num b er of trials, (b) relative frequency with whic h each orthan t is sampled, (c) av erage n um b er of new sign patterns p er iteration as a function of iterations, and (d) theoretical probability of recov ery using ReMBo for three instances of X 0 with row sparsity s = 10, and r = 5 observ ations. 10 0 10 2 10 4 10 6 0 20 40 60 80 100 120 140 160 Iterations Unique sign patterns Gaussian Perturbed 10 0 10 2 10 4 10 6 0 50 100 150 200 250 300 350 400 450 500 550 Iterations Unique sign patterns r = 10 r = 12 r = 15 (a) (b) Figure 8: Num b er of unique sign patterns for (a) t wo 9 × 5 matrices ¯ X with columns scaled to unit ` 2 -norm; one with en tries drawn indep endently from the normal distribution, and one with a single random column rep eated and random perturbations added, and (b) 10 × r matrices with r = 10 , 12 , 15. 15 0 5 10 15 20 0 100 200 300 400 500 600 r Unique orthant pairs Trials = 1,000 Trials = 10,000 Trials = Inf 0 5 10 15 20 0 20 40 60 80 100 r Recovery rate (%) s = 8 s = 9 s = 10 0 5 10 15 20 0 20 40 60 80 100 r Recovery rate (%) s = 8 s = 9 s = 10 (a) (b) (c) Figure 9: Eﬀect of limiting the num b er of weigh t vectors w on (a) the distribution of unique orthant coun ts for 10 × k random matrices ¯ X , solid lines give the median num b er and the dashed lines indicate the minimum and maximum v alues, the top solid line is the theoretical maxim um; (b–c) the av erage p erformance of the ReMBo- ` 1 algorithm (solid) for ﬁxed 20 × 80 matrix A and three diﬀeren t supp ort sizes r = 8 , 9 , 10, along with the a verage predicted performance (dashed). The supp ort patterns used are the same as those used for Figure 4. 5.3.3 Limiting the n umber of iterations The n umber of iterations used in the previous exp erimen ts greatly exceeds that what is practically feasible: we cannot aﬀord to run ReMBo un til all p ossible sign patterns hav e been tried, even if there was a w ay detect that the limit had been reac hed. Realistically , we should set the n umber of iterations to a ﬁxed maximum that dep ends on the computational resources a v ailable, and the problem setting. In Figure 7 w e show the unique orthant coun t as a function of iterations and the predicted reco very rate. When using only a limited num b er of iterations it is interesting to know what the distribution of unique orthant counts lo oks like. T o ﬁnd out, we drew 1,000 random ¯ X matrices for eac h size s × r , with s = 10 nonzero ro ws ﬁxed, and the num b er of columns ranging from r = 1 , . . . , 20. F or each ¯ X w e counted the num b er of unique sign patterns attained after resp ectively 1,000 and 10,000 iterations. The resulting minimum, maxim um, and median v alues are plotted in Figure 9(a) along with the theoretical maxim um. More in terestingly of course is the av erage reco very rate of ReMBo with those num b er of iterations. F or this test w e again used the 20 × 80 matrix A with predetermined support I , and with success or failure of eac h sign pattern on that supp ort precomputed. F or each v alue of r = 1 , . . . , 20 w e generated random matrices X on I and ran ReMBo with the maxim um n um b er of iterations set to 1,000 and 10,000. T o sa ve on computing time, we compared the on-supp ort sign pattern of each combined co eﬃcient vector X w to the known results instead of solving ` 1 . The av erage recov ery rate thus obtained is plotted in Figures 9(b)–(c), along with the a verage of the predicted performance using (5.1) with C ( n, d ) / 2 replaced by orthan t counts found in the previous exp eriment. 6 Conclusions The MMV problem is often solved by minimizing the sum-of-row norms of the unkno wn co eﬃcients X . W e show that the (lo cal) uniform recov ery properties, i.e., reco v ery of all X 0 with a ﬁxed ro w supp ort I = Supp row ( X 0 ), cannot exceed that of ` 1 , 1 , the sum of ` 1 norms. This is despite the fact that ` 1 , 1 reduces to solving the basis pursuit problem (1.2) for each column separately , which do es not tak e adv antage of the fact that all v ectors in X 0 are assumed to ha ve the same supp ort. A consequence of this observ ation is that the use of restricted isometry techniques to analyze (local) uniform recov ery using sum-of-norm minimization can at b est giv e improv ed b ounds on ` 1 reco v ery . Empirically , minimization with ` 1 , 2 , the sum of ` 2 norms, clearly outperforms ` 1 , 1 on individual problem instances: for supports where uniform reco very fails, ` 1 , 2 reco vers more cases than ` 1 , 1 . W e construct cases where ` 1 , 2 succeeds while ` 1 , 1 fails, and vice versa. F rom the construction where only ` 1 , 2 succeeds it also follows that the relative magnitudes of the coeﬃcients in X 0 matter for 16 reco very . This is unlike ` 1 , 1 reco very , where only the support and the sign patterns matter. This implies that the notion of faces, so useful in the analysis of ` 1 , disapp ears. W e sho w that the performance of ` 1 , 1 outside the uniform-reco very regime degrades rapidly as the num b er of observ ations increases. W e can turn this situation around, and increase the p erformance with the n um b er of observ ations by using a bo osted- ` 1 approac h. This technique aims to uncov er the correct supp ort based on basis pursuit solutions for individual observ ations. Bo osted- ` 1 is a special case of the ReMBo algorithm which rep eatedly takes random com binations of the observ ations, allo wing it to sample many more sign patterns in the co eﬃcient space. As a result, the p otential reco very rates of ReMBo (at least in com bination with an ` 1 solv er) are a m uch higher than bo osted- ` 1 . ReMBo can b e used in combination with any solv er for the single measuremen t problem Ax = b , including greedy approac hes and rew eighted ` 1 [4]. The recov ery rate of greedy approaches may b e low er than ` 1 but the algorithms are generally muc h faster, thus giving ReMBo the c hance to sample more random combinations. Another adv antage of ReMBo, ev en more so than b o osted- ` 1 , is that it can b e easily parallelized. Based on the geometrical interpretation of ReMBo- ` 1 (cf. Figure 5), we conclude that, the- oretically , its performance does not increase with the num b er of observ ations after this num b er reac hes the num b er of nonzero ro ws. In addition we develop a simpliﬁed mo del for the performance of ReMBo- ` 1 . T o improv e the m odel we w ould need to know the distribution of faces in the cross-p olytope C that map to faces on A C , and the distribution of external angles for the cones generated by the signed ro ws of the nonzero part of X 0 . It would b e v ery in teresting to compare the reco very p erformance b etw een ` 1 , 2 and ReMBo- ` 1 . Ho wev er, we consider this beyond the scop e of this pap er. All of the n umerical experiments in this pap er are repro ducible. The scripts used to run the exp erimen ts and generate the ﬁgures can b e do wnloaded from http://www.cs.ubc.ca/ ~ mpf/jointsparse . Ac knowledgmen ts The authors w ould lik e to giv e their sincere thanks to ¨ Ozg ¨ ur Yılmaz and Ray an Saab for their though tful comments and suggestions during n umerous discussions. References [1] E. J. Cand` es. Compressive sampling. In Pr o c e e dings of the International Congr ess of Mathe- maticians , Madrid, Spain, 2006. [2] E. J. Cand` es, J. Rom b erg, and T. T ao. Robust uncertaint y principles: Exact signal recon- struction from highly incomplete frequency information. IEEE T r ansactions on Information The ory , 52(2):489–509, F ebruary 2006. [3] E. J. Cand` es and T. T ao. Decoding by linear programming. IEEE T r ansactions on Information The ory , 51(2):4203–4215, Decem b er 2005. [4] E. J. Cand ` es, M. B. W akin, and S. P . Boyd. Enhancing sparsity by reweigh ted ` 1 minimization. Journal of F ourier A nalysis and Applic ations , 14(5–6):877–905, December 2008. [5] J. Chen and X. Huo. Theoretical results on sparse represenations of multiple-measuremen t v ectors. IEEE T r ansactions on Signal Pr o c essing , 54:4634–4643, Decem b er 2006. [6] S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomp osition by basis pursuit. SIAM Journal on Scientiﬁc Computing , 20(1):33–61, 1998. [7] S. F. Cotter and B. D. Rao. Sparse c hannel estimation via matc hing pursuit with application to equalization. IEEE T r ansactions on Communic ations , 50(3), Marc h 2002. 17 [8] S. F. Cotter, B. D. Rao, K. Engang, and K. Kreutz-Delgado. Sparse solutions to linear in v erse problems with m ultiple measurement vectors. IEEE T r ansactions on Signal Pr o c essing , 53:2477–2488, July 2005. [9] D. L. Donoho. Neigh b orly polytop es and sparse solution of underdetermined linear equations. T echnical Rep ort 2005-4, Departmen t of Statistics, Stanford Universit y , Stanford, CA, 2005. [10] D. L. Donoho. Compressed sensing. IEEE T r ansactions on Information The ory , 52(4):1289– 1306, April 2006. [11] D. L. Donoho. High-dimensional cen trosymmetric p olytop es with neigh borliness proportional to dimension. Discr ete and Computational Ge ometry , 35(4):617–652, Ma y 2006. [12] D. L. Donoho and M. Elad. Optimally sparse representation in general (nonorthogonal) dictionaries via ` 1 minimization. PNAS , 100(5):2197–2202, Marc h 2003. [13] D. L. Donoho and X. Huo. Uncertaint y principles and ideal atomic decomp osition. IEEE T r ansactions on Information The ory , 47(7):2845–2862, No vem b er 2001. [14] Y. C. Eldar and M. Mishali. Robust reco v ery of signals from a union of subspaces. arXiv 0807.4581, July 2008. [15] I. J. F evrier, S. B. Gelfand, and M. P . Fitz. Reduced complexity decision feedback equalization for multipath channels with large delay spreads. IEEE T r ansactions on Communic ations , 47(6):927–937, June 1999. [16] J.-J. F uchs. On sparse represen tations in arbitrary redundan t bases. IEEE T r ansactions on Information The ory , 50(6):1341–1344, June 2004. [17] M. Grant and S. Boyd. Graph implementations for nonsmo oth conv ex programs. In V. Blondel, S. Boyd, and H. Kimura, editors, L e ctur e Notes in Contr ol and Information Scienc es , pages 95–110. Springer, 2008. [18] M. Grant and S. Boyd. CVX: Matlab softw are for disciplined con vex programming (w eb page and softw are). http://stanford.edu/ ~ boyd/cvx , F ebruary 2009. [19] R. Grib onv al and M. Nielsen. Sparse represen tations in unions of bases. IEEE T r ansactions on Information The ory , 49(12):3320–3325, December 2003. [20] R. Grib on v al and M. Nielsen. Highly sparse represen tations from dictionaries are unique and indep endents of the sparseness measure. Applie d and Computational Harmonic Analysis , 22(3):335–355, May 2007. [21] B. Gr ¨ un baum. Convex Polytop es , v olume 221 of Gr aduate T exts in Mathematics . Springer- V erlag, second edition, 2003. [22] D. Maliouto v, M. C ¸ etin, and A. S. Willsky . A sparse signal reconstruction persp ective for source lo calization with sensor arrays. IEEE T r ansactions on Signal Pr o c essing , 53(8):3010–3022, August 2005. [23] M. Mishali and Y. C. Eldar. Reduce and bo ost: Recov ering arbitrary sets of join tly sparse v ectors. IEEE T r ansactions on Signal Pr o c essing , 56(10):4692–4702, October 2008. [24] B. K. Natara jan. Sparse approximate solutions to linear systems. SIAM Journal on Computing , 24(2):227–234, April 1995. [25] R. T. Ro ck afellar. Convex A nalysis . Princeton Universit y Press, Princeton, 1970. [26] M. Sto jnic, F. P arv aresh, and B. Hassibi. On the reconstruction of blo ck-sparse signals with an optimal n umber of measurements. arXiv 0804.0041, Marc h 2008. 18 [27] J. A. T ropp. Recov ery of short, complex linear com binations via ` 1 minimization. IEEE T r ansactions on Information The ory , 51(4):1568–1570, April 2005. [28] J. A. T ropp. Algorithms for sim ultaneous sparse appro ximation: Part I I: Conv ex relaxation. Signal Pr o c essing , 86:589–602, 2006. [29] J. A. T ropp, A. C. Gilb ert, and M. J. Strauss. Algorithms for sim ultaneous sparse appro ximation: P art I: Greedy pursuit. Signal Pr o c essing , 86:572–588, 2006. [30] R. H. T ¨ ut ¨ unc¨ u, K. C. T oh, and M. J. T o dd. Solving semideﬁnite-quadratic-linear programs using SDPT3. Mathematic al Pr o gr amming Ser. B , 95:189–217, 2003. 19

Joint-sparse recovery from multiple measurements

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment