Group-sparse Matrix Recovery

GR OUP-SP ARSE MA TRIX RECO VER Y Xiangr ong Zeng and M ´ ario A. T . F igueir edo Instituto de T elecomunicac ¸ ˜ oes, Instituto Superior T ´ ecnico, Lisboa, Portugal ABSTRA CT W e apply the OSCAR (octagonal selection and clustering al- gorithms for re gression) in reco vering group-sparse matrices (two-dimensional—2D—arrays) from compressiv e measure- ments. W e propose a 2D version of OSCAR (2OSCAR) con- sisting of the ` 1 norm and the pair-wise ` ∞ norm, which is con vex but non-differentiable. W e show that the proximity operator of 2OSCAR can be computed based on that of OS- CAR. The 2OSCAR problem can thus be ef ﬁciently solved by state-of-the-art proximal splitting algorithms. Experiments on group-sparse 2D array recovery show that 2OSCAR regular - ization solved by the SpaRSA algorithm is the f astest choice, while the P ADMM algorithm (with debiasing) yields the most accurate results. Index T erms — group sparsity , matrix recovery , proximal splitting algorithms, proximity operator , signal recovery . 1. INTRODUCTION The problem studied in this paper is the classical one of re- cov ering X from Y = AX + W , (1) where A ∈ R m × n is a known sensing matrix, X ∈ R n × d the original unknown matrix/2D-array , Y ∈ R m × d is the ob- served data, and W ∈ R m × d denotes additi ve noise. In many cases of interest, we have m < n , making (1) an ill-posed problem, which can only be addressed by using some form of regularization that injects prior knowledge about the un- known X . Classical regularization formulations seek solu- tions of problems of the form min X F ( X ) + Φ ( X ) , (2) or one of the equiv alent (under mild conditions) forms min X Φ( X ) s.t. F ( X ) ≤ ε or min X F ( X ) s.t. Φ( X ) ≤ , (3) where F ( X ) is the data-ﬁdelity term and Φ( X ) is the regular - izer , the purpose of which is to enforce certain properties on X , such as sparsity or group sparsity , and ε and  are positiv e parameters. W ork partially supported by Fundac ¸ ˜ ao para a Ci ˆ encia e T ecnologia, grants PEst-OE/EEI/LA0008/2013 and PTDC/EEI-PR O/1470/2012. Problem (1) is more challenging than the more studied linear in verse problem of recovering a v ector x ∈ R n from y = Ax + w , (4) where A ∈ R m × n is also a kno wn sensing matrix, y ∈ R m is the observed v ector , and w ∈ R m × d is additive noise. Com- paring with (4), the matrix X of interest in (1) is always as- sumed to be, not only sparse, but also to have a particular sparse structure. For instance, in the multiple measur ement vector model [1], [2], [3], X is an unknown source matrix that should be ro w sparse; in group LASSO [4], [5], [6], X is a coef ﬁcient matrix that is also enforced to be row sparse; in multi-task learning [7], [8], X is a task parameter matrix, which is usually assumed to be ro w or/and column sparse. In this paper, we pursue more general sparsity patterns for X , that is, the arrangement of each group of nonzeros in X is not limited to ro ws and/or columns, but may include ro w/columns segments, blocks, or other groups of connected non-zero ele- ments. Before addressing the question of whether or not there are any available regularizers able to promote this kind of group sparsity , we ﬁrst brieﬂy revie w e xisting group-sparsity- inducing regularizers. In recent years, much attention has been devoted not only to the sparsity of solutions, but also the structure of this spar- sity [9]. In other words, not only the number of non-zeros in the solutions, but also how these non-zeros are located, are of interest. This research direction has lead to the concept of group/block sparsity [4], [10], or more general structured sparsity patterns [11], [12], [13]. A classical model for group sparsity is the gr oup LASSO [4], which, making use of more information than the original LASSO [14] (namely , the struc- ture of the groups) is able to simultaneously encourage spar- sity and group sparsity . In addition, the sparse gr oup LASSO approach was proposed in [15]; its regularizer consists of an ` 1 term plus the group LASSO regularizer , thus unlike group LASSO, it not only selects groups, but also individual v ari- ables within each group. It has also been observed that in some real-world prob- lems, it makes sense to encourage the solution, not only to be sparse, but also to hav e sev eral components sharing similar values. T o formalize this goal, several generic models have been proposed, such as the elastic net [16], the fused LASSO [17], and the octagonal shrinkage and clustering algorithm for re gr ession (OSCAR) [18]. Fig. 1 . Illustration of LASSO, elastic net, fused LASSO and OSCAR The le vel curves of se veral of the regularizers mentioned in the previous paragraph (for the 2D case) are shown in Fig. 1. The ﬁgure illustrates why these models promote variable grouping (unlike LASSO). Firstly , the regularizer of the elastic net [16] consists of a ` 1 term and a ` 2 term, thus simultaneously promoting sparsity and group-sparsity , in which the former comes from the sparsity-inducing corners (see Fig. 1) while the latter from its strictly con vex edges, which creates a grouping effect similar to a quadratic/ridge regularizer . Secondly , the regularizer of the fused LASSO is composed of a ` 1 term and a total variation term, which encourages successive v ariables (in a certain order) to be sim- ilar , making it able to promote both sparsity and smoothness. Thirdly , the OSCAR regularizer (proposed by Bondell and Reich [18]) is constituted by a ` 1 term and a pair-wise ` ∞ term, which promotes equality (in absolute value) of each pair of variables. There are some recent variants of the abov e group-sparsity regularizers, such as the weighted fused LASSO , presented in [19]. The pair-wise fused LASSO [19], which uses the pair-wise term of OSCAR, e xtends the fused LASSO to cases where the variables ha ve no natural ordering. A nov el graph- guided fused LASSO was proposed in [20], where the group- ing structure is modeled by a graph. A Bayesian version of the elastic net was developed in [21]. Finally , an adaptive group- ing pursuit method was proposed in [22], but the underlying regularizer is neither sparsity-promoting nor con vex. The fused LASSO, elastic net, and OSCAR regularizers all hav e the ability to promote sparsity and variable group- ing. Ho wev er , as pointed out in [23], OSCAR outperforms the other two models in terms of grouping. Moreover , the fused LASSO is not suitable for group according to magnitude, and the grouping ability of the conv ex edges of the elastic net is inferior to that of OSCAR. Thus, this paper will focus on the OSCAR regularizer to solve the problems of group-sparse matrix recov ery . In this paper , we will propose a two-dimensional (matrix) version of OSCAR (2OSCAR) for group-sparse matrix recov- ery . Solving OSCAR regularization problems has been ad- dressed in our previous work [24], in which, six state-of-the- art proximal splitting algorithms: FIST A [25], T wIST [26], SpaRSA [27], ADMM [28], SBM [29] and P ADMM [30] are in vestigated. Naturely , we build the relationship between OS- CAR and 2OSCAR, and then address 2OSCAR regularization problems as in [24]. T erminology and Notation W e denote vectors or general v ariables by lower case let- ters, and matrices by upper case ones. The ` 1 norm of a vector x ∈ R n is k x k 1 = P n i =1 | x i | where x i represents the i -th component of x , and that of a matrix X ∈ R n × d is k X k 1 = P n i =1 P d j =1   X ( i,j )   where X ( i,j ) the entry of X at the i -th row and the j -th column. Let k X k F =  P n i =1 P d j =1 X 2 ( i,j )  1 / 2 be the Frobenius norm of X . W e now brieﬂy re view some elements of con vex analysis that will be used below . Let H be a real Hilbert space with in- ner product h· , ·i and norm k·k . Let f : H → [ −∞ , + ∞ ] be a function and Γ be the class of all lo wer semi-continuous, con- ve x, proper functions (not equal to + ∞ everywhere and ne ver equal to −∞ ). The proximity operator [31] of λ f (where f ∈ Γ and λ ∈ R + ) is deﬁned as prox λf ( v ) = arg min x ∈H  λf ( x ) + 1 2 k x − v k 2  . (5) 2. RECO VERING GROUP-SP ARSE MA TRICES 2.1. OSCAR and its 2D V ersion (2OSCAR) The OSCAR criterion is giv en by [18] min x ∈ R n 1 2 k y − Ax k 2 2 + λ 1 k x k 1 + λ 2 X i 1 , α 0 = α min > 0 , α max > α min , and X 0 . 2. V 0 = X 0 − A T ( AX 0 − Y ) /α 0 3. X 1 = Prox Φ 2OSCAR /α 0 ( V 0 ) 4. repeat 5. S k = X k − X k − 1 6. R k = A T AS k 7. ˆ α k = ( S k ) T R k ( S k ) T S k 8. α k = max { α min , min { ˆ α k , α max }} 9. repeat 10. V k = X k − A T ( AX k − Y ) /α k 11. X k +1 = Prox Φ 2OSCAR /α k ( V k ) 12. α k ← η α k 13. until X k +1 satisﬁes an acceptance criterion. 14. k ← k + 1 15. until some stopping criterion is satisﬁed. A common a cceptance criterion in line 13 requires the ob- jectiv e function to decrease; see [27] for details. 2.3. Debiasing As is well known, the solutions obtained under 2OSCAR (and man y other types of regularizers) are attenuated/biased in magnitude. Thus, it is common practice to apply debiasing as a postprocessing step; i.e. , the solutions obtained by , say the the SpaRSA algorithm provides the structure/support of the estimate and the debiasing step recovers the magnitudes of the solutions. The debiasing method used in SpaRSA [27] is also adopted in this paper . Speciﬁcally , the debiasing phase solves b X debias = arg min X  k AX − Y k 2 F  s.t. supp ( X ) = supp ( ˜ X ) (10) where ˜ X is the estimate produced by the SpaRSA (or any other) algorithm and supp ( X ) denotes the set of indices of the non-zero elements of X . This problem is solv ed by conjugate gradient procedure; see [27] for more details. 3. EXPERIMENTS All the e xperiments were performed using MA TLAB on a 64- bit W indows 7 PC with an Intel Core i7 3.07 GHz processor and 6.0GB of RAM. The performance of the different algo- rithms is assessed via the following ﬁ ve metrics, where E is an estimate of X ): • Mean absolute error , MAE = k X − E k 1 / ( nd ) ; • Mean square error , MSE = k X − E k 2 F / ( nd ) ; • Position error rate, PER = n X i =1 d X j =1     sign  X ( i,j )    −   sign  E ( i,j )      / ( nd ) . • Elapsed time (TIME). W e consider the experiments on recovery of a 100 × 10 matrix X with different styles of groups – blocks, lines and curved groups, consisting of positiv e and neg ativ e elements. The observ ed matrix Y is generated by (1), in which the vari- ance of the noise W is σ 2 = 0 . 16 . The sensing matrix A is a 65 × 100 matrix with components sampled from the standard normal distribution. There are 100 nonzeros in the original 100 × 10 matrix, with values arbitrarily chosen from the set {− 7 , − 8 , − 9 , 7 , 8 , 9 } (Fig. 3). T able 1 . Results of metrics Metrics TIME (sec.) MAE MSE PER debiasing yes no yes no yes no - FIST A 4.37 4.26 0.0784 0.477 2.45 0.202 0.1% T wIST 5.10 4.45 0.0799 0.480 2.47 0.202 0.2% SpaRSA 2.25 2.26 0.0784 0.477 2.44 0.202 0.0% ADMM 6.65 6.60 0.0786 0.477 2.44 0.206 0.2% SBM 6.32 6.22 0.0784 0.477 2.45 0.202 0.1% P ADMM 6.01 5.97 0.0762 0.456 2.42 0.182 0.0% W e run algorithms mentioned abo ve (FIST A, T wIST , SpaRSA, SBM, ADMM, P ADMM), with and without debi- asing. The stopping condition is k X k +1 − X k k / k X k +1 k ≤ 0 . 001 , where X k represents the estimate at the k -th iteration. W e set λ 1 = 0 . 5 and λ 2 = 0 . 0024 . Other parameters are hand-tuned in each case for the best improvement in MAE. The recovered matrices are shown in Fig. 3 and the quantita- tiv e results are reported in T able 1. W e can conclude from Fig. 3 and T able 1 that the 2OS- CAR criterion solved by proximal splitting algorithms with debiasing is able to accurately recover group-sparse matrices. Among the algorithms, the SpaRSA is the fastest, while the P ADMM obtains the most accurate solutions. 4. CONCLUSIONS W e hav e applied the OSCAR regularizer to recov er group- sparse matrix with arbitrary groups from compressiv e mea- surements. A matrix version of the OSCAR (2OSCAR) has Fig. 3 . Original and recovered matrices been proposed and solved by six state-of-the-art proximal spiting algorithms: FIST A, T wIST , SpaRSA, SBM, ADMM and P ADMM, with or without debaising. Experiments on group-sparse matrix recovery show that the 2OSCAR reg- ularizer solv ed by the SpaRSA algorithm has the fastest con vergence, while the P ADMM leads to the most accurate estimates. 5. REFERENCES [1] S.F . Cotter , B.D. Rao, K. Engan, and K. Kreutz- Delgado, “Sparse solutions to linear inv erse problems with multiple measurement vectors, ” IEEE T rans. on Signal Pr ocessing , vol. 53, pp. 2477–2488, 2005. [2] BD Rao and K. Kreutz-Delgado, “Sparse solutions to linear inv erse problems with multiple measurement vec- tors, ” in Pr oc. of the 8th IEEE Digital Signal Pr ocessing W orkshop , 1998. [3] Z. Zhang and B.D. Rao, “Sparse signal reco very with temporally correlated source vectors using sparse bayesian learning, ” IEEE Jour . of Selected T opics in Sig- nal Pr ocessing , vol. 5, pp. 912–926, 2011. [4] M. Y uan and Y . Lin, “Model selection and estimation in regression with grouped variables, ” Jour . of the Royal Statistical Society (B) , vol. 68, pp. 49–67, 2005. [5] Z. Qin and D. Goldfarb, “Structured sparsity via alter- nating direction methods, ” The Jour . of Machine Learn- ing Researc h , vol. 98888, pp. 1435–1468, 2012. [6] L. Y uan, J. Liu and J. Y e, “Efﬁcient methods for over - lapping group lasso, ” IEEE T ransactions on P attern Analysis and Mac hine Intellig ence , v ol. 35, pp. 2104– 2116, 2013. [7] J. Zhou, J. Chen, and J. Y e, “Clustered multi-task learn- ing via alternating structure optimization, ” Advances in Neural Information Processing Systems , vol. 25, 2011. [8] J. Zhou, L. Y uan, J. Liu, and J. Y e, “ A multi-task learn- ing formulation for predicting disease progression, ” in 17th ACM SIGKDD international Conf. on Knowledge discovery and Data Mining , 2011, pp. 814–822. [9] F . Bach, R. Jenatton, J. Mairal, and G. Obozinski, “Structured sparsity through con vex optimization, ” Sta- tistical Science , vol. 27, pp. 450–468, 2012. [10] Y .C. Eldar and H. Bolcskei, “Block-sparsity: Coherence and efﬁcient recovery , ” in IEEE International Conf. on Acoustics, Speech and Signal Pr ocessing (ICASSP) , 2009, pp. 2885–2888. [11] J. Huang, T . Zhang, and D. Metaxas, “Learning with structured sparsity , ” The Jour . of Machine Learning Re- sear ch , v ol. 12, pp. 3371–3412, 2011. [12] C.A. Micchelli, J.M. Morales, and M. Pontil, “Regu- larizers for structured sparsity , ” Advances in Computa- tional Math. , pp. 1–35, 2010. [13] J. Mairal, R. Jenatton, G. Obozinski, and F . Bach, “Con- ve x and network ﬂow algorithms for structured spar- sity , ” The Jour . of Machine Learning Researc h , vol. 12, pp. 2681–2720, 2011. [14] R. Tibshirani, “Regression shrinkage and selection via the lasso, ” Jour . of the Royal Statistical Society (B) , pp. 267–288, 1996. [15] N. Simon, J. Friedman, T . Hastie, and R. Tibshirani, “The sparse-group lasso, ” Jour . of Comput. and Graph- ical Statistics , 2012, to appear . [16] H. Zou and T . Hastie, “Regularization and variable se- lection via the elastic net, ” J our . of the Royal Statistical Society (B) , vol. 67, pp. 301–320, 2005. [17] R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, and K. Knight, “Sparsity and smoothness via the fused lasso, ” Jour . of the Royal Statistical Society (B) , vol. 67, pp. 91–108, 2004. [18] H.D. Bondell and B.J. Reich, “Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR, ” Biometrics , vol. 64, pp. 115–123, 2007. [19] Z.J. Daye and X.J. Jeng, “Shrinkage and model se- lection with correlated variables via weighted fusion, ” Computational Statistics & Data Analysis , vol. 53, pp. 1284–1298, 2009. [20] S. Kim, K.A. Sohn, and E.P . Xing, “ A multiv ariate re- gression approach to association analysis of a quantita- tiv e trait network, ” Bioinformatics , vol. 25, pp. i204– i212, 2009. [21] Q. Li and N. Lin, “The bayesian elastic net, ” Bayesian Analysis , vol. 5, pp. 151–170, 2010. [22] X. Shen and H.C. Huang, “Grouping pursuit through a regularization solution surface, ” J our . of the American Statistical Assoc. , vol. 105, pp. 727–739, 2010. [23] L.W . Zhong and J.T . Kwok, “Efﬁcient sparse modeling with automatic feature grouping, ” IEEE T rans. on Neu- ral Networks and Learning Systems , vol. 23, pp. 1436– 1447, 2012. [24] X. Zeng and M.A.T . Figueiredo, “Solving OSCAR reg- ularization problems by proximal splitting algorithms, ” arXiv preprint arxiv .org/abs/1309.6301 , 2013. [25] A. Beck and M. T eboulle, “ A fast iterativ e shrinkage- thresholding algorithm for linear in verse problems, ” SIAM Jour . on Imaging Sciences , vol. 2, pp. 183–202, 2009. [26] J.M. Bioucas-Dias and M.A.T . Figueiredo, “ A ne w twist: two-step iterative shrinkage/thresholding algo- rithms for image restoration, ” IEEE T rans. on Image Pr ocessing , vol. 16, pp. 2992–3004, 2007. [27] S.J. Wright, R.D. No wak, and M.A.T . Figueiredo, “Sparse reconstruction by separable approximation, ” IEEE T rans. on Signal Processing , vol. 57, pp. 2479– 2493, 2009. [28] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers, ” F ounda- tions and T rends in Machine Learning , vol. 3, pp. 1–122, 2011. [29] T . Goldstein and S. Osher , “The split Bregman method for l1-regularized problems, ” SIAM Jour . on Imaging Sciences , vol. 2, pp. 323–343, 2009. [30] A. Chambolle and T . Pock, “ A ﬁrst-order primal-dual al- gorithm for con vex problems with applications to imag- ing, ” Jour . of Math. Imaging and V ision , vol. 40, pp. 120–145, 2011. [31] H.H. Bauschke and P .L. Combettes, Conve x analy- sis and monotone operator theory in Hilbert spaces , Springer , 2011. [32] J. Barzilai and J.M. Borwein, “T wo-point step size gra- dient methods, ” IMA J our . of Numerical Analysis , vol. 8, pp. 141–148, 1988.

Group-sparse Matrix Recovery

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment