Max-Min Distance Nonnegative Matrix Factorization
Nonnegative Matrix Factorization (NMF) has been a popular representation method for pattern classification problem. It tries to decompose a nonnegative matrix of data samples as the product of a nonnegative basic matrix and a nonnegative coefficient …
Authors: Jim Jing-Yan Wang
Max-Min Distance Nonnegati v e Matrix F actor izatio n Jim Jing-Y an W ang a a University at Buffalo, The State University of New Y ork, Buffa lo, NY 14203, USA Abstract Nonnegative Matrix F actorizatio n (NMF) has been a p opular representation metho d for pattern c la ssification pro blem. It tries to deco mpo se a no nnega- tive matrix of data samples a s the pr o duct of a nonneg ative basic matrix and a nonneg a tive co efficie nt matr ix, a nd the co efficient matrix is used as the new representation. How ev er, traditional N MF methods ignor e the class labels of the data samples. In this pap er, we proposed a super v ised no v el NMF algo- rithm to impro ve the discriminative ability of the new repres e ntation. Using the class labels, w e separate all t he data sample pairs in to wit hin-class pairs and b etw een-class pairs. T o improv e the discriminate ability of the new NMF representations, w e hope t hat the maximum distance of the within-class pairs pairs in the new NMF space could be minimized, while the minimu m distance of the betw een-class pairs pair s could be maximized. With this cr iter ia, we con- struct an ob jective function and optimize it with regard to basic a nd coefficient matrices, and sla ck v ariables alternatively , resulting in a itera tiv e algorithm. Keywor ds: Nonnegative Matrix F actorization., Max-Min Dista nce Analysis 1. Intr o duction 1 Nonnegative matrix factorization (NMF) [1, 2] has attracted m uc h attent ion 2 from b oth res earch and engine e r ing comm unities. Given a data matrix with 3 all its elements nonnegative, NMF tries to decompo se it as a product of t wo 4 nonnegative low-rank matrices . One matrix can b e r egarded as a basic matrix 5 with its columns as basic vectors, and the the o ther one as linear c o mb ination 6 co efficient matrix so that the or ig inal data columns in the o riginal ma trix c o uld 7 be repr e sent ed a s the linear con tamination o f the basic v ectors. Because of 8 the nonneg ative constr ains on b oth the factoriz ation metrics, it only allow the 9 additive linear co mbination, and th us a parts-based representation co uld be 10 archiv ed [3]. Since the origina l NMF approach w as prop osed by Seung and Lee 11 [4, 2], due to its abilit y to learn the pa rts of the data set [5], it has b een used 12 as a effectiv e data representation metho d in v a rious proble ms , such as pattern 13 Email addr ess: jim jywang@gma il.com (Jim Jing-Y an W ang) Pr eprint submitte d to Patte rn R ec o gnition Octob er 17, 201 8 recognition [6], computer vision [7], bioinformatics [8 ], etc. The most p opular 14 application of NMF as a data repres ent ation problem is pattern recognition, 15 where the nonnegative feature v ectors o f the da ta samples are orga nized as a 16 nonnegative matrix, and the columns of the co efficient matrix a r e us ed as the 1 new low-dimension representations. 2 Among the patter n recognition pr oblems, when NMF is per fo rmed to the 3 data matrix, it is usually assumed that the cla ss la bels of the data samples are 4 not a v a ilable, making it an unsup ervised problem. Such typical a pplication is 5 clustering of images and do cumen ts [3, 9]. How ever, in real world sup erv ised or 6 semi-sup ervised clas sification applications, the class lab els o f the training data 7 samples a re usually av ailable, which is ignore d b y mos t exis ted NMF meth- 8 o ds. If the cla ss lab el information could be unutilized dur ing the repr esentation 9 pro cedure, the discriminative a bility of th e repres en tation could b e improv ed 10 significantly . T o this end, so me s up er v ised and semi-sup ervise d NMF ar e pro- 11 po sed. F or example, W ang a nd Jia [10] pro po sed the Fisher nonnegative matrix 12 factorization (FNMF) method to encode discrimination information for a c la s- 13 sification pr oblem by imp osing Fisher c onstraints on the NMF algorithm. Lee 14 et a l. [11] pro po sed the semi-sup ervised nonnegative ma tr ix factoriz a tion (SS- 15 NMF) b y jointly incorp ora ting the data matr ix and the (partia l) class la be l 16 matrix in to NMF. Most r ecently , Liu et al. [9 ] the co nstrained nonnegative ma- 17 trix factorizatio n (CNMF) by incorp orateing the lab el info r mation as a dditio nal 18 constraints. 19 In this paper , we prop ose to a nov el sup ervised NMF metho d, b y exploring 20 the c la ss la bel information and using it to co nstrain the coefficient vectors of 21 the data sa mples. W e consider the data sample pairs , and the class lab els of the 22 samples allow us to separa te the pair s to tw o types — the within-class pair and 23 the betw een-class pair. T o improv e the discr iminate abilit y of the coefficient 24 vectors of the samples, we cons ider the distance b etw een the co efficient vectors 25 of each s ample pairs, and try to minimize that of the the within-clas s pairs, 26 while maximize the that o f the b etw een-class pairs. In this wa y , the c o e fficien t 27 vectors of data sa mples of the same class ca n be gathered, while that of differ- 28 ent classes can b e separated. One pro blem is how to assign different weigh ts 29 to differen pairs in the ob jectiv e function. T o a void th is problem, w e apply 30 a strateg y similar to max-min dista nce analysis [12]. The maxim um within- 31 class pair coe fficie nt vector distance is minimized, so that all the within-class 32 pair coefficient v ector distance can b e minimized als o. Meanwhile the m ini- 33 m um betw een-class pair c o e fficien t vector dista nce is maximized, so that all 34 the b etw een-class pair co efficient vector dis ta nce can b e maximized also . W e 35 construct a no vel o b jectiv e function for NMF to impo sing bo th the maximum 36 within-class pair distance minimizatio n a nd the minim um b etw een-class pair 37 distance maximization problems. By optimizing it with an alternative str a tegy , 38 we dev elop an iterative algorithm. The proposed metho d is called Max-Min 39 Distance NMF (MMDN MF). 40 The remaining parts of this pap er is organized as follows: In section 2, we 41 int ro duce the no vel NMF metho d. In section ?? , the exp erimental results ar e 42 given to verify the effectiveness of the problem metho d. The pap er is concluded 43 2 in section 3. 44 2. Max-Mi n Distance NMF 1 In this section, we fir st formulate the problem with and ob jectiv e function, 2 and then optimize it to obtain the iterative algorithm. 3 2.1. Obje ctive function 4 Suppo se we ha v e n data samples in the training set X = { x i } n i =1 , where 5 x i ∈ R d + is the d -dimensional nonnegative feature vector of the i -th sample, we 6 organize the samples as a nonnega tive matrix X = [ x 1 , · · · , x n ] ∈ R d × n + . The 7 i -th column of the ma trix X is the feature vector of the i -th sample. Their 8 corres p onding class lab el set is denoted a s { y i } n i =1 , where y i ∈ Y is the class 9 lab el of the i -th sample, and Y is the class lab el s pace. NMF a ims to find t w o 10 low rank nonnegative matrices U ∈ R d × m + and V ∈ R m × n + , where m ≤ d , so 11 that t he product of them, U V , could a ppr oximate the or iginal matrix, X , as 12 accurate as p oss ible 13 X ≈ U V (1) The m columns of matrix U could be reg a rded a s m basic vectors, a nd eac h 14 sample x i could b e represented as the no nneg ative linear c ombin ation of thes e 15 basic vectors. The linea r c ombin ation co efficient v ector of x i is the i -th column 16 vector v i ∈ R m + of V . W e can also regar d v i as the new lo w-dimensional pre- 17 sentation v ector of x i with regard to the ba sic matrix U . T o seek the optimal 18 matrices U and V , we consider the following pr oblems to c onstruct our o b jectiv e 19 function: 20 • T o reduce the approximation error b etw een X and U V , the squar ed L 2 21 distance b etw een them is usua lly minimized with regard to U and V as 22 follows, 23 min U,V k X − U V k 2 2 s.t. U ≥ 0 , V ≥ 0 (2) • W e consider the training sample pa ir s in the tra ining set, and s eparate 24 them to tw o pair sets — the within-class pair set W , and the betw een- 25 class pa ir set B . The within-class pair set is defined a s the set of sa mple 26 pair belonging to the sa me class, i.e., W = { ( i, j ) | y i = y j , x i , x j ∈ X } . 27 The betw een-class pair set is defined as the set of sample pair s b elong ing 28 to differen t classes, i.e., B = { ( i, j ) | y i 6 = y j , x i , x j ∈ X } . T o the tw o 29 samples of the ( i , j )-th pair in the new co efficient v ector space, w e use 30 the squar ed L 2 norm distance b etw een their co e fficien t vectors, k v i − 31 v j k 2 2 . Apparently , to improv e the discriminate ability of the new NMF 32 presentation, the co efficient v ector distance of within-clas s pairs should 33 3 be minimized while that of the betw een-class pairs should b e maximized. 34 Instead of consider ing all the pa irs, w e directly minimize the maxim um 35 co efficient vector distanc e of within-class pairs, a nd thus we duly consider s 1 the aggreg ation of all within-cla ss pairs, as follows, 2 min V max ( i,j ) ∈W k v i − v j k 2 2 s.t. V ≥ 0 (3) Meanwhile, we a lso maximize the minim um co e fficien t vector distance of 3 betw een-class pair s, and thus we consider the separ ation of all b etw een- 4 class pairs, as follows, 5 max V min ( i,j ) ∈B k v i − v j k 2 2 s.t. V ≥ 0 (4) In this way , the maximum within-class pair distance is minimized, so that 6 all the within-class pair distances a re also minimized. Similarly , the min- 7 im um betw een-class pair dista nce is maximized, so that a ll the betw een- 8 class pair distances are also minimized. 9 T o for mu late our problem, we c o m bine the pro blems in (2), (3) and (4), and 10 prop ose the novel optimization pr oblem for NMF as 11 min U,V k X − U V k 2 2 + a max ( i,j ) ∈W k v i − v j k 2 2 − b min ( i,j ) ∈B k v i − v j k 2 2 s.t U ≥ 0 , V ≥ 0 (5) where a a nd b ar e the trade-o ff parameters. It s hould b e noted that in (5), 12 the maximiza tion and minimization problem ar e coupled, making it difficult to 13 optimize. T o solve this problem, we introduce to nonnegative slake v ariables 14 ε ≥ 0 a nd ζ ≥ 0 to repres ent the maximum co efficient vector distance b etw een 15 all within-class pairs, and the minim um coe fficien t vector dis tance b etw een all 16 betw een-class pairs. In this wa y , (5) could b e rewr itten as 17 min U,V ,ε,ζ k X − U V k 2 2 + aε − bζ s.t. k v i − v j k 2 2 ≤ ε, ∀ ( i, j ) ∈ W , k v i − v j k 2 2 ≥ ζ , ∀ ( i, j ) ∈ B , U ≥ 0 , V ≥ 0 , ε ≥ 0 , ζ ≥ 0 . (6) In this problem, the tw o sla ke v ariables a re also optimized with the bas ic matr ix 18 U and the co efficient matrix V . 19 4 2.2. Optimization 1 T o solve the problem in tro duce in (6), we come up with the Lagrange func- 2 tion a s follows, 3 L ( U, V , ε, ζ , λ ij , ξ ij , Σ , Υ , φ, ϕ ) = k X − U V k 2 2 + aε − bζ + X ( i,j ) ∈W λ ij k v i − v j k 2 2 − ε − X ( i,j ) ∈B ξ ij k v i − v j k 2 2 − ζ − T r (Σ U ⊤ ) − T r (Υ V ⊤ ) − φε − ϕζ (7) where λ ij ≥ 0 is the Lagr a nge multiplier for the co nstrain k v i − v j k 2 2 ≤ ε , ξ ij ≥ 0 4 is the Lagr a nge multiplier for the constra in k v i − v j k 2 2 ≥ ζ , Σ ∈ R d × m + is the 5 Lagra ng e mult iplier matrix f or U ≥ 0, Υ ∈ R m × n + is the Lag range multiplier 6 matrix for V ≥ 0 , φ ≥ 0 is the La grange multiplier for ε ≥ 0, and ϕ ≥ 0 is the 7 Lagra ng e m ultiplier for ζ ≥ 0. According to the dualit y theory of optimization 8 [13], the o ptimal so lution co uld b e achieved by solv ing the following problem, 9 max λ ij , ξ ij , Σ , Υ , φ, ϕ min U,V ,ε,ζ L ( U , V , ε, ζ , λ ij , ξ ij , Σ , Υ , φ, ϕ ) s.t. λ ij ≥ 0 , ∀ ( i, j ) ∈ W , ξ ij ≥ 0 , ∀ ( i, j ) ∈ B , Σ ≥ 0 , Υ ≥ 0 , φ ≥ 0 , ϕ ≥ 0 . (8) By s ubstituting (7 ) to (8), we obtain the following problem, 10 max λ ij , ξ ij , Σ , Υ , φ, ϕ min U,V ,ε,ζ k X − U V k 2 2 + aε − bζ + X ( i,j ) ∈W λ ij k v i − v j k 2 2 − ε − X ( i,j ) ∈B ξ ij k v i − v j k 2 2 − ζ − T r (Σ U ⊤ ) − T r (Υ V ⊤ ) − φε − ϕζ s.t. λ ij ≥ 0 , ∀ ( i, j ) ∈ W , ξ ij ≥ 0 , ∀ ( i, j ) ∈ B , Σ ≥ 0 , Υ ≥ 0 , φ ≥ 0 , ϕ ≥ 0 . (9) This problem is difficult to optimize dir ectly . Instead of so lving it with rega rd 11 to all the v ar iables sim ultaneously , w e adopt an alter na te optimiza tio n strategy 12 [14]. The NMF fac to rization matrices U and V , slac k v aria bles φ and ϕ , a nd 13 5 the L a grange m ultipliers λ ij and ξ ij are updated alternativ ely in an iterativ e 1 algorithm. When one v a r iable is optimized, other v ariables ar e fixed. 2 2.2.1. Optimizing U a nd V 3 By fixing other v ariables and remo ving the terms irrelev an t to U or V , the 4 optimization problem in (9) is reduced to 5 max Σ , Υ min U,V k X − U V k 2 2 + X ( i,j ) ∈W λ ij k v i − v j k 2 2 − X ( i,j ) ∈B ξ ij k v i − v j k 2 2 − T r (Σ U ⊤ ) − T r (Υ V ⊤ ) = T r ( X X ⊤ ) − 2 T r ( X V ⊤ U ⊤ ) + T r ( U V V ⊤ U ⊤ ) + 2 T r V ( D − Λ) V ⊤ − 2 T r V ( E − Ξ) V ⊤ − T r (Σ U ⊤ ) − T r (Υ V ⊤ ) s.t. Σ ≥ 0 , Υ ≥ 0 . (10) where Λ ∈ R n × n + and Ξ ∈ R n × n + with 6 Λ ij = λ ij , if ( i, j ) ∈ W 0 , otherw ise , Ξ ij = ξ ij , if ( i, j ) ∈ B 0 , otherw ise (11) D ∈ R n × n + is a dia gonal matrix whose entries are column sums o f Λ , D ii = 7 P i Λ ij , and E ∈ R n × n + is a diago na l matrix whose entries a re co lumn sums of 8 Ξ, E ii = P i Ξ ij . T o solve this pr oblem, w e set the partial deriv ativ es of the 9 ob jective function in (10) with resp ect to U and V to zero, and we hav e 10 − 2 X V ⊤ + 2 U V V ⊤ − Σ = 0 − 2 U ⊤ X + 2 U ⊤ U V + 2 V ( D − Λ) − 2 V ( E − Ξ) − Υ = 0 (12) Using the KK T co nditions [Σ] ◦ [ U ] = and [Υ] ◦ [ V ] = 0 [15], where [ ] ◦ [ ] denotes 11 the element-wise pro duct betw een tw o matrices, we g et the following equatio ns 12 for U and V : 13 − [ X V ⊤ ] ◦ [ U ] + [ U V V ⊤ ] ◦ [ U ] = 0 − [ U ⊤ X ] ◦ [ V ] + [ U ⊤ U V ] ◦ [ V ] + [ V ( D − Λ )] ◦ [ V ] − [ V ( E − Ξ)] ◦ [ V ] = 0 (13) which lead to the following up dating rules: 14 U ← [ X V ⊤ ] [ U V V ⊤ ] ◦ [ U ] V ← [ U ⊤ X + V Λ + V E ] [ U ⊤ U V + V D + V Ξ] ◦ [ V ] (14) where [ ] [ ] is the element-wise matr ix divisio n op er a tor. 15 6 2.2.2. Optimizing ε and ζ 1 By removing terms irr elev ant to ε and ζ and fixing all other v ar ia bles, we 2 hav e the following optimization problem with r e gard to only ε and ζ : 3 max φ,ϕ min ε,ζ aε − b ζ − X ( i,j ) ∈W λ ij ε + X ( i,j ) ∈B ξ ij ζ − φε − ϕζ s.t. φ ≥ 0 , ϕ ≥ 0 . (15) By s etting the partial deriv ativ es of the ob jective function in (15) with resp ect 4 to ε a nd ζ to zer o, we hav e 5 a − X ( i,j ) ∈W λ ij − φ = 0 − b + X ( i,j ) ∈B ξ ij − ϕ = 0 (16) Using the KKT conditions φε = 0 and ϕζ = 0, w e ge t the follo wing equa tions 6 for ε and ζ : 7 aε − X ( i,j ) ∈W λ ij ε = 0 − bζ + X ( i,j ) ∈B ξ ij ζ = 0 (17) which lead to the following up dating rules: 8 ε ← P ( i,j ) ∈W λ ij a ε ζ ← b P ( i,j ) ∈B ξ ij ζ (18) 2.2.3. Optimizing λ ij and ξ ij 9 Based on (16 ), w e have the following c onstrains fo r λ ij and ξ ij , 10 φ = a − X ( i,j ) ∈W λ ij ≥ 0 ⇒ X ( i,j ) ∈W λ ij ≤ a, ϕ = − b + X ( i,j ) ∈B ξ ij ≥ 0 ⇒ X ( i,j ) ∈B ξ ij ≥ b. (19) By Consider ing these constrains, fixing other v ariables and removing ter ms ir - 11 relev an t to λ ij and ξ ij from (8) , w e have the follo wing problem with regard to 12 λ ij and ξ ij , 13 7 max λ ij ,ξ ij X ( i,j ) ∈W λ ij k v i − v j k 2 2 − ε − X ( i,j ) ∈B ξ ij k v i − v j k 2 2 − ζ s.t.λ ij ≥ 0 , ∀ ( i, j ) ∈ W , ξ ij ≥ 0 , ∀ ( i, j ) ∈ B , X ( i,j ) ∈W λ ij ≤ a, X ( i,j ) ∈B ξ ij ≥ b. (20) This pr oblem ca n be solved as a linear programming (LP) pro blem. 1 3. Conclus ion 2 In this pa per , we inv estigate how to use the class lab els o f the data samples to 3 improv e the discriminative a bilit y of their NMF representations. T o explor e the 4 class la bel informa tion of the data samples, we co nsider the within-class sample 5 pairs with the s ame c la ss labels, a nd als o the b etw een-class sample pairs w ith 6 different cla ss lab els. Apparently , in the NMF representation space , w e need to 7 minimize the dis tances b etw een the within-clas s pairs, and also maximize the 8 distances bet ween the betw een-class pairs. Inspired b y the max-min distance 9 analysis [12], we also co nsider the extr eme situatio n: we pick up the maximum 10 within-class distance and then try to minimize it, so that all the within-cla ss dis- 11 tances are also minimized, and we pick up the minim um b et ween-class dista nce 12 and then maximize it, so that all the b etw een-class distances are maximized. 13 Different ly from the max-min distance analysis , which o nly pic k up the maxi- 14 mize the b etw een-class distance and minimize it, w e consider the b etw een-class 15 and within clas s distances dually . 16 References 17 [1] C.-J . Lin, Pr o jected gradient metho ds for nonnegative matrix facto rization, 18 Neural computation 19 (10 ) (2007) 27 56–27 79. 19 [2] D. Seung, L. Lee, Algorithms for no n-negative matr ix f actoriza tion, Ad- 20 v ances in neural information pro cessing systems 13 (2 001) 55 6–562 . 21 [3] D. Cai, X. He, J. H an, T. S. Huang, Graph regula rized nonnegative ma- 22 trix factoriza tion f or data repr esentation, P attern Analysis and Machine 2 3 Int elligence, IEE E T ransactio ns on 3 3 (8 ) (2011 ) 15 4 8–156 0. 24 [4] D. D. Lee, H. S. Seung, Lea rning the parts of ob jects b y non-negative 25 matrix factorization, Nature 401 (67 55) (1999) 78 8–791 . 26 [5] S. Z . Li, X. W. Hou, H. J. Zhang, Q. S. Cheng, Lea rning spatially lo ca liz e d, 27 parts-base d repr esentation, in: Computer Vision and Pattern Recog nition, 28 2001. CVPR 2001. Pro ce edings o f the 2 001 IEE E Computer So ciety Con- 29 ference on, V ol. 1, IEEE , 2001 , pp. I–207. 30 8 [6] P . O . Hoy er, No n- negative matrix fa ctorization with sparse ne s s co nstraints, 1 The Journal of Ma chine Learning Rese a rch 5 (200 4) 1 457–1 469. 2 [7] A. Shas h ua, T. Hazan, Non-negative tensor factoriz a tion with applications 3 to statistics and computer visio n, in: Pro ceedings of the 22nd international 4 conference on Machine learning, A CM, 2005, pp. 7 9 2–799 . 5 [8] Y. Gao, G. Ch urch, Imp roving molecular cancer class disco very through 6 sparse non-negative matrix factoriza tio n, Bio informatics 21 (21) (2005 ) 7 3970– 3975. 8 [9] H. Liu, Z. W u, X. Li, D. Cai, T. S. Huang, Constrained nonneg ative ma- 9 trix factorization for ima ge repr esentation, Pattern Analysis and Machine 10 Int elligence, IEE E T ransactio ns on 3 4 (7 ) (2012 ) 12 9 9–131 1. 11 [10] Y. W a ng, Y. J ia, Fisher non-negative matrix factor ization for le a rning lo ca l 12 features, in: In Pro c. Asian Conf. on Comp. Vision, Citeseer, 2004 . 13 [11] H. Lee, J . Y o o, S. Choi, Semi-supe r vised nonnegative matrix facto rization, 14 Signal Pro cessing Letters, IE EE 1 7 (1) (201 0) 4–7 . 15 [12] W. Bian, D. T ao , Max- min distance ana lysis by using sequential sdp relax- 16 ation for dimension reductio n, P attern Analysis and Machine Intelligence, 17 IEEE T ransactions on 33 (5) (2011) 1 037–1 050. 18 [13] W. E. Diew ert, Applications o f duality theory , Stanford Institute for Math- 19 ematical Studies in the So cial Science s , 19 74. 20 [14] F. Lo otsma , Alternative o ptimization strategies for large-sca le pro duction- 21 allo cation pro blems, E urop ean Journal of Oper ational Research 75 (1) 22 (1994) 13–4 0. 23 [15] F. R. Bac h, G. R. La nckriet, M. I. Jor dan, Multiple k ernel learning, conic 24 duality , and the smo algorithm, in: Pr o c eedings of the tw en ty-first in ter- 25 national conference on Ma chine lea rning, A CM, 2004, p. 6. 26 9
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment