A Fast Factorization-based Approach to Robust PCA

A F ast F actorization-based A ppr oach to Robust PCA Chong Peng, Zhao Kang, and Qiang Cheng Department of Computer Science, Southern Illinois University ,Carbondale, IL 62901 USA Email: { pchong,zhao.kang ,qcheng } @siu.edu Abstract —Robust principal component analysis (RPCA) has been widely used for r ecovering lo w-rank matrices in many data mining and machine learning problems. It separates a data matrix into a low-rank part and a sparse part. The con vex approach has been well studied in the literature. Howev er , state-of-the-art algorithms for the conv ex appr oach usually hav e relativ ely high complexity due to the need of solving (partial) singular value decompositions of large matrices. A non-con vex approach, AltProj, has also been proposed with lighter complexity and better scalability . Given the true rank r of the underlying low rank matrix, AltPr oj has a complexity of O ( r 2 dn ) , wher e d × n is the size of data matrix. In this paper , we propose a novel factorization-based model of RPCA, which has a complexity of O ( k dn ) , wher e k is an upper bound of the true rank. Our method does not need the precise value of the true rank. Fr om extensive experiments, we observ e that AltProj can work only when r is precisely known in advance; howev er , when the needed rank parameter r is speciﬁed to a value different from the true rank, AltProj cannot fully separate the two parts while our method succeeds. Even when both work, our method is about 4 times faster than AltProj. Our method can be used as a light-weight, scalable tool for RPCA in the absence of the precise value of the true rank. Keyw ords -rob ust principal component analysis; scalable; fac- torization; non-con vex I . I N T R O D U C T I O N Principal component analysis (PCA) is a fundamental technique of exploratory data analysis and has been widely used in many data mining tasks. Given a data matrix X ∈ R d × n , the classic PCA seeks the best rank- k approximation with complexity O ( k dn ) . It is well known that PCA is sensitiv e to outliers. T o combat this drawback, in the past decade, a number of approaches to rob ust PCA (RPCA) hav e been proposed, including alternating minimization [1], random sampling techniques [2, 3], multiv ariate trimming [4], and others [5, 6]. More recently , a new type of RPCA method has emer ged and become popular [7, 8]. It assumes that X can be separated into two parts, i.e., a low-rank L and a sparse S , by solving the following problem: min L,S rank ( L ) + λ k S k 0 , s.t. X = L + S, (1) where k ·k 0 is the ` 0 (pseudo) norm which counts the number of nonzero elements of a matrix, and λ > 0 is a balancing parameter . Because minimizing the rank and the ` 0 norm in (1) is generally NP-hard, in practice, (1) is often relaxed to the following conv ex optimization problem [8]: min L,S k L k ∗ + λ k S k 1 , s.t. X = L + S, (2) where k L k ∗ = P min ( d,n ) i =1 σ i ( L ) is the nuclear norm of L with σ i ( L ) denoting the i -th largest singular value of L , and k S k 1 = P ij | S ij | is the ` 1 norm. Theoretically , under some mild conditions, (2) can exactly separate L with the true rank r from S . A number of algorithms have been de veloped to solve (2), including singular value thresholding (SVT) [9], accelerated proximal gradient (APG) [10], and two versions of augmented Lagrange multipliers (ALM) based approaches [11]: exact ALM and inexact ALM (IALM). Among these algorithms, the ALM based are the state-of-the-art ones for solving (2), which need to compute SVDs of d × n matrices per iteration. T o improv e efﬁciency , another ALM based algorithm adopts PR OP A CK package [12], which solves only partial SVDs instead of full SVDs. Howe ver , this is still computationally costly when d and n are both large. Despite the eleg ant theory of the con v ex RPCA in (2), it has tw o major drawbacks: 1) When the underlying matrix has no incoherence guarantee [8], or the data get grossly corrupted, the results may be far from the underlying true ones; 2) The nuclear norm may lead to biased estimation of the rank [13]. T o combat these dra wbacks, [14] ﬁx es the rank of L as a hard constraint, while [13] uses a non-con vex rank approximation to more accurately approximate the rank of L , which needs to solve full SVDs. [14, 15] need only to solve partial SVDs, which signiﬁcantly reduces the complexity compared to computation of full SVD; for example, AltProj has a complexity o f O ( r 2 dn ) [15]. Howe ver , if r is not known a priori, [14, 15] may fail to recover L correctly . T o further reduce the complexity , enhance the scalability and alleviate the dependence on the knowledge of r , in this paper , we propose a f actorization-based model for RPCA, where L can be decomposed as U C V T with U ∈ R d × k , V ∈ R n × k , C ∈ R k × k , and k  min ( d, n ) . This model relaxes the requirement on a priori knowledge of the rank of L , assuming only that it is upper bounded by k . Scalable algorithms are developed to optimize our models efﬁciently . After acceptance of this paper, it came to our attention that [16] also proposed a factorization-based approach for such a problem. Howe ver , our model is distinct from [16]. Our model has two variants, one of which uses an explicit rank constraint by a matrix factorization in the case that the ground truth rank is known, while the other uses non-con ve x rank approximation in the case that the ground truth rank is unknown. Both v ariants of our model differ starkly from [16], which only considers the second case and simply uses the nuclear norm. W e summarize the contributions of this paper as follows: W e propose a factorization-based model for RPCA, allowing the recov ery of the low-rank component with or without a priori knowledge of its true rank. In the absence of true rank, a non-con ve x rank approximation is adopted, which can approximate the rank of a matrix more accurately than the nuclear norm. Efﬁcient ALM-type optimization algorithms are de veloped with scalability in both d and n . This is contrasted to AltProj whose cost is O ( r 2 dn ) . This difference is important when d and n are lar ge. Empirically , e xtensiv e experiments testify to the ef fectiv eness of our model and algorithms both quantitati vely and qualitativ ely in v arious applications. I I . R E L A T E D W O R K The con ve x RPCA (2) has been thoroughly studied [9]. T o exploit the example-wise sparsity of the sparse component, ` 2 , 1 norm has been adopted [17, 18]: min L,S k L k ∗ + λ k S k 2 , 1 , s.t. X = L + S, (3) where k · k 2 , 1 is deﬁned to be the sum of ` 2 norms of column vectors of a matrix. When a matrix has lar ge singular values, the nuclear norm may be far from an accurate approximation of the rank. Non-conv ex rank approximations hav e been considered in a number of applications, such as subspace clustering [19, 20]. A non-con vex rank approxima- tion has also been used in RPCA [13], which replaces the nuclear norm in (3) by a non-con ve x rank approximation k L k γ = P i (1+ γ ) σ i ( L ) γ + σ i ( L ) , with γ > 0 , and σ i ( L ) being the i -th largest singular value of L . The above approaches usually need to solve SVDs. When the matrix in volved is large, the computation of SVD, in general, is intensi ve. T o reduce the complexity of RPCA, se veral approaches hav e been attempted. For example, AlgProj [15] uses non-conv ex alternating minimization techniques in RPCA and admits O ( r 2 dn ) complexity . [16] uses a factorization approach: min U,V ,S k V k ∗ + λ k S k 1 , s.t.X = U V T + S, U T U = I , (4) which solves SVDs of thin matrices and hence admits scalability , where I is identity matrix with proper size. I I I . F A S T F AC T O R I Z A T I O N - B A S E D R P C A In this section, we formulate the Fast Factorization-based RPCA (FFP). W e model the data as X = U C V T + S , where S is a sparse component and L = U C V T is a low- rank approximation of X with C ∈ R k × k being the core matrix, U ∈ R d × k , V ∈ R n × k satisfying U T U = V T V = I with I being an identity matrix of a proper size. It is seen that the factorization provides a natural upper bound for the rank of the low-rank component of X . The upper bound, which is k , can be used to relax the stringent requirement on the knowledge of the true rank by AltProj algorithm. T o capture the sparse structure of S , we adopt ` 1 norm to obtain sparsity . Thus, we consider the following objectiv e function: min S,C,U T U = I ,V T V = I k S k 1 s.t. X = U C V T + S. (5) For many applications, the true rank of L , which is r , is known. In the presence of this prior knowledge, we let k = r in 5. Ho wev er , when this information on r is not present, 5 may lack the desired capability of recovering L with a (unknown) rank of r while an arbitrary k used. In this situation, we propose to marry the advantages of the con vex and ﬁxed-rank RPCA approaches, yielding the following optimization problem: min S,C,U,V k S k 1 + λ k U C V T k ∗ s.t. X = U C V T + S, U T U = I , V T V = I , (6) where λ > 0 is a balancing parameter . Even in the case that knowledge of the precise v alue of r is not av ailable, a proper k ≥ r can still be chosen because, in the worst case, we may let k = min( d, n ) . Oftentimes, with domain information on the application at hand, an upper bound k  min ( d, n ) can be obtained. It has been shown that the nuclear norm can not approximate the true rank well if there are dominant singular values in a matrix, and non- con ve x rank approximations may help impro ve learning performance [13]. Here, we adopt a log-determinant rank approximation [20], k Y k ld = log det( I + ( Y T Y ) 1 2 ) , to obtain the following RPCA model: min S,C,U,V k S k 1 + λ k U C V T k ld s.t. X = U C V T + S, U T U = I , V T V = I , (7) Due to the fact that k U C V T k ld = log det( I + ( V C T U T U C V T ) 1 2 ) = log det( I + ( C T C ) 1 2 ) = k C k ld , (8) model 7 can be reduced to the following model: min S,C,U,V k S k 1 + λ k C k ld s.t. X = U C V T + S, U T U = I , V T V = I . (9) W e name models (5) and (9) Fixed Rank FFP (F-FFP) and Unﬁxed Rank FFP (U-FFP), respecti vely . I V . O P T I M I Z A T I O N The augmented Lagrange functions of (5) and (9) are min S,C,U T U = V T V = I k S k 1 + ρ 2 k X − U C V T − S + Θ /ρ k 2 F , (10) and min S,C,U,V k S k 1 + λ k C k ld + ρ 2 k X − U C V T − S + 1 ρ Θ k 2 F s.t. U T U = I , V T V = I , (11) respectiv ely . The deriv ations for optimization are similar to those in [20]. W e summarize the optimization in Algorithm 1. Here, we deﬁne P ( · ) and Q ( · ) to be the left and right singular vectors of a matrix from thin SVD, and deﬁne the operator D τ ( D ) = P ( D ) diag { σ ∗ i } ( Q ( D )) T , with σ ∗ i = ( ξ , if f i ( ξ ) ≤ f i (0) and (1 + σ i ( D )) 2 > 4 τ , 0 , otherwise, (12) with f i ( x ) = 1 2 ( x − σ i ( D )) 2 + τ log(1 + x ) , and ξ = σ i ( D ) − 1 2 + q (1+ σ i ( D )) 2 2 − τ . Algorithm 1 F-FFP for Solving 5 (and, U-FFP for Solving 9) 1: Input : X , k , λ , ρ , κ > 1 , t max 2: Initialize: S , U , V , Θ , ρ , and t = 0 . 3: repeat 4: S ij = ( | [ X − U C V T + Θ /ρ ] ij − 1 /ρ | ) sgn ([ X − U C V T + Θ /ρ ] ij ) 5: V = P (( X − S + Θ /ρ ) T U C )( Q (( X − S + Θ /ρ ) T U C )) T 6: U = P (( X − S + Θ /ρ ) V C T )( Q (( X − S + Θ /ρ ) V C T )) T 7: ( F or F-FFP ) C = U T ( X − S + Θ /ρ ) V 8: ( F or U-FFP ) C = D λ/ρ ( U T ( X − S + Θ /ρ ) V ) 9: Θ = Θ + ρ ( X − U C V T − S ) , ρ = ρκ 10: until t ≥ t max or conver gence 11: Output : S , U , V , C A. Complexity Analysis Giv en that k  min ( d, n ) , both F-FFP and U-FFP hav e complexity O ( ndk ) . V . E X P E R I M E N T S T o e valuate the proposed model and algorithms, we con- sider three important applications: foreground-background separation, shadow removal from face images, and anomaly detection. W e compare our algorithms with the state-of-the- art methods, including IALM 1 [8] and AltProj 2 , both of which make use of the PR OP A CK package [12] to solve SVDs for ef ﬁciency . All experiments in this section are conducted using Matlab on a dual-core Intel Xeon E3-1240 V2 3.40 GHz Linux Server with 8 GB memory . For purpose of reproductivity , we provide our codes on the website 3 . A. F or e gr ound-backgr ound separation Fore ground-background separation is to detect moving objects or interesting activities in a scene, and remove background(s) from a video sequence. For this task, we use 15 datasets, as listed in the ﬁrst column of T able I, among which the ﬁrst 11 contain a single background while the 1 http://perception.csl.illinois.edu/matrix- rank/sample code.html#RPCA. 2 http://www .personal.psu.edu/nsa10/codes.html. 3 https://www .researchgate.net/publication/308174615 codes icdm2016 rest 4 have 2 backgrounds 4 . In this case, we have r = 1 and 2 for the ﬁrst 11 and last 4 datasets, respectiv ely . F or each dataset, the data matrix is constructed by treating all vectorized frames as columns 5 . In the following, we test F-FFP and U-FFP in two cases according to whether r is known. 1) Case 1 ( r is known): W e set k = r for F-FFP and AltProj. W e terminate all methods after 200 iterations or when k X − L − S k F k X k F ≤ 10 − 3 is reached. For IALM, we ﬁx ρ = 0 . 0001 and κ = 1 . 5 for all these data sets for fast conv ergence and good visual quality . The balancing parameter is chosen as the theoretical one [8]. For Altproj, the default parameters are used. For F-FFP , we use the same ρ and κ as IALM. Without speciﬁcation, the parameter settings remain the same throughout this paper . The numerical results are reported in T able I. It is ob- served that IALM separates S more sparsely b ut fails to recov er L with lo w rank, while F-FFP and AltProj recover L properly . F-FFP generates more sparse S than AltProj for most of the datasets. All these methods have competitive ﬁtting error . Howe ver , it is possible for F-FFP to obtain more accurate ﬁtting if the same iterations or time is provided as IALM or AltProj. Furthermore, F-FFP needs the least amount of time on all these datasets. 2) Case 2 ( r is unknown): k is speciﬁed as a tight upper bound of r based on domain knowledge on the video. In this test, we set k = 5 on all datasets and compare U- FFP with AltProj and IALM 6 . For U-FFP , λ is chosen from 1 e { 6 , 7 , 8 , 9 } . W e sho w the numerical results in T able II. It is observed that U-FFP is still able to recover L with the true rank, whereas AltProj fails in this case. Besides, the time cost of U-FFP increases slightly by less than 1 second for most of the datasets while the time cost of AltProj increases by about 10-20 seconds. W e also show some video frames in Figure 1 and the visual results in Figure 2. It is observed that the backgrounds recov ered by IALM still hav e shadows; AltProj separates foreground and background well with k known, but the backgrounds are not clean when k is unknown; both F- FFP and U-FFP can successfully separate foreground and background from the video. B. Shadow r emoval fr om face images Learning face is an important topic in pattern recognition; howe ver , shadows make it more challenging. There are often shadows on face images due to varying lighting conditions [21]. Therefore, it is crucial to handle shadows, peculiarities 4 Datasets used in this subsection can be found at: http://perception.i2r .a- star .edu.sg/bk model/bk index.html http://limu.ait.kyushu- u.ac.jp/dataset/en/ http://wordpress- jodoin.dmi.usherb .ca/dataset2012/ http://research.microsoft.com/en- us/um/people/jckrumm/wallﬂower/testimages.htm. 5 For computational ease, do wn-sampling is performed on Camera Parameter, Highway , Ofﬁce, Shopping Mall, Pedestrian, and Time of Day data sets. 6 Here the results of IALM are not shown since they are the same as Case 1. T able I: Results on Different Datasets with r Kno wn Data Method Rank( L ) k S k 0 / ( dn ) k X − L − S k F k X k F # of Iter. # of SVDs Time Highway AltProj 1 0.9331 2.96e-4 37 38 49.65 IALM 539 0.8175 6.02e-4 12 13 269.10 F-FFP 1 0.8854 5.74e-4 24 24 14.83 Ofﬁce AltProj 1 0.8018 9.40e-4 51 52 84.43 IALM 374 0.7582 9.46e-4 11 12 230.53 F-FFP 1 0.8761 5.33e-4 24 24 19.92 PETS2006 AltProj 1 0.8590 5.20e-4 35 36 44.64 IALM 293 0.8649 5.63e-4 12 13 144.26 F-FFP 1 0.8675 5.61e-4 24 24 14.33 Shopping Mall AltProj 1 0.9853 3.91e-5 45 46 45.35 IALM 328 0.8158 9.37e-4 11 12 123.99 F-FFP 1 0.9122 7.70e-4 23 23 11.65 Pedestrian AltProj 1 0.5869 9.32e-4 41 42 37.90 IALM 35 0.8910 5.69e-4 11 12 36.18 F-FFP 1 0.6023 9.98e-4 22 22 10.53 Bootstrap AltProj 1 0.9747 1.17e-4 44 45 107.15 IALM 1146 0.8095 6.27e-4 12 13 1182.92 F-FFP 1 0.9288 7.71e-4 23 23 25.38 W ater Surface AltProj 1 0.8890 3.97e-4 47 48 27.27 IALM 224 0.7861 5.32e-4 12 13 51.00 F-FFP 1 0.8355 9.91e-4 23 23 5.68 Campus AltProj 1 0.9790 9.50e-5 41 42 54.1 IALM 488 0.8136 9.30e-4 11 12 242.59 F-FFP 1 0.9378 6.26e-4 23 23 12.85 Curtain AltProj 1 0.8280 7.46e-4 40 41 102.41 IALM 834 0.7398 6.84e-4 12 13 747.36 F-FFP 1 0.8680 6.28e-4 24 24 27.51 Fountain AltProj 1 0.9113 2.91e-4 50 51 23.90 IALM 102 0.8272 4.91e-4 12 13 25.62 F-FFP 1 0.8854 4.89e-4 24 24 5.00 Escalator Airport AltProj 1 0.9152 2.29e-4 40 41 110.75 IALM 957 0.7744 7.76e-4 11 12 1,040.91 F-FFP 1 0.8877 5.45e-4 23 23 30.78 Lobby AltProj 2 0.9243 1.88e-4 39 41 47.32 IALM 223 0.8346 6.19e-4 12 13 152.54 F-FFP 2 0.8524 6.42e-4 24 24 15.20 Light Switch-2 AltProj 2 0.9050 2.24e-4 47 49 87.35 IALM 591 0.7921 7.93e-4 12 13 613.98 F-FFP 2 0.8323 7.54e-4 24 24 24.12 Camera Parameter AltProj 2 0.8806 5.34e-4 47 49 84.99 IALM 607 0.7750 6.86e-4 12 13 433.47 F-FFP 2 0.8684 6.16e-4 24 24 22.25 Time Of Day AltProj 2 0.8646 4.72e-4 44 46 61.63 IALM 351 0.6990 6.12e-4 13 14 265.87 F-FFP 2 0.8441 6.81e-4 25 25 18.49 For IALM and AltProj, (partial) SVDs are for d × n matrices. For F-FFP , SVDs are for n × k matrices, which are computationally far less expensiv e than those required by IALM and AltProj. (1) (2) (3) Figure 1: (1) is a frame from Highway and (2)-(3) are two frames from Light Switch-2. and saturations on face images, so as to improve the learning capability on face image data. This can be handled with RPCA, because clean images reside in a low-rank subspace while the shadows correspond to sparse components. Fol- lowing [8, 13], we use Extended Y ale B (EY aleB) dataset [22] for this test. Out of 38 persons in this dataset, we choose the ﬁrst two subjects. For each subject, there are 64 hea vily T able II: Results on Datasets with r Unknown Data Method Rank( L ) k S k 0 / ( dn ) k X − L − S k F k X k F # of Iter . # of SVDs Time Highway AltProj 5 0.9007 3.66e-4 43 48 75.60 U-FFP 1 0.8854 5.75e-4 24 24+24+24 18.02 Ofﬁce AltProj 5 0.7159 8.61e-4 47 52 98.54 U-FFP 1 0.8761 5.40e-4 24 24+24+24 24.40 PETS2006 AltProj 5 0.8543 6.15e-4 39 43 63.33 U-FFP 1 0.8675 5.61e-4 24 24+24+24 17.29 Shopping Mall AltProj 5 0.9611 9.82e-5 41 46 63.34 U-FFP 1 0.9122 7.70e-4 23 23+23+23 14.37 Pedestrian AltProj 5 0.6202 6.37e-4 44 49 58.10 U-FFP 1 0.6714 5.65e-4 23 23+23+23 12.40 Bootstrap AltProj 5 0.9875 3.02e-4 47 52 169.06 U-FFP 1 0.9288 7.70e-4 23 23+23+23 31.03 W ater Surface AltProj 5 0.9090 2.38e-4 46 50 33.78 U-FFP 1 0.8355 9.77e-4 23 23+23+23 7.27 Campus AltProj 5 0.9482 3.18e-5 46 51 92.90 U-FFP 1 0.9377 6.26e-4 23 23+23+24 16.29 Curtain AltProj 5 0.8079 8.82e-4 36 39 101.79 U-FFP 1 0.8680 6.28e-4 24 24+24 34.33 Fountain AltProj 5 0.7435 7.55e-4 48 52 32.24 U-FFP 1 0.8852 4.91e-4 24 24+24+24 6.26 Escalator Airport AltProj 5 0.8474 8.43e-4 43 48 162.49 U-FFP 1 0.8877 5.45e-4 23 23+23+23 39.70 Lobby AltProj 5 0.9176 1.71e-4 40 44 61.50 U-FFP 2 0.8523 6.42e-4 24 24+24+24 17.91 Light Switch-2 AltProj 5 0.8507 4.29e-4 37 41 80.37 U-FFP 2 0.8329 7.57e-4 24 24+24+24 28.53 Camera Parameter AltProj 5 0.7311 8.34e-4 50 55 147.28 U-FFP 2 0.8689 6.26e-4 24 24+24+24 26.35 Time Of Day AltProj 5 0.8651 4.61e-4 46 51 73.35 U-FFP 2 0.8425 7.20e-4 25 25+25+25 21.83 For AltProj, (partial) SVDs are performed on d × n matrices. For U-FFP , SVDs are for d × k , n × k , and k × k matrices, which are computationally far less expensi ve than those required by AltProj. corrupted images of size 192 × 168 taken under varying lighting conditions. Each image is vectorized into a column of a 32 , 256 × 64 data matrix. It is natural to assume that the face images of the same person reside in a rank one space, hence we hav e r = 1 . First, we consider the case where r is known. F ollowing [8, 13], IALM, AltProj, and F-FFP are applied to each subject and the quantitative and visual results are shown in T able III and Figure 3, respectiv ely . It is observed from Figure 3 that AltProj and F-FFP can successfully remov e shadows from face images. The majority of shadows can be remov ed by IALM, but some still remain. From T able III, we can see that AltProj and F-FFP can recov er the low- rank component with e xactly rank 1, while IALM reco vers L which has a higher rank. Then, we consider the case where r is unknown. Follo wing the setting in V -A, we set k = 5 and apply AltProj and U-FFP to each subject. The quantitative and visual results are sho wn in T able IV and Figure 4, respectiv ely . It is observed from T able IV that U-FFP has similar visual results as F-FFP while AltProj appears unable to remov e shadows completely . Quantitati vely as sho wn in T able IV, AltProj giv es an L that has a higher rank while U- FFP giv es an L that has the true rank. Besides, the proposed U-FFP is the fastest among all these methods. Figure 2: Foreground-background separation in the Highway and Light Switch-2 videos corresponding to (1)-(3) in Figure 1, respectiv ely . From left to right are results of IALM, AltProj (with k ), F-FFP , AltProj (with r = 5 ), and U-FFP , respecti vely . For e very two panels that are grouped together , the top and bottom are recov ered background and foreground, respectiv ely . T able III: Recovery Results of Face Data with k = 1 Data Method Rank( Z ) k S k 0 / ( dn ) k X − Z − S k F k X k F # of Iter . # of SVDs Time Subject 1 AltProj 1 0.9553 8.18e-4 50 51 4.62 IALM 32 0.7745 6.28e-4 25 26 2.43 F-FFP 1 0.9655 8.86e-4 36 36 1.37 Subject 2 AltProj 1 0.9755 2.34e-4 49 50 5.00 IALM 31 0.7656 6.47e-4 25 26 2.66 F-FFP 1 0.9492 9.48e-4 36 36 1.37 (1) (2) (3) (1) (2) (3) Figure 3: Shadow remov al results for subjects 1 and 2 from EY aleB data. For each of the two parts, the top left is the original image and the rest are recovered clean images (top) and shadows (bottom) by (1) IALM, (2) AltProj, and (3) F-FFP , respectiv ely . T able IV: Recovery Results of Face Data with k = 5 Data Method Rank( Z ) k S k 0 / ( dn ) k X − Z − S k F k X k F # of Iter . # of SVDs Time Subject 1 AltProj 5 0.9309 3.93e-4 51 55 6.08 U-FFP 5 0.9647 8.40e-4 36 36+36+36 1.69 Subject 2 AltProj 5 0.8903 6.40e-4 54 58 7.92 U-FFP 1 0.9651 5.72e-4 37 37+37+37 1.74 C. Anomaly Detection Giv en a number of images from one subject, they form a low-dimensional subspace. Any image signiﬁcantly different from the majority of the images can be regarded as an outlier; besides, fewer images from another subject can be (1) (2) (1) (2) Figure 4: Shadow remov al results for subjects 1 and 2 from EY aleB data. The top panel are the recovered clean images and the bottom panel are the shadows by (1) AltProj and (2) U-FFP , respectively . λ = 2e4 for U-FFP . Figure 5: Selected ‘1’ s and ‘7’ s from USPS dataset. 0 50 100 150 200 0 5 10 C olum n In de x i of S k S i k 2 7 1 Figure 6: ` 2 -norms of the rows of S . Figure 7: Outliers including some unusual ‘1’ s and all ‘7’ s identiﬁed by F-FFP . regarded as outliers. Anomaly detection is to identify such kinds of outliers. USPS dataset contains 9,298 images of hand-written digits of size 16 × 16 . Follo wing [13], among these images, we select the ﬁrst 190 images of ‘1’ s and the last 10 of ‘7’ s and construct a data matrix of size 256 × 200 by regarding each vectorized image as a column. Since the number of ‘1’ s is far greater than ‘7’ s, the former is the dominant digit, while the images of the latter are outliers. The true rank of L should be 1. Some examples of these selected images are shown in Figure 5. It is observ ed that besides ‘7’ s, some ‘1’ s are quite different from the majority . Therefore, anomaly detection, in this case, is to detect not only the ‘7’ s, but also the anomaly of ‘1’ s. After applying F- FFP , the columns in S that correspond to anomalies contain relativ ely larger values. W e use ` 2 norm to measure ev ery column of S and show the values in Figure 6. These outliers can be identiﬁed by ﬁnding the columns with the highest bars. For ease of visualization, we vanish all those values that are smaller than 5 in Figure 6. The corresponding digits are shown in Figure 7 as outliers, which include all ‘7’ s and sev eral unusual ‘1’ s. D. Scalability T o numerically illustrate the scalability of F-FFP and U- FFP , we test the time cost versus the v alues of n and d , 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 10 20 30 40 50 60 Samp le Size ( × n ) Time Cost (Seco n ds) Highway Office PETS2006 ShoppingMall Pedestrians Bootstrap WaterSurface Campus Curtain Fountain Escalator Lobby Light Switch−2 Camera Parameter Time Of Day (a) F-FFP (time v .s. n ) 0.01 0.04 0.09 0.16 0.25 0.36 0.49 0.64 0.81 1 0 1 4 9 16 25 36 49 64 # o f Pixels of F rame Image ( × d ) Time Cost (Seco n ds) Highway Office PETS2006 ShoppingMall Pedestrians Bootstrap WaterSurface Campus Curtain Fountain Escalator Lobby Light Switch−2 Camera Parameter Time Of Day (b) F-FFP (time v .s. d ) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 10 20 30 40 50 60 70 Samp le Size ( × n ) Time Cost (Seco n ds) Highway Office PETS2006 ShoppingMall Pedestrians Bootstrap WaterSurface Campus Curtain Fountain Escalator Lobby Light Switch−2 Camera Parameter Time Of Day (c) U-FFP (time v .s. n ) 0.01 0.04 0.09 0.16 0.25 0.36 0.49 0.64 0.81 1 0 1 4 9 16 25 36 49 64 # o f Pixels of F rame Image ( × d ) Time Cost (Seco n ds) Highway Office PETS2006 ShoppingMall Pedestrians Bootstrap WaterSurface Campus Curtain Fountain Escalator Lobby Light Switch−2 Camera Parameter Time Of Day (d) U-FFP (time v .s. d ) Figure 8: Plots of time cost of F-FFP and U-FFP as functions of data size and dimension. respectiv ely . T o test how the computation time increases with n , we uniformly sample a partition of the frames with the sampling rate in { 0 . 1 , 0 . 2 , · · · , 1 . 0 } and keep all pixels for each frame. T o test the relationship of time with respect to d , we use all frames and down-sample each frame with different rates v arying over { 0 . 1 2 , 0 . 2 2 , · · · , 1 . 0 2 } , to k eep the spatial information of each frame. W e run F-FFP and U-FFP with 50 iterations in 10 repeated runs and we report the average time in Figure 8. It is observed that the time cost increases essentially linearly with both n and d for both F- FFP and U-FFP . V I . C O N C L U S I O N In this paper , we propose a new , factorization-based RPCA model. Non-con ve x rank approximation is used to minimize the rank when the ground truth is unkno wn. ALM-type optimization is de veloped for solving the model. Our model and algorithms admit scalability in both data dimension and sample size, suggesting a potential for real world applications. Extensi ve experiments testify to the effecti veness and scalability of the proposed model and algorithms both quantitativ ely and qualitati vely . A C K N O W L E D G M E N T Qiang Cheng is the corresponding author . This work is supported by National Science Foundation under grant IIS- 1218712, National Natural Science Foundation of China, under grant 11241005, and Shanxi Scholarship Council of China 2015-093. R E F E R E N C E S [1] Q. Ke and T . Kanade, “Robust l 1 norm factorization in the presence of outliers and missing data by alternative conve x programming, ” in Computer V ision and P attern Recognition, 2005. CVPR 2005. IEEE Computer Society Confer ence on , vol. 1. IEEE, 2005, pp. 739–746. [2] R. Maronna, D. Martin, and V . Y ohai, Robust statistics . John Wiley & Sons, Chichester . ISBN, 2006. [3] F . De La T orre and M. J. Black, “ A framew ork for robust subspace learning, ” International Journal of Computer V ision , vol. 54, no. 1-3, pp. 117–142, 2003. [4] R. Gnanadesikan and J. R. Kettenring, “Robust estimates, residuals, and outlier detection with multiresponse data, ” Biometrics , pp. 81– 124, 1972. [5] L. Xu and A. L. Y uille, “Robust principal component analysis by self-organizing rules based on statistical physics approach, ” Neural Networks, IEEE T ransactions on , vol. 6, no. 1, pp. 131–143, 1995. [6] C. Croux and G. Haesbroeck, “Principal component analysis based on robust estimators of the covariance or correlation matrix: inﬂuence functions and efﬁciencies, ” Biometrika , vol. 87, no. 3, pp. 603–618, 2000. [7] J. Wright, A. Ganesh, S. Rao, Y . Peng, and Y . Ma, “Robust principal component analysis: Exact recovery of corrupted low-rank matrices via conve x optimization, ” in Advances in neural information pr ocess- ing systems , 2009, pp. 2080–2088. [8] E. J. Cand ` es, X. Li, Y . Ma, and J. Wright, “Robust principal component analysis?” J ournal of the A CM (J ACM) , vol. 58, no. 3, p. 11, 2011. [9] J.-F . Cai, E. J. Cand ` es, and Z. Shen, “ A singular value thresholding algorithm for matrix completion, ” SIAM Journal on Optimization , vol. 20, no. 4, pp. 1956–1982, 2010. [10] K.-C. T oh and S. Y un, “ An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems, ” P aciﬁc Journal of Optimization , vol. 6, no. 615-640, p. 15, 2010. [11] Z. Lin, M. Chen, and Y . Ma, “The augmented lagrange multiplier method for e xact recovery of corrupted lo w-rank matrices, ” arXiv pr eprint arXiv:1009.5055 , 2010. [12] X. Ding, L. He, and L. Carin, “Bayesian robust principal component analysis, ” Image Pr ocessing, IEEE T ransactions on , vol. 20, no. 12, pp. 3419–3430, 2011. [13] Z. Kang, C. Peng, and Q. Cheng, “Robust pca via nonconve x rank approximation, ” in Data Mining (ICDM), 2015 IEEE International Confer ence on . IEEE, 2015, pp. 211–220. [14] W . K. Leow , Y . Cheng, L. Zhang, T . Sim, and L. Foo, “Background recovery by ﬁxed-rank robust principal component analysis, ” in Com- puter Analysis of Images and P atterns . Springer, 2013, pp. 54–61. [15] P . Netrapalli, U. Niranjan, S. Sanghavi, A. Anandkumar, and P . Jain, “Non-con vex robust pca, ” in Advances in Neural Information Pr o- cessing Systems , 2014, pp. 1107–1115. [16] G. Liu and S. Y an, “ Activ e subspace: T oward scalable low-rank learning, ” Neural computation , vol. 24, no. 12, pp. 3371–3394, 2012. [17] H. Xu, C. Caramanis, and S. Sanghavi, “Robust pca via outlier pursuit, ” in Advances in Neural Information Processing Systems , 2010, pp. 2496–2504. [18] M. McCoy , J. A. Tropp et al. , “T wo proposals for robust pca using semideﬁnite programming, ” Electr onic Journal of Statistics , vol. 5, pp. 1123–1160, 2011. [19] C. Peng, Z. Kang, H. Li, and Q. Cheng, “Subspace clustering using log-determinant rank approximation, ” in Proceedings of the 21th A CM SIGKDD International Confer ence on Knowledge Disco very and Data Mining . A CM, 2015, pp. 925–934. [20] C. Peng, Z. Kang, M. Y ang, and Q. Cheng, “Feature selection em- bedded subspace clustering, ” IEEE Signal Pr ocessing Letters , vol. 23, no. 7, pp. 1018–1022, July 2016. [21] R. Basri and D. W . Jacobs, “Lambertian reﬂectance and linear subspaces, ” P attern Analysis and Machine Intelligence, IEEE T rans- actions on , vol. 25, no. 2, pp. 218–233, 2003. [22] A. S. Georghiades, P . N. Belhumeur , and D. J. Kriegman, “From few to many: Illumination cone models for face recognition under variable lighting and pose, ” P attern Analysis and Machine Intelligence, IEEE T ransactions on , vol. 23, no. 6, pp. 643–660, 2001.

A Fast Factorization-based Approach to Robust PCA

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment