Random Projections on Manifolds of Symmetric Positive Definite Matrices for Image Classification
Recent advances suggest that encoding images through Symmetric Positive Definite (SPD) matrices and then interpreting such matrices as points on Riemannian manifolds can lead to increased classification performance. Taking into account manifold geome…
Authors: Azadeh Alavi, Arnold Wiliem, Kun Zhao
Random Pr ojections on Manif olds of Symmetric P ositi ve Definite Matrices f or Image Classification Azadeh Alavi, Arnold W iliem, K un Zhao, Brian C. Lovell, Conrad Sanderson NICT A, GPO Box 2434, Brisbane, QLD 4001, Australia Uni versity of Queensland, School of ITEE, QLD 4072, Australia Queensland Uni versity of T echnology , Brisbane, QLD 4000, Australia Abstract Recent advances suggest that encoding ima ges thr ough Symmetric P ositive Definite (SPD) matrices and then inter- pr eting such matrices as points on Riemannian manifolds can lead to increased classification performance. T aking into account manifold geometry is typically done via (1) em- bedding the manifolds in tangent spaces, or (2) embedding into Repr oducing Kernel Hilbert Spaces (RKHS). While embedding into tangent spaces allows the use of existing Euclidean-based learning algorithms, manifold shape is only approximated which can cause loss of discriminatory information. The RKHS appr oac h r etains mor e of the man- ifold structur e, but may r equir e non-trivial effort to ker- nelise Euclidean-based learning algorithms. In contrast to the above appr oaches, in this paper we offer a novel so- lution that allows SPD matrices to be used with unmod- ified Euclidean-based learning algorithms, with the true manifold shape well-preserved. Specifically , we pr opose to pr oject SPD matrices using a set of random pr ojection hyperplanes over RKHS into a random pr ojection space, which leads to r epresenting each matrix as a vector of pr o- jection coefficients. Experiments on face r ecognition, per- son r e-identification and textur e classification show that the pr oposed appr oach outperforms se ver al r ecent methods, such as T ensor Sparse Coding, Histogram Plus Epitome, Riemannian Locality Pr eserving Projection and Relational Diver gence Classification. 1. Introduction Cov ariance matrices have recently been employed to de- scribe images and videos [22, 13, 35], as they are kno wn to provide compact and informative feature description [7, 3]. Non-singular cov ariance matrices are naturally symmet- ric positiv e definite matrices (SPD) which form connected Riemannian manifolds when endowed with a Riemannian metric over tangent spaces [20]. As such, the Rieman- nian geometry needs to be considered for solving learning tasks [35]. One of the most widely used metrics for SPD matrices is the Affine Inv ariant Riemannian Metric (AIRM) [22]. The AIRM induces Riemannian structure which is inv ariant to inv ersion and similarity transforms. Despite its proper- ties, learning methods using this approach ha ve to deal with computational challenges, such as employing computation- ally expensi v e non-linear operators. T o address the abov e issue, two lines of research hav e been proposed: (1) embedding manifolds into tangent spaces [21, 24, 27, 35, 36]; (2) embedding into Reproduc- ing Kernel Hilbert Spaces (RKHS), induced by kernel func- tions [2, 14, 16, 17, 28, 30, 34]. The former approaches in ef fect map manifold points to Euclidean spaces, thereby enabling the use of existing Euclidean-based learning al- gorithms. This comes at the cost of disregarding some of the manifold structure. The latter approach addresses this by implicitly mapping points on the manifold into RKHS, which can be considered as a high dimensional Euclidean space. Training data can be used to define a space that pre- serves manifold geometry [17]. The downside is that ex- isting Euclidean-based learning algorithms need to be ker- nelised, which may not be trivial. Furthermore, the result- ing methods can still have high computational load, making them impractical to use in more complex scenarios. Contributions. In this paper we offer a novel approach for analysing SPD matrices which combines the main adv an- tage of tangent space approaches with the discriminatory power provided by kernel space methods. W e adapt a re- cent idea from techniques specifically designed for learn- ing tasks in very large image datasets [12, 19]. In this domain, image representations are mapped into a reduced space wherein the similarities are still well-preserved [19]. In our proposed approach, we employ such a mapping tech- nique to create a space which preserv es the manifold geom- etry while can be considered as Euclidean. Specifically , we first embed SPD manifold points into RKHS via the Stein Di ver gence Kernel [33]. W e then gen- erate random projection hyperplanes in RKHS and project the embedded points via the method proposed in [19]. Fi- nally , as the underlying space can be thought as Euclidean, any appropriate Euclidean-based learning machinery can be applied. In this paper , we study the efficac y of this em- bedding method for classification tasks. W e show that the space is only as effecti ve as the completeness of the train- ing data generating the random projection hyperplanes, and address this through the use of synthetic data to augment training data. Experiments on several vision tasks (person re-identification, face recognition and texture recognition), show that the proposed approach outperforms sev eral state- of-the-art methods. W e continue the paper as follows. Section 2 provides a brief ov ervie w of the manifold structure and its associated kernel function. W e then detail the proposed approach in Section 3. Section 4 presents results on the study of the random projection space discriminability as well as com- parisons with the state-of-the-art results in various visual classification tasks. The main findings and possible future directions are summarised in Section 5. 2. Manif old Structur e and Stein Diver gence Consider { X 1 . . . X n } ∈ S y m d + to be a set of non- singular d × d -sized cov ariance matrices, which are symmet- ric positiv e definite (SPD) matrices. These matrices belong to a smooth dif ferentiable topological space, kno wn as SPD manifolds. In this work, we endow the SPD manifold with the AIRM to induce the Riemannian structure [22]. As such, a point on manifold M can be mapped to a tangent space using: log X i X j = X i 1 2 log( X i − 1 2 X j X i − 1 2 ) X i 1 2 (1) where X i , X j ∈ S y m d + , X i is the point where the tangent space is located ( i.e . tangent pole) and X j is the point that we would like to map into the tangent space T X i M ; log ( · ) is the matrix logarithm. The in verse function of this maps points on a particular tangent space into the manifold is: exp X i y = X i 1 2 exp( X i − 1 2 y X i − 1 2 ) X i 1 2 (2) where X i ∈ S y m d + is again the tangent pole; y ∈ T X i M is a point in the tangent space T X i M ; exp( · ) is the matrix exponential. From the abov e functions, we now define the shortest distance between two points on the manifold. The dis- tance, here called geodesic distance, is represented as the minimum length of the curvature path that connects two points [22]: d 2 g ( X i , X j ) = trace n log 2 ( X i − 1 2 X j X i − 1 2 ) o (3) The above mapping functions can be computationally expensi v e. W e can also use the recently introduced Stein div er gence [33] to determine similarities between points on the SPD manifold. Its symmetrised form is: J φ ( X , Y ) , log det X + Y 2 − 1 2 log (det ( X Y )) (4) The Stein div er gence kernel can then be defined as: K( X , Y ) = exp {− σ J φ ( X , Y ) } (5) under the condition of σ ∈ { 1 2 , 2 2 , ..., d − 1 2 } to ensure that the kernel matrix formed by Eqn. (5) is positi ve definite [15]. 3. Random Projection on RKHS W e aim to address classification tasks, originally formu- lated on the manifold, by embedding them into a random projection space, which can be considered as Euclidean, while still honouring the manifold geometry structure. T o this end, we propose to use random projection on RKHS with the aid of the Stein div er gence kernel. Random projection is an approximation approach for estimating distances between pairs of points in a high- dimensional space [1]. In essence, the projection of a point u ∈ R d can be done via a set of randomly generated hyper - planes { r 1 . . . r k } ∈ R d : f ( u ) = u > R (6) where R ∈ R d × k is the matrix wherein each column con- tains a single hyperplane r i ; f ( · ) is the mapping func- tion which maps any point in R d into a random projection space space R k . According to the Johnson-Lindenstrauss lemma [1], it is possible to map a set of high-dimensional points into much lo wer dimensional space wherein the pair- wise distance between two points are well-preserved: Lemma 3.1. Johnson-Lindenstr auss Lemma. F or any such that 1 2 > > 0 , and any set of points S ∈ R d with | S | = n upon projection to a uniform random k-dimension subspace where k = O(log n ) , the following pr operty holds with pr obability at least 1 2 for every pair u , v ∈ S , (1 − ) || u − v || 2 ≤ || f ( u ) − f ( v ) || 2 ≤ (1 + ) || u − v || 2 , wher e f ( u ) , f ( v ) ar e pr ojection of u , v . Despite the success of numerous approaches using this lemma to accomplish various computer vision tasks, most of them restrict the distance function to the ` p norm, Ma- halanobis metric or inner product [5, 8, 18], which makes them incompatible for non-Euclidean spaces. Recently , Kulis and Grauman [19] proposed a method that allows the distance function to be ev aluated over RKHS. Thus, it is possible to apply the lemma for any arbitrary kernel K ( i, j ) = K( X i , X j ) = φ ( X i ) > φ ( X j ) for an unknown embedding φ ( · ) which maps the points to a Hilbert space H [19]. This approach makes it possible for one to construct a random projection space on an SPD manifold, where the manifold structure is well-preserved. The main idea of our proposed approach, denoted as R andom Projection O n S PD manifold for Imag E Classi- fication (R OSE), is to first map all points on the manifold into RKHS, with implicit mapping function φ ( · ) , via the Stein div er gence kernel. This is followed by mapping all the points in the RKHS φ ( X i ) ∈ H into a random projection space R k . T o achiev e this we follow the Kulis-Grauman approach [19] by randomly generating a set of hyperplanes ov er the RKHS { r 1 . . . r k } ∈ H which is restricted to be approximately Gaussian. As the embedding function φ ( · ) is unkno wn, then the generation process is done indirectly via a weighted sum of the subset of the giv en training sets. T o this end, consider each data point φ ( X i ) from the training set as a vector from some underlying distribution D with unknown mean µ and unknown covariance Σ . Let S be a set of t training ex emplars chosen i.i.d. from D , then z t = 1 t P i ∈ S φ ( X i ) is defined over S . According to the central limit theorem for sufficiently large t , the ran- dom vector ˜ z t = √ t ( z t − µ ) is distributed according to the multi-v ariate Gaussian N ( µ , Σ ) [26]. Then if a whiten- ing transform is applied, it results in r i = Σ − 1 2 ˜ z t which follows N (0 , I ) distribution in Hilbert space H . Therefore the i -th coefficient of each vector in the random projection space is defined as: φ ( X i ) T Σ − 1 2 ˜ z t (7) The mean µ and covariance Σ need to be approximated from training data. A set of p objects is chosen to form the first p items of a reference set: φ ( X 1 ) , . . . , φ ( X p ) . Then the mean is implicitly estimated as µ = 1 p P p i =1 φ ( X i ) , and the cov ariance matrix Σ is also formed over the p sam- ples. Eqn. (7) can be solved using a similar approach as for Kernel PCA, which requires projecting onto the eigen vec- tors of the cov ariance matrix [19]. Let the eigendecomposi- tion of Σ be U V U T , then Σ − 1 2 = U V 1 2 U T , and therefore Eqn. (7) can be rewritten as [19]: φ ( X i ) T U V 1 2 U T ˜ z t (8) Let then define K as a kernel matrix ov er the p ran- domly selected training points, where K = ΛΘΛ T . Based on the fact that the non zero eigen v alues of V are equal to the non-zero eigen values of Θ , Kulis-Grauman [19] showed that Eqn. (8) is equiv alent to: X p i =1 ω ( i )( φ ( X i ) T φ ( X )) (9) where ω ( i ) = 1 t p X j =1 X l ∈ S K ij − 1 2 K j l − 1 p p X j =1 p X k =1 K ij − 1 2 K j k (10) where, for S , a set of t points are randomly selected from the p sampled points. The expression w ( i ) in Eqn. (10) can be further simplified by defining e as a vector of all ones, and e S as a zero v ector with ones in the entries correspond- ing to the indices of S [19]: w = K 1 2 1 t e s − 1 p e (11) In terms of calculating the computational complexity of the training algorithm, according to Eqns. (9) and (11), the most expensiv e step is in the single offline computation of K 1 2 , which takes O ( p 3 ) . The computational complex- ity of classifying a query point depends then on three fac- tors: computing the kernel vector which requires O ( pd 3 ) operations, projecting the resulting kernel vector into ran- dom hyperplane which demands O ( pt ) operations (where t < p ), and finally applying a classifier in the projection space which can be done with one versus all support vec- tor machine O ( nb ) operations, where n is the number of classes and b is the number of hyperplanes used in defining the hyperplane [19]. Hence the complexity of classification for a single query data is equal to O ( p 3 + pt + nb ) which is more ef ficient than Relational Div ergence based Classi- cation (RDC) [2], which is later shown to be the second best approach in the experiment part. The RDC represents Riemannian points as similarity vectors to a set of training points. As similarity vectors are in Euclidean space, RDC then employs Linear Discriminant Analysis as a classifier . 3.1. Synthetic Data As later shown in the experiment section (for instance the result shown in Fig. 2), the discriminativ e power of the random projection space depends heavily on the training set which generates the random projection hyperplanes. T o ov ercome this limitation, we propose to use generated syn- thetic SPD matrices X 1 , ..., X n ∈ S y m d + centred around the mean of the data (denoted by X µ ), where the mean of the training set can be determined intrinsically via the Karcher mean algorithm [22]. W e relate the synthetic data to the training set, by enforc- ing the following condition on the synthetic SPD matrices: ∀ X j ∈ S and ∀ X i ∈ G : (12) d g ( X µ , X j ) 6 max(d g ( X µ , X i )) where G is the training set, S is a set of t training e xemplars chosen i.i.d. from some underlying distribution D , X µ is the mean of the training set and X j is a generated synthetic point. The constraint in Eqn. (12) considers a ball around the mean of the training data, with the radius equal to the longest calculated distance between mean and the giv en training points: r = max(d g ( X µ , X i )) (13) Then we need to generate SPD matrices which are lo- cated within r radius from the mean (Eqn. 13). It is not trivial to generate SPD matrices which follow Eqn. (12), as it establishes a relation between generate SPD matrices and the original training points. T o address this, we apply the re- lationship between the geodesic distance and the gi ven Rie- mannian metric in a tangent space. Let X i , X j ∈ S y m d + be two points on the manifold and x i , x j ∈ T X i M be the cor- responding points on the tangent space T X i M . The norm of vector x i x j is equiv alent to d g ( X i , X j ) [22]. There- fore, it is possible to find a point Y i along the geodesic X i and X j whose geodesic distance to X i satisfies (12). Along with the above definitions, we introduce the fol- lowing definition and proposition: Definition 3.2. Any point on an SPD manifold X i ∈ S y m d + is said to have normalised geodesic distance with r espect to X j ∈ S y m d + if and only if d g ( X i , X j ) = 1 . Proposition 3.3. F or any two SPD matrices X , X µ ∈ S y m d + , ther e exists X g on the geodesic curve defined on X and X µ , which has normalised geodesic distance with r espect to X µ . The point X g can be determined via: X µ 1 2 X µ − 1 2 X X µ − 1 2 c X µ 1 2 , where c = ζ d g ( X , X µ ) , for ζ = 1 . T o prov e the above proposition, we let X , X µ ∈ S y m d + to be two giv en points on an SPD manifold. In order to normalise the geodesic distance of X with respect to X µ , we map point X into tangent space T X µ M . As a tangent space is considered as Euclidean space where the distance between X and tangent pole X µ is preserved, Euclidean vector normalisation can be applied. Finally the normalised point is mapped back to the manifold. These steps can be presented as: X g = exp X µ ζ d g ( X µ , X ) log X µ ( X ) (14) By plugging in (1) and (2) we obtain: X g = X µ 1 2 exp ζ d g ( X µ , X ) log( X µ − 1 2 X X µ − 1 2 ) X µ 1 2 If we let c = ζ d g ( X µ , X ) , based on the fact that X and X µ are SPD matrices, we arriv e at: 1 X g = X µ 1 2 exp log ( X µ − 1 2 X X µ − 1 2 ) c X µ 1 2 which prov es that: X g = X µ 1 2 X µ − 1 2 X X µ − 1 2 c X µ 1 2 (15) Having ζ is equal to 1 results a normalised geodesic dis- tance with respect to X µ . Ho we v er in our case to sat- isfy Eqn. (12), we use ζ = δ × max(d g ( X µ , X j )) , where δ ∈ [0 , 1] is randomly generated number according to uni- form distribution. 4. Experiments and Discussion W e consider three computer vision classification tasks: (1) texture classification [25]; (2) face recognition [23] and (3) person re-identification [9]. W e first detail the experi- ment set up for each application and discuss our results for the comprehensiv e study of the random projection space discriminability on the tasks. T o this end, we first embed the SPD matrices into RKHS via the Stein div ergence ker- nel, followed by mapping the embedded data points into a random projection space. The resulting vectors are then 1 See the appendix for proof of log X c = c log X , where X ∈ S y m d + . Figure 1. T op row: e xample of pedestrians in the ETHZ dataset [29]. Middle row: example images from the Brodatz tex- ture dataset [25]. Bottom row: examples of closely-cropped faces from the FERET ‘b’ subset [23]. fed to a linear Support V ector Machine classifier , which uses a one-versus-all configuration for multi-class classifi- cation [10, 30] The parameter settings are as follows. As suggested in [19], we have used t = min(30 , 1 4 n ) , where n is the number of samples chosen to create each hyperplane. For the number of the random hyperplanes we hav e used vali- dation data to choose one of n , 2 n or 3 n . Based on empiri- cal observations on validation sets, the number of synthetic samples was chosen as either n or m , where m be the num- ber of samples per class. In a similar manner , the value of σ in Eqn. (5) was chosen from { 1 , 2 , . . . , 20 } . W e compare our proposed method, here denoted as R andom Projection O n S PD manifold for Imag E Classifi- cation (R OSE), with se veral other embedding approaches (tangent spaces, RKHS and hashing) as well as sev eral state-of-the-art methods. W e also e v aluate the ef fect of aug- menting the training data with synthetic data points, and re- fer to this approach as R OSE with S ynthetic data (R OSES). For the person re-identification task we used the modi- fied version [29] of the ETHZ dataset [9]. The dataset was captured from a moving camera, with the images of pedes- trians containing occlusions and wide v ariations in appear- ance. Sequence 1 contains 83 pedestrians, and Sequence 2 contains 35 (Fig. 1). Following [2], we first do wnsampled all the images and then created the training set using 10 ran- domly selected images, while the rest were used to shape the test set. The random selection of the training and testing data was repeated 20 times. Each image was represented as a cov ariance matrix of feature v ectors obtained at each pixel location: F x,y = x, y , R x,y , G x,y , B x,y , R 0 x,y , G 0 x,y , B 0 x,y , R 00 x,y , G 00 x,y , B 00 x,y where x and y represent the position of a pixel, while R x,y , G x,y and B x,y represent the corresponding colour informa- tion. C 0 x,y , C 00 x,y represent the gradient and Laplacian for colour C , respectiv ely . For the task of texture classification, we use the Bro- datz dataset [25]. See Fig. 1 for examples. W e fol- low the test protocol presented in [32]. Accordingly , nine test scenarios with v arious number of classes were gen- erated, T o create SPD matrices, we follow [2] by down- sampling each image and then splitting it into 64 regions. A feature vector for each pixel I ( x, y ) is calculated as F ( x, y ) = h I ( x, y ) , ∂ I ∂ x , ∂ I ∂ y , ∂ 2 I ∂ x 2 , ∂ 2 I ∂ y 2 i . Each region is described by a cov ariance matrix formed from these vec- tors. For each test scenario, we randomly select 25 covari- ance matrices per class to construct training set and the rest is used to create the testing set. The random selection was repeated 10 times and the mean results are reported. For face recognition task, the ‘b’ subset of the FERET dataset [23] is used. Each image is first closely cropped to include only the face and then downsampled (Fig. 1). The tests with various pose angles were created to ev aluate the performance of the method. The training set consists of frontal images with illumination, expression and small pose variations. Non-frontal images are used to create the test sets. Each face image is represented by a covariance matrix, where for ev ery pixel I ( x, y ) the following feature vector is computed: F x,y = [ I ( x, y ) , x, y , | G 0 , 0 ( x, y ) | , · · · , | G 4 , 7 ( x, y ) | ] (16) where G u,v ( x, y ) is the response of a 2D Gabor w av elet cen- tred at x, y . 4.1. Random Projection Space Discriminability W e first compare the performance of the proposed ROSE method with se veral other embedding methods: (1) Ker- nel SVM (KSVM) using the Stein div ergence kernel, (2) Kernelised Locality-Sensitive Hashing (KLSH) [19], and (3) Riemannian Spectral Hashing (RSH), a hashing method specifically designed for smooth manifolds [6]. T ables 1, 2 and 3 report the results for each dataset. R OSE considerably outperforms the other embedding meth- ods on the texture and person re-identification applications, while being on par with KLSH on the face recognition task. This suggests that the random projection space constructed T able 1. Recognition accuracy (in %) for the person re- identification task on Seq. 1 and Seq .2 of the ETHZ dataset; KSVM: Kernel SVM; KLSH: Kernelised Locality-Sensiti ve Hash- ing; RSH: Riemannian Spectral Hashing. R OSE is the proposed method, and R OSES is R OSE augmented with synthetic data. KSVM KLSH RSH R OSE R OSES Seq.1 72 . 0 81 . 0 58 . 5 90 . 7 92 . 5 Seq.2 79 . 0 84 . 0 62 . 7 91 . 5 94 . 0 A verage 75 . 5 82 . 5 60 . 6 91 . 2 93 . 2 by the random hyperplanes o v er RKHS offers sufficient dis- crimination for the classification tasks. In fact, as we use linear SVM for the classifier, the results presented here fol- low the theoretical results from [31] which suggest that the margin for the SVM classifier is still well-preserved after the random projection. W e apply the R OSES method (R OSE augmented with synthetic data) on the three tasks in order to take a closer look at the contribution of the training data generating the random projection hyperplanes for space discriminability . As shown in the results, there is notable improvement over R OSE in the ETHZ person re-identification as well as Bro- datz texture classification datasets. Howe ver , using syn- thetic points gi ves adverse effect on the FERET face recog- nition dataset. The results suggest that the training data contributes to space discriminability . This is probably due to the fact that each random projection hyperplane is represented as a lin- ear combination of randomly selected training points. As such, variations and completeness of the training data may hav e significant contributions to the resulting space. The performance loss suffered on the FERET face recognition dataset is probably caused by the skewed data distribution of this particular dataset. Hence adding synthetic points would significantly alter the original data distribution which in turn affects space discriminability . From our empirical observation (while working with RSH), we found that all data points are grouped together when an intrinsic cluster- ing method was applied to the the dataset. The very poor performance of RSH on this dataset supports our view . T able 2. Recognition accuracy (in %) for the texture recognition task on BR OD A TZ dataset. KSVM KLSH RSH R OSE R OSES 5c 99 . 3 88 . 7 96 . 6 99 . 3 99 . 8 5m 85 . 8 43 . 6 81 . 9 90 . 1 88 . 4 5v 86 . 2 82 . 6 76 . 9 91 . 6 88 . 6 5v2 89 . 4 52 . 0 80 . 9 90 . 5 92 . 7 5v3 87 . 4 73 . 0 79 . 1 88 . 6 91 . 3 10 81 . 3 47 . 0 72 . 5 86 . 7 87 . 0 10v 81 . 5 48 . 0 69 . 3 88 . 1 88 . 5 16c 79 . 6 33 . 7 65 . 7 84 . 1 85 . 7 16v 73 . 4 35 . 5 59 . 0 77 . 1 79 . 8 A verage 84 . 88 56 . 0 75 . 8 88 . 5 89 . 1 T able 3. Recognition accuracy (in %) for the face recognition task on the ‘b’ subset of the FERET dataset. KSVM KLSH RSH R OSE R OSES bd 39 . 0 70 . 0 13 . 5 70 . 5 52 . 0 bg 58 . 5 80 . 5 31 . 5 80 . 5 61 . 5 A verage 48 . 8 75 . 2 22 . 5 75 . 5 56 . 8 Figure 2. Sensitivity of random projection space discriminability to the number of selected data points for generating the random hyperplanes, as well as the effect of adding synthetic data points for improving space discriminability . The graphs compare the per- formance of R OSE and ROSES on 5 c (top) and 5 m (bottom) sets of the BR OD A TZ te xture recognition dataset respectiv ely . T o further highlight the proposed ROSES method we set an experiment on two sequences ‘5c’ and ‘5m’ of the BR OD A TZ dataset. In this experiment we reduce the num- ber of data required for creating the mapping function step by step. First step we use all the provided training data to construct the random projection space. Then, we progres- siv ely discard training data points from a particular class to construct the space. W e repeat this process until there is only one class left. Both ’5c’ and ’5m’ have a total of 5 classes where each class has 5 samples for training. W e ran the experiment on ev ery single combination for each case ( e.g . when two classes are excluded, there are 10 combina- tions) and present the av erage accuracy . As sho wn in Fig. 2, there is a significant performance difference between the R OSE and ROSES methods, high- lighting the importance of the training data generating the random projection hyperplanes. This performance dif fer- ence is more pronounced when more classes are excluded from the training data. W e note that this training set is dif- ferent from the training set to train the classifier . Although we exclude some classes in the training set for constructing the random projection space, we still use all the provided training data to train the classifier . 4.2. Comparison with Recent Methods T able 4 sho ws that on the FERET face recognition dataset the proposed R OSE method obtains consider- ably better results than se veral recent methods: log- Euclidean sparse representation (logE-SR) [13, 37], T en- sor Sparse Coding (TSC) [32], Locality Preserving Pro- jection (RLPP) [17], and Relational Diver gence Classifica- tion (RDC) [2]. T able 5 contrasts the performance of the ROSES method (R OSE augmented with synthetic data) on the BR OD A TZ texture recognition task against the above methods. W e note that in this case the the use of synthetic data is nec- essary in order to achiev e improved performance. On av- erage, ROSES achiev es higher performance than the other methods, with top performance obtained in 7 out of 9 tests. Finally , we compared the R OSES method with sev eral state-of-the-art algorithms for person re- identification on the ETHZ dataset: Histogram Plus Epitome (HPE) [4], Symmetry-Driv en Accumulation of Local Features (SD ALF) [11], RLPP [17] and RDC [2]. The performance of TSC [32] was not ev aluated due to the method’ s high computational demands: it would take approximately 200 hours to process the ETHZ dataset. W e do not report the results for LogE-SR due to its low performance on the other two datasets. The results shown in T able 6 indicate that the proposed R OSES method obtains better performance. As in the previous experiment, the use of synthetic data is necessary to obtain improved performance. T able 4. Recognition accuracy (in %) for the face recognition task using log-Euclidean sparse representation (logE-SR) [13, 37], T ensor Sparse Coding (TSC) [32], Riemannian Locality Preserv- ing Projection (RLPP) [17], Relational Diver gence Classification (RDC) [2], and the proposed R OSE method. LogE-SR TSC RLPP RDC R OSE bd 35 . 0 36 . 0 47 . 0 59 . 0 70 . 0 bg 47 . 0 45 . 0 58 . 0 71 . 0 80 . 5 A verage 41 . 0 40 . 5 52 . 5 65 . 0 75 . 2 T able 5. Performance on the Brodatz texture dataset [25] for LogE-SR [13, 37], T ensor Sparse Coding (TSC) [32], Riemannian Locality Preserving Projection (RLPP) [17], Relational Diver - gence Classification (RDC) [2], and the proposed R OSES method. LogE-SR TSC RLP RDC R OSES 5c 89 . 0 99 . 7 99 . 2 98 . 2 99 . 8 5m 53 . 5 72 . 5 86 . 2 88 . 0 88 . 4 5v 73 . 5 86 . 3 86 . 4 87 . 0 88 . 6 5v2 70 . 8 86 . 1 90 . 0 89 . 0 92 . 7 5v3 63 . 6 83 . 1 89 . 7 87 . 0 91 . 3 10 60 . 6 81 . 3 84 . 7 84 . 0 87 . 0 10v 63 . 4 67 . 9 83 . 0 86 . 0 88 . 5 16c 67 . 1 75 . 1 82 . 0 88 . 0 85 . 7 16v 55 . 4 66 . 6 74 . 0 81 . 0 79 . 8 A verage 66 . 3 79 . 8 86 . 1 87 . 6 89 . 1 T able 6. Recognition accuracy (in %) for the person re- identification task on Seq.1 and Seq.2 of the ETHZ dataset. HPE: Histogram Plus Epitome [4]; SD ALF: Symmetry-Driv en Accumu- lation of Local Features [11]; RLPP: Riemannian Locality Pre- serving Projection [17]; RDC: Relational Div ergence Classifica- tion [2]. HPE SD ALF RLPP RDC R OSES Seq.1 79 . 5 84 . 1 88 . 2 88 . 7 92 . 5 Seq.2 85 . 0 84 . 0 89 . 8 89 . 8 94 . 0 A verage 82 . 2 84 . 0 89 . 0 89 . 2 93 . 2 5. Main Findings and Future Dir ections The key advantage of representing images in forms of non-singular cov ariance matrix groups is that superior performance can be achie ved when the underlying struc- ture of the group is considered. It has been shown that when endowed with the Affine In v ariant Riemannian Met- ric (AIRM), the matrices form a connected, smooth and dif- ferentiable Riemannian manifold. W orking directly on the manifold space via AIRM poses many computational chal- lenges. T ypical ways of addressing this issue include em- bedding the manifolds to tangent spaces, and embedding into Reproducing Kernel Hilbert Spaces (RKHS). Embed- ding the manifolds to tangent spaces considerably simplifies further analysis, at the cost of disregarding some of the man- ifold structure. Embedding via RKHS can better preserve the manifold structure, b ut adds the burden of extending ex- isting Euclidean-based learning algorithms into RKHS. In this work, we hav e presented a novel solution which embeds the data points into a random projection space by first generating random hyperplanes in RKHS and then pro- jecting the data in RKHS into the random projection space. W e presented a study of space discriminability for various computer vision classification tasks and found that the space has superior discriminati ve po wer to the typical approaches outlined above. In addition, we found that the space dis- criminativ e power depends on the completeness of training data generating the random hyperplanes. T o address this is- sue, we proposed to augment training data with synthetic data. Experiments on face recognition, person re- identification and texture classification show that the proposed method (combined with a linear SVM) outper- forms state-of-the-art approaches such as T ensor Sparse Coding, Histogram Plus Epitome, Riemannian Locality Preserving Projection and Relational Diver gence Classi- fication. T o our knowledge this is the first time random projection space has been applied to solve classification tasks in manifold space. W e en vision that the proposed method can be used to bring superior discriminativ e power of manifold spaces to more general vision tasks, such as object tracking. A ppendix Here we provide more details to support Proposition 3.3: we show that for SPD matrices, log( X c ) = c × log ( X ) . For this proof we let c be a discrete number , howe ver we note that it can be extended to continuous c . log( X c ) = log( X × X × ... × X | {z } c instances of X ) (17) Replacing X ∈ S y m d + with its singular value decomposi- tion (SVD) as X = U V U > , the abov e equation becomes: log( X c ) = log( U V U > × ... × U V U > ) (18) As X ∈ S y m d + , the eigen v alue matrix U is orthonormal, and hence U > U = I . As such, the following equation is valid: log( X c ) = log( U V c U > ) (19) Similarly , as X ∈ S y m d + , log( X c ) = U log( V c ) U > , where log( V c ) is the diagonal matrix of the eigen v alue loga- rithm [35]. Hence we hav e: log( X c ) = U log ( V c ) U > = c U log( V ) U > = c × log ( X ) Acknowledgements This research was partly funded by Sulliv an Nicolaides Pathology (Australia), and the Australian Research Council Linkage Projects Grant LP130100230. NICT A is funded by the Australian Go v ernment through the Department of Communications and the Australian Re- search Council through the ICT Centre of Excellence Pro- gram. References [1] D. Achlioptas. Database-friendly random projections: Johnson- Lindenstrauss with binary coins. Journal of Computer and System Sciences , 66(4):671–687, 2003. [2] A. Alavi, M. T . Harandi, and C. Sanderson. Relational div ergence based classification on Riemannian manifolds. In IEEE W orkshop on Applications of Computer V ision (W ACV) , pages 111–116, 2013. [3] C. Anoop, M. V assilios, and P . Nikolaos. Dirichlet process mixture models on symmetric positive definite matrices for appearance clus- tering in video surveillance applications. In IEEE Conf. Computer V ision and P attern Recognition (CVPR) , pages 3417–3424, 2011. [4] L. Bazzani, M. Cristani, A. Perina, M. Farenzena, and V . Murino. Multiple-shot person re-identification by HPE signature. In Int. Conf. P attern Recognition (ICPR) , pages 1413–1416, 2010. [5] M. S. Charikar . Similarity estimation techniques from rounding algo- rithms. In Pr oceedings of A CM Symposium on Theory of Computing , pages 380–388, 2002. [6] R. Chaudhry and Y . Ivanov . Fast approximate nearest neighbor meth- ods for non-Euclidean manifolds with applications to human activity analysis in videos. In Eur opean Conference on Computer V ision , pages 735–748, 2010. [7] A. Cherian, S. Sra, A. Banerjee, and N. Papanikolopoulos. Efficient similarity search for cov ariance matrices via the Jensen-Bregman LogDet diver gence. In Int. Conf. Computer V ision (ICCV) , pages 2399–2406, 2011. [8] M. Datar, N. Immorlica, P . Indyk, and V . S. Mirrokni. Locality- sensitiv e hashing scheme based on p-stable distributions. In Pr o- ceedings of Symposium on Computational Geometry , pages 253– 262. A CM, 2004. [9] A. Ess, B. Leibe, and L. V an Gool. Depth and appearance for mobile scene analysis. In Int. Conf. Computer V ision (ICCV) , pages 1–8, 2007. [10] R.-E. Fan, K.-W . Chang, C.-J. Hsieh, X.-R. W ang, and C.-J. Lin. LI- BLINEAR: A library for large linear classification. Journal of Ma- chine Learning Resear ch , 9:1871–1874, 2008. [11] M. Farenzena, L. Bazzani, A. Perina, V . Murino, and M. Cristani. Person re-identification by symmetry-driven accumulation of local features. IEEE Conf. Computer V ision and P attern Recognition , pages 2360–2367, 2010. [12] Y . Gong, S. Lazebnik, A. Gordo, and F . Perronnin. Iterativ e quantiza- tion: A procrustean approach to learning binary codes for lar ge-scale image retrie val. IEEE T rans. on P attern Analysis and Machine Intel- ligence , 35(12):2916–2929, 2013. [13] K. Guo, P . Ishwar, and J. K onrad. Action recognition using sparse representation on covariance manifolds of optical flow . In IEEE Conf. Advanced V ideo and Signal Based Surveillance (A VSS) , pages 188–195, 2010. [14] J. Hamm and D. D. Lee. Extended Grassmann kernels for subspace- based learning. In D. Koller , D. Schuurmans, Y . Bengio, and L. Bot- tou, editors, Advances in Neural Information Processing Systems (NIPS) , pages 601–608. 2009. [15] M. Harandi, C. Sanderson, R. Hartley , and B. C. Lovell. Sparse coding and dictionary learning for symmetric positive definite matrices: a kernel approach. In Eur opean Conference on Computer V ision (ECCV) , volume 7573 of Lectur e Notes in Computer Science , pages 216–229. Springer , 2012. [16] M. Harandi, C. Sanderson, C. Shen, and B. C. Lovell. Dictionary learning and sparse coding on Grassmann manifolds: An extrinsic solution. In IEEE Int. Conf. Computer V ision (ICCV) , 2013. [17] M. Harandi, C. Sanderson, A. Wiliem, and B. C. Lovell. Kernel analysis over Riemannian manifolds for visual recognition of actions, pedestrians and textures. IEEE W orkshop on the Applications of Computer V ision (W ACV) , pages 433–439, 2012. [18] P . Jain, B. Kulis, and K. Grauman. Fast image search for learned metrics. In IEEE Confer ence on Computer V ision and P attern Recog- nition (CVPR) , 2008. [19] B. Kulis and K. Grauman. Kernelized locality-sensitive hashing for scalable image search. In International Conference on Computer V ision , pages 2130–2137, 2009. [20] S. Lang. Fundamentals of differ ential geometry , volume 160. Springer V erlag, 1999. [21] Y . Lui. T angent bundles on special manifolds for action recognition. IEEE T rans. Cir cuits and Systems for V ideo T echnology , 22(6):930– 942, 2011. [22] X. Pennec. Intrinsic statistics on Riemannian manifolds: Basic tools for geometric measurements. Journal of Mathematical Imaging and V ision , 25(1):127–154, 2006. [23] P . Phillips, H. Moon, S. Rizvi, and P . Rauss. The FERET evaluation methodology for face-recognition algorithms. IEEE T rans. P attern Analysis and Machine Intelligence , 22(10):1090–1104, 2000. [24] F . Porikli, O. Tuzel, and P . Meer . Covariance tracking using model update based on Lie algebra. In IEEE Conf. Computer V ision and P attern Recognition (CVPR) , pages 728–735, 2006. [25] T . Randen and J. Husoy . Filtering for texture classification: A com- parativ e study . IEEE Tr ansactions on P attern Analysis and Machine Intelligence , 21(4):291–310, 1999. [26] J. A. Rice. Mathematical statistics and data analysis. Cengage Learn- ing, 2007. [27] A. Sanin, C. Sanderson, M. T . Harandi, and B. C. Lovell. K-tangent spaces on Riemannian manifolds for improved pedestrian detection. In IEEE Int. Conf. Imag e Pr ocessing (ICIP) , pages 473–476, 2012. [28] A. Sanin, C. Sanderson, M. T . Harandi, and B. C. Lovell. Spatio- temporal covariance descriptors for action and gesture recognition. In IEEE W orkshop on Applications of Computer V ision (WA CV) , pages 103–110, 2013. [29] W . R. Schwartz and L. S. Davis. Learning discriminati ve appearance- based models using partial least squares. In SIBGRAPI , pages 322– 329, 2009. [30] J. Shawe-T aylor and N. Cristianini. K ernel Methods for P attern Anal- ysis . Cambridge Univ ersity Press, 2004. [31] Q. Shi, C. Shen, R. Hill, and A. v . d. Hengel. Is margin preserved after random projection? ICML , 2012. [32] R. Siv alingam, D. Bole y , V . Morellas, and N. P apanikolopoulos. T en- sor sparse coding for region covariances. Eur opean Confer ence on Computer V ision (ECCV) , pages 722–735, 2010. [33] S. Sra. Positive definite matrices and the symmetric Stein diver gence. Pr eprint: [arXiv:1110.1773] , 2012. [34] S. Sra and A. Cherian. Generalized dictionary learning for sym- metric positiv e definite matrices with application to nearest neigh- bor retriev al. In Machine Learning and Knowledge Discovery in Databases , pages 318–332. Springer , 2011. [35] O. Tuzel, F . Porikli, and P . Meer. Pedestrian detection via classifi- cation on Riemannian manifolds. IEEE Tr ans. P attern Analysis and Machine Intelligence , 30(10):1713–1727, 2008. [36] A. V eeraragha v an, A. Roy-Chowdhury , and R. Chellappa. Match- ing shape sequences in video with applications in human movement analysis. P attern Analysis and Machine Intelligence , 27(12):1896– 1909, 2005. [37] C. Y uan, W . Hu, X. Li, S. Maybank, and G. Luo. Human action recognition under log-Euclidean Riemannian metric. In Asian Con- fer ence on Computer V ision (ACCV) , volume 5994 of Lecture Notes in Computer Science , pages 343–353. 2010.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment