Scale and curvature effects in principal geodesic analysis
There is growing interest in using the close connection between differential geometry and statistics to model smooth manifold-valued data. In particular, much work has been done recently to generalize principal component analysis (PCA), the method of…
Authors: Drew Lazar, Lizhen Lin
Scale and curv ature effects in principal geo desic analysis Drew Lazar a , Lizhen Lin b a Dep artment of Mathematics, Bal l State University b Dep artment of Applie d and Computational Mathematics and Statistics, The University of Notr e Dame Abstract There is gro wing interest in using the close connection betw een differential geom- etry and statistics to mo del smo oth manifold-v alued data. In particular, muc h w ork has b een done recen tly to generalize principal comp onen t analysis (PCA), the method of dimension reduction in linear spaces, to Riemannian manifolds. One suc h generalization is known as principal geodesic analysis (PGA). This pap er, in a nov el fashion, obtains T aylor expansions in scaling parameters in- tro duced in the domain of ob jective functions in PGA. It is sho wn this technique not only leads to b etter closed-form appro ximations of PGA but also reveals the effects that scale, curv ature and the distribution of data ha ve on solutions to PGA and on their differences to first-order tangent space approximations. This approac h should b e able to b e applied not only to PGA but also to other gener- alizations of PCA and more generally to other intrinsic statistics on Riemannian manifolds. Keyw ords. Curv ature effects; data scaling; diffusion tensors; dimension reduc- tion; manifold-v alued statistics; principal geo desic analysis (PGA), symmetric spaces. 1. In tro duction Principal comp onen t analysis (PCA) is an imp ortan t statistical metho d for dimension reduction and exploration of the v ariance structure of data in a lin- ear space. PCA has b een generalized to data in smo oth manifolds in v arious principal geo desic pro cedures in whic h pro jections are done to explanatory sub- manifolds whic h serve as non-linear analogues of the linear subspaces of PCA. I c 2016. This manuscript version is made av ailable under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/ II DOI link: http://dx.doi.org/10.1016/j.jmva.2016.09.009 Email addr esses: dmlazar@bsu.edu (Drew Lazar), lizhen.lin@nd.edu (Lizhen Lin) Pr eprint submitte d to Elsevier Octob er 6, 2016 Princip al ge o desic analysis (PGA), as in tro duced in [10], successively iden- tifies orthogonal explanatory directions in the tangen t space at the intrinsic mean of data and then exp onentiates the span of the results to form explana- tory submanifolds. In [10] first-order tangent space approximations of PGA w ere formulated. Subsequently metho ds for exact computation of PGA in sp e- cific manifolds were offered as in [17] and [27]. Then in [29], using the deriv ative of the exp onential map and ODEs if necessary in gradien t descent algorithms, pro cedures to find exact solutions in a general class of manifolds were outlined. As p oin ted out in [28], how ever, exact computation of PGA can be compu- tationally complex and time-intensiv e, and thus there is interest in determining the accuracy and effectiveness of first-order appro ximations to PGA. This will dep end on the distribution of data and its disp ersion from the tangent space, the curv ature and shap e of the manifold in question and the interaction of these factors. F or illustration, as in [28], consider the position of the “wrist” of a mo ving rob otic arm while its “elb o w” and “b ody” are fixed. In Figure 1 the motion is restricted to a tw o-dimensional surface. T o analyze the mov ement of the wrist one might collect motion capture data as represen ted by the red dots in the figure. F orm ulating the surface as a Riemannian manifold and using intrinsic distances, an intrinsic mean of the data, µ , might b e located. Then a geo desic through µ , represented by the blue curve on the surface, that b est fits the data or b est accounts for the data’s v ariability migh t b e identified. One can find a linear direction of maxim um v ariability , the unit vector v 1 , 0 in Figure 1, of data pro jected by the Riemannian log map to the tangent space at the in trinsic mean. v 1 , 0 will b e an appro ximation of the unit vector tangen t to the geo desic v 1 . Generally the greater the lo cal curv ature of the surface the less accurate this appro ximation will b e with scale of the data or its disp ersion from the intrinsic mean augmenting this effect. Conv ersely , pro jections to the tangen t space will con verge to the data, in trinsic distances will conv erge to tangent space distances and v 1 , 0 will con verge to v 1 as the data dra ws in tow ards µ . In this pap er w e quantify such e ffects by introducing scaling parameters on pro jections of data to the tangent space and by obtaining T aylor expansions of solutions to PGA pro cedures in these parameters. Leading terms, suc h as v 1 , 0 in Figure 1, will originate from the Euclidean structure in the tangen t space. Next-order terms will demonstrate how lo cal curv ature and scale interact to con tribute to differences b et ween first-order approximations and exact solutions. This not only allows for more accurate closed-form approximations of PGA but should also contribute to a b etter understanding of the parts of PGA and corresp onding statistics. In this pap er data in three types of symmetric spaces whic h hav e regular application are considered. Also using [17, 27, 29] we can compute exact solutions in these spaces whic h allo ws for comparison and testing. 1.1. Outline Section 2 includes notations and definitions. In Section 3 a prop osition which allo ws the expansion of PGA directions in this paper is stated and prov ed. In 2 Figure 1: First PGA motion capture data Section 4 we review the geometry of the n -spheres and obtain and test expan- sions using our prop osition. W e also carry out exp erimen ts on data sampled from an anisotropic log-normal distribution on the unit n -sphere to show im- pro ved approximations. In Section 5 we review the geometry of the space of p ositiv e definite matrices and obtain expansions using our proposition and com- puter algebra. In Section 6 we review the geometry of the special orthogonal group and obtain expansions of PGA in this space. Also, in Section 6.3 w e take a closer look at PGA in Lie groups in [11] to show ho w expansions can give insight in to the form ulation of such intrinsic manifold statistics. In Section 7, using ex- pansions, we obtain improv ements of the line ar differ enc e indic ators in tro duced in [28]. In Section 8 we discuss the results and consider their applications in similar con texts. 2. Notations and Definitions Let M b e a Riemannian manifold with Riemannian metric p → h , i p for p ∈ M . Giv en p ∈ M , T p M is the tangent sp ac e at p . The unit sphere at T p M is then S p M = { X ∈ T p M ; h X, X i p = 1 } . The Riemannian exp onential and Ri emannian lo g maps , are denoted by Exp p : T p M 7→ M and Log p : M 7→ T p M , resp ectiv ely . Given smo oth manifolds M 1 and M 2 , p ∈ M 1 and smooth mapping λ : M 1 → M 2 w e denote the differ ential of λ at p b y d p λ . Then giv en smo oth function f : M → R and p ∈ M the gr adient of f at p is denoted ∇ p f so that h∇ p f , X i p = d p f ( X ) for all X ∈ T p M . Differen tial geometry texts [6] and [25] pro vide a background for and definitions of these concepts. 3 All the manifolds we will deal with in the pap er will b e of the class defined b elo w. Definition 1. Let M b e a connected Riemannian manifold. M is a symmetric sp ac e if and only if for ev ery p ∈ M there exists an isometry φ p : M , such that φ p ( p ) = p and ∀ X ∈ T p M d p φ p ( X ) = − X. The tensors defined below are used to measure curv ature in Riemmanian manifolds. Definition 2. The curvatur e tensor is the (1 , 3) tensor given as R( x, y ) w = ∇ x ∇ y w − ∇ y ∇ x w − ∇ [ x,y ] w for vector fields x, y , w where ∇ is the co v arian t deriv ativ e and [ · , · ] is the Lie brac ket. The Riemann curvatur e tensor , also denoted by R, is the (0 , 4) tensor giv en, for all x, y , w , z ∈ T p M , as R p ( x, y , w , z ) = h R( x, y ) w, z i p . The Riemann curv ature tensor characterizes the more geometrical descrip- tion of curv ature given b elow. Definition 3. The se ctional curvatur e of σ v,q = span( { v , q } ) is given, for all v , q ∈ T p M , as K ( σ v,q ) = R p ( q , v , v , q ) h q , q i p h v , v i p − h q , v i 2 p . (1) The sectional curv ature is the Gaussian curv ature of Exp p ( σ v,q ). Also, as in [4, p. 16] the Riemann curv ature tensor is completely characterized by the sectional curv ature. The following definition generalizes orthogonal pro jection in an inner prod- uct space to pro jection to submanifolds. Definition 4. Let V k = { v 1 , . . . , v k } ⊂ T µ M and H ( V k ) = Exp µ { span( V k ) } with µ ∈ M . F or p ∈ M the pr oje ction of p to H ( V k ) is π H ( V k ) ( p ) = argmin x ∈ H ( V k ) d( x, p ) . As in [29], existence of pro jection is guaranteed when H ( V k ) is compact whic h will b e the case when pro jecting in the sp ecial orthogonal group and in the n -spheres in this pap er. Then as in [16] w e will ha ve uniqueness of pro jection almost ev erywhere in these manifolds. Throughout we will assume uniqueness of pro jection in the sp ecial orthogonal group and in the n -spheres. Also, with the p ositiv e definite matrices having non-p ositiv e curv ature, as in [10] there is existence and uniqueness of pro jection for every H ( V k ). The first intrinsic statistic we formulate is a generalization of the arithmetic mean in an inner pro duct space. A general notion of a mean of a probability 4 distribution on a metric space w as first due to [12]. The following can be viewed as a F r ´ ec het mean with respect to the empirical distribution on { p 1 , . . . , p N } and the in trinsic distance d( · , · ). Definition 5. Let D = { p 1 , . . . , p N } ⊂ M . The intrinsic me an of D is µ ( D ) = argmin x ∈ M 1 N N X i =1 d( x, p i ) 2 . The in trinsic mean is used as an offset in the following definition. Definition 6. Given data D = { p 1 , . . . , p N } ⊂ M with intrinsic mean µ ( D ), the intrinsic variance of D is σ 2 ( D ) = 1 N N X i =1 d { µ ( D ) , p i } 2 . Princip al ge o desic analysis (PGA), as introduced in [10], is generalization of principal comp onen t analysis (PCA) and is defined b elo w. Definition 7. Let D = { p 1 , . . . , p N } ⊂ M and K = dim(T µ ( D ) M ). PGA lo cates { v 1 , . . . , v K } ⊂ S µ ( D ) M such that v 1 = argmin k v k =1 1 N N X i =1 d { p i , π H 1 ( v ) ( p i ) } 2 with H 1 ( v ) = Exp µ ( D ) { span( v ) } v 2 = argmin k v k =1 , v ∈ C 1 1 N N X i =1 d { p i , π H 2 ( v ) ( p i ) } 2 . . . . . . v K = argmin k v k =1 , v ∈ C K − 1 1 N N X i =1 d { p i , π H K ( v ) ( p i ) } 2 where V j = { v 1 , . . . , v j } , C j = span( V j ) ⊥ and H j ( v ) = Exp µ ( D ) [span( { V j − 1 ∪ v } )] for j = 2 , . . . , K . Ob jective functions in PCA are sum of squared distances of data to their orthogonal pro jections to linear subspaces. The symmetric, linear op erator defined b elo w is the gradient of the ob jective function in PCA. Definition 8. Given { q 1 , . . . , q N } ∈ T µ M define L : T u M 7→ T u M , L( v ) = 2 N N X i =1 h q i , v i µ q i . Throughout we assume that { q 1 , . . . , q N } is distributed so that the eigen- v alues of L are distinct. Th us we can assume that the eigenv ectors of L form an orthonormal set. W e denote the eigenv ectors of L b y u 1 , . . . , u K with corre- sp onding eigenv alues β 1 , . . . , β K giv en in descending order by magnitude. 5 3. Expansion of PGA directions Let S n r , P ( n ) and S O ( n ) denote the n -sphere of radius r , the space of p ositive definite matrices and the sp ecial orthogonal group formulated as Riemannian manifolds as in Sections 4.1, 5.1, and 6.1, resp ectively . Using the ab ov e three spaces as working examples, in the following Prop osition we explore the effects of scaling the Riemannian logs of the data in the tangent space. Assume M in the pr op osition in this se ction is one of these sp ac es. W e let > 0 b e the sc aling p ar ameter . The disp ersion of data from the in trinsic mean, µ ( D ), dep ends on the norm of each data p oin t’s Riemannian log in the tangent space at µ ( D ). As the data becomes more greatly disp ersed in this manner the effects of curv ature a wa y from the tangent space should b ecome more significant to the solutions to PGA and its comp onents. By simultane- ously scaling the Riemannian logs of all the data by and b y obtaining T aylor expansions in this parameter w e can discern and insp ect this effect. In practice, suc h as in the sim ulations in Section 4.3 or in the impro vemen t ov er the linear difference indicators in Section 7, we can tak e = 1 with the norms of the Riemannian logs of the data determining the data’s scale. Prop osition. Let µ ∈ M , { q 1 , . . . , q N } ∈ T µ M and u 1 , . . . , u K b e the eigen- v ectors of cov ariance operator L as in Definition 8. Also letting, for all i ∈ { 1 , . . . , N } , p i, = Exp µ ( q i ) let D = { p j, } j . Assume µ ( D ) = µ and that V K, = { v 1 ( ) , . . . , v K ( ) } is the set of PGA directions of D for all 6 = 0. F urther, let f k ( v , ) b e the ob jectiv e function for v k ( ) in Definition 7 for k = 1 , . . . , K , i.e., f k ( v , ) = 1 N N X i =1 d { p i, , π H k ( v ) ( p i, ) } 2 . Then provided pro jection as in Definition 4, is unique, f k ( v , ) is even in and w e expand f k ( v , ) = f k, 2 ( v ) 2 + f k, 4 ( v ) 4 + O ( 6 ) . (2) Let g k, 4 ( v ) b e as f k, 4 ( v ) with u 1 , . . . , u k − 1 in place of previous PGA direc- tions, v 1 ( ) , . . . , v k − 1 ( ). Also, for j > k let α k = ∇ u k g k, 4 and α k,j = h α k , u j i = d u k g k, 4 ( u j ) . (3) If C is the K × K sk ew-symmetric matrix C = ( c k,j ) where, for j > k , c k,j = α k,j β j − β k 6 expanding v k ( ) = v k, 0 + v k, 2 2 + O ( 4 ) then yields, for all k ∈ { 1 , . . . , K } , v k ( ) = v k, 0 + v k, 2 2 + O ( 4 ) = u k + K X j =1 c k,j u j 2 + O ( 4 ) . Proof. The pro of is by induction. The base case can b e shown in a similar manner as the induction step. Th us we let k > 1, assume the prop osition holds for the first k − 1 PGA directions V k − 1 , = { v 1 ( ) , . . . , v k − 1 ( ) } and then show it holds for v k ( ). As in Sections 4.1, 5.1, and 6.1, M is a symmetric space. It thus follo ws that f k ( v , ) is ev en in as we assume pro jection is unique and the mapping ι : M , ι ( p ) = Exp µ {− Log µ ( p ) } for p ∈ M (4) is an isometry . In Riemannian manifold M intrinsic distances b et ween p oints lo cal to µ are Euclidean distances b etw een Riemannian logs of these p oin ts in T µ M and thus w e hav e f k, 2 ( v ) = 1 N N X i =1 h q i , q i i − k − 1 X j =1 h q i , u j i 2 − h q i , v i 2 . (5) as the leading term of f k ( v , ). Computing a gradient ∇ v f k, 2 = − 2 N N X i =1 h q i , v i q i = − L( v ) so that ∇ v f k ( ) = − L( v ) 2 + ∇ v f k, 4 4 + O ( 6 ) (6) Using Lagrange m ultipliers ( − λ 1 , . . . , − λ k − 1 , − (1 / 2) λ k ) w e hav e ∇ v k ( ) f k ( ) = − λ 1 v 1 ( ) − · · · − λ k v k ( ) (7) with constrain ts h v 1 ( ) , v k ( ) i = · · · = h v k − 1 ( ) , v k ( ) i = 0 , h v k ( ) , v k ( ) i = 1 . Expand v k ( ) in , viz. v k ( ) = v k, 0 + v k, 2 2 + O ( 4 ) . (8) Substituting this expansion and the expansions of { v 1 ( ) , . . . , v k − 1 ( ) } in the constrain ts and equating co efficients in orders of gives h v k, 0 , v j, 0 i = 0 , h v k, 0 , v j, 2 i = − h v k, 2 , v j, 0 i for j = 1 , . . . , k − 1 , h v k, 2 , v k, 0 i = 0 and h v k, 0 , v k, 0 i = 1 . (9) 7 F or each j ∈ { 1 , . . . , k } , expand the Lagrange multiplier λ j = λ j, 0 + λ j, 2 2 + λ j, 4 4 + O ( 6 ) . (10) In computations in Sections 4.2, 5.3, and 6.2 we ha ve ∇ v k ( ) f k, 4 = ∇ v k, 0 g k, 4 + O ( 2 ) . Using this and substituting expansions (8), (10) and of { v 1 ( ) , . . . , v k − 1 ( ) } in (7), using (6) and equating co efficients in orders of gives λ 1 , 0 v 1 , 0 + · · · + λ k, 0 v k, 0 = 0 ( ⇒ λ 1 , 0 = · · · = λ k, 0 = 0) λ 1 , 2 v 1 , 0 + · · · + λ k, 2 v k, 0 = L( v k, 0 ) (11) X k j =1 ( λ j, 2 v j, 2 + λ j, 4 v j, 0 ) = −∇ v k, 0 g k, 4 + L( v k, 2 ) . (12) As L is a symmetric linear op erator, (11) and (9) give λ j, 2 = h L( v k, 0 ) , v j, 0 i = h v k, 0 , L( v j, 0 ) i = β j h v k, 0 , v j, 0 i = 0 (13) for eac h j ∈ { 1 , . . . , k − 1 } . Th us by (11), L( v k, 0 ) = λ k, 2 v k, 0 and v k, 0 is a normalized eigenv ector of L with eigen v alue λ k, 2 = 2 N N X i =1 h v k, 0 , q i i 2 . F urther, letting U k − 1 = { u 0 , . . . , u k − 1 } , as ∀ 6 =0 ∀ v ∈ S V ⊥ k − 1 , f k { v k ( ) , } ≤ f k ( v , ) using (5) ∀ v ∈ S U ⊥ k − 1 1 N N X i =1 h q i , v k, 0 i 2 ≥ 1 N N X i =1 h q i , v i 2 . Th us v k, 0 is the dominant normalized eigenv ector of L so that v k, 0 = u k and β k = λ k, 2 . Then with α k as in (3), b y ab ov e, (12), and (13) we hav e (L − β k ) v k, 2 = k X j =1 λ j, 4 u j + α k . (14) Consider the orthonormal expansion of v k, 2 , viz. v k, 2 = K X j =1 ,j 6 = k c k,j u j . (15) F or j < k , using (9) and the form of the expansions of { v 1 ( ) , . . . , v k − 1 ( ) } c k,j = h v k, 2 , u j i = − h u k , v j, 2 i = − c j,k . 8 Also, substituting (15) into (14) gives K X j =1 ,j 6 = k c k,j ( β j − β k ) u j = k X j =1 λ j, 4 u j + α k so that for j > k c k,j = h α k , u j i / ( β j − β k ) = α k,j / ( β j − β k ) . Th us v k ( ) has the form given in the prop osition and the prop osition holds. Remark 3.1. In v k ( ) , v k, 0 = u k minimizes f k, 2 ( v ) sub ject to h v , v i = 1. Then v k, 2 2 is the first term whic h accounts for f k, 4 ( v ) 4 in the ob jective function. F or j > k the numerator of c k,j , α k,j , reflects the sensitivity of f k, 4 ( v ) ev aluated locally , that is, g k, 4 ( v ), to a change in direction from u k to wards u j with a greater magnitude giving a greater “b enefit” in minimizing f k, 4 ( v ) in the ob jective function. The magnitude of the denominator, which is the difference of the shares of the data’s v ariabilit y in the tangen t space accounted for b y u j and u k , resp ectiv ely , reflects the “cost,” with respect to the minimization of f k, 2 ( v ), of this change in direction. F urther, for j < k with v k ( ) minimizing the sum of square residuals after v 1 ( ) , . . . , v k − 1 , α k,j accoun ts for the sensitivity of g j, 4 ( v ) to a change from u j in the direction of u k but with an opp osite sign than α j,k making C skew-symmetric. Unlik e the effect of the scaling parameter, , which is made explicit with the PGA expansion, the role of curv ature on the PGA directions is more subtle. In f k ( v , ) , f k, 4 ( v ) is the first co efficien t present due to non-linearity and thus generally greater curv ature will create greater α k,j 0 s . Greater scale of the data as measured by the norms of the Riemannian logs of the data and b y augments this effect with v k ( ) = v k, 0 + v k, 2 2 + O ( 4 ). Although the direct characterization of dep endence on curv ature for general symmetric spaces is not av ailable, for all of the examples w e considered, as in (18), (19), (26), (29), curv ature app ears explicitly in this manner. 4. PGA in S n r W e first examine the role of scale in PGA in the n -spheres. W e denote the n -spheres of radius r by S n r . Spherical data occurs in directional statistics, in preshap es in shap e analysis, in text mining, in cluster analysis and others as in [1], [14], [17], and [23]. W e obtain the expansions of PGA directions in S n r according to the prop osition in Section 3, test these expansions and then apply them to sim ulated data to show improv ed approximations of PGA directions. 4.1. S n r as a Riemannian manifold W e hav e the identification T p S n r ≡ { v ∈ R n +1 ; h v , p i = 0 } . 9 On S n r w e use the Riemannian metric induced b y the em b edding S n r → R n +1 , i.e., for any p ∈ S n r and X, Y ∈ T p S n r , h X , Y i is the dot pro duct. The geo desics in S n r are great circles and the Riemannian exp onential map in S n r is directly computed as b elo w. Exp p ( X ) = cos k X k r p + r sin k X k r X k X k for X ∈ { Y ∈ T p S ( n ); k Y k < π r } . Then Log p ( q ) = ( 0 , for h p, q i = r 2 , r arccos h p, q i /r 2 q −h p,q i p/r 2 k q −h p,q i p/r 2 k for | h p, q i | < r 2 . (16) Giv en p, q ∈ S n r d( p, q ) = Log p ( q ) = r arccos h p, q i /r 2 , that is, the distance b et ween p and q is the ordinary spherical distance. As in [7], S n r is a symmetric space with the symmetry at an y p oin t p ∈ S n r pro vided by reflection ov er the line containing p in R n +1 . Also, w e hav e K ( σ v,q ) = 1 /r 2 for an y plane σ v,q ⊂ T p S n r . 4.2. Exp ansion of PGA dir e ctions in S n r As in [17] the pro jection op erator in S n r has closed-form. Giv en p, µ ∈ S n r and an orthonormal set V k = { v 1 , . . . , v k } ⊂ T µ S n r , set v 0 = µ/r . With p / ∈ span( µ ∪ V k ) ⊥ so that pro jection is unique then π H ( V k ) ( p ) = r w/ k w k , (17) where w = h v 0 , p i v 0 + · · · + h v k , p i v k . That is, pro jection of p is first done to span( µ ∪ V k ) in R n +1 and the result scaled to obtain pro jection to the hyper- sphere H ( V k ). Let q ∈ T µ S n r , 6 = 0 , p = Exp µ ( q ) and, for all j ∈ { 1 , . . . , k } , t j ( ) = Log µ { π H ( V k ) ( p ) } , v j , so that π H ( V k ) ( p ) = Exp µ k X j =1 t j ( ) v j . By using (16) and (17), taking an inner pro duct and computing T aylor expan- sions w e obtain the following exp ansion of pr oje ction c o efficients in S n r : t m ( ) = h q , v m i + h q , v m i 3 r 2 h q , q i 2 − k X j =1 h q , v j i 2 3 + O ( 5 ) = cos θ m k q k + cos θ m 3 r 2 1 − k X j =1 cos 2 θ j k q k 3 3 + O ( 5 ) (18) 10 for eac h m ∈ { 1 , . . . , k } , where θ m the angle formed b y q and v m . As in the prop osition in Section 3 assume µ ∈ S n r , q i ∈ T µ S n r and p i, = Exp µ ( q i ) for each i ∈ { 1 , . . . , N } with V k − 1 , = { v 1 ( ) , . . . , v k − 1 ( ) } the first k − 1 PGA directions. Using the prop osition w e find v k ( ). Let v ∈ S µ S n r , V k, = { V k − 1 , ∪ v } and t i,j = Log µ { π H ( V k, ) ( p i, ) } , v j and t i,k = Log µ { π H ( V k, ) ( p i, ) } , v for all i ∈ { 1 , . . . , N } and j ∈ { 1 , . . . , k − 1 } . Our ob jective function is f k ( v , ) = 1 N N X i =1 r 2 acos 2 Exp µ ( t i, 1 v 1 + · · · + t i,k v ) , Exp µ ( q i ) r 2 = 1 N N X i =1 r 2 acos 2 ( cos r X k j =1 t 2 i,j r ! cos k q i k r + sin r X k j =1 t 2 i,j r ! sin k q i k r P k − 1 j =1 h t i,j v j , q i i + h t i,k v , q i i k q i k q P k j =1 t 2 i,j ) . Then by taking T aylor expansions in and with f k, 4 ( v ) as in (2) in the prop o- sition in Section 3 w e obtain f k, 4 ( v ) = 1 3 N r 2 N X i =1 X k − 1 j =1 h q i , v j i 2 + h q i , v i 2 × X k − 1 j =1 h q i , v j i 2 + h q i , v i 2 − h q i , q i i . With α k,m as in (3), taking a gradient and ev aluating giv es the following exp an- sion of v 0 k s in S n r . α k,m = h∇ u k g k, 4 , u m i = 2 3 N r 2 N X i =1 X k j =1 2 h q i , u j i 2 − h q i , q i i h q i , u k i h q i , u m i = 2 3 N r 2 N X i =1 k q i k 4 X k j =1 2 cos 2 θ i,j − 1 cos θ i,k cos θ i,m , (19) where θ i,a is the angle formed by q i and u a for all i, a and which is used in the prop osition to obtain, for each k ∈ { 1 , . . . , n } , the expansion v k ( ) = v k, 0 + v k, 2 2 + O ( 4 ) . Let S n b e the unit sphere S n 1 . In Figure 2 the expansions of PGA directions obtained ab o ve are tested in S 10 . With µ ∈ S 10 , 50 tangent v ectors are sampled from T µ S 10 with the en tries of { q i } having normal distributions and v ariances 11 v arying by en try so that principal geo desic directions can b e iden tified. W e then tak e D = { p i, } i = { Exp µ ( q i ) } i . PGA directions are lo cated in MA TLAB b y using fixed-point algorithms such as those in [17] used to compute princip al c omp onent ge o desics . Ln-ln plots are shown for v 1 ( ) , v 2 ( ) , v 4 ( ) and v 9 ( ). Similar plots w ere obtained for expansions of the other PGA directions. − 3 − 2 . 8 − 2 . 6 − 2 . 4 − 2 . 2 − 2 − 1 . 8 − 1 . 6 − 1 . 4 − 1 . 2 − 1 − 0 . 8 − 0 . 6 − 8 − 7 − 6 − 5 − 4 Slope = 1 . 98 ≈ 2 Intercept = − 2 . 17 ≈ ln( k v 1 , 2 k ) = − 2 . 12 ln ln ( k v 1 ( ) − v 1 , 0 k ) (a) ln-ln, v 1 ( ) − 3 − 2 . 8 − 2 . 6 − 2 . 4 − 2 . 2 − 2 − 1 . 8 − 1 . 6 − 1 . 4 − 1 . 2 − 1 − 0 . 8 − 0 . 6 − 8 − 7 − 6 − 5 − 4 Slope = 1 . 97 ≈ 2 Intercept = − 2 . 22 ≈ ln( k v 2 , 2 k ) = − 2 . 13 ln ln ( k v 2 ( ) − v 2 , 0 k ) (b) ln-ln, v 2 ( ) − 3 − 2 . 8 − 2 . 6 − 2 . 4 − 2 . 2 − 2 − 1 . 8 − 1 . 6 − 1 . 4 − 1 . 2 − 1 − 0 . 8 − 0 . 6 − 8 − 7 − 6 − 5 − 4 Slope = 1 . 99 ≈ 2 Intercept = − 2 . 24 ≈ ln( k v 5 , 2 k ) = − 2 . 23 ln ln ( k v 5 ( ) − v 5 , 0 k ) (c) ln-ln, v 5 ( ) − 3 − 2 . 8 − 2 . 6 − 2 . 4 − 2 . 2 − 2 − 1 . 8 − 1 . 6 − 1 . 4 − 1 . 2 − 1 − 0 . 8 − 0 . 6 − 10 . 5 − 10 − 9 . 5 − 9 − 8 . 5 − 8 − 7 . 5 − 7 − 6 . 5 − 6 − 5 . 5 Slope = 2 . 01 ≈ 2 Intercept = − 4 . 29 ≈ ln( k v 9 , 2 k ) = − 4 . 30 ln ln ( k v 9 ( ) − v 9 , 0 k ) (d) ln-ln, v 9 ( ) Figure 2: T ests of PGA expansions in S 15 4.3. Appr oximations of PGA on Simulate d Data in S n In this section w e sample data from a generalization of the log-normal distri- bution in S 15 . Sp ecifically we sample from X where X ∼ Exp µ ( N (0 , κ Σ)) with µ ∈ S 15 , κ > 0 and Σ a fixed p ositiv e definite matrix. Σ has distinct eigenv alues with the largest eigenv alue having 20 times the magnitude of the smallest so that our distribution on S 15 is anisotropic and the population PGA directions exist. W e compare PGA directions to approximations while v arying κ whic h p ositiv ely correlates with the exp ected disp ersion of the data from its mean. In T able 1 for each v alue of κ 100 data p oints are sampled in 5 runs each. F or each run all 14 PGA directions are computed. Also, their leading- and next- order appro ximations in the scale of the data are lo cated, that is, v k, 0 (using 12 PCA in the tangen t space) and v k, 0 + v k, 2 , resp ectively , with = 1 as the norms of the Riemannian logs of the data scale the approximations. Across all runs and for each v alue of κ the means of the angles in radians the approximations mak e (m.est. θ 0 and m.est. θ 2 ) with the exact PGA directions and the mean of the norms of the Riemannian logs of the data (m.scale) are computed. Also, as prop osed in [29], an iterative algorithm to lo cate PGA direction v k can b e initialized by doing PCA with pro jections of the Riemannian logs of the data in to span( V ⊥ k − 1 ). W e compare this method with initialization by pro jection of the next-order appro ximations in to span( V ⊥ k − 1 ) by computing the means of the angles these initializations make (m.init. θ 0 and m.init. θ 2 , resp ectiv ely) with the v 0 j s across the 5 runs for each v alue of κ . κ 1 .850 .700 .550 .400 .250 .100 .050 m.scale 1.4684 1.3361 1.2278 1.1867 0.9784 0.7472 0.4843 0.3399 m.est. θ 0 0.3400 0.2558 0.2327 0.1659 0.1442 0.0921 0.0576 0.0190 m.est. θ 2 0.2654 0.2323 0.1595 0.0959 0.0541 0.0371 0.0215 0.0017 m.init. θ 0 0.2329 0.1717 0.1573 0.1146 0.0972 0.0613 0.0347 0.0123 m.init. θ 2 0.1882 0.1528 0.1014 0.0636 0.0347 0.0225 0.0120 0.0010 m.scale = mean of norms of Riemannian logs m.est. θ 0 , est. θ 2 = mean angles of leading and next order estimates w/ v 0 j s m.init. θ 0 , init. θ 2 = mean angles of leading and next order initializations w/ v 0 j s T able 1: Estimates of PGA in S 15 from Log-normal samples A t κ = 1 the data is nearly uniformly distributed which is reflected in m.scale = 1.4684 nearly π / 2. Even at κ = 1 the next-order estimates are an impro vemen t ov er the leading-order estimates for this distribution. Note that the next-order estimates holding for data disp ersed significantly from the the mean is also reflected in the plots with nearly correct in tercepts at ln( ) = 0 ⇒ = 1 in Figure 2. As κ decreases and the samples draw in to wards their computed means both estimates impro ve with the next-order estimates impro ving more sharply as they should. Also, pro jecting the next-order estimates into span( V ⊥ k − 1 ) provides an im- pro vemen t here ov er doing PCA in span( V ⊥ k − 1 ) with m.init. θ 2 < m.init. θ 0 for all v alues of κ . In Figure 3, κ = . 4 and for each PGA direction we plot the v alues of est. θ 0 , est. θ 2 and init. θ 0 , init. θ 2 whic h are means across 5 runs. Similar plots were obtained for other v alues of κ . 5. PGA in P ( n ) W e denote the space of p ositiv e definite matrices by P ( n ). The imaging tec hnology diffusion tensor MRI as in [2] pro duces data known as diffusion tensors in P (3). In [10] PGA in Definition 7 w as prop osed to allow the prop er 13 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 5 · 10 − 2 0 . 1 0 . 15 0 . 2 0 . 25 0 . 3 0 . 35 j θ with v j est. θ 0 est. θ 2 (a) Estimates of v 0 j s 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 0 . 05 0 . 1 0 . 15 0 . 2 0 . 25 0 . 3 0 . 35 0 . 4 0 . 45 0 . 5 j init. θ 0 init. θ 2 (b) Initializations of v 0 j s Figure 3: Estimates of PGA in S 15 with κ = . 4 analysis of statistical v ariabilit y of diffusion tensor data by formulating P (3) as a manifold and generalizing PCA. In this section using the prop osition in Section 3 and computer algebra we obtain expansions of PGA directions in P ( n ). W e test the expansions and we tak e a closer lo ok at the geometry of the first pro jection co efficient. 5.1. P ( n ) as a R iemannian manifold As in [22], P ( n ) is an op en set in the vector space of n × n symmetric matrices. Thus, we hav e the identification T p P ( n ) ≡ n × n symmetric matrices . Denote the general linear group by GL ( n ). Consider the follo wing action on P ( n ) , ϕ ϕ : GL ( n ) × P ( n ) → P ( n ) , ϕ ( g, p ) = ϕ g ( p ) = g pg > . (20) As in [3] this action is transitive. W e hav e the following Riemannian metric on P ( n ) for which ϕ g is an isometry up to a p ositiv e scalar multiple. h X, Y i p = (1 / 2)tr p − 1 X p − 1 Y for p ∈ P ( n ) , X, Y ∈ T p P ( n ) and where tr denotes the matrix trace. As in [3] setting M = P ( n ) and φ p ( q ) = pq − 1 p for q ∈ P ( n ) in Definition 1 mak es P ( n ) a symmetric space. F or vector fields x, y on P ( n ) , [ x, y ] = xy − y x is the c ommutator of x, y . Also, as P ( n ) is a symmetric space, by [25], we hav e R( x, y ) z = [ z , [ x, y ]] (21) As in [3], at I ∈ P ( n ), Exp I is the matrix exp onen tial function and we hav e the follo wing Riemannian exp onential map on P ( n ), for X ∈ T I P ( n ), Exp I ( X ) = ∞ X i =0 X i /i ! (22) 14 and for an y p = g g > ∈ P ( n ), where g ∈ GL ( n ) we hav e Exp p ( X ) = φ g [Exp I { d p φ g − 1 ( X ) } ] = g Exp I { g − 1 X ( g − 1 ) > } g > for X ∈ T p P ( n ). Also in [3], for all p ∈ P ( n ), Exp p is bijective and thus Log p is well-defined. 5.2. Computer Algebr a Expansions in this section and in Sections 6.3 and 7 were obtained with the help of the Maxima computer algebra system. A matrix T aylor function and a function with the distributive and cyclic prop erties of the trace were co ded to compute expansions in v olving Log I and Exp I in . This co de along with the data sets used in this pap er are av ailable at https://github.com/DMLazar/PGAScale . 5.3. Exp ansion of PGA dir e ctions in P ( n ) By emplo ying the transitive action b y isometries in (20) PGA in P ( n ) can b e carried out in the tangen t space at the iden tity . Let v ∈ S I P ( n ) , q ∈ T I P ( n ) and p = Exp I ( q ) for 6 = 0. T o pro ject p to H ( v ) = Exp I { span( v ) } find t ( ) = argmin s ∈ R d { Exp I ( sv ) , Exp I ( q ) } 2 (23) Define O ,s ( ) as f ( s, ) is O ,s ( ) ⇔ f ( s, ) ≤ ` X k =0 A k k s ` − k for some A 1 , . . . , A ` ∈ R . Then let g ( s, ) = Exp I ( − sv / 2)Exp I ( q )Exp I ( − sv / 2) and setting h ( s, ) as the cost function in (23) and expanding h ( s, ) = (1 / 2)tr[Log I { g ( s, ) } Log I { g ( s, ) } ] = 2 tr ( q q ) 2 + s 2 − s tr ( q v ) + tr { q 2 v 2 − ( q v ) 2 } 2 s 2 12 + O ,s (6) . (24) Using (4), t ( ) is o dd in and we expand t ( ) = t 1 + t 3 3 + O ( 5 ) for some t 1 , t 3 ∈ R . Solving for t 1 and t 3 in ∂ h s ( t ( ) , ) ∂ s = 0 giv es t ( ) = 1 2 tr ( q v ) + 1 24 tr ( q v ) tr { ( q v ) 2 − q 2 v 2 } 3 + O ( 5 ) . (25) 15 Using (1) in (25) we hav e the exp ansion of a pr oje ction (to a ge o desic) c o ef- ficient in P ( n ) b elow t ( ) = 1 2 tr ( q v ) + 1 24 tr ( q v ) tr { ( q v ) 2 − q 2 v 2 } 3 + O ( 5 ) = cos θ k q k + 1 12 cos θ sin 2 θ K ( σ v,q ) k q k 3 3 + O ( 5 ) , (26) where θ is the angle formed by q and v . In Figure 4 v and q are sampled from the uniform distribution on the unit sphere in T I P ( n ) and plots are generated in MA TLAB to test the expansions. − 6 − 5 . 5 − 5 − 4 . 5 − 4 − 3 . 5 − 3 − 2 . 5 − 2 − 1 . 5 − 1 − 0 . 5 0 − 22 − 20 − 18 − 16 − 14 − 12 − 10 − 8 − 6 − 4 − 2 Slope = 2 . 99 ≈ 3 Intercept = − 3 . 55 ≈ ln( t 3 ) = − 3 . 52 ln( ) ln | t ( ) − t 1 | (a) ln-ln plot − 2 − 1 . 5 − 1 − 0 . 5 0 0 . 5 1 1 . 5 2 − 1 . 2 − 1 − 0 . 8 − 0 . 6 − 0 . 4 − 0 . 2 0 0 . 2 0 . 4 0 . 6 0 . 8 1 1 . 2 actual linear order3 order5 (b) Approximations Figure 4: T ests of expansion of t ( ) As in Figure 5, letting k q k = 1 so that | | = d( p , I), as go es to zero the tangen t vectors become more like their exp onents and pro jection in the tangen t space b ecomes more like pro jection in the manifold. The third-order term accounts for the difference in the approximating Euclidean triangle in the tangen t space and the geodesic triangle in the manifold. As in [3] P ( n ) is of non- p ositiv e sectional curv ature. Thus cos θ sin 2 θ K ( σ v,q ) / 12 will b e non-positive for acute angle θ with greater lo cal curv ature of Exp µ { span( v , q ) } , as measured by the magnitude of K ( σ v,q ), con tributing to a less accurate approximation. W e apply the prop osition to compute the expansions of the first PGA direc- tions in P ( n ). First we hav e f 1 ( v , ) = 1 N N X i =1 d[Exp I { t i ( , v ) v } , Exp I ( q i )] 2 (27) With f 1 , 4 ( v ) as in (2) in the prop osition in Section 3, using (25) in (27) and expanding in giv es f 1 , 4 ( v ) = 1 48 N N X i =1 tr ( q i v ) 2 { tr q 2 i v 2 − tr ( q i v q i v ) } . (28) 16 0 I θ v q q ≈ 1 12 cos θ sin 2 θ K ( σ v,q ) 3 Exp I Log I e sv I e q Figure 5: Approximation of pro jection coefficient With α 1 ,j as in (3), taking a gradient and ev aluating gives the follo wing exp an- sion of v 1 ( ) in P ( n ). α 1 ,j = h∇ u 1 g 1 , 4 , u j i = (1 / 2)tr ( α 1 u j ) = 1 48 N N X i =1 tr ( q i u 1 ) 2 tr q 2 i u 1 u j + q 2 i u j u 1 − 2 q i u 1 q i u j + 2tr ( q i u 1 ) tr ( q i u j ) tr { q 2 i u 2 1 − ( q i u 1 ) 2 } = − 1 6 N N X i =1 k q i k 4 cos 2 θ i, 1 R I ( u 1 , ˜ q i , ˜ q i , u j ) + cos θ i, 1 cos θ i,j R I ( u 1 , ˜ q i , ˜ q i , u 1 ) = − 1 6 N N X i =1 k q i k 4 cos 2 θ i, 1 R I ( u 1 , ˜ q i , ˜ q i , u j ) + cos θ i, 1 cos θ i,j sin 2 θ i, 1 K ( σ q i ,u 1 ) (29) where θ i,m is the angle formed by q i and u m and ˜ q i = q i / k q i k for all i, m and whic h can b e used in the prop osition to obtain the expansion of v 1 ( ). PGA directions in addition to the first can b e found by first solving for pro- jection co efficients in systems of equations. As in (29) substituting the pro jec- tion coefficients in the PGA ob jectiv e function, taking a gradient and ev aluating giv es α k,j . A t https://gith ub.com/DMLazar/PGAScale the Maxima co de for and the form of v 2 ( ) can b e found. In Figure 6, the expansions of v 1 ( ) and v 2 ( ) in P ( n ) are tested. 75 matrices { q i } i are sampled from T I P (3) with entries distributed as in the test in Figure 2 and the data is set as D = { p i, } i = { Exp I ( q i ) } i . The computations of the pro jection op erator and principal geodesic directions are done using MA TLAB minimization routines and user-supplied gradients as form ulated in [29] with 17 the deriv ative of the matrix exp onential map provided by [24, Theorem 4.5]. − 1 . 8 − 1 . 6 − 1 . 4 − 1 . 2 − 1 − 0 . 8 − 0 . 6 − 0 . 4 − 0 . 2 − 8 − 7 . 5 − 7 − 6 . 5 − 6 − 5 . 5 − 5 − 4 . 5 − 4 Slope = 1 . 99 ≈ 2 Intercept = − 4 . 01 ≈ ln( k v 1 , 2 k ) = − 4 . 00 ln ln ( k v 1 ( ) − v 1 , 0 k ) (a) ln-ln, v 1 ( ) − 1 . 8 − 1 . 6 − 1 . 4 − 1 . 2 − 1 − 0 . 8 − 0 . 6 − 0 . 4 − 0 . 2 − 7 . 5 − 7 − 6 . 5 − 6 − 5 . 5 − 5 − 4 . 5 − 4 Slope = 1 . 99 ≈ 2 Intercept = − 3 . 84 ≈ ln( k v 2 , 2 k ) = − 3 . 82 ln ln ( k v 2 ( ) − v 2 , 0 k ) (b) ln-ln, v 2 ( ) Figure 6: T ests of PGA expansions in P ( n ) 6. PGA in S O ( n ) W e denote the sp e cial ortho gonal gr oup b y S O ( n ). S O ( n ) is the group of rigid rotations of R n . In particular it includes the r otation gr oup S O (3) in whic h data naturally arises in rob otics, computer vision and others (see [19], [21] and [30]). W e obtain expansions in S O ( n ) and apply them to the formulation of PGA in [11] for Lie groups. W e use the identification of S O (3) with S U (2) as in [27] and fixed-p oin t algorithms as in [17] to carry out PGA. 6.1. S O ( n ) as a Riemannian manifold As in [5] S O ( n ) is a closed and b ounded subset of R n 2 and is thus a compact set. Also, we hav e the identification T I S O ( n ) ≡ so ( n ) = { n × n skew-symmetric matrices } . S O ( n ) is a matrix Lie group and w e hav e the following transitive action on S O ( n ) , ϕ by left multiplication ϕ : S O ( n ) × S O ( n ) → S O ( n ) , ϕ ( g , p ) = ϕ g ( p ) = g p. W e then ha ve the Riemannian metric for which ϕ g is an isometry up to a positive scalar m ultiple. h X, Y i p = − (1 / 2)tr p − 1 X p − 1 Y (30) for p ∈ S O ( n ) , X , Y ∈ T p S O ( n ). As in P ( n ) setting M = S O ( n ) and φ p ( q ) = pq − 1 p for q ∈ S O ( n ) in Defi- nition (1) makes S O ( n ) a symmetric space. F urther, as S O ( n ) is a symmetric space w e hav e (21) as in P ( n ). 18 As in [8], at I ∈ S O ( n ), Exp I is the matrix exponential function and we ha ve the Riemannian exp onential map on S O ( n ), defined, for X ∈ { Y ∈ T I S O ( n ) such that k Y k < π } by (22) and for any p ∈ S O ( n ) we hav e Exp p ( X ) = φ p Exp I { d p φ p − 1 ( X ) } = p Exp I p − 1 X for X ∈ { Y ∈ T p S O ( n ) such that k Y k < π } . Also, using the results in [13], for any p , Exp p is a diffeomorphism on { X ∈ T p S O ( n ); k X k < π } . 6.2. Exp ansion in S O ( n ) Both P ( n ) and S O ( n ) are symmetric spaces that ha ve the matrix exp onen tial as the Riemannian exp onen tial map at I. At I, the inner products of P ( n ) and S O ( n ) are scalar multiples of the trace of matrix pro ducts of tangent vectors. Also, the action in (20), with S O ( n ) replacing GL ( n ) and P ( n ), is an action by isometries whic h is transitive as an y p ∈ S O ( n ) can b e decomp osed as p = g g where g = Exp I { (1 / 2)Log I ( p ) } . Th us, the expansions in in S O ( n ) are the expansions in Section 5.3 for P ( n ) with the inner pro duct of S O ( n ) replacing the inner pro duct of P ( n ). 6.3. Lie gr oup PGA By use of expansions in scaling parameter , in the case of S O ( n ), we take a lo ok at the first iteration of PGA in [11]. In a Lie group such as S O ( n ), PGA w as formulated in [11] as recursive v ariance remov al through left multiplication as b elo w. Definition 9 (PGA alternative) . Given Lie group M and D = { p 1 , . . . , p N } ⊂ M with µ ( D ) = µ , PGA directions { v 1 , . . . , v K } ⊂ S µ ( D ) M are given Set k = 1. I. Find v k = argmin k v k =1 P d { π H ( v ) ( p i ) , p i } 2 with H ( v ) = Exp µ { span( v ) } I I. Set D 0 = { g − 1 1 p 1 , . . . , g − 1 N p N } where g i = π H ( v k ) ( p i ) for all k I II. If k < K set k = k + 1 , D = D 0 and return to I, else stop. In S O ( n ) we can decomp ose any pro jection g i = Exp µ ( t i v k ) ab o ve as g i = Exp µ ( at i v k )Exp µ ( bt i v k ) , where a + b = 1. Then γ a,b : p → Exp µ {− a ( t i v k ) } p Exp µ {− b ( t i v k ) } is an isometry as for all X ∈ T µ S O ( n ), d µ γ a,b ( X ) = Exp µ {− a ( t i v k ) } X Exp µ {− b ( t i v k ) } and through direct computation and using (30) we ha ve, for all X , Y ∈ T µ S O ( n ), h d µ γ a,b ( X ) , d µ γ a,b ( Y ) i γ a,b ( µ ) = h X, Y i µ . 19 Of these isometries w e should choose the one that “minimizes energy” in moving parts of data along the geo desic from g i to µ . Assuming D has b een demeaned so that µ = I, letting p i = Exp I ( q i ) and expanding Log I [ γ a,b { Exp I ( q i ) } ] = { q i + (1 / 2)tr ( q i v k ) v k } − (1 / 4)tr ( q i v k ) ( b − a )( v k q i − q i v k ) 2 + O ( 3 ) = k q i k ( ˜ q i − cos θ i v k ) + k q i k 2 (1 / 2) cos θ i ( b − a )[ v k , ˜ q i ] 2 + O ( 3 ) where θ i is the angle formed by q i and v k . Thus for a 6 = b we ha ve additional mo vemen t in the orthogonal complement of explanatory direction v k . Note that accordingly , making the iden tification of S O (3) with S U (2), it is straigh tforward to sho w that with a = b , γ a,b corresp onds to a simple plane rotation. Also to consider in this metho d of PGA is the displacemen t of the mean as a result of curv ature after removing v ariabilit y in an explanatory direction. W e quan tify this effect in S O ( n ) by letting D = { p 1 , , . . . , p N , } ⊂ S O ( n ) b e such that µ ( D ) = I for all , v k ∈ S I S O ( n ) and Case 1. D 0 = { g − 1 / 2 1 p 1 , g − 1 / 2 1 , . . . , g − 1 / 2 N p N , g − 1 / 2 N } using a = b = 1 / 2 and Case 2. D 0 = { g − 1 1 p 1 , , . . . , g − 1 N p N , } using a = 1, b = 0 as in Definition 9 and b y obtaining the expansion of x ( ) = k Log I { µ ( D 0 ) }k in b oth cases. By [20] for function f ( y ) = d( y , p ) 2 = Log y ( p ) 2 w e hav e ∇ y f = − 2Log y ( p i ) . (31) Using this gradien t in Definition (5) x ( ) is such that N X i =1 Log I [Exp I {− x ( ) / 2 } Exp I ( r i )Exp I {− x ( ) / 2 } ] = 0 (32) with 1. Exp I ( r i ) = Exp I {− t i ( , v k ) v k / 2 } Exp I ( q i )Exp I {− t i ( , v k ) v k / 2 } and 2. Exp I ( r i ) = Exp I {− t i ( , v k ) v k } Exp I ( q i ) for i = 1 , . . . , N in cases 1 and 2, resp ectiv ely . Letting x ( ) = x 1 + x 2 2 + x 3 3 + x 4 4 + O ( 5 ) substituting in to (32) and solving for x 1 , x 2 , x 3 and x 4 giv es the expansions 20 1. x ( ) = x 3 3 + O ( 5 ) where x 3 = 1 96 N X i =1 { tr ( q i v k ) 2 } (2 v k q i v k − q i v k v k − v k v k q i ) − 4 { tr ( q i v k ) } (2 q i v k q i − q i q i v k − v k q i q i ) + 4 { tr ( q i v k ) } tr ( q i q i v k v k ) − tr ( q i v k q i v k ) v k = 1 24 N X i =1 k q i k 3 h cos 2 θ i R( q i , v k ) v k + 2 cos θ i R( v k , q i ) q i − 2 cos θ i sin 2 θ i K ( σ q i ,v k ) v k i and 2. x ( ) = x 2 2 + O ( 3 ) where x 2 = N X i =1 (1 / 4)tr ( q i v k ) ( v k q i − q i v k ) = − N X i =1 (1 / 2) k q i k 2 cos θ i [ v k , ˜ q i ] in case 1 and case 2, respectively with k x ( ) k = O ( 3 ) in case 1 and k x ( ) k = O ( 2 ) in case 2. Considering these expansions, a = b = 1 / 2 gives less local displacement of the mean and a = b = 1 / 2 is preferable. Still, the use of a centering or normalization step in this form of PGA might b e considered, particularly when there is still significant v ariability in the data and if some degree of degeneracy in explanatory directions is observ ed. In table 2 in 500 runs I generated 50 vectors in T I S O (3) with entries having a standard Gaussian distribution and differing v ariances. I compare the mean angles o ver the 500 runs that lo cated explanatory directions, b oth for a = 1 , b = 0 (as in Definition 9) and a = b = 1 / 2, mak e with the eigenv ectors of co v ariance op erator L ( θ w/ e.v.’s) and with PGA directions in Definition 7 ( θ w/ PGA). I also measure the displacement of the intrinsic mean after removing an explanatory direction ( µ disp.) and the aver age r e c onstruction err or , i.e., the in trinsic v ariability remaining in the data after removing explanatory directions (R.E.) under a = b = 1 / 2 and a = 1 , b = 0. W e repeat the exp erimen t for data given in [26] which was collected to in- v estigate the v ariability in six sub jects’ mov emen ts while completing a drilling task. The data we use is motion capture data of the rotation of the first sub- ject’s wrist. This data has little v ariability whic h is claimed in [26] is common in h uman kinetics studies and accordingly in [26] tangent space metho ds are used for analysis. The tables show b etter agreemen t with the eigenv ectors of L and v ariability as identified by PGA for a = b = 1 / 2. Also, there is less displacement of the mean for a = b = 1 / 2 in agreement with the expansions ab ov e. There is also sligh tly less reconstruction error for a = b = 1 / 2 in these exp erimen ts. These effects are greater in magnitude for the simulated data whic h has greater tangen t space v ariability . 21 θ w/ e.v.’s θ w/ PGA µ disp. R.E. a = b = 1 2 first dir. 0.0844 0.00 0.0109 0.4640 second dir. 0.1180 0.0021 0.0057 0.2530 a = 1 , b = 0 first dir. 0.0844 0.00 0.0196 0.4640 second dir. 0.6322 0.5880 0.0112 0.2610 Simulated data, 500 runs, mean intrinsic var. = 1.4878 θ w/ e.v.’s θ w/ PGA µ disp. R.E. a = b = 1 2 first dir. 1.308e-3 0.00 110.418e-6 43.853e-3 second dir. 2.01e-3 27.997e-6 42.035e-6 19.926e-3 a = 1 , b = 0 first dir. 1.308e-3 0.00 437.222e-6 43.853e-3 second dir. 63.526e-3 61.940e-3 189.840e-6 20.247e-3 W rist rotation data, intrinsic v ar. = 0.2912 T able 2: Comparisons of alt-PGA 7. Linear Difference Indicators In [28] differences b etw een exact solutions to PGA and tangent space approx- imations were explored. T o this end [28] introduced measures of the accuracy of appro ximations of the pro jection op erator and of appro ximations of explanatory directions obtained by orthogonal pro jection and PCA in the tangen t space, re- sp ectiv ely . Given D = { p 1 , . . . , p N } ⊂ M with µ ( D ) = µ , PGA directions V k − 1 = { v 1 , . . . , v k − 1 } and v ∈ S V ⊥ k − 1 the aver age pr oje ction differ enc e is for- m ulated as τ H = 1 N N X i =1 d { p i , ˆ π H ( v ) ( p i ) } 2 − d { p i , π H ( v ) ( p i ) } 2 (33) where H ( v ) = Exp µ (span( V k − 1 ∪ v )) and ˆ π H ( v ) ( p i ) is the exp onen tiation of the orthogonal pro jection of q i = Log µ ( p i ) in T µ M to span( V k − 1 ∪ v ). Then setting v = v k where v k is the k th PGA direction and letting ˆ v be its first-order approximation obtained in T µ M the aver age r esidual differ enc e is form ulated as ρ = 1 N N X i =1 d { p i , π H ( ˆ v ) ( p i ) } 2 − d { p i , π H ( v ) ( p i ) } 2 In [28] differ enc e indic ators which are sho wn to b e correlated to these statis- tics and which can b e computed b efore carrying out exact PGA are prop osed. The indicator for the a verage pro jection difference is given as τ H ≈ ˜ τ H ( v ) = 2 N N X i =1 k∇ ˆ π H ( v ) ( p i ) f k , (34) 22 where f ( y ) = d( p i , y ) 2 . Using (31), k∇ ˆ π H ( v ) ( p i ) f k can b e computed as the magnitude of the comp onen t of − 2Log ˆ π H v ( p i ) ( p i ) in T ˆ π H ( v ) ( p i ) M . The indicator for the a verage residual difference is giv en as the standard deviation of the differences of the distances of the q 0 i s to their orthogonal pro- jections to span( V k − 1 ∪ ˆ v ) and the distances of the p 0 i s to the exp onen tiation of those orthogonal pro jections. That is, σ = v u u t 1 N N X i =1 k q i − Log µ { ˆ π H ( ˆ v ) ( p i ) }k − d { p i , ˆ π H ( ˆ v ) ( p i ) } − m 2 where m is the mean of the differences b et ween the distances. 7.1. Exp ansions of τ H and ρ W e obtain expansions in scaling parameter of τ H and ρ in P ( n ) and S O ( n ). Then in Section 7.2 we apply the difference indicators and the expansions to t wo data sets in exp erimen ts similar to the ones in [28] and show the expansions pro vide go od appro ximations of τ H and ρ . Note that although ˜ τ H ( v ) and σ sho w correlation only ˜ τ H ( v ) is an appro ximation. These expansions will also giv e a b etter understanding of how scale and curv ature terms affect τ H and ρ whic h then can b e used to make the t yp e of decisions ab out the use of linear appro ximations demonstrated in [28]. W e will obtain these expansions for k = 1 whic h will allo w us to carry out the t yp e of experiments done in [28]. Expansions for k > 1 can b e obtained in a similar manner. Let p i, = Exp ( q i ) and t i ( ) = t i, 1 + t i, 3 3 + O ( 5 ) b e the expansion of the pro jection coefficient to a geo desic as Section 5.3. Using the expansion of the ob jective function in (24) and of t i ( ) given in (25) we obtain τ H = 1 N N X i =1 h i ( t i, 1 , ) − h i ( t i ( ) , ) = 1 N N X i =1 ( t i, 3 ) 2 6 + O ( 8 ) = 1 144 N N X i =1 k q k 6 cos 2 ( θ i ) sin 4 ( θ i ) { K ( θ v,q i ) } 2 6 + O ( 8 ) Consider the cost function f 1 ( v , ) in (2) and the expansion v 1 ( ) = v 1 , 0 + v 1 , 2 2 + v 1 , 4 4 + O ( 6 ) . W e hav e ρ = f 1 ( v 1 , 0 , ) − f 1 { v 1 ( ) , } . (35) Giv en the constraint k v 1 ( ) k = 1 and that for v a 1 ( ) = a X j =0 v 2 j 2 j , 23 k v a 1 ( ) k is not necessarily 1 for any a , consider the expansion 1 k v a 1 ( ) k = 1 p 1 + h v 1 , 2 , v 1 , 2 i I 4 + O ( 8 ) = 1 − h v 1 , 2 , v 1 , 2 i I 2 4 + O ( 8 ) and set ˜ v a 1 ( ) = v a 1 ( ) / k v a ( ) k = v 1 , 0 + v 1 , 2 2 + v 1 , 4 − v 1 , 0 h v 1 , 2 , v 1 , 2 i I 2 4 + O ( 6 ) . Then w e hav e v 1 ( ) = lim a →∞ ˜ v a 1 ( ) and as in (5) we hav e [ f 1 , 2 ( v 1 , 0 ) − f 1 , 2 { v 1 ( ) } ] 2 = 1 N n X i =1 h q i , v 1 ( ) i 2 I − h q i , v 1 , 0 i 2 I ! 2 = 1 N N X i =1 h q i , v 1 , 2 i 2 I − h q i , v 1 , 0 i 2 I h v 1 , 2 , v 1 , 2 i I 6 + O ( 8 ) (36) with the second equalit y using 1 N N X i =1 h q i , v 1 , 0 i I h q i , v 1 ,k i I = h L( v 1 , 0 ) , v 1 ,k i I = β 1 h u 1 , v 1 ,k i I = 0 for k = 2 , 4. Using (28), in P ( n ) we hav e [ f 1 , 4 ( v 1 , 0 ) − f 1 , 4 { v 1 ( ) } ] 4 = 1 48 N N X i =1 n tr( q i v 1 , 0 ) 2 tr (2 q i v 1 , 0 q i v 1 , 2 − q i q i v 1 , 0 v 1 , 2 − q i q i v 1 , 2 v 1 , 0 ) + 2tr ( q i v 1 , 0 ) tr ( q i v 1 , 2 ) tr ( q i v 1 , 0 q i v 1 , 0 − q i q i v 1 , 0 v 1 , 0 ) o 6 + O ( 8 ) = 1 6 N N X i =1 n cos 2 θ i, 1 R I ( v 1 , 0 , ˜ q i , ˜ q i , v 1 , 2 ) + cos θ i, 1 sin 2 θ i, 1 cos ˆ θ i, 1 K ( σ v,q i ) o k q i k 6 6 + O ( 8 ) . (37) In S O ( n ), using the metric in (30), we just take the negativ e of this expression. Then using (35) the expansion of ρ in P ( n ) or S O ( n ) is the sum of (36) and (37). 7.2. Exp eriments c omp aring differ enc e indic ators and expansions Set the expansion of ρ as ρ = ρ 6 6 + O ( 8 ) and the expansion of τ H as τ H = τ H, 6 6 + O ( 8 ), as derived ab ov e in Section 7.1. The first data set is the wrist rotation data set in S O (3) from Section 6.3. The second is a syn thetic data 24 set in P ( n ) with a sample of 36 from a distribution as in the test in Figure 6. As in [28] for each exp erimen t we dra w a random sample of size 8 from the data set 20 times and compute the relev an t statistics each time for comparison. F or the wrist rotation data in S O (3), τ H is not computed as pro jection has closed-form in this case. Exp erimen t 1. W rist Rotation Data As indicated in Section 6.3 this data set has little v ariability . Accordingly , ρ 6 in Figure 7 is a very close estimate of ρ and provides a nearly p erfect picture of the p enalty (very small in this case) of using the first-order approximation of PGA. A t the same time σ has a low er correlation with ρ and is of a muc h different scale and is thus of less v alue in assessing the use of a first-order approximation of PGA. Exp erimen t 2. Syn thetic Data As in [28] the first PGA direction v 1 is set to v in (33) and used to compute τ H . In Figure 8 b oth τ H, 6 and ρ 6 pro vide goo d estimates of τ H and ρ with correlation co efficients of .993 and .981, resp ectiv ely . Also to note, one migh t squar e the norm of the gradient in (34) to use the Pythagorean theorem in each T ˆ π H ( v ) ( p i ) M to get another improv ed estimate of τ H o ver ˜ τ H . 0 . 6 0 . 8 1 1 . 2 1 . 4 1 . 6 1 . 8 2 2 . 2 2 . 4 2 . 6 2 . 8 3 3 . 2 3 . 4 · 10 − 4 − 0 . 2 0 0 . 2 0 . 4 0 . 6 0 . 8 1 1 . 2 1 . 4 1 . 6 · 10 − 7 Pearson’s r = 0 . 709 σ ρ (a) Difference Indicator, σ 0 0 . 2 0 . 4 0 . 6 0 . 8 1 1 . 2 1 . 4 1 . 6 · 10 − 7 0 0 . 2 0 . 4 0 . 6 0 . 8 1 1 . 2 1 . 4 1 . 6 · 10 − 7 Pearson’s r = 0 . 999 ρ 6 ρ (b) Expansion term, ρ 6 Figure 7: ρ for wrist rotation 25 8. Discussion and conclusions In [31] PGA is formulated as a probabilit y mo del (PPGA) in whic h data is distributed according to a manifold generalization of the normal distribution. Explanatory directions are included as parameters to b e estimated b y maxim um lik eliho o d. Also, a lo cation parameter and scaling parameters for the explana- tory directions and for the v ariability or disp ersion of the data are fit. Thus, as an adv antage, not only are the the mean and explanatory directions jointly estimated but the disp ersion of the data is also taken into account. In the descriptiv e setting, in this pap er, consideration of a disp ersion or scaling factor was shown to b e an essential element in rev ealing the underlying structure of solutions to PGA. In the prop osition in Section 3, for example, we see how the share of v ariability in the tangent space, accounted for by eigenv ec- tor u k and measured by eigen v alue β k , w eights the curv ature terms in v k, 2 to determine the difference, at least lo cally , b et ween the first-order approximation and the exact solution. Or in expansions of Section 7 we see the explicit inter- 6 · 10 − 2 8 · 10 − 2 0 . 1 0 . 12 0 . 14 0 . 16 0 . 18 0 . 2 0 . 22 0 0 . 5 1 1 . 5 2 2 . 5 · 10 − 2 Pearson’s r = 0 . 943 ˜ τ H τ H (a) Difference Indicator, ˜ τ H 0 0 . 2 0 . 4 0 . 6 0 . 8 1 1 . 2 1 . 4 1 . 6 1 . 8 2 · 10 − 2 0 0 . 5 1 1 . 5 2 2 . 5 · 10 − 2 Pearson’s r = 0 . 993 τ H, 6 τ H (b) Expansion term, τ H, 6 1 2 3 4 5 6 7 8 9 · 10 − 2 0 0 . 2 0 . 4 0 . 6 0 . 8 1 1 . 2 1 . 4 · 10 − 2 Pearson’s r = 0 . 798 σ ρ (c) Difference Indicator, σ 0 0 . 2 0 . 4 0 . 6 0 . 8 1 1 . 2 1 . 4 1 . 6 1 . 8 · 10 − 2 0 0 . 2 0 . 4 0 . 6 0 . 8 1 1 . 2 1 . 4 1 . 6 · 10 − 2 Pearson’s r = 0 . 981 ρ 6 ρ (d) Expansion term, ρ 6 Figure 8: τ H , ρ for data in P (3) 26 action b et ween scale, curv ature and the distribution of data in determining the lo cal difference b etw een the pro jection op erator, the PGA directions and their linear appro ximations. Also, we see in this pap er, at least exp erimentally , that the approximations obtained by expansion hold for data significantly disp ersed from the tangen t space. In Figure 4, for example, with q , v uniformly distributed in S I P ( n ) the third- and fifth-order approximations in hold for > 2. In this case, one can readily obtain b ounds on the ratios of the co efficien ts in the expansion and deriv e exp ected v alues of those ratios to explain this plot. Such an approach should b e able to b e taken with other expansions and distributions of data as w ell. Also, in a bounded manifold the v ariability of data and th us is restricted. In S n or S O ( n ), for example, we take | | < π / k q i k for data p i, = Exp µ ( q i ) so that the expansions only need to hold for these v alues of . Since PGA was introduced in [11] and then [10] a num b er of other metho ds to analyze the v ariability of manifold-v alued data hav e b een prop osed. F or example, [18] accounts for non-geo desic v ariabilit y in spheres and [17] pro jects to geodesics that in tersect orthogonally at a mean on the first geodesic that best fits the data. One could use the approach of this pap er in these con texts, for example, an expansion of the difference b et ween the mean lo cated in [17] and the in trinsic mean might b e obtained. In addition, this paper and others use a definition of PGA that minimizes residual error while PGA was defined in [11] as maximizing pro jected v ari- abilit y . Expansions of solutions of PGA using the latter definition might b e obtained and the higher-order terms compared to determine what accounts for the differences and how these differences might b e taken into account in decid- ing which definition to employ . Also, the approach of this pap er might b e used to quantify differences b et ween other generalizations of linear statistics such as intrinsic MANOV A in [15] and ge o desic r e gr ession in [9] and their local, linear appro ximations. 9. Ac knowledgmen ts W e are grateful to the Editor, the Asso ciate Editor, and the reviewers for their v aluable commen ts whic h ha ve led to impro vemen ts of our paper. DL thanks Professor Leo Butler for help and guidance in w orking on this pap er. W e also thank Professor Steve Rosenberg for useful discussions. The con tribution of LL was funded by NSF grant I IS1546331 and a grant from the Army’s research office. References [1] A. Banerjee, I. S. Dhillon, J. Ghosh, and S. Sra, Clustering on the unit hyp erspher e using von Mises–Fisher distributions , J. Mach. Learn. Res. 6 (2005), 1345–1382. 27 [2] P .J. Basser, J. Mattiello, and D. LeBihan, { MR } diffusion tensor sp e c- tr osc opy and imaging , Bioph ysical Journal 66 (1994), 259 – 267. [3] M. Bridson and A. Haefliger, Metric sp ac es of non-p ositive curvatur e , Grundlehren der mathematischen Wissenschaften, 319, pp. 315–316, Springer, 1999. [4] J. Cheeger and D.G. Ebin, Comp arison the or ems in Riemannian ge ometry , AMS Chelsea Publishing, Pro vidence, RI, 2008, Revised reprint of the 1975 original. [5] R.W.R. Darling, Differ ential forms and c onne ctions , Cambridge Univ ersity Press, Cam bridge, 1994. [6] M.P . do Carmo, Differ ential ge ometry of curves and surfac es , Pren tice-Hall, Inc., Englew o o d Cliffs, N.J., 1976, T ranslated from the Portuguese. [7] P . Jost-Hinrich Eschen burg, L e ctur e notes on symmetric sp ac es , January 1997, h ttp://myw eb.rz.uni-augsburg.de/ eschen bu/symspace.pdf. [8] P .T. Fletcher, L e ctur e notes on symmetric sp ac es , Jan uary 2009, h ttp://www.sci.utah.edu/fletcher/RiemannianGeometryNotes.pdf. [9] , Ge o desic r e gr ession and the the ory of le ast squar es on Riemannian manifolds , Int. J. Comput. Vis. 105 (2013), 171–185. [10] P .T. Fletcher and S. Joshi, Princip al ge o desic analysis on symmetric sp ac es: Statistics of diffusion tensors , Computer Vision and Mathematical Meth- o ds in Medical and Biomedical Image Analysis (Milan Sonk a, IoannisA. Kak adiaris, and Jan Kybic, e ds.), Lecture Notes in Computer Science, vol. 3117, Springer Berlin Heidelb erg, 2004, pp. 87–98 (English). [11] P .T. Fletcher, Conglin Lu, and Sarang Joshi, Statistics of shap e via princi- p al ge o desic analysis on lie gr oups , Pro ceedings of the 2003 IEEE Computer So ciet y Conference on Computer Vision and P attern Recognition (W ash- ington, DC, USA), CVPR’03, IEEE Computer So ciet y , 2003, pp. 95–101. [12] M. F r´ ec het, L es ´ elements al´ eatoir es de natur e quelc onque dans un esp ac e distanci ´ e , Ann. Inst. H. P oincar´ e 10 (1948), 215–310. [13] J. Gallier and D. Xu, Computing exp onentials of skew symmetric matric es and lo garithms of ortho gonal matric es , In ternational Journal of Rob otics and Automation 18 (2000), 10–20. [14] K. Hornik, I. F einerer, M. Kob er, and C. Buc hta, Spheric al k -Me ans Clus- tering , Journal of Statistical Soft ware 50 (2012), 1–22. [15] S. Huc kemann, T. Hotz, and A. Munk, Intrinsic manova for riemannian manifolds with an applic ation to kendal l’s sp ac e of planar shap es. , IEEE T rans. Pattern Anal. Mach. Intell. 32 (2010), 593–603. 28 [16] , Intrinsic shap e analysis: geo desic PCA for Riemannian manifolds mo dulo isometric Lie gr oup actions , Statist. Sinica 20 (2010), 1–58. [17] S. Huck emann and H. Ziezold, Princip al c omp onent analysis for Rieman- nian manifolds, with an applic ation to triangular shap e sp ac es , Adv. in Appl. Probab. 38 (2006), 299–319. [18] S. Jung, I.L. Dryden, and J.S. Marron, Analysis of princip al neste d spher es , Biometrik a 99 (2012), 551–568. [19] B.K. and P . Horn, R elative orientation , International Journal of Computer Vision (1990), 59–78. [20] H. Karc her, Riemannian c enter of mass and mol lifier smo othing , Comm. Pure Appl. Math. 30 (1977), 509–541. [21] J. Kuffner, Effe ctive sampling and distanc e metrics for 3d rigid b ody p ath planning , Robotics and Automation, 2004. Pro ceedings. ICRA ’04. 2004 IEEE In ternational Conference on, vol. 4, April 2004, pp. 3993–3998. [22] K. Lange, Optimization , Springer T exts in Statistics, 95, p. 36, Springer, 2013. [23] K.V. Mardia and P .E. Jupp, Dir e ctional statistics , Wiley Series in Proba- bilit y and Statistics, John Wiley & Sons, Ltd., Chic hester, 2000, Revised reprin t of Statistics of directional data by Mardia. [24] I. Na jfeld and T.F. Hav el, Derivatives of the matrix exp onential and their c omputation , Adv. in Appl. Math. 16 (1995), 321–375. [25] P . P etersen, R iemannian geometr y , Graduate T exts in Mathematics, p. 242, Springer, 2010. [26] D. Rancourt, L.-P . Rivest, and J. Asselin, Using orientation statistics to investigate variations in human kinematics , J. Roy . Statist. So c. Ser. C 49 (2000), 81–94. [27] S. Said, N. Court y , N. Le Bihan, and S.J. Sangwine, Exact princip al ge o desic analysis for data on SO(3) , Signal Pro cessing Conference, 2007 15th Euro- p ean, Sept 2007, pp. 1701–1705. [28] S. Sommer, F. Lauze, S. Haub erg, and Mads Nielsen, Manifold value d statistics, exact princip al ge o desic analysis and the effe ct of line ar ap- pr oximations , Pro ceedings of the 11th Europ ean Conference on Computer Vision: P art VI (Berlin, Heidelb erg), ECCV’10, Springer-V erlag, 2010, pp. 43–56. [29] S. Sommer, F. Lauze, and M. Nielsen, The differ ential of the exp o- nential map, jac obi fields and exact princip al ge odesic analysis , CoRR abs/1008.1902 (2010), 1–26. 29 [30] B. Stanfill, H. Hofmann, and U. Gensc hel, R otations: An R p ackage for SO(3) data , The R Journal 6 (2014), 68–78. [31] M. Zhang and P .T. Fletc her, Pr ob abilistic princip al ge o desic analysis. , NIPS (Christopher J. C. Burges, Lon Bottou, Zoubin Ghahramani, and Kilian Q. W einberger, eds.), 2013, pp. 1178–1186. 30
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment