Infinite-dimensional Log-Determinant divergences II: Alpha-Beta divergences
This work presents a parametrized family of divergences, namely Alpha-Beta Log- Determinant (Log-Det) divergences, between positive definite unitized trace class operators on a Hilbert space. This is a generalization of the Alpha-Beta Log-Determinant…
Authors: Minh Ha Quang
Infinite-dimensional Log-Determinant di ver gences II: Alpha-Beta di ver gences H ` a Quang Minh Istituto Italiano di T ecnologia, V ia More go 30, Genova 16163, IT ALY Abstract This work presents a parametrized family of di ver gences, namely Alpha-Beta Log- Determinant (Log-Det) div ergences, between positive definite unitized trace class op- erators on a Hilbert space. This is a generalization of the Alpha-Beta Log-Determinant div ergences between symmetric, positiv e definite matrices to the infinite-dimensional setting. The family of Alpha-Beta Log-Det di ver gences is highly general and con- tains many div ergences as special cases, including the recently formulated infinite- dimensional affine-in variant Riemannian distance and the infinite-dimensional Alpha Log-Det di ver gences between positiv e definite unitized trace class operators. In partic- ular , it includes a parametrized family of metrics between positive definite trace class operators, with the affine-in variant Riemannian distance and the square root of the symmetric Stein di vergence being special cases. For the Alpha-Beta Log-Det di ver- gences between cov ariance operators on a Reproducing K ernel Hilbert Space (RKHS), we obtain closed form formulas via the corresponding Gram matrices. K e ywor ds: 2010 MSC: 47B65, 47L07, 46E22, 15A15 infinite-dimensional Log-Determinant div ergences, Alpha di v ergences, Alpha-Beta div ergences, af fine-in variant Riemannian distance, Stein di vergence, positi ve definite operators, trace class operators, extended trace, e xtended Fredholm determinant, Reproducing kernel Hilbert spaces, cov ariance operators Email addr ess: minh.haquang@iit.it (H ` a Quang Minh) Pr eprint submitted to Journal of L A T E X T emplates J anuary 14, 2017 1. Introduction Symmetric Positiv e Definite (SPD) matrices play an important role in many areas of mathematics, statistics, machine learning, optimization, computer vision, and re- lated fields, see e.g. [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]. The set Sym ++ ( n ) of n × n SPD matrices is an open con ve x cone and can also be equipped with a Riemannian manifold structure. Among the most studied Riemannian metrics on Sym ++ ( n ) are the classical affine-in variant metric [1, 2, 3, 5, 12] and the more recent Log-Euclidean metric [4, 9, 13]. The con vex cone structure of Sym ++ ( n ) , on the other hand, gi ves rise to distance-like functions such as the Alpha Log-Determinant diver gences [14], which hav e been shown to be special cases of the Alpha-Beta Log-Determinant div ergences [15]. These div ergences are fast to compute and have been shown to work well in various applications [7, 16, 8]. The present work aims to generalize the Alpha-Beta Log-Determinant div ergences to the infinite-dimensional setting. Finite-dimensional Alpha-Beta Log-Determinant divergences . W e recall that for A, B ∈ Sym ++ ( n ) , the Alpha-Beta Log-Determinant (Log-Det) diver gence be- tween A and B is a parametrized family of div ergences defined by (see [15]) D ( α,β ) ( A, B ) = 1 αβ log det α ( AB − 1 ) β + β ( AB − 1 ) − α α + β , (1) α 6 = 0 , β 6 = 0 , α + β 6 = 0 . Remark 1 . T o keep our presentation compact, in the following we consider the case α > 0 , β > 0 , as well as the limiting cases α = 0 , β = 0 . Since D ( α,β ) ( A, B ) = D ( − α, − β ) ( B , A ) , the case α < 0 , β < 0 is essentially identical to the previous case. W e do not consider the cases α , β hav e opposite signs, since in those cases the well- definedness and finiteness of D ( α,β ) r ( A, B ) depends on the spectrum of AB − 1 (see Theorem 2 in [15]), that is it is not a valid di vergence on all of Sym ++ ( n ) . The parametrized family of div ergences defined by Eq.(1) is highly general and admits as special cases many metrics and distance-like functions on Sym ++ ( n ) , in- cluding in particular the following: 2 1. The af fine-in variant Riemannian distance [3], corresponding to the limiting case D (0 , 0) ( A, B ) , with D (0 , 0) ( A, B ) = 1 2 d 2 aiE ( A, B ) = 1 2 || log( B − 1 / 2 AB − 1 / 2 ) || 2 F , (2) where log( A ) denotes the principal logarithm of the matrix A and || || F denotes the Frobenius norm. 2. The Alpha Log-Determinant diver gences [14], corresponding to D ( α, 1 − α ) ( A, B ) , 0 < α < 1 , with D ( α, 1 − α ) ( A, B ) = 1 α (1 − α ) log det[ αA + (1 − α ) B ] det( A ) α det( B ) 1 − α . (3) A special case of this div ergence is the symmetric Stein div ergence (also called the Jensen-Bregman LogDet diver gence), corresponding to D (1 / 2 , 1 / 2) ( A, B ) , whose square root is a metric on Sym ++ ( n ) [16], with D (1 / 2 , 1 / 2) ( A, B ) = 4 d 2 stein ( A, B ) = 4 log det( A + B 2 ) p det( A ) det( B ) . (4) 3. The limiting cases β = 0 and α = 0 correspond to, respecti vely , D ( α, 0) ( A, B ) = 1 α 2 tr(( A − 1 B ) α − I ) − α log det( A − 1 B ) , (5) D (0 ,β ) ( A, B ) = 1 β 2 tr(( B − 1 A ) β − I ) − β log det( B − 1 A ) , (6) with D (1 , 0) ( A, B ) = tr( A − 1 B − I ) − log det( A − 1 B ) and D (0 , 1) ( A, B ) = tr( B − 1 A − I ) − log det( B − 1 A ) . Contributions of this work . The current w ork is a continuation and generalization of the author’ s recent work [17]. In [17], we generalized the Alpha Log-Det diver - gences between SPD matrices [14] to the infinite-dimensional Alpha Log-Determinant div ergences between positive definite unitized trace class operators in a Hilbert space. In the current work, we present a formulation for the Alpha-Beta Log-Det di ver gences between positiv e definite unitized trace class operators, generalizing the Alpha-Beta div ergences between SPD matrices as defined by Eq.(1). As in the finite-dimensional setting, the formulation we present here is general and admits as special cases many 3 metrics and distance-like functions between positi ve definite unitized trace class op- erators, including in particular the follo wing: the infinite-dimensional af fine-in variant Riemannian distance [18]; the infinite-dimensional Alpha Log-Det div ergences [17], a special case of which is the infinite-dimensional symmetric Stein diver gence. For the div ergences between reproducing kernel Hilbert spaces (RKHS) cov ariance opera- tors, we obtain closed form formulas for the Alpha-Beta Log-Det diver gences via the corresponding Gram matrices. Organization . W e provide a summary of the main results of the paper in Section 2, including our definition of the infinite-dimensional Alpha-Beta Log-Det div ergences. The key concepts in volv ed are described in Section 3. The motiv ations and deriv a- tions leading to our definition of the Alpha-Beta Log-Det di ver gences are presented in Section 4. W e then show in Section 5 that both the affine-in variant Riemannian dis- tance and the Alpha Log-Det div ergences are special cases of the Alpha-Beta Log-Det div ergences. All mathematical proofs are presented in Appendix A. 2. Summary of main results W e present a summary of our main results in this section, with the detailed technical descriptions provided in subsequent sections. Throughout the paper , let H denote a separable Hilbert space, with dim( H ) = ∞ , unless explicitly stated otherwise. Let L ( H ) be the Banach space of bounded linear operators on H and Sym( H ) ⊂ L ( H ) be the subspace of self-adjoint, bounded operators on H . For A ∈ L ( H ) , we write A > 0 to denote that A is a self-adjoint positi ve definite operator . Let T r( H ) denote the Banach algebra of trace class operators on H . The set of positive definite unitized trace class oper ators on H is then defined to be PT r( H ) = { A + γ I > 0 : A = A ∗ , A ∈ T r( H ) , γ ∈ R } . (7) The main purpose of the current work is the generalization of the Alpha-Beta Log- Det div ergence between SPD matrices, as defined in Eq. (1), to that between positi ve definite unitized trace class operators in PT r( H ) . The following is our definition of the Alpha-Beta (Log-Det) div ergences in the infinite-dimensional setting. 4 Definition 1 ( Alpha-Beta Log-Determinant Diver gences ) . Assume that dim( H ) = ∞ . Let α > 0 , β > 0 be fixed. Let r ∈ R , r 6 = 0 be fixed. F or ( A + γ I ) , ( B + µI ) ∈ PT r( H ) , the ( α, β ) -Log-Det diver gence D ( α,β ) r [( A + γ I ) , ( B + µI )] is defined to be D ( α,β ) r [( A + γ I ) , ( B + µI )] = 1 αβ log " γ µ r ( δ − α α + β ) det X α (Λ + γ µ I ) r (1 − δ ) + β (Λ + γ µ I ) − rδ α + β !# , (8) wher e Λ + γ µ I = ( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 , δ = αγ r αγ r + β µ r . Equivalently , D ( α,β ) r [( A + γ I ) , ( B + µI )] = 1 αβ log " γ µ r ( δ − α α + β ) det X α ( Z + γ µ I ) r (1 − δ ) + β ( Z + γ µ I ) − rδ α + β !# , (9) wher e Z + γ µ I = ( A + γ I )( B + µI ) − 1 . Remark 2 . In Definition 1, det X denotes the extended F r edholm determinant defined in [17] (see Section 3 below). For γ = 1 , we hav e det X ( A + γ I ) = det( A + I ) , with det on the right hand side being the Fredholm determinant. For dim( H ) < ∞ , det X ( A + γ I ) = det( A + γ I ) , with det on the right hand side being the standard matrix determinant. The quantity D ( α,β ) r [( A + γ I ) , ( B + µI )] where α > 0 , β > 0 , as stated in Defini- tion 1, can be extended to the cases α > 0 , β = 0 and α = 0 , β > 0 , ∀ r ∈ R , r 6 = 0 , via limiting arguments. The following is our definition in these cases. Definition 2 ( Limiting cases - I ) . Assume that dim( H ) = ∞ . Let α > 0 , β > 0 , r 6 = 0 be fixed. F or ( A + γ I ) , ( B + µI ) ∈ PT r( H ) , the Log-Det diverg ence D ( α, 0) r [( A + γ I ) , ( B + µI )] is defined to be D ( α, 0) r [( A + γ I ) , ( B + µI )] = r α 2 µ γ r − 1 log µ γ (10) + 1 α 2 tr X ([( A + γ I ) − 1 ( B + µI )] r − I ) − 1 α 2 µ γ r log det X [( A + γ I ) − 1 ( B + µI )] r . 5 Similarly , D (0 ,β ) r [( A + γ I ) , ( B + µI )] is defined to be D (0 ,β ) r [( A + γ I ) , ( B + µI )] = r β 2 γ µ r − 1 log γ µ (11) + 1 β 2 tr X ([( B + µI ) − 1 ( A + γ I )] r − I ) − 1 β 2 γ µ r log det X [( B + µI ) − 1 ( A + γ I )] r . The following result confirms that the quantity D ( α,β ) r , as defined in Definitions 1 and 2, is in fact a di ver gence on PT r( H ) . Theorem 1 ( P ositivity ) . Assume the hypothesis stated in Definitions 1 and 2. Then D ( α,β ) r [( A + γ I ) , ( B + µI )] ≥ 0 (12) D ( α,β ) r [( A + γ I ) , ( B + µI )] = 0 ⇐ ⇒ A = B , γ = µ. (13) Theorem 2 ( Special cases - I ) . The following are some of the most important special cases of Definitions 1 and 2. 1. The infinite-dimensional affine-in variant Riemannian distance d aiHS [( A + γ I ) , ( B + µI )] [18], whic h corr esponds to the limiting case lim α → 0 D ( α,α ) r [( A + γ I ) , ( B + µI )] , wher e r = r ( α ) is smooth, with r (0) = 0 , r 0 (0) 6 = 0 , and r ( α ) 6 = 0 for α 6 = 0 . The limit is given by lim α → 0 D ( α,α ) r [( A + γ I ) , ( B + µI )] = [ r 0 (0)] 2 8 d 2 aiHS [( A + γ I ) , ( B + µI )] . (14) In particular , for r = 2 α , lim α → 0 D ( α,α ) 2 α [( A + γ I ) , ( B + µI )] = 1 2 d 2 aiHS [( A + γ I ) , ( B + µI )] . (15) This is the content of Theor em 9. 2. The infinite-dimensional Alpha Log-Determinant diver gences d α logdet [( A + γ I ) , ( B + µI )] [17], with D ( α, 1 − α ) ± 1 [( A + γ I ) , ( B + µI )] = d ± (1 − 2 α ) logdet [( A + γ I ) , ( B + µI )] , (16) 0 ≤ α ≤ 1 . This is the content of Theor em 10. 6 Since the limit lim α → 0 D ( α,α ) r [( A + γ I ) , ( B + µI )] in the first part of Theorem 2 is unique, up to the multiplicati ve factor [ r 0 (0)] 2 / 8 , we define the quantity D (0 , 0) 0 [( A + γ I ) , ( B + µI )] as follo ws. Definition 3 ( Limiting cases - II ) . F or ( A + γ I ) , ( B + µI ) ∈ PT r( H ) , the Log-Det diver gence D (0 , 0) 0 [( A + γ I ) , ( B + µI )] is defined to be D (0 , 0) 0 [( A + γ I ) , ( B + µI )] = lim α → 0 D ( α,α ) 2 α [( A + γ I ) , ( B + µI )] = 1 2 d 2 aiHS [( A + γ I ) , ( B + µI )] . (17) Since d aiHS [( A + γ I ) , ( B + µI )] is a metric on PT r( H ) , D (0 , 0) 0 [( A + γ I ) , ( B + µI )] is automatically a symmetric div ergence on PT r( H ) . In fact, it is a member of the parametrized f amily D ( α,α ) 2 α [( A + γ I ) , ( B + µI )] , α ≥ 0 , of symmetric div ergences on PT r( H ) , as stated in the follo wing result. Theorem 3 ( Special cases - II ) . The parametrized family D ( α,α ) 2 α [( A + γ I ) , ( B + µI )] , α ≥ 0 , is a family of symmetric diver gences on PT r( H ) , with α = 0 corresponding to the infinite-dimensional affine-in variant Riemannian distance above and α = 1 / 2 corr esponding to the infinite-dimensional symmetric Stein diverg ence, which is given by 1 4 d 0 logdet [( A + γ I ) , ( B + µI )] . Finite-dimensional case . For γ = µ , we have δ = α α + β , so that Eq. (9) becomes D ( α,β ) r [( A + γ I ) , ( B + γ I )] = 1 αβ log det X α [( A + γ I )( B + γ I ) − 1 ] rβ α + β + β [( A + γ I )( B + γ I ) − 1 ] − rα α + β α + β ! . (18) In the finite-dimensional case, where A and B are two n × n SPD matrices, setting γ = 0 and recalling that det X = det for finite matrices , we obtain D ( α,β ) r ( A, B ) = 1 αβ log det α ( AB − 1 ) rβ α + β + β ( AB − 1 ) − rα α + β α + β ! . (19) In particular , by setting r = α + β , we recover Eq. (1). For γ = µ , Eq. (10) becomes D ( α, 0) r [( A + γ I ) , ( B + γ I )] (20) = 1 α 2 tr X ([( A + γ I ) − 1 ( B + γ I )] r − I ) − log det X [( A + γ I ) − 1 ( B + γ I )] r , 7 which reduces to Eq. (5) when A, B ∈ Sym ++ ( n ) , γ = 0 , and r = α . Similarly , Eq. (11) becomes D (0 ,β ) r [( A + γ I ) , ( B + γ I )] (21) = 1 β 2 tr X ([( B + γ I ) − 1 ( A + γ I )] r − I ) − log det X [( B + γ I ) − 1 ( A + γ I )] r , which reduces to Eq. (6) when A, B ∈ Sym ++ ( n ) , γ = 0 , and r = β . Remark 3 . As in the cases of the Log-Hilbert-Schmidt distance [19], the infinite- dimensional affine-in v ariant Riemannian distance [18, 20], and the infinite-dimensional Alpha Log-Det di vergences [17], we show below that in general, the infinite-dimensional formulation is not obtainable as the limit of the finite-dimensional version as the dimen- sion approaches infinity . Remark 4 . Except for the case r = α + β , the quantity r in D ( α,β ) r that we introduce here, to the best of our knowledge, has no equiv alence in the existing literature in the finite-dimensional setting. Remark 5 . Throughout the paper , we employ the following notations. Using the iden- tity ( B + µI ) − 1 = 1 µ I − B µ ( B + µI ) − 1 , we write the operator ( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 as ( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 = Λ + γ µ I ∈ PT r( H ) , (22) where Λ = ( B + µI ) − 1 / 2 A ( B + µI ) − 1 / 2 − γ µ B ( B + µI ) − 1 ∈ T r( H ) . This notation is employed in Eq. (8). Similarly , in Eq. (9), we write ( A + γ I )( B + µI ) − 1 = γ µ I + A ( B + µI ) − 1 − γ µ B ( B + µI ) − 1 = Z + γ µ I , (23) where Z = A ( B + µI ) − 1 − γ µ B ( B + µI ) − 1 ∈ T r( H ) . Metric pr operties . Consider now a special case, where α = β and r = α + β . For simplicity , we consider operators ( A + γ I ) and ( B + µI ) with γ = µ . For γ > 0 , γ ∈ R fixed, we define the follo wing subset of PT r( H ) PT r( H )( γ ) = { A + γ I > 0 : A ∗ = A, A ∈ T r( H ) } . (24) 8 Remark 6 . Throughout the paper , we assume, unless stated otherwise, that dim( H ) = ∞ , and the condition A + γ I > 0 automatically implies that γ > 0 . When dim( H ) < ∞ , we can set γ = 0 . Theorem 4 ( Metric property ) . Let γ > 0 , γ ∈ R be fixed. The square r oot function q D ( α,α ) 2 α [( A + γ I ) , ( B + γ I )] is a metric on PT r( H )( γ ) for all α ≥ 0 . W e thus hav e a family of metrics between positiv e definite operators of the form ( A + γ I ) ∈ PT r( H )( γ ) , parametrized by the parameter α ≥ 0 . In particular , with α = 0 in Theorem 4, we obtain the affine-in variant Riemannian distance, and with α = 1 2 we obtain the following metric, which is the square root of the infinite-dimensional Stein div ergence q D (1 / 2 , 1 / 2) 1 [( A + γ I ) , ( B + γ I )] = 2 v u u u t log det X h ( A + γ I )+( B + γ I ) 2 i det X ( A + γ I ) 1 / 2 det X ( B + γ I ) 1 / 2 . (25) The corresponding finite-dimensional result [15], where A, B ∈ Sym ++ ( n ) , is recov ered by setting γ = 0 in Theorem 4. In particular , with α = 1 / 2 and A, B ∈ Sym ++ ( n ) , we obtain the corresponding result of [16]. Remark 7 . The analysis of q D ( α,α ) 2 α [( A + γ I ) , ( B + µI )] , where γ 6 = µ , is techni- cally more in volv ed and will be presented in a separate work. 3. Positi ve definite unitized trace class operators T o generalize the Alpha-Beta Log-Determinant di ver gences from the finite to infinite- dimensional setting, we need to employ the follo wing concepts • Positiv e definite operators P ( H ) . • Extended (or unitized) trace class operators T r X ( H ) . • Positiv e definite unitized trace class operators PT r( H ) . • Extended Fredholm determinant det X on T r X ( H ) . 9 • Exponential, logarithm, and power functions for operators in PT r( H ) and their products. W e discuss in detail below the logarithm and power functions of products of operators in PT r( H ) . Other concepts are briefly re viewed and we refer to [17] for the detailed motiv ations leading to the definitions of these concepts. Throughout the following, we assume that dim( H ) = ∞ , unless stated explicitly otherwise. Positi ve definite operators . W e recall that an operator A ∈ L ( H ) is said to be positiv e definite if there exists a constant M A > 0 such that h x, Ax i ≥ M A || x | 2 ∀ x ∈ H . This is equi valent to saying that A is both strictly positi ve and in vertible. W e denote by P ( H ) the set of all positi ve definite operators on H . Extended trace class operators . Let T r( H ) denote the set of trace class operators on H , the set of extended (or unitized) trace class operators on H is defined to be T r X ( H ) = { A + γ I : A ∈ T r( H ) , γ ∈ R } . Equipped with the extended trace class norm || A + γ I || tr X = || A || tr + | γ | = tr | A | + | γ | , T r X ( H ) becomes a Banach algebra. For ( A + γ I ) ∈ T r X ( H ) , its extended trace is defined to be tr X ( A + γ I ) = tr( A ) + γ . Thus by this definition tr X ( I ) = 1 , in contrast to usual trace definition, according to which tr( I ) = ∞ . Extended Fredholm determinant . For ( A + γ I ) ∈ T r X ( H ) , γ 6 = 0 , its extended Fredholm determinant is defined to be det X ( A + γ I ) = 1 γ det A γ + I , where the determinant on the right hand side is the Fredholm determinant. For γ = 1 , we recov er the Fredholm determinant. In the case dim( H ) < ∞ , we define det X ( A + γ I ) = det( A + γ I ) , the standard matrix determinant. 10 Positi ve definite unitized trace class operators . Having defined both positiv e definite operators and extended trace class operators, the set of positi ve definite unitized trace class operators PT r( H ) ⊂ T r X ( H ) is then defined to be the intersection PT r( H ) = Sym( H ) ∩ P ( H ) = { A + γ I > 0 : A ∗ = A, A ∈ T r( H ) γ ∈ R } . Exponential, logarithm, and po wer functions . Consider the exponential function exp : L ( H ) → L ( H ) defined by exp( A ) = ∞ X j =0 A j j ! . The following result sho ws that exp maps T r X ( H ) to T r X ( H ) . Lemma 1. Let ( A + γ I ) ∈ T r X ( H ) . Then exp( A + γ I ) ∈ T r X ( H ) . Consider next the inv erse function log = exp − 1 : L ( H ) → L ( H ) . For any ( A + γ I ) ∈ PT r( H ) , log( A + γ I ) is always well-defined as follows. Let { λ k } ∞ k =1 be the eigen v alues of A with corresponding orthonormal eigen vectors { φ k } ∞ k =1 . Then A = ∞ X k =1 λ k φ k ⊗ φ k , log( A + γ I ) = ∞ X k =1 log( λ k + γ ) φ k ⊗ φ k , (26) where φ k ⊗ φ k : H → H is a rank-one operator defined by ( φ k ⊗ φ k ) w = h φ k , w i φ k ∀ w ∈ H . Moreover , log ( A + γ I ) ∈ Sym( H ) ∩ T r X ( H ) and assumes the form log( A + γ I ) = A 1 + γ 1 I , A 1 ∈ Sym( H ) ∩ T r( H ) , γ 1 ∈ R . By Proposition 6 in [17], for any α ∈ R , the power function ( A + γ I ) α is then well- defined via the expression ( A + γ I ) α = exp[ α log( A + γ I )] ∈ PT r( H ) . For the purposes of the current w ork, we need to go be yond the set PT r( H ) . Specif- ically , for two operators ( A + γ I ) , ( B + µI ) ∈ PT r( H ) , we sho w that log[( A + γ I )( B + µI ) − 1 ] , [( A + γ I )( B + µI ) − 1 ] α , α ∈ R (27) are all well-defined and are elements of T r X ( H ) , even though they are no longer nec- essarily self-adjoint. 11 First, let B ∈ L ( H ) be an y in vertible operator , then for any A ∈ L ( H ) , we hav e exp( B AB − 1 ) = ∞ X j =0 ( B AB − 1 ) j j ! = B ∞ X j =0 A j j ! B − 1 = B exp( A ) B − 1 . Thus for ( A + γ I ) ∈ PT r( H ) , the logarithm of B ( A + γ I ) B − 1 = B AB − 1 + γ I ∈ T r X ( H ) is also well-defined and is gi ven by log[ B ( A + γ I ) B − 1 ] = B log( A + γ I ) B − 1 = B ( A 1 + γ 1 I ) B − 1 = B A 1 B − 1 + γ 1 I ∈ T r X ( H ) . (28) Using Eq. (28), we obtain the following results. Proposition 1. Let ( A + γ I ) , ( B + µI ) ∈ PT r( H ) . Let Λ + γ µ I = ( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 . Then 1. The lo garithm log[( A + γ I )( B + µI ) − 1 ] ∈ T r X ( H ) is well-defined and is given by log[( A + γ I )( B + µI ) − 1 ] = ( B + µI ) 1 / 2 log Λ + γ µ I ( B + µI ) − 1 / 2 . (29) 2. F or any α ∈ R , the power function [( A + γ I )( B + µI ) − 1 ] α ∈ T r X ( H ) is well-defined and is given by [( A + γ I )( B + µI ) − 1 ] α = ( B + µI ) 1 / 2 Λ + γ µ I α ( B + µI ) − 1 / 2 . (30) 3. F or any p, q ∈ R , any α, β ∈ R such that α + β 6 = 0 , det X α [( A + γ I )( B + µI ) − 1 ] p + β [( A + γ I )( B + µI ) − 1 ] q α + β =det X " α (Λ + γ µ I ) p + β (Λ + γ µ I ) q α + β # . (31) 4. Infinite-Dimensional Alpha-Beta Log-Determinant diver gences W e no w show the motiv ations and deriv ations leading to Definition 1. W e recall that in the case dim( H ) < ∞ , the Log-Det di vergences were motiv ated by K y Fan’ s 12 inequality [21] on the log-concavity of the determinant, which states that for A, B ∈ Sym ++ ( n ) , det( αA + (1 − α ) B ) ≥ det( A ) α det( B ) 1 − α , 0 ≤ α ≤ 1 , with equality if and only if A = B ( 0 < α < 1 ). This inequality has recently been generalized to the infinite-dimensional setting for the extended Fredholm determinant (Theorem 1 in [17]). The following is a further generalization of Theorem 1 in [17]. Theorem 5. Let 0 ≤ α ≤ 1 . F or ( A + γ I ) , ( B + µI ) ∈ PT r( H ) , for any p, q ∈ R , det X [ α ( A + γ I ) p + (1 − α )( B + µI ) q ] ≥ γ p µ q α − δ det X ( A + γ I ) pδ det X ( B + µI ) q (1 − δ ) , (32) wher e δ = αγ p αγ p +(1 − α ) µ q , 1 − δ = (1 − α ) µ q αγ p +(1 − α ) µ q . F or 0 < α < 1 , equality happens if and only if A γ + I p = B µ + I q and γ p = µ q ⇐ ⇒ ( A + γ I ) p = ( B + µI ) q . (33) In particular , for γ = µ 6 = 1 , equality happens if and only if simultaneously p = q and A = B . (34) In particular, for p = q = 1 , we recover Theorem 1 in [17]. From Theorem 5, we immediately hav e the following result. Corollary 1. Let α > 0 , β > 0 . F or ( A + γ I ) , ( B + µI ) ∈ PT r( H ) , for any p, q ∈ R , det X α ( A + γ I ) p + β ( B + µI ) q α + β ≥ γ p µ q α α + β − δ det X ( A + γ I ) pδ det X ( B + µI ) q (1 − δ ) , (35) wher e δ = αγ p αγ p + β µ q , 1 − δ = β µ q αγ p + β µ q . Equality happens if and only if ( A + γ I ) p = ( B + µI ) q . F or γ = µ 6 = 1 , equality happens if and only if simultaneously p = q and A = B . Motiv ated by Theorem 5 and Corollary 1, we first define the following quantity . 13 Definition 4. Let α > 0 , β > 0 be fixed. F or ( A + γ I ) , ( B + µI ) ∈ PT r( H ) , for p, q ∈ R , define D ( α,β ) ( p,q ) [( A + γ I ) , ( B + µI )] = 1 αβ log " γ µ ( p + q )( δ − α α + β ) det X α (Λ + γ µ I ) p + β (Λ + γ µ I ) − q α + β !# , (36) wher e Λ + γ µ I = ( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 , δ = α ( γ µ ) p + q α ( γ µ ) p + q + β . The following theorem giv es sufficient conditions for p, q ∈ R , with α > 0 , β > 0 being fixed, so that for a giv en pair of operators ( A + γ I ) , ( B + µI ) ∈ PT r( H ) , the quantity D ( α,β ) ( p,q ) [( A + γ I ) , ( B + µI )] in Definition 4 is nonnegativ e, with equality if and only if A = B and γ = µ . Theorem 6. Let α > 0 , β > 0 be fixed. F or ( A + γ I ) , ( B + µI ) ∈ PT r( H ) , assume that p, q ∈ R satisfy the following conditions p + q 6 = 0 , (37) αp γ µ p + q = β q . (38) Then the quantity D ( α,β ) ( p,q ) [( A + γ I ) , ( B + µI )] satisfies D ( α,β ) ( p,q ) [( A + γ I ) , ( B + µI )] ≥ 0 , (39) D ( α,β ) ( p,q ) [( A + γ I ) , ( B + µI )] = 0 ⇐ ⇒ A = B , γ = µ. (40) Subsequently , we assume that conditions (37) and (38) are satisfied. W e see that p and q are not uniquely determined by (38). One way to enforce the uniqueness of p and q is by fixing the sum p + q . This is the approach we adopt in this work, which leads to Definition 1. Theorem 7. Under the hypothesis of Theor em 6, assume further that p + q = r , r ∈ R , r 6 = 0 , r fixed. Under this condition, in Definition 4, we have δ = α ( γ µ ) r α ( γ µ ) r + β , p = r (1 − δ ) = β r α ( γ µ ) r + β , q = rδ = αr ( γ µ ) r α ( γ µ ) r + β . (41) 14 Plugging the expr essions for p and q in Eq. (41) into Definition 4, we obtain Defini- tion 1. Furthermor e, the two formulas given in Eqs. (8) and (9) in Definition 1 are equivalent. W e now sho w how D ( α,β ) ( p,q ) [( A + γ I ) , ( B + µI )] can be expressed concretely in terms of the Fredholm determinant. Theorem 8. Let α > 0 , β > 0 be fixed. F or ( A + γ I ) , ( B + µI ) ∈ PT r( H ) , assume that p, q ∈ R satisfy conditions (37) and (38) in Theorem 6. Then D ( α,β ) ( p,q ) [( A + γ I ) , ( B + µI )] = ( p + q )( δ − α α + β ) αβ log γ µ (42) + 1 αβ log α ( γ µ ) p + β ( γ µ ) − q α + β ! + 1 αβ log det " α (Λ + γ µ I ) p + β (Λ + γ µ I ) − q α ( γ µ ) p + β ( γ µ ) − q # . 5. Special cases of the Alpha-Beta Log-Determinant diver gences W e now describe sev eral important special cases of Definition 1, including the infinite-dimensional af fine-in v ariant Riemannian distance, the infinite-dimensional Al- pha Log-Det div ergences [17], and the infinite-dimensional Beta Log-Det di ver gences. 5.1. Affine-in variant Riemannian distance Let HS( H ) denote the space of Hilbert-Schmidt operators on H , which is defined by HS( H ) = { A ∈ L ( H ) : || A || 2 HS = tr( A ∗ A ) < ∞} , where || || HS is the Hilbert-Schmidt norm. If A is Hilbert-Schmidt, then A is compact and possesses a countable set of eigen v alues { λ k } ∞ k =1 . If A is furthermore self-adjoint, then the Hilbert-Schmidt norm of A is gi ven by || A || 2 HS = ∞ X k =1 λ 2 k . W e recall the infinite-dimensional Hilbert manifold of positi ve definite unitized Hilbert- Schmidt operators on H , considered in [18] Σ( H ) = { A + γ I > 0 : A = A ∗ , A ∈ HS( H ) , γ ∈ R } . 15 In the case dim( H ) = ∞ , the set PT r( H ) of positive definite unitized trace class operators on H is a strict subset of Σ( H ) . The manifold Σ( H ) can be equipped with the following Riemannian metric, as formulated by [18]. For each P ∈ Σ( H ) , on the tangent space T P (Σ( H )) ∼ = H R = { A + γ I : A = A ∗ , A ∈ HS( H ) , γ ∈ R } , we define the following inner product h A + γ I , B + µI i P = h P − 1 / 2 ( A + γ I ) P − 1 / 2 , P − 1 / 2 ( B + µI ) P − 1 / 2 i eHS , where h , i eHS is the extended Hilbert-Schmidt inner product, defined by h A + γ I , B + µI i eHS = h A, B i HS + γ µ. The Riemannian metric giv en by h , i P then makes Σ( H ) an infinite-dimensional Riemannian manifold. Under this metric, the geodesic distance between ( A + γ I ) , ( B + µI ) is gi ven by d aiHS [( A + γ I ) , ( B + µI )] = || log[( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 ] || eHS . (43) W e now show that the af fine-in variant distance d aiHS [( A + γ I ) , ( B + µI )] is a limiting case of D ( α,β ) r [( A + γ I ) , ( B + µI )] , as α → 0 , β → 0 . In this section, we consider β = α , in which case Definition 1 reduces to the following. Definition 5. In Definition 1, with α = β , we have D ( α,α ) r [( A + γ I ) , ( B + µI )] = 1 α 2 log " γ µ r ( δ − 1 2 ) det X (Λ + γ µ I ) r (1 − δ ) + (Λ + γ µ I ) − rδ 2 !# , (44) wher e δ = ( γ µ ) r ( γ µ ) r +1 , 1 − δ = 1 ( γ µ ) r +1 . By Theorem 8, we have the following formula, which expresses D ( α,α ) r [( A + γ I ) , ( B + µI )] concretely in terms of the Fredholm determinant. D ( α,α ) r [( A + γ I ) , ( B + µI )] = r ( δ − 1 2 ) α 2 log γ µ + 1 α 2 log ( γ µ ) p + ( γ µ ) − q 2 ! + 1 α 2 log det " (Λ + γ µ I ) p + (Λ + γ µ I ) − q ( γ µ ) p + ( γ µ ) − q # , (45) 16 where δ = ( γ µ ) r ( γ µ ) r +1 , 1 − δ = 1 ( γ µ ) r +1 , p = r (1 − δ ) , q = r δ . The following is the main result in this section. Theorem 9 ( Affine-In variant Riemannian Distance ) . Let ( A + γ I ) , ( B + µI ) ∈ PT r( H ) . Assume that r = r ( α ) is smooth, with r (0) = 0 , r 0 (0) 6 = 0 , and r ( α ) 6 = 0 for α 6 = 0 . Then lim α → 0 D ( α,α ) r [( A + γ I ) , ( B + µI )] = [ r 0 (0)] 2 8 d 2 aiHS [( A + γ I ) , ( B + µI )] . (46) In particular , for r = 2 α , we have lim α → 0 D ( α,α ) 2 α [( A + γ I ) , ( B + µI )] = 1 2 d 2 aiHS [( A + γ I ) , ( B + µI )] . (47) Remark 8 . W e stress that, as the y are currently stated, the limits in Theorem 9 are v alid for ( A + γ I ) , ( B + µI ) ∈ PT r( H ) , that is A and B must be trace class operators. The generalization of Theorem 9 to the entire Hilbert manifold Σ( H ) , where A and B are Hilbert-Schmidt operators, will be presented in an upcoming work. 5.2. Infinite-dimensional Alpha Log-Determinant diver gences W e no w sho w that the formulation for the infinite-dimensional Alpha Log-Determinant div ergences in [17] is a special case of the present formulation, with β = 1 − α and r = ± 1 . Let dim( H ) = ∞ . W e recall that for − 1 < α < 1 , the Log-Det α -div ergence d α logdet [( A + γ I ) , ( B + µI )] for ( A + γ I ) , ( B + µI ) ∈ PT r( H ) is defined in [17] to be d α logdet [( A + γ I ) , ( B + µI )] = 4 1 − α 2 log " det X 1 − α 2 ( A + γ I ) + 1+ α 2 ( B + µI ) det X ( A + γ I ) q det X ( B + µI ) 1 − q γ µ q − 1 − α 2 # , (48) where q = (1 − α ) γ (1 − α ) γ +(1+ α ) µ and 1 − q = (1+ α ) µ (1 − α ) γ +(1+ α ) µ , with the limiting cases α = ± 1 giv en by d 1 logdet [( A + γ I ) , ( B + µI )] = γ µ − 1 log γ µ + tr X [( B + µI ) − 1 ( A + γ I ) − I ] − γ µ log det X [( B + µI ) − 1 ( A + γ I )] . (49) d − 1 logdet [( A + γ I ) , ( B + µI )] = µ γ − 1 log µ γ + tr X ( A + γ I ) − 1 ( B + µI ) − I − µ γ log det X [( A + γ I ) − 1 ( B + µI )] . (50) 17 Definition 6. In Definition 1, with 0 < α < 1 and β = 1 − α , we have D ( α, 1 − α ) r [( A + γ I ) , ( B + µI )] (51) = 1 α (1 − α ) log " γ µ r ( δ − α ) det X α Λ + γ µ I r (1 − δ ) + (1 − α ) Λ + γ µ I − rδ !# . wher e δ = α ( γ µ ) r α ( γ µ ) r +1 − α , 1 − δ = 1 − α α ( γ µ ) r +1 − α . The follo wing result sho ws that D ( α, 1 − α ) r [( A + γ I ) , ( B + µI )] for the cases r = ± 1 are precisely d 1 − 2 α logdet [( A + γ I ) , ( B + µI )] and d 2 α − 1 logdet [( A + γ I ) , ( B + µI )] , respecti vely . Theorem 10 ( Alpha Log-Determinant Divergences ) . Let 0 < α < 1 be fixed. F or ( A + γ I ) , ( B + µI ) ∈ PT r( H ) , D ( α, 1 − α ) 1 [( A + γ I ) , ( B + µI )] = δ − α α (1 − α ) log γ µ + 1 α (1 − α ) log det X [ α ( A + γ I ) + (1 − α )( B + µI )] det X ( A + γ I ) δ det X ( B + µI ) 1 − δ (52) = d 1 − 2 α logdet [( A + γ I ) , ( B + µI )] , wher e δ = αγ αγ +(1 − α ) µ . Similarly , D ( α, 1 − α ) − 1 [( A + γ I ) , ( B + µI )] = d 2 α − 1 logdet [( A + γ I ) , ( B + µI )] . (53) At the endpoints α = 0 and α = 1 , lim α → 1 D ( α, 1 − α ) 1 [( A + γ I ) , ( B + µI )] = d − 1 logdet [( A + γ I ) , ( B + µI )] (54) lim α → 0 D ( α, 1 − α ) 1 [( A + γ I ) , ( B + µI )] = d 1 logdet [( A + γ I ) , ( B + µI )] . (55) In particular , in Theorem 10, for γ = µ , we have δ = α , and D ( α, 1 − α ) 1 [( A + γ I ) , ( B + γ I )] = 1 α (1 − α ) log det X [ α ( A + γ I ) + (1 − α )( B + γ I )] det X ( A + γ I ) α det X ( B + γ I ) 1 − α . (56) This is the direct generalization of the finite-dimensional formula giv en by Eq. (6) in [14]. Remark 9 ( Beta Log-Determinant Diver gences ) . In the finite-dimensional setting in [15], the authors call D 1 ,β ( A, B ) the Beta Log-Determinant div ergence between 18 A, B ∈ Sym ++ ( n ) . Similarly , in the case dim( H ) = ∞ , let β > 0 be fixed and let r ∈ R , r 6 = 0 be fixed. For ( A + γ I ) , ( B + µI ) ∈ PT r( H ) , we then have the corresponding infinite-dimensional Beta Log-Determinant div ergence D (1 ,β ) r [( A + γ I ) , ( B + µI )] = 1 β log " γ µ r ( δ − 1 1+ β ) det X (Λ + γ µ I ) r (1 − δ ) + β (Λ + γ µ I ) − rδ 1 + β !# , (57) where Λ + γ µ I = ( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 , δ = ( γ µ ) r ( γ µ ) r + β , 1 − δ = β ( γ µ ) r + β . Howe ver , we do not explore this di ver gence in detail in this work. 5.3. Other limiting cases W e consider next two other limiting cases, namely β → 0 when α > 0 is fix ed, and α → 0 when β > 0 is fix ed. In particular , our definitions of D ( α, 0) r [( A + γ I ) , ( B + µI )] , α > 0 , and D (0 ,β ) r [( A + γ I ) , ( B + µI )] , β > 0 , as given in Definition 2, are based on the respectiv e limits in Theorems 11 and 12 belo w . Theorem 11 ( Liming case α > 0 , β → 0 ) . Let α > 0 be fixed. Assume that r = r ( β ) is smooth, with r (0) = r ( β = 0) . Then lim β → 0 D ( α,β ) r [( A + γ I ) , ( B + µI )] = r (0) α 2 " µ γ r (0) − 1 # log µ γ (58) + 1 α 2 tr X ([( A + γ I ) − 1 ( B + µI )] r (0) − I ) − 1 α 2 µ γ r (0) log det X [( A + γ I ) − 1 ( B + µI )] r (0) . Theorem 12 ( Limit case α → 0 , β > 0 ) . Let β > 0 be fixed. Assume that r = r ( α ) is smooth, with r (0) = r ( α = 0) . Then lim α → 0 D ( α,β ) r [( A + γ I ) , ( B + µI )] = r (0) β 2 " γ µ r (0) − 1 # log γ µ (59) + 1 β 2 tr X ([( B + µI ) − 1 ( A + γ I )] r (0) − I ) − 1 β 2 γ µ r (0) log det X [( B + µI ) − 1 ( A + γ I )] r (0) . 19 Special cases . Let us now describe several special cases of Theorems 11 and 12, including their specialization to the finite-dimensional setting. (i) For γ = µ , we hav e lim β → 0 D ( α,β ) r [( A + γ I ) , ( B + γ I )] = 1 α 2 tr X ([( A + γ I ) − 1 ( B + γ I )] r (0) − I ) − 1 α 2 log det X [( A + γ I ) − 1 ( B + γ I )] r (0) , (60) lim α → 0 D ( α,β ) r [( A + γ I ) , ( B + γ I )] = 1 β 2 tr X ([( B + γ I ) − 1 ( A + γ I )] r (0) − I ) − 1 β 2 log det X [( B + γ I ) − 1 ( A + γ I )] r (0) . (61) In particular , for r = α + β , we have r ( β = 0) = α , r ( α = 0) = β , so that lim β → 0 D ( α,β ) α + β [( A + γ I ) , ( B + γ I )] (62) = 1 α 2 tr X ([( A + γ I ) − 1 ( B + γ I )] α − I ) − α log det X [( A + γ I ) − 1 ( B + γ I )] , lim α → 0 D ( α,β ) α + β [( A + γ I ) , ( B + γ I )] (63) = 1 β 2 tr X ([( B + γ I ) − 1 ( A + γ I )] β − I ) − β log det X [( B + γ I ) − 1 ( A + γ I )] . These are the direct generalizations of the corresponding formulas in the finite-dimensional setting. In fact, for A, B ∈ Sym ++ ( n ) , n ∈ N , by setting γ = 0 , we obtain lim β → 0 D ( α,β ) α + β [ A, B ] = 1 α 2 tr([( A − 1 B ) α − I ) − α log det( A − 1 B ) , (64) lim α → 0 D ( α,β ) α + β [ A, B ] = 1 β 2 tr([( B − 1 A ) β − I ) − β log det( B − 1 A ) . (65) These are precisely the finite-dimensional expressions gi ven by Eqs. (5) and 6, which are Eqs. (23) and (22) in [15], respectiv ely . (ii) If r (0) = r ( β = 0) = 1 , we hav e for α > 0 fixed, lim β → 0 D ( α,β ) r [( A + γ I ) , ( B + µI )] = 1 α 2 µ γ − 1 log µ γ + 1 α 2 tr X [( A + γ I ) − 1 ( B + µI ) − I ] − µ γ log det X [( A + γ I ) − 1 ( B + µI )] = 1 α 2 d − 1 logdet [( A + γ I ) , ( B + µI )] . (66) 20 Similarly , if r (0) = r ( α = 0) = 1 , we have for β > 0 fixed, lim α → 0 D ( α,β ) r [( A + γ I ) , ( B + µI )] = 1 β 2 γ µ − 1 log γ µ + 1 β 2 tr X [( B + µI ) − 1 ( A + γ I ) − I ] − γ µ log det X [( B + µI ) − 1 ( A + γ I )] = 1 β 2 d 1 logdet [( A + γ I ) , ( B + µI )] . (67) In particular , if r ≡ 1 as a constant function, then with β = 1 − α , we have lim α → 1 D ( α, 1 − α ) 1 [( A + γ I ) , ( B + µI )] = d − 1 logdet [( A + γ I ) , ( B + µI )] lim α → 0 D ( α, 1 − α ) 1 [( A + γ I ) , ( B + µI )] = d 1 logdet [( A + γ I ) , ( B + µI )] , which are precisely the limiting cases stated in Eqs. (54) and (55) in Theorem 10. 6. Properties of the Alpha-Beta Log-Determinant di vergences The follo wing results establish se veral important results of D ( α,β ) r as defined above, which generalize those from both the finite-dimensional setting [14, 15] and the infinite- dimensional Alpha Log-Det div ergences [17]. Theorem 13 ( Dual symmetry ) . D ( β ,α ) r [( B + µI ) , ( A + γ I )] = D ( α,β ) r [( A + γ I ) , ( B + µI )] . (68) In particular , for β = α , we have D ( α,α ) r [( B + µI ) , ( A + γ I )] = D ( α,α ) r [( A + γ I ) , ( B + µI )] . (69) Special case: Dual symmetry of the infinite-dimensional Alpha Log-Det diver - gences . By Theorem 10, we have for 0 ≤ α ≤ 1 , D ( α, 1 − α ) 1 [( A + γ I ) , ( B + µI )] = D (1 − α,α ) 1 [( B + µI ) , ( A + γ I )] ⇐ ⇒ d 1 − 2 α logdet [( A + γ I ) , ( B + µI )] = d − (1 − 2 α ) logdet [( B + µI ) , ( A + γ I )] . (70) This is precisely the dual symmetry of the infinite-dimensional Alpha Log-Det di ver - gences (Theorem 4 in [17]). 21 Theorem 14 ( Dual in variance under in version ) . D ( α,β ) r [( A + γ I ) − 1 , ( B + µI ) − 1 ] = D ( α,β ) − r [( A + γ I ) , ( B + µI )] (71) Special case: Dual in variance under inv ersion of the infinite-dimensional Al- pha Log-Det diver gences . By Theorem 10, we hav e D ( α, 1 − α ) 1 [( A + γ I ) − 1 , ( B + µI ) − 1 ] = D ( α, 1 − α ) − 1 [( A + γ I ) , ( B + µI )] ⇐ ⇒ d 1 − 2 α logdet [( A + γ I ) − 1 , ( B + µI ) − 1 ] = d − (1 − 2 α ) logdet [( A + γ I ) , ( B + µI )] . (72) This is precisely the dual inv ariance under inv ersion of the infinite-dimensional Alpha Log-Det div ergences (Theorem 5 in [17]). Theorem 15 ( Affine inv ariance ) . F or any ( A + γ I ) , ( B + µI ) ∈ PT r( H ) and any in vertible ( C + ν I ) ∈ T r X ( H ) , ν 6 = 0 , D ( α,β ) r [( C + ν I )( A + γ I )( C + ν I ) ∗ , ( C + ν I )( B + µI )( C + ν I ) ∗ ] = D ( α,β ) r [( A + γ I ) , ( B + µI )] . (73) Theorem 16 ( In variance under unitary transformations ) . F or any ( A + γ I ) , ( B + µI ) ∈ PT r( H ) and any C ∈ L ( H ) , with C C ∗ = C ∗ C = I , D ( α,β ) r [ C ( A + γ I ) C ∗ , C ( B + µI ) C ∗ ] = D ( α,β ) r [( A + γ I ) , ( B + µI )] . (74) Theorem 17. D ( α,β ) r [( A + γ I ) , ( B + µI )] = D ( α,β ) r Λ + γ µ I , I . (75) Theorem 18. Let ω ∈ R , ω 6 = 0 be arbitrary . Then D ( ωα,ω β ) ωr [( A + γ I ) , ( B + µI )] = 1 ω 2 D ( α,β ) r Λ + γ µ I ω , I . (76) The follo wing two properties are important for pro ving that the square root function q D ( α,α ) 2 α [( A + γ I ) , ( B + γ I )] , is a metric on PT r( H )( γ ) . W e focus on the case α > 0 , since for α = 0 , q D ( α,α ) 2 α [( A + γ I ) , ( B + µI )] = 1 √ 2 d aiHS [( A + γ I ) , ( B + µI )] is automatically a metric on PT r( H ) . 22 Theorem 19 ( Con vergence in trace norm ) . Let α > 0 be fixed. Let H be a separ able Hilbert space. Let A, B : H → H be self-adjoint, trace class operators such that ( I + A ) > 0 , ( I + B ) > 0 . Let { A n } n ∈ N , { B n } n ∈ N be sequences of self-adjoint, trace-class operator s such that lim n →∞ || A n − A || tr = 0 , lim n →∞ || B n − B || tr = 0 . Then lim n →∞ D ( α,α ) 2 α [( I + A n ) , ( I + B n )] = D ( α,α ) 2 α [( I + A ) , ( I + B )] . (77) Theorem 20 ( T riangle inequality ) . Let α > 0 be fixed. Let H be a separable Hilbert space. Let γ > 0 , γ ∈ R be fixed. Let A, B , C : H → H be self-adjoint, trace class operators suc h that ( A + γ I ) > 0 , ( B + γ I ) > 0 , ( C + γ I ) > 0 . Then q D ( α,α ) 2 α [( A + γ I ) , ( B + γ I )] ≤ q D ( α,α ) 2 α [( A + γ I ) , ( C + γ I )] + q D ( α,α ) 2 α [( C + γ I ) , ( B + γ I )] . (78) In particular , for α = 1 / 2 and γ = 1 , we obtain the following triangle inequality . Theorem 21 ( T riangle inequality- square root of symmetric Stein div ergence ) . Let H be a separable Hilbert space. Let A, B , C : H → H be self-adjoint trace-class operators with A + I > 0 , B + I > 0 , C + I > 0 . Then s log det( A + B 2 + I ) p det( A + I ) det( B + I ) ≤ s log det( A + C 2 + I ) p det( A + I ) det( C + I ) + s log det( C + B 2 + I ) p det( C + I ) det( B + I ) . (79) Theorem 22 ( Diagonalization ) . Let α ≥ 0 be fixed. Let H be a separable Hilbert space. Let γ > 0 , γ ∈ R , be fixed. Let A, B : H → H be self-adjoint trace class operators, such that A + γ I > 0 , B + γ I > 0 . Let Eig( A ) , Eig ( B ) : ` 2 → ` 2 be diagonal operators with the diagonals consisting of the eigen values of A and B , r espectively , in decreasing or der . Then D ( α,α ) 2 α [(Eig( A ) + γ I ) , (Eig( B ) + γ I )] ≤ D ( α,α ) 2 α [( A + γ I ) , ( B + γ I )] . (80) 7. Alpha-Beta Log-Det diver gences between RKHS covariance operators Let X be an arbitrary non-empty set. W e no w compute the Alpha-Beta Log-Det div ergences between cov ariance operators on an RKHS induced by a positi ve definite 23 kernel K on X × X . In this case, we have explicit formulas for D ( α,β ) r via the cor- responding Gram matrices. W e recall that similar formulas exist in the cases of the Log-Hilbert-Schmidt distance [19], the infinite-dimensional affine-in variant Rieman- nian distance [18, 20], and the infinite-dimensional Alpha Log-Det div ergences [17]. W e first prove the follo wing result. Theorem 23. Let H 1 , H 2 be separable Hilbert spaces. Let A, B : H 1 → H 2 be compact linear operators such that both AA ∗ : H 2 → H 2 and B B ∗ : H 2 → H 2 ar e trace class operators. Assume that dim( H 2 ) = ∞ . Let α, β > 0 be fixed. F or any r ∈ R , r 6 = 0 , for any γ > 0 , µ > 0 , D ( α,β ) r [( AA ∗ + γ I H 2 ) , ( B B ∗ + µI H 2 )] (81) = r ( δ − α α + β ) αβ log γ µ + 1 αβ log α ( γ µ ) p + β ( γ µ ) − q α + β ! + 1 αβ log det " α ( γ µ ) p ( C + I H 1 ⊗ I 3 ) p + β ( γ µ ) − q ( C + I H 1 ⊗ I 3 ) − q α ( γ µ ) p + β ( γ µ ) − q # , wher e δ = αγ r αγ r + β µ r , p = r (1 − δ ) , q = r δ , and C = A ∗ A γ − A ∗ B √ γ µ ( I H 1 + B ∗ B µ ) − 1 − A ∗ AA ∗ B γ √ γ µ ( I H 1 + B ∗ B µ ) − 1 B ∗ A √ γ µ − B ∗ B µ ( I H 1 + B ∗ B µ ) − 1 − B ∗ AA ∗ B γ µ ( I H 1 + B ∗ B µ ) − 1 B ∗ A √ γ µ − B ∗ B µ ( I H 1 + B ∗ B µ ) − 1 − B ∗ AA ∗ B γ µ ( I H 1 + B ∗ B µ ) − 1 . (82) For comparison, the following is the corresponding version of D ( α,β ) r [( AA ∗ + γ I H 2 ) , ( B B ∗ + µI H 2 )] , using the finite-dimensional formula given in Eq. (19), when dim( H 2 ) < ∞ . Theorem 24. Let H 1 , H 2 be separable Hilbert spaces. Let A, B : H 1 → H 2 be compact linear operators such that both AA ∗ : H 2 → H 2 and B B ∗ : H 2 → H 2 ar e trace class operators. Assume that dim( H 2 ) < ∞ . Let α, β > 0 be fixed. F or any 24 r ∈ R , r 6 = 0 , for any γ > 0 , µ > 0 , D ( α,β ) r [( AA ∗ + γ I H 2 ) , ( B B ∗ + µI H 2 )] (83) = 1 αβ " log α ( γ µ ) p + β ( γ µ ) − q α + β !# dim( H 2 ) + 1 αβ log det " α ( γ µ ) p ( C + I H 1 ⊗ I 3 ) p + β ( γ µ ) − q ( C + I H 1 ⊗ I 3 ) − q α ( γ µ ) p + β ( γ µ ) − q # , wher e p = r β α + β , q = r α α + β , and C is as given in Theor em 23. Let us briefly recall the RKHS covariance operators discussed in [17]. Let x = [ x 1 , . . . , x m ] be a data matrix randomly sampled from X according to a Borel probabil- ity distrib ution ρ , where m ∈ N is the number of observ ations. Let K be a positi ve defi- nite k ernel on X ×X and H K its induced reproducing k ernel Hilbert space (RKHS). Let Φ : X → H K be the corresponding feature map, so that K ( x, y ) = h Φ( x ) , Φ( y ) i H K for all pairs ( x, y ) ∈ X × X . The feature map Φ gives rise to the bounded linear operator Φ( x ) : R m → H K , Φ( x ) b = m X j =1 b j Φ( x j ) , b ∈ R m . (84) The operator Φ( x ) can also be viewed as the (potentially infinite) mapped data matrix Φ( x ) = [Φ( x 1 ) , . . . , Φ( x m )] of size dim( H K ) × m in the feature space H K , with the j th column being Φ( x j ) . The corresponding empirical co variance operator for Φ( x ) is defined to be C Φ( x ) = 1 m Φ( x ) J m Φ( x ) ∗ : H K → H K , (85) where Φ( x ) ∗ : H K → R m is the adjoint operator of Φ( x ) and J m is the centering matrix, defined by J m = I m − 1 m 1 m 1 T m with 1 m = (1 , . . . , 1) T ∈ R m . Let x = [ x i ] m i =1 , y = [ y i ] m i =1 , m ∈ N , be two random data matrices sampled from X according to two Borel probability distributions and C Φ( x ) , C Φ( y ) be the corre- sponding cov ariance operators induced by the kernel K . Let K [ x ] , K [ y ] , and K [ x , y ] be the m × m Gram matrices defined by ( K [ x ]) ij = K ( x i , x j ) , ( K [ y ]) ij = K ( y i , y j ) , ( K [ x , y ]) ij = K ( x i , y j ) , 1 ≤ i, j ≤ m. (86) 25 Let A = 1 √ m Φ( x ) J m : R m → H K , B = 1 √ m Φ( y ) J m : R m → H K , so that AA ∗ = C Φ( x ) , B B ∗ = C Φ( y ) , A ∗ A = 1 m J m K [ x ] J m , B ∗ B = 1 m J m K [ y ] J m , A ∗ B = 1 m J m K [ x , y ] J m , B ∗ A = 1 m J m K [ y , x ] J m . (87) Theorems 23 and 24 can then be applied to gi ve closed form formulas for the di- ver gences between ( C Φ( x ) + γ I ) and ( C Φ( y ) + µI ) , as follows. Theorem 25 ( Alpha-Beta Log-Det diver gences between RKHS covariance oper - ators - Infinite-dimensional version ) . Let α, β > 0 be fixed. Let r ∈ R , r 6 = 0 be fixed. Assume that dim( H K ) = ∞ . F or any γ > 0 , µ > 0 , the diver gence D ( α,β ) r [( C Φ( x ) + γ I ) , ( C Φ( y ) + µI )] is given by D ( α,β ) r [( C Φ( x ) + γ I ) , ( C Φ( y ) + µI )] (88) = r ( δ − α α + β ) αβ log γ µ + 1 αβ log α ( γ µ ) p + β ( γ µ ) − q α + β ! + 1 αβ log det " α ( γ µ ) p ( C + I 3 m ) p + β ( γ µ ) − q ( C + I 3 m ) − q α ( γ µ ) p + β ( γ µ ) − q # , wher e δ = αγ r αγ r + β µ r , p = r (1 − δ ) , q = r δ , and C = C 11 C 12 C 13 C 21 C 22 C 23 C 21 C 22 C 23 ∈ R 3 m × 3 m . (89) Her e the sub-matrices C ij , i = 1 , 2 , j = 1 , 2 , 3 , each of size m × m , ar e given by C 11 = 1 γ m J m K [ x ] J m , (90) C 12 = − 1 √ γ µm J m K [ x , y ] J m I m + 1 µm J m K [ y ] J m − 1 , (91) C 13 = − 1 γ √ γ µm 2 J m K [ x ] J m K [ x , y ] J m I m + 1 µm J m K [ y ] J m − 1 , (92) C 21 = 1 √ γ µm J m K [ y , x ] J m , (93) C 22 = − 1 µm J m K [ y ] J m I m + 1 µm J m K [ y ] J m − 1 , (94) C 23 = − 1 γ µm 2 J m K [ y , x ] J m K [ x , y ] J m I m + 1 µm J m K [ y ] J m − 1 . (95) 26 Theorem 26 ( Alpha-Beta Log-Det diver gences between RKHS covariance oper - ators - Finite-dimensional version ) . Let α, β > 0 be fixed. Let r ∈ R , r 6 = 0 be fixed. Assume that dim( H K ) < ∞ . F or any γ > 0 , µ > 0 , the diver gence D ( α,β ) r [( C Φ( x ) + γ I ) , ( C Φ( y ) + µI )] is given by D ( α,β ) r [( C Φ( x ) + γ I ) , ( C Φ( y ) + µI )] (96) = 1 αβ " log α ( γ µ ) p + β ( γ µ ) − q α + β !# dim( H K ) + 1 αβ log det " α ( γ µ ) p ( C + I 3 m ) p + β ( γ µ ) − q ( C + I 3 m ) − q α ( γ µ ) p + β ( γ µ ) − q # , wher e p = r β α + β , q = r α α + β , and C is as given in Theor em 25. Remark 10 . The closed form formulas for D ( α,β ) r [( C Φ( x ) + γ I ) , ( C Φ( y ) + µI )] given in Eqs. (88) and (96) in Theorems 25 and 26, respecti vely , coincide if and only if γ = µ . If γ 6 = µ , then the right hand side of Eq. (96) approaches infinity when dim( H K ) → ∞ . Thus in general, the infinite-dimensional version is not obtainable as the limit of the finite-dimensional version as the dimension goes to infinity . Remark 11 . The closed form formulas giv en by Eqs. (88) and (96) in Theorems 25 and 26, respectiv ely , are derived under more general conditions than those in [17] and are consequently more general but more complicated than the corresponding closed form formulas for the Alpha Log-Det di vergences in [17] (see Theorems 12,13, 15, 16 in [17]). Thus for practical applications inv olving the Alpha Log-Det di ver gences, the corresponding closed form formulas in [17] should be employed. Appendix A. Proofs of main r esults Appendix A.1. Pr oofs for the gener al Alpha-Beta Log-Determinant diver gences In this section, we prov e Lemma 1, Proposition 1, and Theorems 5, 6, 7, and 8. Proof of Lemma 1 . Since any bounded operator A commutes with the identity opera- tor I , we ha ve exp( A + γ I ) = e γ exp( A ) = e γ I + ∞ X j =1 A j j ! = e γ I + e γ ∞ X j =1 A j j ! , 27 where P ∞ j =1 A j j ! is trace class, since ∞ X j =1 A j j ! tr ≤ ∞ X j =1 || A || j tr j ! = exp( || A || tr ) − 1 < ∞ . Thus exp( A + γ I ) ∈ T r X ( H ) . This completes the proof. Proof of Proposition 1 . For ( A + γ I ) , ( B + µI ) ∈ PT r( H ) , we hav e ( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 ∈ PT r( H ) and the logarithm log [( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 ] ∈ T r X ( H ) is well-defined. By the discussion preceding Proposition 1, we hav e log[( A + γ I )( B + µI ) − 1 ] = log[( B + µI ) 1 / 2 ( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 ( B + µI ) − 1 / 2 ] = ( B + µI ) 1 / 2 log[( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 ]( B + µI ) − 1 / 2 = ( B + µI ) 1 / 2 log Λ + γ µ I ( B + µI ) − 1 / 2 ∈ T r X ( H ) . For the po wer function, we hav e [( A + γ I )( B + µI ) − 1 ] α = exp( α log[( A + γ I )( B + µI ) − 1 ]) = exp ( B + µI ) 1 / 2 α log Λ + γ µ I ( B + µI ) − 1 / 2 = ( B + µI ) 1 / 2 exp α log Λ + γ µ I ( B + µI ) − 1 / 2 = ( B + µI ) 1 / 2 Λ + γ µ I α ( B + µI ) − 1 / 2 . For the sum of two po wer functions, we then have α [( A + γ I )( B + µI ) − 1 ] p + β [( A + γ I )( B + µI ) − 1 ] q α + β = ( B + µI ) 1 / 2 " α (Λ + γ µ I ) p + β (Λ + γ µ I ) q α + β # ( B + µI ) − 1 / 2 . By Lemma 5 in [17], det X [ C ( A + γ I ) C − 1 ] = det X ( A + γ I ) for any inv ertible 28 operator C ∈ L ( H ) . It follows that det X α [( A + γ I )( B + µI ) − 1 ] p + β [( A + γ I )( B + µI ) − 1 ] q α + β =det X " α (Λ + γ µ I ) p + β (Λ + γ µ I ) q α + β # . This completes the proof. Proof of Theorem 5 . By definition of the power function, we ha ve α ( A + γ I ) p + (1 − α )( B + µI ) q = α exp[ p log( A + γ I )] + (1 − α ) exp[ q log( B + µI )] = α exp p log A γ + I + p (log γ ) I + (1 − α ) exp q log B µ + I + q (log µ ) I = αγ p A γ + I p + (1 − α ) µ q B µ + I q . It follows that for δ = αγ p αγ p +(1 − α ) µ q , 1 − δ = (1 − α ) µ q αγ p +(1 − α ) µ q , we hav e det X [ α ( A + γ I ) p + (1 − α )( B + µI ) q ] = [ αγ p + (1 − α ) µ q ] det αγ p αγ p + (1 − α ) µ q A γ + I p + (1 − α ) µ q αγ p + (1 − α ) µ q B µ + I q ≥ [ α γ p + (1 − α ) µ q ] det A γ + I pδ det B µ + I q (1 − δ ) by Proposition 7 in [17] ≥ γ pα µ (1 − α ) q det A γ + I pδ det B µ + I q (1 − δ ) by Ky F an’ s Inequality applied to αγ p + (1 − α ) µ q = γ p ( α − δ ) µ − q ( α − δ ) det X ( A + γ I ) pδ det X ( B + µI ) q (1 − δ ) = γ p µ q α − δ det X ( A + γ I ) pδ det X ( B + µI ) q (1 − δ ) . For 0 < α < 1 , equality happens if and only if simultaneously , we hav e A γ + I p = B µ + I q and γ p = µ q ⇐ ⇒ ( A + γ I ) p = ( B + µI ) q . In particular , for γ = µ , the condition γ p = µ q becomes γ p = γ q ⇐ ⇒ γ p − q = 1 ⇐ ⇒ p = q if γ 6 = 1 . 29 W ith the conditions γ = µ 6 = 1 and p = q , we then hav e A γ + I p = B γ + I p ⇐ ⇒ A = B . This completes the proof of the theorem. Proof of Theorem 6 . Recall that we write the operator ( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 in the form ( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 = Λ + ( γ /µ ) I ∈ PT r( H ) . Its in verse has the form ( B + µI ) 1 / 2 ( A + γ I ) − 1 ( B + µI ) 1 / 2 = [Λ + ( γ /µ ) I ] − 1 = µ γ I − µ γ 2 Λ I + µ γ Λ − 1 ∈ PT r( H ) . It follows from Corollary 1 that det X α [(Λ + ( γ /µ ) I ] p + β [(Λ + ( γ /µ ) I ) − 1 ] q α + β ≥ ( γ /µ ) p ( µ/γ ) q α α + β − δ det X (Λ + ( γ /µ ) I ] pδ det X [(Λ + ( γ /µ ) I ] − q (1 − δ ) = γ µ ( p + q )( α α + β − δ ) det X (Λ + ( γ /µ ) I ] pδ det X [(Λ + ( γ /µ ) I ] − q (1 − δ ) , (A.1) where δ = α ( γ µ ) p α ( γ µ ) p + β ( µ γ ) q = α ( γ µ ) p + q α ( γ µ ) p + q + β , 1 − δ = β ( µ γ ) q α ( γ µ ) p + β ( µ γ ) q = β α ( γ µ ) p + q + β . For the two determinants on the right hand side of (A.1) to cancel each other out, we need pδ = q (1 − δ ) ⇐ ⇒ α p γ µ p = β q µ γ q ⇐ ⇒ αp γ µ p + q = β q . Assuming that this condition holds, then along with the definition of D ( α,β ) ( p,q ) , (A.1) giv es " γ µ ( p + q )( δ − α α + β ) det X α (Λ + γ µ I ) p + β (Λ + γ µ I ) − q α + β !# ≥ 1 ⇐ ⇒ D ( α,β ) ( p,q ) [( A + γ I ) , ( B + µI )] ≥ 0 . 30 In the inequality in (A.1), the equality sign happens if and only if [(Λ + ( γ /µ ) I ] p = [(Λ + ( γ /µ ) I ] − q ⇐ ⇒ [(Λ + ( γ /µ ) I ] p + q = I . If p + q = 0 , then this is always true, so that D ( α,β ) ( p,q ) [( A + γ I ) , ( B + µI )] = 0 for all pairs ( A + γ I ) , ( B + µI ) ∈ PT r( H ) , which is not what we want. In fact, with p + q = 0 , the condition αp γ µ p + q = β q gives ( α + β ) p = 0 ⇒ p = 0 ⇒ q = 0 . If p + q 6 = 0 , since Λ + ( γ /µ ) I > 0 , this happens if and only if Λ + ( γ /µ ) I = I ⇐ ⇒ ( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 = I ⇐ ⇒ A + γ I = B + µI ⇐ ⇒ A = B and γ = µ. This completes the proof. Proof of Theorem 7 . Under the condition p + q = r , by Theorem 6, we hav e αp γ µ r = β ( r − p ) ⇒ p = β r α γ µ r + β It follo ws then that q = r − p = rα ( γ µ ) r α ( γ µ ) r + β . The equiv alence of Eqs. (8) and (9) follo ws from Proposition 1. Proof of Theorem 8 . W e have α (Λ + γ µ I ) p + β (Λ + γ µ I ) − q α + β = α ( γ µ ) p ( µ γ Λ + I ) p + β ( γ µ ) − q ( µ γ Λ + I ) − q α + β = α ( γ µ ) p ( I + C 1 ) + β ( γ µ ) − q ( I + C 2 ) α + β = h α ( γ µ ) p + β ( γ µ ) − q i I + h α ( γ µ ) p C 1 + β ( γ µ ) − q C 2 i α + β = α ( γ µ ) p + β ( γ µ ) − q α + β " I + α ( γ µ ) p C 1 + β ( γ µ ) − q C 2 α ( γ µ ) p + β ( γ µ ) − q # , where C 1 = P ∞ k =1 p k k ! h log µ γ Λ + I i k ∈ T r( H ) , C 2 = P ∞ k =1 ( − 1) k q k k ! h log µ γ Λ + I i k ∈ 31 T r( H ) . By definition of the det X function, we then hav e log det X " α (Λ + γ µ I ) p + β (Λ + γ µ I ) − q ) α + β # = log α ( γ µ ) p + β ( γ µ ) − q α + β ! + log det " I + α ( γ µ ) p C 1 + β ( γ µ ) − q C 2 α ( γ µ ) p + β ( γ µ ) − q # = log α ( γ µ ) p + β ( γ µ ) − q α + β ! + log det " α (Λ + γ µ I ) p + β (Λ + γ µ I ) − q α ( γ µ ) p + β ( γ µ ) − q # . This, together with the definition of D ( α,β ) ( p,q ) , giv es us the desired expression. Appendix A.2. Pr oofs for the Af fine-in variant Riemannian distance In this section, we prove Theorem 9. W e first need the following preliminary re- sults. Lemma 2. Let γ > 0 . Assume that r = r ( α ) is smooth, with r (0) = 0 . Let δ = γ r γ r +1 . Then lim α → 0 r ( δ − 1 2 ) α 2 = [ r 0 (0)] 2 4 log γ . (A.2) In particular , for r = 2 α , we have lim α → 0 r ( δ − 1 2 ) α 2 = log γ . (A.3) Proof of Lemma 2 . By L ’Hopital’ s rule applied twice, we obtain lim α → 0 r ( δ − 1 2 ) α 2 = lim α → 0 r ( γ r − 1) 2 α 2 ( γ r + 1) = lim α → 0 r ( γ r − 1) 4 α 2 = lim α → 0 r 0 ( α )( γ r − 1) + r γ r r 0 ( α ) log γ 8 α = lim α → 0 r 00 ( α )( γ r − 1) + γ r ( r 0 ( α )) 2 log γ + γ r ( r 0 ( α )) 2 log γ 8 + lim α → 0 r γ r ( r 0 ( α ) log γ ) 2 + r γ r r 00 ( α ) log γ 8 = [ r 0 (0)] 2 log γ 4 . This completes the proof. 32 Lemma 3. Let γ > 0 be fixed. Let λ > 0 be fixed. Assume that r = r ( α ) is smooth, with r (0) = 0 . Define δ = γ r γ r +1 , p = r (1 − δ ) , q = r δ . Then lim α → 0 1 α 2 log λ p + λ − q 2 = [ r 0 (0)] 2 4 − (log γ )(log λ ) + 1 2 (log λ ) 2 . (A.4) In particular , if γ = λ , then lim α → 0 1 α 2 log γ p + γ − q 2 = − [ r 0 (0)] 2 8 (log γ ) 2 . (A.5) Proof of Lemma 3 . For p , q sufficiently small, λ p = e p log λ = 1 + p log λ + p 2 2 (log λ ) 2 + o ( p 3 ) , λ − q = e − q log λ = 1 − q log λ + q 2 2 (log λ ) 2 + o ( q 3 ) . Thus for α sufficiently small, so that p = o ( α ) , q = o ( α ) , we hav e λ p + λ − q 2 = 1 + p − q 2 log λ + p 2 + q 2 4 (log λ ) 2 + o ( p 3 , q 3 ) = 1 + r 1 2 − δ (log λ ) + r 2 4 (1 − δ ) 2 + δ 2 (log λ ) 2 + o ( α 3 ) . By Lemma 2, we hav e lim α → 0 r 1 2 − δ α 2 = − [ r 0 (0)] 2 4 log γ . W e have by L ’Hopital’ s rule lim α → 0 r 2 α 2 = lim α → 0 2 r r 0 ( α ) 2 α = lim α → 0 [ r 0 ( α )] 2 + r r 00 ( α ) = [ r 0 (0)] 2 . Since lim α → 0 δ = 1 2 , it follows then that lim α → 0 r 2 4 α 2 [(1 − δ ) 2 + δ 2 ] = [ r 0 (0)] 2 8 . Combining these limits with lim x → 0 log(1+ ax ) x = a , we obtain lim α → 0 1 α 2 log λ p + λ − q 2 = [ r 0 (0)] 2 4 − (log γ )(log λ ) + 1 2 (log λ ) 2 . This completes the proof of the lemma. 33 Lemma 4. Let γ > 0 be fixed. Let λ ∈ R be fixed such that λ + γ > 0 . Assume that r = r ( α ) is smooth, with r (0) = 0 . Define δ = γ r γ r +1 , p = r (1 − δ ) , q = r δ . Then lim α → 0 1 α 2 log ( λ + γ ) p + ( λ + γ ) − q γ p + γ − q = [ r 0 (0)] 2 8 [log( λ + γ ) − log γ ] 2 = [ r 0 (0)] 2 8 log λ γ + 1 2 . (A.6) In particular , for r = r ( α ) = 2 α , we have lim α → 0 1 α 2 log ( λ + γ ) p + ( λ + γ ) − q γ p + γ − q = 1 2 [log( λ + γ ) − log γ ] 2 = 1 2 log λ γ + 1 2 . (A.7) Proof of Lemma 4 . W e have by Lemma 3 lim α → 0 1 α 2 log ( λ + γ ) p + ( λ + γ ) − q γ p + γ − q = lim α → 0 1 α 2 log ( λ + γ ) p + ( λ + γ ) − q 2 − lim α → 0 1 α 2 log γ p + γ − q 2 = [ r 0 (0)] 2 4 − (log γ )[log( λ + γ )] + 1 2 [log( λ + γ )] 2 − [ − 1 2 (log γ ) 2 ] = [ r 0 (0)] 2 8 [log( λ + γ ) − log γ ] 2 = [ r 0 (0)] 2 8 log λ γ + 1 2 . This completes the proof. Lemma 5. Let γ > 0 be fixed. Let λ ∈ R be fixed such that λ + γ > 0 . Assume that r = r ( α ) is smooth, with r (0) = 0 . Define δ = γ r γ r +1 , p = r (1 − δ ) , q = r δ . Then ( λ + γ ) p + ( λ + γ ) − q γ p + γ − q ≥ 1 , (A.8) log ( λ + γ ) p + ( λ + γ ) − q γ p + γ − q ≥ 0 . (A.9) 34 Proof of Lemma 5 . By Theorem 5, we hav e ( λ + γ ) p + ( λ + γ ) − q γ p + γ − q = γ p γ p + γ − q λ γ + 1 p + γ − q γ p + γ − q λ γ + 1 − q = α λ γ + 1 p + (1 − α ) λ γ + 1 − q where α = γ p γ p + γ − q = γ p + q γ p + q + 1 = γ r γ r + 1 = δ ≥ λ γ + 1 pδ λ γ + 1 − q (1 − δ ) = λ γ + 1 ( p + q ) δ − q = λ γ + 1 rδ − q = 1 , since q = rδ . This completes the proof. Proof of Theorem 9 . For α = β , we ha ve δ = ( γ µ ) r ( γ µ ) r + 1 , p = r (1 − δ ) , q = r δ. Let { λ j } j ∈ N be the eigen v alues of Λ . By Theorem 8, we hav e D ( α,α ) r [( A + γ I ) , ( B + µI )] = r ( δ − 1 2 ) α 2 log γ µ + 1 α 2 log ( γ µ ) p + ( γ µ ) − q 2 ! + 1 α 2 log det " (Λ + γ µ I ) p + (Λ + γ µ I ) − q ( γ µ ) p + ( γ µ ) − q # = r ( δ − 1 2 ) α 2 log γ µ + 1 α 2 log ( γ µ ) p + ( γ µ ) − q 2 ! + 1 α 2 ∞ X j =1 log ( λ j + γ µ ) p + ( λ j + γ µ ) − q ( γ µ ) p + ( γ µ ) − q ! . By Lemma 2, we hav e lim α → 0 r ( δ − 1 2 ) α 2 log γ µ = [ r 0 (0)] 2 4 log γ µ 2 . By Lemma 3, we hav e lim α → 0 1 α 2 log ( γ µ ) p + ( γ µ ) − q 2 ! = − [ r 0 (0)] 2 8 log γ µ 2 . 35 By Lemma 4, we hav e lim α → 0 1 α 2 log det " (Λ + γ µ I ) p + (Λ + γ µ I ) − q ( γ µ ) p + ( γ µ ) − q # = lim α → 0 1 α 2 ∞ X j =1 log " ( λ j + γ µ ) p + ( λ j + γ µ ) − q ( γ µ ) p + ( γ µ ) − q # = ∞ X j =1 lim α → 0 1 α 2 log " ( λ j + γ µ ) p + ( λ j + γ µ ) − q ( γ µ ) p + ( γ µ ) − q # by Lebesgue’ s Monotone Conv ergence Theorem, since log " ( λ j + γ µ ) p + ( λ j + γ µ ) − q ( γ µ ) p + ( γ µ ) − q # ≥ 0 ∀ j ∈ N by Lemma 5 = [ r 0 (0)] 2 8 ∞ X j =1 log λ j + γ µ − log γ µ 2 = [ r 0 (0)] 2 8 ∞ X j =1 log λ j µ γ + 1 2 . Summing up these three expressions, we obtain lim α → 0 D ( α,α ) r [( A + γ I ) , ( B + µI )] = [ r 0 (0)] 2 8 log γ µ 2 + ∞ X j =1 log λ j µ γ + 1 2 = [ r 0 (0)] 2 8 log γ µ 2 + log Λ µ γ + I 2 HS ! = [ r 0 (0)] 2 8 log Λ + γ µ I 2 eHS = [ r 0 (0)] 2 8 || log[( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 ] || 2 eHS = [ r 0 (0)] 2 8 d 2 aiHS [( A + γ I ) , ( B + µI )] . This completes the proof. Appendix A.3. Pr oofs for the Alpha Log-Determinant diver gences In this section, we prov e Theorem 10. Proof of Theorem 10 . The proof for the cases α = 0 and α = 1 is a special case of the results discussed at the end of Section 5.3. Consider now the case 0 < α < 1 . W e first note that d 1 − 2 α logdet [( A + γ I ) , ( B + µI ) = 1 α (1 − α ) log det X ( α ( A + γ I ) + (1 − α )( B + µI ) det X ( A + γ I ) q det X ( B + µI ) 1 − q + q − α α (1 − α ) log γ µ , 36 where q = αγ αγ +(1 − α ) µ . By Definition 6, we hav e D ( α, 1 − α ) r [( A + γ I ) , ( B + µI )] = 1 α (1 − α ) log " γ µ r ( δ − α ) det X α Λ + γ µ I r (1 − δ ) + (1 − α ) Λ + γ µ I − rδ !# = r ( δ − α ) α (1 − α ) log γ µ + 1 α (1 − α ) log det X α Λ + γ µ I r (1 − δ ) + (1 − α ) Λ + γ µ I − rδ ! . By Proposition 1, we hav e det X α Λ + γ µ I r (1 − δ ) + (1 − α ) Λ + γ µ I − rδ ! = det X h α [( A + γ I )( B + µI ) − 1 ] r (1 − δ ) + (1 − α )[( A + γ I )( B + µI ) − 1 ] − rδ i = det X [( A + γ I )( B + µI ) − 1 ] − rδ det X [ α [( A + γ I )( B + µI ) − 1 ] r + (1 − α ) I ] . In particular , for r = 1 , we have det X [ α [( A + γ I )( B + µI ) − 1 ] + (1 − α )] = det X [ α ( A + γ I ) + (1 − α )( B + µI )] det X ( B + µI ) . Thus it follows that det X α Λ + γ µ I (1 − δ ) + (1 − α ) Λ + γ µ I − δ ! = det X [ α ( A + γ I ) + (1 − α )( B + µI )] det X ( A + γ I ) δ det X ( B + µI ) 1 − δ . Also for r = 1 , in Definition 6, we hav e δ = δ ( r = 1) = αγ αγ +(1 − α ) µ . Combining all of these expressions and comparing with the expressions for d 1 − 2 α logdet , we obtain the first desired statement. For r = − 1 , we hav e D ( α, 1 − α ) − 1 [( A + γ I ) , ( B + µI )] = − ( δ − 1 − α ) α (1 − α ) log γ µ + 1 α (1 − α ) log det X α Λ + γ µ I − (1 − δ − 1 ) + (1 − α ) Λ + γ µ I δ − 1 ! , 37 where δ − 1 = δ ( r = − 1) = α 1 γ α 1 γ +(1 − α ) 1 µ = αµ αµ +(1 − α ) γ . Similar to the case r = 1 , we have det X [ α [( A + γ I )( B + µI ) − 1 ] − 1 + (1 − α ) I ] = det X [(1 − α )( A + γ I ) + α ( B + µI )] det X ( A + γ I ) . Thus it follows that det X α Λ + γ µ I − (1 − δ − 1 ) + (1 − α ) Λ + γ µ I δ − 1 ! = det X [(1 − α )( A + γ I ) + α ( B + µI )] det X ( A + γ I ) 1 − δ − 1 det X ( B + µI ) δ − 1 . On the other hand, we hav e d 2 α − 1 logdet [( A + γ I ) , ( B + µI ) = 1 α (1 − α ) log det X ((1 − α )( A + γ I ) + α ( B + µI ) det X ( A + γ I ) p det X ( B + µI ) 1 − p + p − (1 − α ) α (1 − α ) log γ µ , where p = (1 − α ) γ (1 − α ) γ + αµ = 1 − δ − 1 . Combining all of these expressions, we obtain the second desired statement, namely D ( α, 1 − α ) − 1 [( A + γ I ) , ( B + µI )] = d 2 α − 1 logdet [( A + γ I ) , ( B + µI ) . This completes the proof. Appendix A.4. Pr oofs for the other limiting cases In this section, we prove Theorems 11 and 12. W e need the following preliminary results. Lemma 6. Let H be a separable Hilbert space. Let A ∈ Sym( H ) ∩ T r( H ) be such that A + I > 0 . Then ∀ α ∈ R , the operator ( A + I ) α is well defined and ( A + I ) α − I ∈ Sym( H ) ∩ T r( H ) . Equivalently , let { λ k } k ∈ N be the eigen values of A , then tr[( A + I ) α − I ] = ∞ X k =1 [( λ k + 1) α − 1] (A.10) has a finite value. 38 Proof of Lemma 6 . By Lemma 3 in [17], if A ∈ Sym( H ) ∩ T r( H ) and A + I > 0 , then log( A + I ) ∈ Sym( H ) ∩ T r( H ) . By definition of the power function, we ha ve ( A + I ) α = exp[ α log( A + I )] = I + ∞ X j =1 α j j ! [log( A + I )] j . Since T r( H ) is a Banach algebra under the trace norm, we ha ve || ( A + I ) α − I || tr = ∞ X j =1 α j j ! [log( A + I )] j tr ≤ ∞ X j =1 | α | j j ! || log( A + I ) || j tr = exp( | α | || log( A + I ) || tr ) − 1 < ∞ . Thus ( A + I ) α − I ∈ T r( H ) . The equiv alent statement is then obvious. This completes the proof. Lemma 7. Let H be a separable Hilbert space. Assume that ( A + γ I ) ∈ PT r( H ) . Then for any α ∈ R , we have ( A + γ I ) α − γ α I ∈ Sym( H ) ∩ T r( H ) and tr[( A + γ I ) α − γ α I ] = γ α tr A γ + I α − I , (A.11) tr X [( A + γ I ) α ] = γ α 1 + tr A γ + I α − I . (A.12) Proof of Lemma 7 . By definition of the power function, we ha ve ( A + γ I ) α = exp[ α log( A + γ I )] = exp ( α log γ ) I + α log A γ + I = γ α A γ + I α = γ α A γ + I α − I + γ α I , where h A γ + I α − I i ∈ T r( H ) by Lemma 6. Thus it follo ws that ( A + γ I ) α − γ α I ∈ Sym( H ) ∩ T r( H ) and tr[( A + γ I ) α − γ α I ] = γ α tr A γ + I α − I , which is the first identity . By definition of the extended trace tr X [( A + γ I ) α ] = tr X ([( A + γ I ) α − γ α I ] + γ α I ) = γ α tr A γ + I α − I + γ α , which is the second identity . This completes the proof. 39 Lemma 8. Let ( A + γ I ) , ( B + µI ) ∈ PT r( H ) . Let Λ + γ µ I = ( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 . Then for any α ∈ R , tr X [( A + γ I )( B + µI ) − 1 ] α = tr X Λ + γ µ α = tr X [( B + µI ) − 1 ( A + γ I )] α . (A.13) det X [( A + γ I )( B + µI ) − 1 ] α = det X Λ + γ µ α = det X [( B + µI ) − 1 ( A + γ I )] α . (A.14) Proof of Lemma 8 . By Proposition 1, we hav e [( A + γ I )( B + µI ) − 1 ] α = ( B + µI ) 1 / 2 Λ + γ µ α ( B + µI ) − 1 / 2 . Similarly , [( B + µI ) − 1 ( A + γ I )] α = ( B + µI ) − 1 / 2 Λ + γ µ α ( B + µI ) 1 / 2 . By the commutativity of the tr X operation (Lemma 4 in [17]), we then hav e tr X [( A + γ I )( B + µI ) − 1 ] α = tr X Λ + γ µ α = tr X [( B + µI ) − 1 ( A + γ I )] α . Similarly , by the product property of the det X operation (Proposition 4 in [17]), det X [( A + γ I )( B + µI ) − 1 ] α = det X Λ + γ µ α = det X [( B + µI ) − 1 ( A + γ I )] α . This completes the proof. Lemma 9. Assume that λ > 0 , γ > 0 , α > 0 ar e fixed. Assume that r = r ( β ) is smooth. Then for δ = αγ r αγ r + β , p = r (1 − δ ) , q = r δ , we have lim β → 0 1 αβ log αλ p + β λ − q α + β = 1 α 2 (log λ ) r (0) γ r (0) + λ − r (0) − 1 . (A.15) In particular , for λ = γ , we have lim β → 0 1 αβ log αγ p + β γ − q α + β = 1 α 2 [(log γ ) r (0) + 1] γ − r (0) − 1 . (A.16) 40 Proof of Lemma 9 . W e have for α > 0 , lim β → 0 δ = 1 , lim β → 0 p = 0 , lim β → 0 q = r (0) , so that lim β → 0 ( αλ p + β λ − q ) = α . W ith p = r (1 − δ ) = rβ αγ r + β , we hav e ∂ p ∂ β = ( ∂ r ∂ β β + r )( αγ r + β ) − rβ ( αγ r log γ ∂ r ∂ β + 1) ( αγ r + β ) 2 , lim β → 0 ∂ p ∂ β = r (0) αγ r (0) . W ith q = r δ = rαγ r αγ r + β , we hav e ∂ q ∂ β = ( ∂ r ∂ β αγ r + r αγ r log γ ∂ r ∂ β )( αγ r + β ) − rα γ r ( αγ r log γ ∂ r ∂ β + 1) ( αγ r + β ) 2 , lim β → 0 ∂ q ∂ β = ∂ r ∂ β (0) − r (0) αγ r (0) . The required limit is of the form 0 0 and L ’Hopital’ s rule can be applied to gi ve lim β → 0 1 αβ log αλ p + β λ − q α + β = 1 α lim β → 0 α + β αλ p + β λ − q [ αλ p (log λ ) ∂ p ∂ β + λ − q − β λ − q (log λ ) ∂ q ∂ β ]( α + β ) − ( αλ p + β λ − q ) ( α + β ) 2 = α (log λ ) ∂ p ∂ β (0) + λ − r (0) − 1 α 2 = 1 α 2 (log λ ) r (0) γ r (0) + λ − r (0) − 1 . This completes the proof. Lemma 10. Assume that γ > 0 , α > 0 ar e fixed. Assume that λ ∈ R is also fixed, such that λ + γ > 0 . Assume that r = r ( β ) is smooth. Then for δ = αγ r αγ r + β , p = r (1 − δ ) , q = r δ , we have lim β → 0 1 αβ log α ( λ + γ ) p + β ( λ + γ ) − q αγ p + β γ − q = 1 α 2 log λ γ + 1 r (0) γ r (0) + ( λ + γ ) − r (0) − γ − r (0) . (A.17) Proof of Lemma 10 . By Lemma 9, we hav e lim β → 0 1 αβ log α ( λ + γ ) p + β ( λ + γ ) − q αγ p + β γ − q = lim β → 0 1 αβ log α ( λ + γ ) p + β ( λ + γ ) − q α + β − lim β → 0 1 αβ log αγ p + β γ − q α + β = 1 α 2 (log( λ + γ ) r (0) γ r (0) + ( λ + γ ) − r (0) − 1 − 1 α 2 (log γ ) r (0) γ r (0) + γ − r (0) − 1 = 1 α 2 log λ γ + 1 r (0) γ r (0) + ( λ + γ ) − r (0) − γ − r (0) . 41 This completes the proof. Lemma 11. Assume that γ > 0 , α > 0 ar e fixed. Assume that λ ∈ R is also fixed, such that λ + γ > 0 . Assume that r = r ( β ) is smooth. Then for δ = αγ r αγ r + β , p = r (1 − δ ) , q = r δ , we have α ( λ + γ ) p + β ( λ + γ ) − q αγ p + β γ − q ≥ 1 , (A.18) log α ( λ + γ ) p + β ( λ + γ ) − q αγ p + β γ − q ≥ 0 . (A.19) Proof of Lemma 11 . W e proceed as in the proof of Lemma 5, by applying Theorem 5 as follows α ( λ + γ ) p + β ( λ + γ ) − q αγ p + β γ − q = αγ p αγ p + β γ − q λ γ + 1 p + β γ − q αγ p + β γ − q λ γ + 1 − q = s λ γ + 1 p + (1 − s ) λ γ + 1 − q , where s = αγ p αγ p + β γ − q = αγ p + q αγ p + q + β = αγ r αγ r + β = δ, ≥ λ γ + 1 pδ λ γ + 1 − q (1 − δ ) = λ γ + 1 ( p + q ) δ − q = λ γ + 1 rδ − q = 1 , since r δ = q . This completes the proof. Lemma 12. Assume that γ > 0 , α > 0 ar e fixed. Assume that r = r ( β ) is smooth. Then for δ = αγ r αγ r + β , lim β → 0 r ( δ − α α + β ) αβ = 1 α 2 r (0)[ − γ − r (0) + 1] . (A.20) Proof of Lemma 12 . W e first have ∂ δ ∂ β = αγ r log γ ∂ r ∂ β ( αγ r + β ) − αγ r ( αγ r log γ ∂ r ∂ β + 1) ( αγ r + β ) 2 lim β → 0 ∂ δ ∂ β = − 1 αγ r (0) . Since the required limit has the form 0 0 , we apply L ’Hopital’ s rule to get lim β → 0 r ( δ − α α + β ) αβ = lim β → 0 1 α ∂ r ∂ β δ − α α + β + r ∂ δ ∂ β + α ( α + β ) 2 = 1 α r (0) − 1 αγ r (0) + 1 α = 1 α 2 r (0)[ − γ − r (0) + 1] . This completes the proof. 42 Proof of Theorem 11 . Let { λ j } ∞ j =1 be the eigen v alues of Λ . By Theorem 8, we hav e D ( α,β ) r [( A + γ I ) , ( B + µI )] = r ( δ − α α + β ) αβ log γ µ + 1 αβ log α ( γ µ ) p + β ( γ µ ) − q α + β ! + 1 αβ log det α (Λ + γ µ I ) p + β (Λ + γ µ I ) − q α ( γ µ ) p + β ( γ µ ) − q ! = r ( δ − α α + β ) αβ log γ µ + 1 αβ log α ( γ µ ) p + β ( γ µ ) − q α + β ! + 1 αβ ∞ X j =1 log α ( λ j + γ µ ) p + β ( λ j + γ µ ) − q α ( γ µ ) p + β ( γ µ ) − q ! , where p = p ( β ) = r (1 − δ ) = rβ α ( γ µ ) r + β , q = q ( β ) = r δ = rα ( γ µ ) r α ( γ µ ) r + β . For α > 0 fix ed, as functions of β , we have lim β → 0 p ( β ) = 0 , lim β → 0 q ( β ) = r (0) . For simplicity , in the following, we replace γ µ by γ . By Lemma 9, lim β → 0 1 αβ log αγ p + β γ − q α + β = 1 α 2 [(log γ ) r (0) + 1] γ − r (0) − 1 . By Lemma 10, lim β → 0 1 αβ log α ( λ j + γ ) p + β ( λ j + γ ) − q αγ p + β γ − q = 1 α 2 log λ j γ + 1 r (0) γ r (0) + ( λ j + γ ) − r (0) − γ − r (0) . By Lemma 11, we have log α ( λ j + γ ) p + β ( λ j + γ ) − q αγ p + β γ − q ≥ 0 ∀ j ∈ N , so that by Lebesgue’ s Monotone Con ver gence Theorem, we obtain lim β → 0 1 αβ ∞ X j =1 log α ( λ j + γ ) p + β ( λ j + γ ) − q αγ p + β γ − q = ∞ X j =1 lim β → 0 1 αβ log α ( λ j + γ ) p + β ( λ j + γ µ ) − q αγ p + β γ − q ! = 1 α 2 ∞ X j =1 log λ j γ + 1 r (0) γ r (0) + ( λ j + γ ) − r (0) − γ − r (0) . By Lemma 12 log( γ ) lim β → 0 r ( δ − α α + β ) αβ = 1 α 2 r (0)[ − γ − r (0) + 1] log( γ ) . 43 Combining all three expressions, we obtain the desired limit as the sum 1 α 2 [ γ − r (0) + r (0) log( γ ) − 1] + 1 α 2 r (0) γ r (0) ∞ X j =1 log λ j γ + 1 + ∞ X j =1 " 1 ( λ j + γ ) r (0) − 1 γ r (0) # . (A.21) By Lemmas 6 and 7, we hav e ∞ X j =1 1 ( λ j + γ ) r (0) − 1 γ r (0) = γ − r (0) ∞ X j =1 " λ j γ + 1 − r (0) − 1 # = γ − r (0) tr " Λ γ + I − r (0) − I # = tr[(Λ + γ I ) − r (0) − γ − r (0) I ] . Thus it follows that γ − r (0) − 1 + ∞ X j =1 " 1 ( λ j + γ ) r (0) − 1 γ r (0) # = γ − r (0) − 1 + tr[(Λ + γ I ) − r (0) − γ − r (0) I ] = tr X [(Λ + γ I ) − r (0) − I ] . Furthermore, r (0) γ r (0) ∞ X j =1 log λ j γ + 1 = r (0) γ − r (0) log det Λ γ + I = r (0) γ − r (0) log det X (Λ + γ I ) − r (0) γ − r (0) log γ = − γ − r (0) log det X (Λ + γ I ) − r (0) − r (0) γ − r (0) log γ . Plugging the last two expressions into (A.21), we obtain the desired limit as 1 α 2 n r (0)(1 − γ − r (0) ) log γ o + 1 α 2 n tr X [(Λ + γ I ) − r (0) − I ] − γ − r (0) log det X (Λ + γ I ) − r (0) o . (A.22) W e now replace γ by γ µ . W e have by Lemma 8, tr X " Λ + γ µ I − r (0) # = tr X [( B + µI ) − 1 ( A + γ I )] − r (0) = tr X [( A + γ I ) − 1 ( B + µI )] r (0) , det X Λ + γ µ I − r (0) = det X [( B + µI ) − 1 ( A + γ I )] − r (0) = det X ( A + γ I ) − 1 ( B + µI ) r (0) . 44 Then (A.22) becomes r (0) α 2 " µ γ r (0) − 1 # log µ γ + 1 α 2 tr X ([( A + γ I ) − 1 ( B + µI )] r (0) − I ) − 1 α 2 µ γ r (0) log det X [( A + γ I ) − 1 ( B + µI )] r (0) . This completes the proof of the theorem. Proof of Theorem 12 . The dual symmetry in Theorem 13 giv es lim α → 0 D ( α,β ) r [( A + γ I ) , ( B + µI )] = lim α → 0 D ( β ,α ) r [( B + µI ) , ( A + γ I )] . The limit on the right hand side then follows from Theorem 11. Appendix A.5. Pr oofs of the pr operties of the Alpha-Beta Log-Determinant diver gences In this section, we prov e Theorems 13, 14, 15, 16, 17, and 18. For the case α = β = 0 , we ha ve D (0 , 0) 0 [( A + γ I ) , ( B + µI )] = 1 2 d 2 aiHS [( A + γ I ) , ( B + µI )] , with d aiHS being the affine-in variant Riemannian distance on PT r( H ) . Thus these properties are either automatic or straightforward to verify . W e thus focus on the three cases ( α > 0 , β > 0) , ( α > 0 , β = 0) , and ( α = 0 , β > 0) . Proof of Theorem 13 (Dual symmetry) . For the case α > 0 , β = 0 and α = 0 , β > 0 , from Eqs. (10) and (11), we immediately hav e D ( α, 0) r [( A + γ I ) , ( B + µI )] = r α 2 µ γ r − 1 log µ γ + 1 α 2 tr X ([( A + γ I ) − 1 ( B + µI )] r − I ) − 1 α 2 µ γ r log det X [( A + γ I ) − 1 ( B + µI )] r = D (0 ,α ) r [( B + µI ) , ( A + γ I )] . Consider now the case α > 0 , β > 0 . Write δ = δ ( α, β ) to emphasize its dependence on α and β , we hav e δ ( α, β ) = αγ r αγ r + β µ r in D ( α,β ) r [( A + γ I ) , ( B + µI )] . Then for D ( β ,α ) r [( B + µI ) , ( A + γ I )] , we hav e δ ( β , α ) = β µ r αγ r + β µ r = 1 − δ ( α, β ) , 1 − δ ( β , α ) = δ ( α , β ) , δ ( β , α ) − β α + β = 1 − δ ( α, β ) − β α + β = − δ ( α, β ) − α α + β . 45 By Definition 1, we hav e D ( β ,α ) r [( B + µI ) , ( A + γ I )] = 1 αβ log µ γ r ( δ ( β ,α ) − β α + β ) + 1 αβ log det X β [( B + µI )( A + γ I ) − 1 ] r (1 − δ ( β ,α )) + α [( B + µI )( A + γ I ) − 1 ] − rδ ( β ,α ) α + β = 1 αβ log γ µ r ( δ ( α,β ) − α α + β ) + 1 αβ log det X β [( A + γ I )( B + µI ) − 1 ] − rδ ( α,β ) + α [( A + γ I )( B + µI ) − 1 ] r (1 − δ ( α,β )) α + β = D ( α,β ) r [( A + γ I ) , ( B + µI )] . This completes the proof of the theorem. Proof of Theorem 14 (Dual in variance under in version) . W e hav e ( A + γ I ) − 1 = 1 γ I − A γ ( A + γ I ) − 1 , ( B + µI ) − 1 = 1 µ I − B µ ( B + µI ) − 1 , ( B + µI ) 1 / 2 ( A + γ I ) − 1 ( B + µI ) 1 / 2 = [( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 ] − 1 . Consider the case α > 0 , β > 0 . By Definition 1, we have D ( α,β ) r [( A + γ I ) − 1 , ( B + µI ) − 1 ] = 1 αβ log 1 /γ 1 /µ r ( δ 2 − α α + β ) + 1 αβ log det X α (Λ + γ µ I ) − r (1 − δ 2 ) + β (Λ + γ µ ) rδ 2 α + β ! where δ 2 = α (1 /γ ) r α (1 /γ ) r + β (1 /µ ) r = αµ r αµ r + β γ r = δ ( − r ) . Thus D ( α,β ) r [( A + γ I ) − 1 , ( B + µI ) − 1 ] = D ( α,β ) − r [( A + γ I ) , ( B + µI )] . Consider the case α = 0 , β > 0 (the case α > 0 , β = 0 then follows by dual symme- 46 try). W e have ] D (0 ,β ) r [( A + γ I ) − 1 , ( B + µI ) − 1 ] = r β 2 1 /γ 1 /µ r − 1 log 1 /γ 1 /µ + 1 β 2 tr X ([( B + µI )( A + γ I ) − 1 ] r − I ) − 1 β 2 1 /γ 1 /µ r log det X [( B + µI )( A + γ I ) − 1 ] r = − r β 2 " γ µ − r − 1 # log γ µ + 1 β 2 tr X ([( A + γ I )( B + µI ) − 1 ] − r − I ) − 1 β 2 γ µ − r log det X [( A + γ I )( B + µI ) − 1 ] − r . By Lemma 8, we hav e tr X [( A + γ I )( B + µI ) − 1 ] − r = tr X " Λ + γ µ − r # = tr X [( B + µI ) − 1 ( A + γ I )] − r , det X [( A + γ I )( B + µI ) − 1 ] − r = det X " Λ + γ µ − r # = det X [( B + µI ) − 1 ( A + γ I )] − r . Thus it follows that D (0 ,β ) r [( A + γ I ) − 1 , ( B + µI ) − 1 ] = − r β 2 " γ µ − r − 1 # log γ µ + 1 β 2 tr X ([( B + µI ) − 1 ( A + γ I )] − r − I ) − 1 β 2 γ µ − r log det X [( B + µI ) − 1 ( A + γ I )] − r = D (0 ,β ) − r [( A + γ I ) , ( B + µI )] . This completes the proof. Proof of Theorem 15 (Affine-in variance) . W e hav e for ( A + γ I ) ∈ PT r( H ) and ( C + ν I ) ∈ T r X ( H ) , ν 6 = 0 , ( C + ν I )( A + γ I )( C + ν I ) ∗ = C AC ∗ + ν ( C A + AC ∗ ) + ν 2 A + γ C C ∗ + γ ν ( C + C ∗ ) + γ ν 2 I ∈ T r X ( H ) . Since ( C + ν I ) is assumed to be in vertible, the operator ( C + ν I )( A + γ I )( C + ν I ) ∗ is also in vertible, with inv erse [( C + ν I ) ∗ ] − 1 ( A + γ I ) − 1 ( C + ν I ) − 1 . Furthermore, 47 ∀ x ∈ H , h x, ( C + ν I )( A + γ I )( C + ν I ) ∗ x i = h ( C + ν I ) ∗ x, ( A + γ I )( C + ν I ) ∗ x i ≥ M A || ( C + ν I ) ∗ x || ≥ 0 , with equality if and only if ( C + ν I ) ∗ x = 0 ⇐ ⇒ x = 0 . Thus ( C + ν I )( A + γ I )( C + ν I ) ∗ is strictly positi ve. T ogether with its in vertibility , this sho ws that this is a positiv e definite operator . Hence ( C + ν I )( A + γ I )( C + ν I ) ∗ ∈ PT r( H ) . For two operators ( A + γ I ) , ( B + µI ) ∈ PT r( H ) , we then have [( C + ν I )( A + γ I )( C + ν I ) ∗ ][( C + ν I )( B + µI )( C + ν I ) ∗ ] − 1 = ( C + ν I )[( A + γ I )( B + µI ) − 1 ]( C + ν I ) − 1 . Then for any p ∈ R , we hav e ([( C + ν I )( A + γ I )( C + ν I ) ∗ ][( C + ν I )( B + µI )( C + ν I ) ∗ ] − 1 ) p = ( C + ν I )[( A + γ I )( B + µI ) − 1 ] p ( C + ν I ) − 1 . Thus for any a, b > 0 and an y p, q ∈ R . a ([( C + ν I )( A + γ I )( C + ν I ) ∗ ][( C + ν I )( B + µI )( C + ν I ) ∗ ] − 1 ) p + b ([( C + ν I )( A + γ I )( C + ν I ) ∗ ][( C + ν I )( B + µI )( C + ν I ) ∗ ] − 1 ) q = ( C + ν I )( a [( A + γ I )( B + µI ) − 1 ] p + b [( A + γ I )( B + µI ) − 1 ] q )( C + ν I ) − 1 . By the definition of D ( α,β ) r and the following inv ariances of the extended Fredholm determinant det X as well as of the extended trace operation tr X , namely , det X [ C ( A + γ I ) C − 1 ] = det X [( A + γ I )] , tr X [ C ( A + γ I ) C − 1 ] = tr X [( A + γ I )] , for A + γ I ∈ T r X ( H ) , γ 6 = 0 , and C ∈ L ( H ) in v ertible (Lemma 5 in [17]), we then obtain the desired affine in variance for D ( α,β ) r , namely D ( α,β ) r [( C + ν I )( A + γ I )( C + ν I ) ∗ , ( C + ν I )( B + µI )( C + ν I ) ∗ ] = D ( α,β ) r [( A + γ I ) , ( B + µI )] . This completes the proof. 48 Proof of Theorem 16 (In variance under unitary transformations) . The proof of this theorem is similar to that of the proof for Theorem 15 , using the fact that C ∗ = C − 1 and the properties det X [ C ( A + γ I ) C − 1 ] = det X [( A + γ I )] , tr X [ C ( A + γ I ) C − 1 ] = tr X [( A + γ I )] , of the operations det X and tr X . Proof of Theorem 17 . For the case α > 0 , β > 0 , this follows immediately from Definition 1. For the case α > 0 , β = 0 , by Definition 2 and Lemma 8, we have D ( α, 0) r [( A + γ I ) , ( B + µI )] = r α 2 µ γ r − 1 log µ γ + 1 α 2 tr X (Λ + γ µ ) − r − I ) − 1 α 2 µ γ r log det X (Λ + γ µ ) − r = D ( α, 0) r [(Λ + γ µ ) , I ] . The case α = 0 , β > 0 is entirely similar . Proof of Theorem 18 . W e first note that (Λ + γ µ I ) ω = ( γ µ ) ω ( µ γ Λ + I ) ω . Then for α > 0 , β > 0 , the statement of the theorem follows immediately from Definition 1. For the case α > 0 , β = 0 , by Definition 2 and Lemma 8, we have D ( ωα, 0) ωr [( A + γ I ) , ( B + µI )] = r ω 2 α 2 µ γ ωr − 1 log µ γ ω + 1 ω 2 α 2 tr X (Λ + γ µ ) − ωr − I ) − 1 ω 2 α 2 µ γ ωr log det X (Λ + γ µ ) − ωr = 1 ω 2 D ( α, 0) r [(Λ + γ µ ) ω , I ] . The case α = 0 , β > 0 is entirely similar . Appendix A.6. Pr oofs of Theor ems 1, 2, and 3 W e are now ready to provide the proofs for Theorems 1, 2, and 3. For the proof of positi vity , we first need the following technical result. 49 Lemma 13. (i ) Let r 6 = 0 be fixed. The function f ( x ) = x r − 1 − r log ( x ) for x > 0 has a unique global minimum f min = f (1) = 0 . In other wor ds, f ( x ) ≥ 0 ∀ x > 0 , with equality if and only if x = 1 . (ii) Let ν > 0 , r 6 = 0 be fixed. F or r 6 = 0 , the function g ( x ) = ( x ν + 1) r − 1 − r log( x ν + 1) for x > − ν has a unique global minimum g min = g (0) = 0 . In other wor ds, g ( x ) ≥ 0 ∀ x > − ν , with equality if and only if x = 0 . Proof of Lemma 13 . (i) W e have f 0 ( x ) = r ( x r − 1) x . When r > 0 , we have x r < 1 for 0 < x < 1 and x r > 1 for x > 1 . When r < 0 , we hav e x r > 1 for 0 < x < 1 and x r < 1 for x > 1 . Thus, for all r 6 = 0 , we have f 0 ( x ) < 0 when 0 < x < 1 and f 0 ( x ) > 0 when x > 1 . Hence f has a unique global minimum f min = f (1) = 0 . (ii) The proof for g follows that for f by the change of variable y = x ν + 1 . Proof of Theorem 1 (P ositivity) . For the case α > 0 , β > 0 , this is a special case of Theorem 6, with p + q = r . Consider now the case α = 0 , β > 0 (the case α > 0 , β = 0 then follows by dual symmetry). For the proof of positivity , we can ignore the positiv e factor β 2 and thus it suffices to consider D (0 , 1) r . W e recall that we define Λ + ν I = ( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 , where ν = γ µ . Then, since det X [( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 ] = det X [( B + µI ) − 1 ( A + γ I )] and tr X [( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 ] = tr X [( B + µI ) − 1 ( A + γ I )] , we ha ve D (0 , 1) r [( A + γ I ) , ( B + µI )] = r ( ν r − 1) log ν + tr X [(Λ + ν I ) r − I ] − ν r log det X (Λ + ν I ) r By Lemma 7, tr X [(Λ + ν I ) r − I ] = ν r − 1 + ν r tr Λ ν + I r − I . Also log det X (Λ + ν I ) r = log ν r det Λ ν + I r = r log det Λ ν + I + r log ν. 50 Thus we hav e D (0 , 1) r [( A + γ I ) , ( B + µI )] = ν r − 1 − r log ν + ν r tr Λ ν + I r − I − r log det Λ ν + I = ν r − 1 − r log ν + ν r " ∞ X k =1 λ k ν + 1 r − 1 − r log λ k ν + 1 # . By the first part of Lemma 13, we hav e for all ν > 0 ν r − 1 − r log ν ≥ 0 , with equality if and only if ν = 1 . By the second part of the Lemma 13, we hav e for all k ∈ N λ k ν + 1 r − 1 − r log λ k ν + 1 ≥ 0 , with equality if and only λ k = 0 . Combining these two inequalities, we obtain D (0 , 1) r [( A + γ I ) , ( B + µI )] ≥ 0 , with equality if and only if ν = γ µ = 1 and λ k = 0 ∀ k ∈ N ⇐ ⇒ Λ = I , that is if and only ( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 = I ⇐ ⇒ A + γ I = B + µI ⇐ ⇒ A = B and γ = µ . This completes the proof. Proof of Theorem 2 (Special cases - I) . The first statement of the theorem is the con- tent of Theorem 9. The second statement is the content of Theorem 10. Proof of Theorem 3 (Special cases - II) . This theorem follows from Theorems 9 and 10 as well as the symmetry of D ( α,α ) r [( A + γ I ) , ( B + µI )] as proved in Theorem 13. Appendix A.7. Pr oofs for the diver gences between RKHS co variance operators In this section, we prov e Theorems 23, 24, 25, and 26. W e first need the following preliminary results. 51 Lemma 14. Let H 1 , H 2 be separable Hilbert spaces. Let A : H 1 → H 2 and B : H 2 → H 1 be compact linear operators such that both AB : H 2 → H 2 and B A : H 1 → H 1 ar e trace class oper ators. Let α, β > 0 be fixed. F or any p, q ∈ R , det α ( AB + I H 2 ) p + β ( AB + I H 2 ) q α + β = det α ( B A + I H 1 ) p + β ( B A + I H 1 ) q α + β . (A.23) Proof of Lemma 14 . Since the nonzero eigenv alues of AB : H 2 → H 2 and B A : H 1 → H 1 are the same, we hav e for any p ∈ R det[( AB + I H 2 ) p ] = det[( B A + I H 1 ) p ] . For an y p, q ∈ R , det α ( AB + I H 2 ) p + β ( AB + I H 2 ) q α + β = det α ( B A + I H 1 ) p + β ( B A + I H 1 ) q α + β . In the above equality , we have used the fact that a zero eigenv alue of AB and B A corresponds to an eigenv alue equal to 1 for α ( AB + I H 2 ) p + β ( AB + I H 2 ) q α + β : H 2 → H 2 and α ( B A + I H 1 ) p + β ( B A + I H 1 ) q α + β : H 1 → H 1 , respectively , which does not change the determinant. This completes the proof. Lemma 15. Let H 1 , H 2 be separable Hilbert spaces. Let A, B : H 1 → H 2 be com- pact linear operators suc h that both AA ∗ : H 2 → H 2 and B B ∗ : H 2 → H 2 ar e trace class operators. Let α, β > 0 be fixed. F or any p, q ∈ R , det α [( AA ∗ + I H 2 )( B B ∗ + I H 2 ) − 1 ] p + β [( AA ∗ + I H 2 )( B B ∗ + I H 2 ) − 1 ] q α + β = det α ( C + I H 1 ⊗ I 3 ) p + β ( C + I H 1 ⊗ I 3 ) q α + β , (A.24) wher e C = A ∗ A − A ∗ B ( I H 1 + B ∗ B ) − 1 − A ∗ AA ∗ B ( I H 1 + B ∗ B ) − 1 B ∗ A − B ∗ B ( I H 1 + B ∗ B ) − 1 − B ∗ AA ∗ B ( I H 1 + B ∗ B ) − 1 B ∗ A − B ∗ B ( I H 1 + B ∗ B ) − 1 − B ∗ AA ∗ B ( I H 1 + B ∗ B ) − 1 . (A.25) 52 Proof of Lemma 15 . W e make use of the follo wing notation. Let A, B , C : H 1 → H 2 be three bounded linear operators. Consider the operator ( A B C ) : H 3 1 → H 2 , with ( A B C ) ∗ = A ∗ B ∗ C ∗ : H 2 → H 3 1 . Here H 3 1 = H 1 ⊕ H 1 ⊕ H 1 denotes the direct sum of H 1 with itself, that is H 3 1 = H 1 ⊕ H 1 ⊕ H 1 = { ( v 1 , v 2 , v 3 ) : v 1 , v 2 , v 3 ∈ H 1 } , equipped with the inner product h ( v 1 , v 2 , v 3 ) , ( w 1 , w 2 , w 3 ) i H 3 1 = h v 1 , w 1 i H 1 + h v 2 , w 2 i H 1 + h v 3 , w 3 i H 1 . If { e i } ∞ i =1 is an orthonormal basis for H 1 , then { ( e i , 0 , 0) } ∞ i =1 ∪ { (0 , e i , 0) } ∞ i =1 ∪ { (0 , 0 , e i ) } ∞ i =1 is an orthonormal basis for H 3 1 . W e now utilize this notation in our setting. By the Sherman-Morrison-W oodbury formula, we hav e ( B B ∗ + I H 2 ) − 1 = I H 2 − B ( I H 1 + B ∗ B ) − 1 B ∗ . Thus it follows that ( AA ∗ + I H 2 )( B B ∗ + I H 2 ) − 1 = I H 2 + AA ∗ − B ( I H 1 + B ∗ B ) − 1 B ∗ − AA ∗ B ( I H 1 + B ∗ B ) − 1 B ∗ = I H 2 + C 1 C 2 . Here the operators C 1 , C 2 are defined as follows. C 1 = [ A − B ( I H 1 + B ∗ B ) − 1 − AA ∗ B ( I H 1 + B ∗ B ) − 1 ] : H 3 1 → H 2 , C 2 = A ∗ B ∗ B ∗ : H 2 → H 3 1 . The operator C 2 C 1 : H 3 1 → H 3 1 is giv en by C 2 C 1 = A ∗ A − A ∗ B ( I H 1 + B ∗ B ) − 1 − A ∗ AA ∗ B ( I H 1 + B ∗ B ) − 1 B ∗ A − B ∗ B ( I H 1 + B ∗ B ) − 1 − B ∗ AA ∗ B ( I H 1 + B ∗ B ) − 1 B ∗ A − B ∗ B ( I H 1 + B ∗ B ) − 1 − B ∗ AA ∗ B ( I H 1 + B ∗ B ) − 1 . 53 It follows from Lemma 14 that det α [( AA ∗ + I H 2 )( B B ∗ + I H 2 ) − 1 ] p + β [( AA ∗ + I H 2 )( B B ∗ + I H 2 ) − 1 ] q α + β = det α ( I H 2 + C 1 C 2 ) p + β ( I H 2 + C 1 C 2 ) q α + β = det α ( C 2 C 1 + I H 1 ⊗ I 3 ) p + β ( C 2 C 1 + I H 1 ⊗ I 3 ) q α + β . This completes the proof. Proof of Theorem 23 . Let Λ + γ µ I = ( B B ∗ + µI H 2 ) − 1 / 2 ( AA ∗ + γ I )( B B ∗ + µI ) − 1 / 2 and Z + γ µ I = ( AA ∗ + γ I )( B B ∗ + µI ) − 1 , with µ γ Z + I = ( AA ∗ γ + I )( B B ∗ µ + I ) − 1 . By Theorem 8, we hav e D ( α,β ) r [( AA ∗ + γ I H 2 ) , ( B B ∗ + µI H 2 )] = r ( δ − α α + β ) αβ log γ µ + 1 αβ log α ( γ µ ) p + β ( γ µ ) − q α + β ! + 1 αβ log det " α (Λ + γ µ I ) p + β (Λ + γ µ I ) − q α ( γ µ ) p + β ( γ µ ) − q # , with p = r (1 − δ ) and q = r δ . The determinant in the last term is det " α (Λ + γ µ I ) p + β (Λ + γ µ I ) − q α ( γ µ ) p + β ( γ µ ) − q # = det " α ( γ µ ) p ( µ γ Λ + I ) p + β ( γ µ ) − q ( µ γ Λ + I ) − q α ( γ µ ) p + β ( γ µ ) − q # = det " α ( γ µ ) p ( µ γ Z + I ) p + β ( γ µ ) − q ( µ γ Z + I ) − q α ( γ µ ) p + β ( γ µ ) − q # = det " α ( γ µ ) p ( C + I H 1 ⊗ I 3 ) p + β ( γ µ ) − q ( C + I H 1 ⊗ I 3 ) − q α ( γ µ ) p + β ( γ µ ) − q # by Lemma 15, where C = A ∗ A γ − A ∗ B √ γ µ ( I H 1 + B ∗ B µ ) − 1 − A ∗ AA ∗ B γ √ γ µ ( I H 1 + B ∗ B µ ) − 1 B ∗ A √ γ µ − B ∗ B µ ( I H 1 + B ∗ B µ ) − 1 − B ∗ AA ∗ B γ µ ( I H 1 + B ∗ B µ ) − 1 B ∗ A √ γ µ − B ∗ B µ ( I H 1 + B ∗ B µ ) − 1 − B ∗ AA ∗ B γ µ ( I H 1 + B ∗ B µ ) − 1 , which is obtained by replacing AA ∗ and B B ∗ in Lemma 15 with AA ∗ γ and B B ∗ µ , re- spectiv ely . This completes the proof of the theorem. 54 Proof of Theorem 24 . Let Z + γ µ I = ( AA ∗ + γ I )( B B ∗ + µI ) − 1 . By the finite- dimensional formula giv en in Eq. (19), we hav e D ( α,β ) r [( AA ∗ + γ I H 2 ) , ( B B ∗ + µI H 2 )] = 1 αβ log det " α ( Z + γ µ I ) p + β ( Z + γ µ I ) − q α + β # = 1 αβ " log α ( γ µ ) p + β ( γ µ ) − q α + β !# dim( H 2 ) + 1 αβ log det " α ( Z + γ µ I ) p + β ( Z + γ µ I ) − q α ( γ µ ) p + β ( γ µ ) − q # . As in the proof of Theorem 23, the determinant in last term in the abov e expression is det " α (Λ + γ µ I ) p + β (Λ + γ µ I ) − q α ( γ µ ) p + β ( γ µ ) − q # = det " α ( γ µ ) p ( C + I H 1 ⊗ I 3 ) p + β ( γ µ ) − q ( C + I H 1 ⊗ I 3 ) − q α ( γ µ ) p + β ( γ µ ) − q # . This giv es us the final expression. Proof of Theorem 25 . W e consider the linear operators A = 1 √ m Φ( x ) J m : R m → H K , B = 1 √ m Φ( y ) J m : R m → H K . The desired expression then follo ws from Theorem 23. Proof of Theorem 26 . This is proved in the same way as Theorem 25, except that we in vok e Theorem 24. Appendix A.8. Pr oofs for the metric pr operties In this section, we pro ve Theorems 19, 20, 21, which lead to the proofs of Theorems 4 and 22. W e present two sets of separate proofs for Theorems 4 and 22, one simpler proof for the particular case α = 1 / 2 , which corresponds to the infinite-dimensional symmetric Stein div ergence, and one general proof for any α > 0 . The former case utilizes Theorem 28 and the latter case utilizes Theorem 30, both of which should be of interest in their own right. 55 Appendix A.8.1. The case of the infinite-dimensional symmetric Stein diver gence Consider the first case α = 1 / 2 , which corresponds to the infinite-dimensional symmetric Stein div ergence. Lemma 16. Let H be a separ able Hilbert space. Let A, B , C : H → H be self-adjoint finite-rank oper ators, such that A + I > 0 , B + I > 0 , C + I > 0 . Then s log det( A + B 2 + I ) p det( A + I ) det( B + I ) ≤ s log det( A + C 2 + I ) p det( A + I ) det( C + I ) + s log det( C + B 2 + I ) p det( C + I ) det( B + I ) . (A.26) Proof of Lemma 16 . Since A, B , C are all finite-rank operators, there exists a finite- dimensional subspace H n ⊂ H , with dim( H n ) = n for some n ∈ N , such that range( A ) ⊂ H n , range( B ) ⊂ H n , and range( C ) ⊂ H n . Let A n = A H n : H n → H n , B n = B H n : H n → H n , C n = C H n : H n → H n . Then A n , B n , C n are linear operators on the finite-dimensional space H n and thus are represented by n × n matrices, which we denote by the same symbols. W e also have ( A + B ) n = ( A + B ) H n = A H n + B H n = A n + B n , ( A + C ) n = A n + C n , ( C + B ) n = B n + C n . Applying the finite-dimensional result in [16], we then obtain s log det( A n + B n 2 + I n ) p det( A n + I n ) det( B n + I n ) ≤ s log det( A n + C n 2 + I n ) p det( A n + I n ) det( C n + I n ) + s log det( C n + B n 2 + I n ) p det( C n + I n ) det( B n + I n ) . It is clear that the non-zero eigen values of A and A n are the same, so that det( A + I ) = det( A n + I n ) and the same holds true for the other operators. This gi ves us the final result. Proof of Theorem 21 (T riangle inequality- square root of symmetric Stein divergence) . Let { A n } n ∈ N , { B n } n ∈ N , { C n } n ∈ N be sequences of finite-rank operators with || A n − A || tr → 0 , || B n − B || tr → 0 , || C n − C || tr → 0 , as n → ∞ . 56 By Lemma 16, we hav e s log det( A n + B n 2 + I ) p det( A n + I ) det( B n + I ) ≤ s log det( A n + C n 2 + I ) p det( A n + I ) det( C n + I ) + s log det( C n + B n 2 + I ) p det( C n + I ) det( B n + I ) . By Theorem 3.5 in [22], as n → ∞ , we have det( A n + I ) → det( A + I ) , det( B n + I ) → det( B + I ) , det( A n + B n 2 + I ) → det( A + B 2 + I ) , and the same holds true for the other operators. Thus by taking the limit as n → ∞ in the abov e triangle inequality for ( A n + I ) , ( B n + I ) and ( C n + I ) , we obtain the final triangle inequality for ( A + I ) , ( B + I ) , and ( C + I ) . The following is the specialization of Theorem 4 when α = 1 / 2 . Theorem 27 ( Metric property - square root of symmetric Stein divergence ) . Let γ > 0 , γ ∈ R be fixed. The square r oot of the infinite-dimensional symmetric Stein diver gence q D (1 / 2 , 1 / 2) 1 [( A + γ I ) , ( B + γ I )] is a metric on PT r( H )( γ ) . Proof of Theorem 27 . W e hav e already sho wn the positivity and symmetry of D (1 / 2 , 1 / 2) 1 [( A + γ I ) , ( B + γ I )] . It remains for us to show the triangle inequality , namely q D (1 / 2 , 1 / 2) 1 [( A + γ I ) , ( B + γ I )] ≤ q D (1 / 2 , 1 / 2) 1 [( A + γ I ) , ( C + γ I )] + q D (1 / 2 , 1 / 2) 1 [( C + γ I ) , ( B + γ I )] , for any three operators ( A + γ I ) , ( B + γ I ) , ( C + γ I ) ∈ PT r( H ) . W e have D (1 / 2 , 1 / 2) 1 [( A + γ I ) , ( B + γ I )] = 4 log " det X ( ( A + γ I )+( B + γ I ) 2 ) det X ( A + γ I ) 1 / 2 det X ( B + γ I ) 1 / 2 # = 4 log " det( A + B 2 γ + I ) det( A γ + I ) 1 / 2 det( B γ + I ) 1 / 2 # . Thus the triangle inequality for D (1 / 2 , 1 / 2) 1 [( A + γ I ) , ( B + γ I )] follows that stated in Theorem 21. 57 Lemma 17. Let H be a separable Hilbert space. Let A, B : H → H be self-adjoint finite-rank oper ators, with maximum rank n , n ∈ N , such that A + I > 0 , B + I > 0 . Then n Y j =1 λ j ( A ) + λ j ( B ) 2 + 1 ≤ det A + B 2 + I . (A.27) Proof of Lemma 17 . Since A, B are both finite-rank operators, there exists a finite- dimensional subspace H n ⊂ H , with dim( H n ) = n , such that range( A ) ⊂ H n , range( B ) ⊂ H n . Let A n = A H n : H n → H n , B n = B H n : H n → H n . Then A n , B n are linear operators on the finite-dimensional space H n and thus are represented by n × n matrices, which we denote by the same symbols. W e also have ( A + B ) n = ( A + B ) H n = A H n + B H n = A n + B n . Thus we can apply the follo wing inequality for finite-dimensional SPD matrices ([23]) n Y j =1 λ j ( A n ) + λ j ( B n ) 2 + 1 = n Y j =1 λ j ( A n + I n ) + λ j ( B n + I n ) 2 ≤ det A n + B n 2 + I n . W e note that the non-zero eigen values of A n , B n are the same as those of A, B , respec- tiv ely , with the maximum number being n , and det( A + B 2 + I ) = det( A n + B n 2 + I n ) . T ogether with the previous inequality , this giv es us the final result. Theorem 28. Let H be a separable Hilbert space. Let A, B : H → H be self-adjoint trace class oper ators, such that A + I > 0 , B + I > 0 . Then ∞ Y j =1 λ j ( A ) + λ j ( B ) 2 + 1 ≤ det A + B 2 + I . (A.28) Proof of Theorem 28 . Let A = P ∞ j =1 λ j ( A ) φ j ⊗ φ j denote the spectral decomposi- tion for A . For each n ∈ N , define A n = n X j =1 λ j ( A ) φ j ⊗ φ j . 58 Then A n is a finite-rank operator with the eigen values being the first n eigen values of A and lim n →∞ || A n − A || tr = 0 . In the same way , we construct a sequence of finite-rank operators B n with lim n →∞ || B n − B || tr = 0 , so that lim n →∞ || ( A n + B n ) − ( A + B ) || tr = 0 . By Theorem 3.5 in [22], as n → ∞ , we then have lim n →∞ det A n + B n 2 + I = det A + B 2 + I . Applying Lemma 17 to A n and B n , we hav e n Y j =1 λ j ( A n ) + λ j ( B n ) 2 + 1 ≤ det A n + B n 2 + I . (A.29) The final result is then obtained by taking the limit as n → ∞ , noting that the eigen- values of A n , B n , are precisely the first n eigen v alues of A, B , respectively . The following is the specialization of Theorem 22 when α = 1 / 2 . Theorem 29. Let H be a separable Hilbert space. Let A, B : H → H be self-adjoint trace class operators, such that A + I > 0 , B + I > 0 . Let Eig( A ) , Eig( B ) : ` 2 → ` 2 be diagonal operators with the diagonals consisting of the eigen values of A and B , r espectively , in decreasing or der . Then D (1 / 2 , 1 / 2) 1 [(Eig( A ) + I ) , (Eig ( B ) + I )] ≤ D (1 / 2 , 1 / 2) 1 [( A + I ) , ( B + I )] . (A.30) Proof of Theorem 29 . By definition, we hav e D (1 / 2 , 1 / 2) 1 [(Eig( A ) + I ) , (Eig ( B ) + I )] = 4 log " det( Eig( A )+Eig( B ) 2 + I ) p det(Eig( A ) + I ) det(Eig ( B ) + I ) # = 4 log Q ∞ j =1 h λ j ( A )+ λ j ( B ) 2 + 1 i p det( A + I ) det( B + I ) ≤ 4 log " det( A + B 2 + I ) p det( A + I ) det( B + I ) # by Theorem 28 = D (1 / 2 , 1 / 2) 1 [( A + I ) , ( B + I )] . This completes the proof. 59 Appendix A.8.2. The general case W e now consider the general case α > 0 . W e need the following results. In the following, let C p ( H ) denote the class of p th Schatten class operators on H , under the norm || || p , 1 ≤ p ≤ ∞ , which is defined by || A || p = [ ∞ X k =1 λ p k ( A ∗ A ) 1 / 2 )] 1 /p , (A.31) with C 1 ( H ) being the space of trace class operators T r( H ) , C 2 ( H ) being the space of Hilbert-Schmidt operators HS( H ) , and C ∞ ( H ) being the set of compact operators under the operator norm || || . Theorem 30. Let r ∈ R be fixed but arbitrary . Assume that 1 ≤ p ≤ ∞ . Let { A n } n ∈ N ∈ Sym( H ) ∩ C p ( H ) , A ∈ Sym( H ) ∩ C p ( H ) be such that I + A > 0 , I + A n > 0 ∀ n ∈ N . Assume that lim n →∞ || A n − A || p = 0 . Then lim n →∞ || ( I + A n ) r − ( I + A ) r || p = 0 . (A.32) Proof of Theorem 30 . (i) W e first prove that lim n →∞ || ( I + A n ) r − ( I + A ) r || p = 0 , 0 ≤ r ≤ 1 . (A.33) The case r = 0 is trivial. Let us prove this for 0 < r ≤ 1 . For this limit, we make use of the follo wing result from [24] (Corollary 3.2), which states that for any tw o positi ve operators A, B on H such that A ≥ c > 0 , B ≥ c > 0 , and any operator X on H , || A r X − X B r || p ≤ r c r − 1 || AX − X B || p , (A.34) where 0 < r ≤ 1 and || || p , 1 ≤ p ≤ ∞ , denotes the Schatten p -norm. By the assumption I + A > 0 , there exists M A > 0 such that h x, ( I + A ) x i ≥ M A || x || 2 ∀ x ∈ H . By the assumption lim n →∞ || A n − A || p = 0 , for any satisfying 0 < < M A , there exists N = N ( ) ∈ N such that || A n − A || p < ∀ n ≥ N . Then ∀ x ∈ H , |h x, ( A n − A ) x i| ≤ || A n − A || || x || 2 ≤ || A n − A || p || x || 2 ≤ || x || 2 . 60 It thus follows that ∀ x ∈ H , h x, ( I + A n ) x i = h x, ( I + A ) x i + h x, ( A n − A ) x i ≥ ( M A − ) || x || 2 . Thus we have I + A ≥ M A > 0 , I + A n ≥ M A − > 0 ∀ n ≥ N = N ( ) . Then, applying Eq. (A.34), we hav e for all n ≥ N , || ( I + A n ) r − ( I + A ) r || p ≤ r ( M A − ) r − 1 || ( I + A n ) − ( I + A ) || p = r 1 M A − 1 − r || A n − A || p , which implies lim n →∞ || ( I + A n ) r − ( I + A ) r || p = 0 . This completes the proof of the first limit. (ii) For r > 1 , we proceed by induction as follo ws. W e have || ( I + A n ) r − ( I + A ) r || p ≤ || ( I + A n ) r − ( I + A n )( I + A ) r − 1 || p + || ( I + A n )( I + A ) r − 1 − ( I + A ) r || p ≤ || I + A n || || ( I + A n ) r − 1 − ( I + A ) r − 1 || p + || A n − A || p || ( I + A ) r − 1 || . Thus this case follows from the case 0 ≤ r ≤ 1 by induction. (iii) W e now prov e that lim n →∞ || ( I + A n ) − 1 − ( I + A ) − 1 || p = 0 . (A.35) W e have ∀ n ≥ N = N ( ) , || ( I + A n ) − 1 − ( I + A ) − 1 || p = || ( I + A n ) − 1 [( I + A n ) − ( I + A )]( I + A ) − 1 || p ≤ || ( I + A n ) − 1 || || A n − A || p || ( I + A ) − 1 || ≤ 1 M A ( M A − ) || A n − A || p , which implies that lim n →∞ || ( I + A n ) − 1 − ( I + A ) − 1 || p = 0 . (iii) W e next prove that lim n →∞ || ( I + A n ) − r − ( I + A ) − r || p = 0 , 0 < r ≤ 1 . (A.36) 61 W e have ( I + A ) − 1 ≥ 1 max { (1 + λ k ( A )) : k ∈ N } = 1 || I + A || > 0 . From the limit lim n →∞ || A n − A || = 0 , it follows that for any satisfying 0 < < || I + A || , there exists M = M ( ) ∈ N such that ∀ n ≥ M , || I + A || − ≤ || I + A n || ≤ || I + A || + . It follows that ∀ n ≥ M , ( I + A n ) − 1 ≥ 1 max { (1 + λ k ( A n )) : k ∈ N } = 1 || I + A n || ≥ 1 || I + A || + . Hence in voking Eq. (A.34) ag ain, we obtain ∀ n ≥ M || ( I + A n ) − r − ( I + A ) − r || p ≤ r ( || I + A || + ) 1 − r || ( I + A n ) − 1 − ( I + A ) − 1 || p , which implies that lim n →∞ || ( I + A n ) − r − ( I + A ) − r || p = 0 by the previous limit, when r = 1 . (iv) By an induction ar gument as in step (ii), we then obtain that lim n →∞ || ( I + A n ) − r − ( I + A ) − r || p = 0 , ∀ r > 1 . (A.37) This completes the proof. Lemma 18. Let H be a separable Hilbert space. Assume that { A n } n ∈ N , A are trace class operators on H such that ( I + A ) > 0 , ( I + A n ) > 0 ∀ n ∈ N . Assume that || A n − A || tr = 0 as n → ∞ . Then A n ( I + A n ) − 1 and A ( I + A ) − 1 ar e trace class operators and lim n →∞ || A n ( I + A n ) − 1 − A ( I + A ) − 1 || tr = 0 . (A.38) Proof of Lemma 18 . It is obvious that, given that A n and A are trace class operators, both A n ( I + A n ) − 1 and A ( I + A ) − 1 are trace class operators. W e have || A n ( I + A n ) − 1 − A ( I + A ) − 1 || tr = || ( I + A n ) − 1 A n − A ( I + A ) − 1 || tr = || ( I + A n ) − 1 [ A n ( I + A ) − ( I + A n ) A ]( I + A ) − 1 || tr = || ( I + A n ) − 1 [ A n − A ]( I + A ) − 1 || tr ≤ || ( I + A n ) − 1 || || A n − A || tr || ( I + A ) − 1 || . 62 By the assumption I + A > 0 , there exists M A > 0 such that h x, ( I + A ) x i ≥ M A || x || 2 ∀ x ∈ H . By the assumption lim n →∞ || A n − A || tr = 0 , for any satisfying 0 < < M A , there exists N = N ( ) ∈ N such that || A n − A || tr < ∀ n ≥ N . Then ∀ x ∈ H , |h x, ( A n − A ) x i| ≤ || A n − A || || x || 2 ≤ || A n − A || tr || x || 2 ≤ || x || 2 . It thus follows that ∀ x ∈ H , h x, ( I + A n ) x i = h x, ( I + A ) x i + h x, ( A n − A ) x i ≥ ( M A − ) || x || 2 . Thus we have I + A ≥ M A > 0 , I + A n ≥ M A − > 0 ∀ n ≥ N = N ( ) , from which it follows that || ( I + A n ) − 1 || ≤ 1 M A − ∀ N ≥ N ( ) , || ( I + A ) − 1 || ≤ 1 M A . Combining this with the first inequality , we hav e || A n ( I + A n ) − 1 − A ( I + A ) − 1 || tr ≤ 1 M A ( M A − ) || A n − A || tr ∀ n ≥ N , which implies that lim n →∞ || A n ( I + A n ) − 1 − A ( I + A ) − 1 || tr = 0 . This completes the proof. Lemma 19. Let H be a separable Hilbert space. Let { A n } n ∈ N , A , { B n } n ∈ N , B , be self-adjoint, tr ace class operators on H , with lim n →∞ || A n − A || tr = 0 , lim n →∞ || B n − B || tr = 0 . Assume that I + A > 0 , I + B > 0 , I + A n > 0 , I + B n > 0 ∀ n ∈ N . Then ( I + B n ) − 1 / 2 ( I + A n )( I + B n ) − 1 / 2 − I and ( I + B ) − 1 / 2 ( I + A )( I + B ) − 1 / 2 − I ar e self-adjoint, trace class oper ators on H and lim n →∞ || ( I + B n ) − 1 / 2 ( I + A n )( I + B n ) − 1 / 2 − ( I + B ) − 1 / 2 ( I + A )( I + B ) − 1 / 2 || tr = 0 . (A.39) 63 Proof of Lemma 19 . W e write ( I + B n ) − 1 / 2 ( I + A n )( I + B n ) − 1 / 2 = I − B n ( I + B n ) − 1 − ( I + B n ) − 1 / 2 A n ( I + B n ) − 1 / 2 , ( I + B ) − 1 / 2 ( I + A )( I + B ) − 1 / 2 = I − B ( I + B ) − 1 − ( I + B ) − 1 / 2 A ( I + B ) − 1 / 2 . It follows immediately that [( I + B n ) − 1 / 2 ( I + A n )( I + B n ) − 1 / 2 − I ] and [( I + B ) − 1 / 2 ( I + A )( I + B ) − 1 / 2 − I ] are self-adjoint, trace class operators on H . By Lemma 18, we hav e lim n →∞ || B n ( I + B n ) − 1 − B ( I + B ) − 1 || tr = 0 . Consider next the dif ference between the third terms of the abov e two expressions || ( I + B n ) − 1 / 2 A n ( I + B n ) − 1 / 2 − ( I + B ) − 1 / 2 A ( I + B ) − 1 / 2 || tr ≤ || ( I + B n ) − 1 / 2 A n ( I + B n ) − 1 / 2 − ( I + B n ) − 1 / 2 A ( I + B n ) − 1 / 2 || tr + || ( I + B n ) − 1 / 2 A ( I + B n ) − 1 / 2 − ( I + B n ) − 1 / 2 A ( I + B ) − 1 / 2 || tr + || ( I + B n ) − 1 / 2 A ( I + B ) − 1 / 2 − ( I + B ) − 1 / 2 A ( I + B ) − 1 / 2 || tr . (A.40) By the assumption I + A > 0 , I + B > 0 , there exist constants M A > 0 , M B > 0 such that I + A ≥ M A , I + B ≥ M B . As in the proof of Lemma 18, since lim n →∞ || A n − A || = 0 , lim n →∞ || B n − B || = 0 , for any 0 < < min { M A , M B } , there exist N A = N A ( ) ∈ N , N B = N B ( ) ∈ N , such that I + A n ≥ M A − , ∀ n ≥ N A , I + B n ≥ M B − ∀ n ≥ N B . The first term on the right hand side of the inequality in Eq. (A.40) is || ( I + B n ) − 1 / 2 ( A n − A )( I + B n ) − 1 / 2 || tr ≤ || A n − A || tr || ( I + B n ) − 1 / 2 || 2 ≤ 1 M B − || A n − A || tr ∀ n ≥ N B . The second term is || ( I + B n ) − 1 / 2 A [( I + B n ) − 1 / 2 − ( I + B ) − 1 / 2 ] || tr ≤ || ( I + B n ) − 1 / 2 || || A || tr || [( I + B n ) − 1 / 2 − ( I + B ) − 1 / 2 ] || ≤ 1 √ M B − || A || tr || [( I + B n ) − 1 / 2 − ( I + B ) − 1 / 2 ] || . 64 Similarly , for the third term, we hav e || ( I + B n ) − 1 / 2 A ( I + B ) − 1 / 2 − ( I + B ) − 1 / 2 A ( I + B ) − 1 / 2 || tr ≤ || A ( I + B ) − 1 / 2 || tr || [( I + B n ) − 1 / 2 − ( I + B ) − 1 / 2 ] || . By Theorem 30, we hav e || ( I + B n ) − 1 / 2 − ( I + B ) − 1 / 2 || ≤ || ( I + B n ) − 1 / 2 − ( I + B ) − 1 / 2 || tr → 0 as n → ∞ . The final result is obtained by combining all of the abov e inequalities. Lemma 20. Let H be a separable Hilbert space. Let A, B , C : H → H be self-adjoint, finite-rank oper ators such that ( I + A ) > 0 , ( I + B ) > 0 , ( I + C ) > 0 . Then D ( α,α ) 2 α [( I + A ) , ( I + B )] ≤ D ( α,α ) 2 α [( I + A ) , ( I + C )] + D ( α,α ) 2 α [( I + C ) , ( I + B )] . (A.41) Proof of Lemma 20 . Since A, B , C are all finite-rank operators, there exists a finite- dimensional subspace H n ⊂ H , with dim( H n ) = n for some n ∈ N , such that range( A ) ⊂ H n , range( B ) ⊂ H n , and range( C ) ⊂ H n . Let A n = A H n : H n → H n , B n = B H n : H n → H n , C n = C H n : H n → H n . Then A n , B n , C n are linear operators on the finite-dimensional space H n and thus are represented by n × n matrices, which we denote by the same symbols. W e have ( I + A n )( I + B n ) − 1 = ( I + A n )[ I − B n ( I + B n ) − 1 ] = I + A n − B n ( I + B n ) − 1 − A n B n ( I + B n ) − 1 , ( I + A )( I + B ) − 1 = I + A − B ( I + B ) − 1 − AB ( I + B ) − 1 , where A − B ( I + B ) − 1 − AB ( I + B ) − 1 is of finite rank, since both A and B are, with range in H n . It is clear that [ A − B ( I + B ) − 1 − AB ( I + B ) − 1 ] H n = A n − B n ( I + B n ) − 1 − A n B n ( I + B n ) − 1 . Thus the nonzero eigen values of ( I + A )( I + B ) − 1 − I = [ A − B ( I + B ) − 1 − AB ( I + B ) − 1 ] and ( I + A n )( I + B n ) − 1 − I = [ A n − B n ( I + B n ) − 1 − A n B n ( I + B n ) − 1 ] 65 are the same. It follows that D ( α,α ) 2 α [( I + A ) , ( I + B )] = 1 α 2 log det [( I + A )( I + B ) − 1 ] α + [( I + A )( I + B ) − 1 ] − α 2 = 1 α 2 log det [( I + A n )( I + B n ) − 1 ] α + [( I + A n )( I + B n ) − 1 ] − α 2 = D ( α,α ) 2 α [( I + A n ) , ( I + B n )] . Similarly , we hav e D ( α,α ) 2 α [( I + A ) , ( I + C )] = D ( α,α ) 2 α [( I + A n ) , ( I + C n )] , D ( α,α ) 2 α [( I + C ) , ( I + B )] = D ( α,α ) 2 α [( I + C n ) , ( I + B n )] . Applying the triangle inequality from the finite-dimensional setting [15], we get D ( α,α ) 2 α [( I + A n ) , ( I + B n )] ≤ D ( α,α ) 2 α [( I + A n ) , ( I + C n )] + D ( α,α ) 2 α [( I + C n ) , ( I + B n )] . T ogether with the above expressions, this gi ves us the final result. Proof of Theorem 19 (Con vergence in trace norm) . Let I + Λ = ( I + B ) − 1 / 2 ( I + A )( I + B ) − 1 / 2 and I + Λ n = ( I + B n ) − 1 / 2 ( I + A n )( I + B n ) − 1 / 2 , with Λ , Λ n ∈ Sym( H ) ∩ T r( H ) . By Lemma 19, we hav e lim n →∞ || Λ n − Λ || tr = 0 . Thus by Theorem 30, we hav e lim n →∞ || ( I + Λ n ) α − ( I + Λ) α || tr = 0 ∀ α ∈ R . By Definition 5, we hav e D ( α,α ) 2 α [( I + A n ) , ( I + B n )] = 1 α 2 log det ( I + Λ n ) α + ( I + Λ n ) − α 2 . T aking limit as n → ∞ and applying the continuity of the Fredholm determinant in the trace norm (e.g. Theorem 3.5 in [22]), we obtain lim n →∞ D ( α,α ) 2 α [( I + A n ) , ( I + B n )] = 1 α 2 log det ( I + Λ) α + ( I + Λ) − α 2 = D ( α,α ) 2 α [( I + A ) , ( I + B )] . 66 This completes the proof. Proof of Theorem 20 (T riangle inequality) . For a fix ed γ > 0 , we have D ( α,α ) 2 α [( A + γ I ) , ( B + γ I )] = 1 α 2 log det X [( A + γ I )( B + γ I ) − 1 ] α + ( A + γ I )( B + γ I ) − 1 ] − α 2 = 1 α 2 logdet [( A γ + I )( B γ + I ) − 1 ] α + ( A γ + I )( B γ + I ) − 1 ] − α 2 ! , which thus reduces to the case γ = 1 . Thus it suffices for us to prov e in triangle inequality for γ = 1 . Let { A n } n ∈ N , { B n } n ∈ N , and { C n } n ∈ N be sequences of finite-rank operators such that lim n →∞ || A n − A || tr = 0 , lim n →∞ || B n − B || tr = 0 , lim n →∞ || C n − C || tr = 0 . By Lemma 20, we hav e the triangle inequality q D ( α,α ) 2 α [( I + A n ) , ( I + B n )] ≤ q D ( α,α ) 2 α [( I + A n ) , ( I + C n )] + q D ( α,α ) 2 α [( I + C n ) , ( I + B n )] . T aking limits on both side as n → ∞ and in voking Theorem 19, we then obtain q D ( α,α ) 2 α [( I + A ) , ( I + B )] ≤ q D ( α,α ) 2 α [( I + A ) , ( I + C )] + q D ( α,α ) 2 α [( I + C ) , ( I + B )] . This completes the proof of the theorem. Proof of Theorem 4 (Metric property) . The case α = 0 corresponds to the affine- in v ariant Riemannian distance on the Hilbert manifold Σ( H ) [18], which is still a metric when restricted to PT r( H ) . Consider the case α > 0 . The positi vity and symmetry of the diver gence D ( α,α ) 2 α [( A + γ I ) , ( B + γ I )] are from Theorems 1 and 13, respectiv ely . The triangle inequality for q D ( α,α ) 2 α [( A + γ I ) , ( B + γ I )] is from Theorem 20. Thus q D ( α,α ) 2 α [( A + γ I ) , ( B + γ I )] is a metric on PT r( H )( γ ) . 67 Proof of Theorem 22 (Diagonalization) . Consider first the case α > 0 . As in the proof of Theorem 20, it suffices for us to prove this theorem for the case γ = 1 . Let A = P ∞ j =1 λ j ( A ) φ j ⊗ φ j denote the spectral decomposition for A . For each n ∈ N , define A n = n X j =1 λ j ( A ) φ j ⊗ φ j . Then A n is a finite-rank operator with the eigen values being the first n eigen values of A and lim n →∞ || A n − A || tr = 0 . In the same way , we construct a sequence of finite-rank operators B n with lim n →∞ || B n − B || tr = 0 . By construction, we also have lim n →∞ || Eig( A n ) − Eig ( A ) || tr = 0 , lim n →∞ || Eig( B n ) − Eig ( B ) || tr = 0 . Thus by Theorem 19, we hav e lim n →∞ D ( α,α ) 2 α [(Eig( A n ) + I ) , (Eig( B n ) + I )] = D ( α,α ) 2 α [(Eig( A ) + I ) , (Eig ( B ) + I )] , lim n →∞ D ( α,α ) 2 α [( A n + I ) , ( B n + I )] = D ( α,α ) 2 α [( A + I ) , ( B + I )] . Since A n , B n can be identified with finite-dimensional matrices, as in the proof of Lemma 16, we can apply the corresponding finite-dimensional result in [15] to obtain D ( α,α ) 2 α [(Eig( A n ) + I ) , (Eig( B n ) + I )] ≤ D ( α,α ) 2 α [( A n + I ) , ( B n + I )] . Thus taking limits as n → ∞ giv es D ( α,α ) 2 α [(Eig( A ) + I ) , (Eig ( B ) + I )] ≤ D ( α,α ) 2 α [( A + I ) , ( B + I )] . Letting α → 0 on both sides of the above expression, we also obtain the result for the case α = 0 . This completes the proof of the theorem. References [1] G. Mosto w , Some ne w decomposition theorems for semi-simple groups, Memoirs of the American Mathematical Society 14 (1955) 31–54. [2] J. D. Lawson, Y . Lim, The geometric mean, matrices, metrics, and more, The American Mathematical Monthly 108 (9) (2001) 797–812. 68 [3] R. Bhatia, Positiv e Definite Matrices, Princeton Uni versity Press, 2007. [4] V . Arsigny , P . Fillard, X. Pennec, N. A yache, Geometric means in a novel vector space structure on symmetric positiv e-definite matrices, SIAM J. on Matrix An. and App. 29 (1) (2007) 328–347. [5] X. Pennec, P . Fillard, N. A yache, A Riemannian frame work for tensor computing, International Journal of Computer V ision 66 (1) (2006) 41–66. [6] O. Tuzel, F . Porikli, P . Meer, Pedestrian detection via classification on Rieman- nian manifolds, IEEE T ransactions on P attern Analysis and Machine Intelligence 30 (10) (2008) 1713–1727. [7] B. Kulis, M. A. Sustik, I. S. Dhillon, Low-rank kernel learning with Bregman matrix div ergences, The Journal of Machine Learning Research 10 (2009) 341– 376. [8] A. Cherian, S. Sra, A. Banerjee, N. Papanikolopoulos, Jensen-Bregman LogDet div ergence with application to ef ficient similarity search for cov ariance matrices, IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (9) (2013) 2161–2174. [9] S. Jayasumana, R. Hartley , M. Salzmann, H. Li, M. Harandi, Kernel methods on the Riemannian manifold of symmetric positi ve definite matrices, in: IEEE Conference on Computer V ision and Pattern Recognition (CVPR), 2013, pp. 73– 80. [10] P . Formont, J.-P . Ovarlez, F . Pascal, On the use of matrix information geome- try for polarimetric SAR image classification, in: Matrix Information Geometry , Springer , 2013, pp. 257–276. [11] F . Barbaresco, Information geometry of cov ariance matrix: Cartan-Siegel homo- geneous bounded domains, Mosto w/Berger fibration and Frechet median, in: Ma- trix Information Geometry , Springer , 2013, pp. 199–255. 69 [12] D. A. Bini, B. Iannazzo, Computing the Karcher mean of symmetric positive definite matrices, Linear Algebra and its Applications 438 (4) (2013) 1700–1710. [13] P . Li, Q. W ang, W . Zuo, L. Zhang, Log-Euclidean kernels for sparse represen- tation and dictionary learning, in: International Conference on Computer V ision (ICCV), 2013, pp. 1601 – 1608. [14] Z. Chebbi, M. Moakher, Means of Hermitian positive-definite matrices based on the log-determinant α -diver gence function, Linear Algebra and its Applications 436 (7) (2012) 1872–1889. [15] A. Cichocki, S. Cruces, S. Amari, Log-Determinant diver gences re visited: Alpha- Beta and Gamma Log-Det div ergences, Entrop y 17 (5) (2015) 2988–3034. [16] S. Sra, A new metric on the manifold of kernel matrices with application to ma- trix geometric means, in: Advances in Neural Information Processing Systems (NIPS), 2012, pp. 144–152. [17] H. Minh, Infinite-dimensional Log-Determinant div ergences between positive definite trace class operators, Linear Algebra and Its Applications (In Press) (2016) http://dx.doi.org/10.1016/j.laa.2016.09.018. [18] G. Larotonda, Nonpositi ve curvature: A geometrical approach to Hilbert-Schmidt operators, Differential Geometry and its Applications 25 (2007) 679–700. [19] H. Minh, M. San Biagio, V . Murino, Log-Hilbert-Schmidt metric between pos- itiv e definite operators on Hilbert spaces, in: Advances in Neural Information Processing Systems (NIPS), 2014, pp. 388–396. [20] H. Q. Minh, Af fine-in variant Riemannian distance between infinite-dimensional cov ariance operators, in: Geometric Science of Information, 2015, pp. 30–38. [21] K. Fan, On a theorem of Weyl concerning eigen v alues of linear transformations: II, Proceedings of the National Academy of Sciences of the United States of America 36 (1) (1950) 31. 70 [22] B. Simon, Notes on infinite determinants of Hilbert space operators, Adv ances in Mathematics 24 (1977) 244–273. [23] R. Bhatia, Matrix analysis, V ol. 169, Springer Science & Business Media, 2013. [24] F . Kittaneh, H. Kosaki, Inequalities for the Schatten p-norm V, Publications of the Research Institute for Mathematical Sciences 23 (2) (1987) 433–443. 71
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment