Infinite-dimensional Log-Determinant divergences II: Alpha-Beta divergences

Inﬁnite-dimensional Log-Determinant di ver gences II: Alpha-Beta di ver gences H ` a Quang Minh Istituto Italiano di T ecnologia, V ia More go 30, Genova 16163, IT ALY Abstract This work presents a parametrized family of di ver gences, namely Alpha-Beta Log- Determinant (Log-Det) div ergences, between positive deﬁnite unitized trace class op- erators on a Hilbert space. This is a generalization of the Alpha-Beta Log-Determinant div ergences between symmetric, positiv e deﬁnite matrices to the inﬁnite-dimensional setting. The family of Alpha-Beta Log-Det di ver gences is highly general and con- tains many div ergences as special cases, including the recently formulated inﬁnite- dimensional afﬁne-in variant Riemannian distance and the inﬁnite-dimensional Alpha Log-Det di ver gences between positiv e deﬁnite unitized trace class operators. In partic- ular , it includes a parametrized family of metrics between positive deﬁnite trace class operators, with the afﬁne-in variant Riemannian distance and the square root of the symmetric Stein di vergence being special cases. For the Alpha-Beta Log-Det di ver- gences between cov ariance operators on a Reproducing K ernel Hilbert Space (RKHS), we obtain closed form formulas via the corresponding Gram matrices. K e ywor ds: 2010 MSC: 47B65, 47L07, 46E22, 15A15 inﬁnite-dimensional Log-Determinant div ergences, Alpha di v ergences, Alpha-Beta div ergences, af ﬁne-in variant Riemannian distance, Stein di vergence, positi ve deﬁnite operators, trace class operators, extended trace, e xtended Fredholm determinant, Reproducing kernel Hilbert spaces, cov ariance operators Email addr ess: minh.haquang@iit.it (H ` a Quang Minh) Pr eprint submitted to Journal of L A T E X T emplates J anuary 14, 2017 1. Introduction Symmetric Positiv e Deﬁnite (SPD) matrices play an important role in many areas of mathematics, statistics, machine learning, optimization, computer vision, and re- lated ﬁelds, see e.g. [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]. The set Sym ++ ( n ) of n × n SPD matrices is an open con ve x cone and can also be equipped with a Riemannian manifold structure. Among the most studied Riemannian metrics on Sym ++ ( n ) are the classical afﬁne-in variant metric [1, 2, 3, 5, 12] and the more recent Log-Euclidean metric [4, 9, 13]. The con vex cone structure of Sym ++ ( n ) , on the other hand, gi ves rise to distance-like functions such as the Alpha Log-Determinant diver gences [14], which hav e been shown to be special cases of the Alpha-Beta Log-Determinant div ergences [15]. These div ergences are fast to compute and have been shown to work well in various applications [7, 16, 8]. The present work aims to generalize the Alpha-Beta Log-Determinant div ergences to the inﬁnite-dimensional setting. Finite-dimensional Alpha-Beta Log-Determinant divergences . W e recall that for A, B ∈ Sym ++ ( n ) , the Alpha-Beta Log-Determinant (Log-Det) diver gence be- tween A and B is a parametrized family of div ergences deﬁned by (see [15]) D ( α,β ) ( A, B ) = 1 αβ log det  α ( AB − 1 ) β + β ( AB − 1 ) − α α + β  , (1) α 6 = 0 , β 6 = 0 , α + β 6 = 0 . Remark 1 . T o keep our presentation compact, in the following we consider the case α > 0 , β > 0 , as well as the limiting cases α = 0 , β = 0 . Since D ( α,β ) ( A, B ) = D ( − α, − β ) ( B , A ) , the case α < 0 , β < 0 is essentially identical to the previous case. W e do not consider the cases α , β hav e opposite signs, since in those cases the well- deﬁnedness and ﬁniteness of D ( α,β ) r ( A, B ) depends on the spectrum of AB − 1 (see Theorem 2 in [15]), that is it is not a valid di vergence on all of Sym ++ ( n ) . The parametrized family of div ergences deﬁned by Eq.(1) is highly general and admits as special cases many metrics and distance-like functions on Sym ++ ( n ) , in- cluding in particular the following: 2 1. The af ﬁne-in variant Riemannian distance [3], corresponding to the limiting case D (0 , 0) ( A, B ) , with D (0 , 0) ( A, B ) = 1 2 d 2 aiE ( A, B ) = 1 2 || log( B − 1 / 2 AB − 1 / 2 ) || 2 F , (2) where log( A ) denotes the principal logarithm of the matrix A and || || F denotes the Frobenius norm. 2. The Alpha Log-Determinant diver gences [14], corresponding to D ( α, 1 − α ) ( A, B ) , 0 < α < 1 , with D ( α, 1 − α ) ( A, B ) = 1 α (1 − α ) log  det[ αA + (1 − α ) B ] det( A ) α det( B ) 1 − α  . (3) A special case of this div ergence is the symmetric Stein div ergence (also called the Jensen-Bregman LogDet diver gence), corresponding to D (1 / 2 , 1 / 2) ( A, B ) , whose square root is a metric on Sym ++ ( n ) [16], with D (1 / 2 , 1 / 2) ( A, B ) = 4 d 2 stein ( A, B ) = 4 log det( A + B 2 ) p det( A ) det( B ) . (4) 3. The limiting cases β = 0 and α = 0 correspond to, respecti vely , D ( α, 0) ( A, B ) = 1 α 2  tr(( A − 1 B ) α − I ) − α log det( A − 1 B )  , (5) D (0 ,β ) ( A, B ) = 1 β 2  tr(( B − 1 A ) β − I ) − β log det( B − 1 A )  , (6) with D (1 , 0) ( A, B ) = tr( A − 1 B − I ) − log det( A − 1 B ) and D (0 , 1) ( A, B ) = tr( B − 1 A − I ) − log det( B − 1 A ) . Contributions of this work . The current w ork is a continuation and generalization of the author’ s recent work [17]. In [17], we generalized the Alpha Log-Det diver - gences between SPD matrices [14] to the inﬁnite-dimensional Alpha Log-Determinant div ergences between positive deﬁnite unitized trace class operators in a Hilbert space. In the current work, we present a formulation for the Alpha-Beta Log-Det di ver gences between positiv e deﬁnite unitized trace class operators, generalizing the Alpha-Beta div ergences between SPD matrices as deﬁned by Eq.(1). As in the ﬁnite-dimensional setting, the formulation we present here is general and admits as special cases many 3 metrics and distance-like functions between positi ve deﬁnite unitized trace class op- erators, including in particular the follo wing: the inﬁnite-dimensional af ﬁne-in variant Riemannian distance [18]; the inﬁnite-dimensional Alpha Log-Det div ergences [17], a special case of which is the inﬁnite-dimensional symmetric Stein diver gence. For the div ergences between reproducing kernel Hilbert spaces (RKHS) cov ariance opera- tors, we obtain closed form formulas for the Alpha-Beta Log-Det diver gences via the corresponding Gram matrices. Organization . W e provide a summary of the main results of the paper in Section 2, including our deﬁnition of the inﬁnite-dimensional Alpha-Beta Log-Det div ergences. The key concepts in volv ed are described in Section 3. The motiv ations and deriv a- tions leading to our deﬁnition of the Alpha-Beta Log-Det di ver gences are presented in Section 4. W e then show in Section 5 that both the afﬁne-in variant Riemannian dis- tance and the Alpha Log-Det div ergences are special cases of the Alpha-Beta Log-Det div ergences. All mathematical proofs are presented in Appendix A. 2. Summary of main results W e present a summary of our main results in this section, with the detailed technical descriptions provided in subsequent sections. Throughout the paper , let H denote a separable Hilbert space, with dim( H ) = ∞ , unless explicitly stated otherwise. Let L ( H ) be the Banach space of bounded linear operators on H and Sym( H ) ⊂ L ( H ) be the subspace of self-adjoint, bounded operators on H . For A ∈ L ( H ) , we write A > 0 to denote that A is a self-adjoint positi ve deﬁnite operator . Let T r( H ) denote the Banach algebra of trace class operators on H . The set of positive deﬁnite unitized trace class oper ators on H is then deﬁned to be PT r( H ) = { A + γ I > 0 : A = A ∗ , A ∈ T r( H ) , γ ∈ R } . (7) The main purpose of the current work is the generalization of the Alpha-Beta Log- Det div ergence between SPD matrices, as deﬁned in Eq. (1), to that between positi ve deﬁnite unitized trace class operators in PT r( H ) . The following is our deﬁnition of the Alpha-Beta (Log-Det) div ergences in the inﬁnite-dimensional setting. 4 Deﬁnition 1 ( Alpha-Beta Log-Determinant Diver gences ) . Assume that dim( H ) = ∞ . Let α > 0 , β > 0 be ﬁxed. Let r ∈ R , r 6 = 0 be ﬁxed. F or ( A + γ I ) , ( B + µI ) ∈ PT r( H ) , the ( α, β ) -Log-Det diver gence D ( α,β ) r [( A + γ I ) , ( B + µI )] is deﬁned to be D ( α,β ) r [( A + γ I ) , ( B + µI )] = 1 αβ log "  γ µ  r ( δ − α α + β ) det X α (Λ + γ µ I ) r (1 − δ ) + β (Λ + γ µ I ) − rδ α + β !# , (8) wher e Λ + γ µ I = ( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 , δ = αγ r αγ r + β µ r . Equivalently , D ( α,β ) r [( A + γ I ) , ( B + µI )] = 1 αβ log "  γ µ  r ( δ − α α + β ) det X α ( Z + γ µ I ) r (1 − δ ) + β ( Z + γ µ I ) − rδ α + β !# , (9) wher e Z + γ µ I = ( A + γ I )( B + µI ) − 1 . Remark 2 . In Deﬁnition 1, det X denotes the extended F r edholm determinant deﬁned in [17] (see Section 3 below). For γ = 1 , we hav e det X ( A + γ I ) = det( A + I ) , with det on the right hand side being the Fredholm determinant. For dim( H ) < ∞ , det X ( A + γ I ) = det( A + γ I ) , with det on the right hand side being the standard matrix determinant. The quantity D ( α,β ) r [( A + γ I ) , ( B + µI )] where α > 0 , β > 0 , as stated in Deﬁni- tion 1, can be extended to the cases α > 0 , β = 0 and α = 0 , β > 0 , ∀ r ∈ R , r 6 = 0 , via limiting arguments. The following is our deﬁnition in these cases. Deﬁnition 2 ( Limiting cases - I ) . Assume that dim( H ) = ∞ . Let α > 0 , β > 0 , r 6 = 0 be ﬁxed. F or ( A + γ I ) , ( B + µI ) ∈ PT r( H ) , the Log-Det diverg ence D ( α, 0) r [( A + γ I ) , ( B + µI )] is deﬁned to be D ( α, 0) r [( A + γ I ) , ( B + µI )] = r α 2  µ γ  r − 1  log µ γ (10) + 1 α 2 tr X ([( A + γ I ) − 1 ( B + µI )] r − I ) − 1 α 2  µ γ  r log det X [( A + γ I ) − 1 ( B + µI )] r . 5 Similarly , D (0 ,β ) r [( A + γ I ) , ( B + µI )] is deﬁned to be D (0 ,β ) r [( A + γ I ) , ( B + µI )] = r β 2  γ µ  r − 1  log γ µ (11) + 1 β 2 tr X ([( B + µI ) − 1 ( A + γ I )] r − I ) − 1 β 2  γ µ  r log det X [( B + µI ) − 1 ( A + γ I )] r . The following result conﬁrms that the quantity D ( α,β ) r , as deﬁned in Deﬁnitions 1 and 2, is in fact a di ver gence on PT r( H ) . Theorem 1 ( P ositivity ) . Assume the hypothesis stated in Deﬁnitions 1 and 2. Then D ( α,β ) r [( A + γ I ) , ( B + µI )] ≥ 0 (12) D ( α,β ) r [( A + γ I ) , ( B + µI )] = 0 ⇐ ⇒ A = B , γ = µ. (13) Theorem 2 ( Special cases - I ) . The following are some of the most important special cases of Deﬁnitions 1 and 2. 1. The inﬁnite-dimensional afﬁne-in variant Riemannian distance d aiHS [( A + γ I ) , ( B + µI )] [18], whic h corr esponds to the limiting case lim α → 0 D ( α,α ) r [( A + γ I ) , ( B + µI )] , wher e r = r ( α ) is smooth, with r (0) = 0 , r 0 (0) 6 = 0 , and r ( α ) 6 = 0 for α 6 = 0 . The limit is given by lim α → 0 D ( α,α ) r [( A + γ I ) , ( B + µI )] = [ r 0 (0)] 2 8 d 2 aiHS [( A + γ I ) , ( B + µI )] . (14) In particular , for r = 2 α , lim α → 0 D ( α,α ) 2 α [( A + γ I ) , ( B + µI )] = 1 2 d 2 aiHS [( A + γ I ) , ( B + µI )] . (15) This is the content of Theor em 9. 2. The inﬁnite-dimensional Alpha Log-Determinant diver gences d α logdet [( A + γ I ) , ( B + µI )] [17], with D ( α, 1 − α ) ± 1 [( A + γ I ) , ( B + µI )] = d ± (1 − 2 α ) logdet [( A + γ I ) , ( B + µI )] , (16) 0 ≤ α ≤ 1 . This is the content of Theor em 10. 6 Since the limit lim α → 0 D ( α,α ) r [( A + γ I ) , ( B + µI )] in the ﬁrst part of Theorem 2 is unique, up to the multiplicati ve factor [ r 0 (0)] 2 / 8 , we deﬁne the quantity D (0 , 0) 0 [( A + γ I ) , ( B + µI )] as follo ws. Deﬁnition 3 ( Limiting cases - II ) . F or ( A + γ I ) , ( B + µI ) ∈ PT r( H ) , the Log-Det diver gence D (0 , 0) 0 [( A + γ I ) , ( B + µI )] is deﬁned to be D (0 , 0) 0 [( A + γ I ) , ( B + µI )] = lim α → 0 D ( α,α ) 2 α [( A + γ I ) , ( B + µI )] = 1 2 d 2 aiHS [( A + γ I ) , ( B + µI )] . (17) Since d aiHS [( A + γ I ) , ( B + µI )] is a metric on PT r( H ) , D (0 , 0) 0 [( A + γ I ) , ( B + µI )] is automatically a symmetric div ergence on PT r( H ) . In fact, it is a member of the parametrized f amily D ( α,α ) 2 α [( A + γ I ) , ( B + µI )] , α ≥ 0 , of symmetric div ergences on PT r( H ) , as stated in the follo wing result. Theorem 3 ( Special cases - II ) . The parametrized family D ( α,α ) 2 α [( A + γ I ) , ( B + µI )] , α ≥ 0 , is a family of symmetric diver gences on PT r( H ) , with α = 0 corresponding to the inﬁnite-dimensional afﬁne-in variant Riemannian distance above and α = 1 / 2 corr esponding to the inﬁnite-dimensional symmetric Stein diverg ence, which is given by 1 4 d 0 logdet [( A + γ I ) , ( B + µI )] . Finite-dimensional case . For γ = µ , we have δ = α α + β , so that Eq. (9) becomes D ( α,β ) r [( A + γ I ) , ( B + γ I )] = 1 αβ log det X α [( A + γ I )( B + γ I ) − 1 ] rβ α + β + β [( A + γ I )( B + γ I ) − 1 ] − rα α + β α + β ! . (18) In the ﬁnite-dimensional case, where A and B are two n × n SPD matrices, setting γ = 0 and recalling that det X = det for ﬁnite matrices , we obtain D ( α,β ) r ( A, B ) = 1 αβ log det α ( AB − 1 ) rβ α + β + β ( AB − 1 ) − rα α + β α + β ! . (19) In particular , by setting r = α + β , we recover Eq. (1). For γ = µ , Eq. (10) becomes D ( α, 0) r [( A + γ I ) , ( B + γ I )] (20) = 1 α 2  tr X ([( A + γ I ) − 1 ( B + γ I )] r − I ) − log det X [( A + γ I ) − 1 ( B + γ I )] r  , 7 which reduces to Eq. (5) when A, B ∈ Sym ++ ( n ) , γ = 0 , and r = α . Similarly , Eq. (11) becomes D (0 ,β ) r [( A + γ I ) , ( B + γ I )] (21) = 1 β 2  tr X ([( B + γ I ) − 1 ( A + γ I )] r − I ) − log det X [( B + γ I ) − 1 ( A + γ I )] r  , which reduces to Eq. (6) when A, B ∈ Sym ++ ( n ) , γ = 0 , and r = β . Remark 3 . As in the cases of the Log-Hilbert-Schmidt distance [19], the inﬁnite- dimensional afﬁne-in v ariant Riemannian distance [18, 20], and the inﬁnite-dimensional Alpha Log-Det di vergences [17], we show below that in general, the inﬁnite-dimensional formulation is not obtainable as the limit of the ﬁnite-dimensional version as the dimen- sion approaches inﬁnity . Remark 4 . Except for the case r = α + β , the quantity r in D ( α,β ) r that we introduce here, to the best of our knowledge, has no equiv alence in the existing literature in the ﬁnite-dimensional setting. Remark 5 . Throughout the paper , we employ the following notations. Using the iden- tity ( B + µI ) − 1 = 1 µ I − B µ ( B + µI ) − 1 , we write the operator ( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 as ( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 = Λ + γ µ I ∈ PT r( H ) , (22) where Λ = ( B + µI ) − 1 / 2 A ( B + µI ) − 1 / 2 − γ µ B ( B + µI ) − 1 ∈ T r( H ) . This notation is employed in Eq. (8). Similarly , in Eq. (9), we write ( A + γ I )( B + µI ) − 1 = γ µ I + A ( B + µI ) − 1 − γ µ B ( B + µI ) − 1 = Z + γ µ I , (23) where Z = A ( B + µI ) − 1 − γ µ B ( B + µI ) − 1 ∈ T r( H ) . Metric pr operties . Consider now a special case, where α = β and r = α + β . For simplicity , we consider operators ( A + γ I ) and ( B + µI ) with γ = µ . For γ > 0 , γ ∈ R ﬁxed, we deﬁne the follo wing subset of PT r( H ) PT r( H )( γ ) = { A + γ I > 0 : A ∗ = A, A ∈ T r( H ) } . (24) 8 Remark 6 . Throughout the paper , we assume, unless stated otherwise, that dim( H ) = ∞ , and the condition A + γ I > 0 automatically implies that γ > 0 . When dim( H ) < ∞ , we can set γ = 0 . Theorem 4 ( Metric property ) . Let γ > 0 , γ ∈ R be ﬁxed. The square r oot function q D ( α,α ) 2 α [( A + γ I ) , ( B + γ I )] is a metric on PT r( H )( γ ) for all α ≥ 0 . W e thus hav e a family of metrics between positiv e deﬁnite operators of the form ( A + γ I ) ∈ PT r( H )( γ ) , parametrized by the parameter α ≥ 0 . In particular , with α = 0 in Theorem 4, we obtain the afﬁne-in variant Riemannian distance, and with α = 1 2 we obtain the following metric, which is the square root of the inﬁnite-dimensional Stein div ergence q D (1 / 2 , 1 / 2) 1 [( A + γ I ) , ( B + γ I )] = 2 v u u u t log   det X h ( A + γ I )+( B + γ I ) 2 i det X ( A + γ I ) 1 / 2 det X ( B + γ I ) 1 / 2   . (25) The corresponding ﬁnite-dimensional result [15], where A, B ∈ Sym ++ ( n ) , is recov ered by setting γ = 0 in Theorem 4. In particular , with α = 1 / 2 and A, B ∈ Sym ++ ( n ) , we obtain the corresponding result of [16]. Remark 7 . The analysis of q D ( α,α ) 2 α [( A + γ I ) , ( B + µI )] , where γ 6 = µ , is techni- cally more in volv ed and will be presented in a separate work. 3. Positi ve deﬁnite unitized trace class operators T o generalize the Alpha-Beta Log-Determinant di ver gences from the ﬁnite to inﬁnite- dimensional setting, we need to employ the follo wing concepts • Positiv e deﬁnite operators P ( H ) . • Extended (or unitized) trace class operators T r X ( H ) . • Positiv e deﬁnite unitized trace class operators PT r( H ) . • Extended Fredholm determinant det X on T r X ( H ) . 9 • Exponential, logarithm, and power functions for operators in PT r( H ) and their products. W e discuss in detail below the logarithm and power functions of products of operators in PT r( H ) . Other concepts are brieﬂy re viewed and we refer to [17] for the detailed motiv ations leading to the deﬁnitions of these concepts. Throughout the following, we assume that dim( H ) = ∞ , unless stated explicitly otherwise. Positi ve deﬁnite operators . W e recall that an operator A ∈ L ( H ) is said to be positiv e deﬁnite if there exists a constant M A > 0 such that h x, Ax i ≥ M A || x | 2 ∀ x ∈ H . This is equi valent to saying that A is both strictly positi ve and in vertible. W e denote by P ( H ) the set of all positi ve deﬁnite operators on H . Extended trace class operators . Let T r( H ) denote the set of trace class operators on H , the set of extended (or unitized) trace class operators on H is deﬁned to be T r X ( H ) = { A + γ I : A ∈ T r( H ) , γ ∈ R } . Equipped with the extended trace class norm || A + γ I || tr X = || A || tr + | γ | = tr | A | + | γ | , T r X ( H ) becomes a Banach algebra. For ( A + γ I ) ∈ T r X ( H ) , its extended trace is deﬁned to be tr X ( A + γ I ) = tr( A ) + γ . Thus by this deﬁnition tr X ( I ) = 1 , in contrast to usual trace deﬁnition, according to which tr( I ) = ∞ . Extended Fredholm determinant . For ( A + γ I ) ∈ T r X ( H ) , γ 6 = 0 , its extended Fredholm determinant is deﬁned to be det X ( A + γ I ) = 1 γ det  A γ + I  , where the determinant on the right hand side is the Fredholm determinant. For γ = 1 , we recov er the Fredholm determinant. In the case dim( H ) < ∞ , we deﬁne det X ( A + γ I ) = det( A + γ I ) , the standard matrix determinant. 10 Positi ve deﬁnite unitized trace class operators . Having deﬁned both positiv e deﬁnite operators and extended trace class operators, the set of positi ve deﬁnite unitized trace class operators PT r( H ) ⊂ T r X ( H ) is then deﬁned to be the intersection PT r( H ) = Sym( H ) ∩ P ( H ) = { A + γ I > 0 : A ∗ = A, A ∈ T r( H ) γ ∈ R } . Exponential, logarithm, and po wer functions . Consider the exponential function exp : L ( H ) → L ( H ) deﬁned by exp( A ) = ∞ X j =0 A j j ! . The following result sho ws that exp maps T r X ( H ) to T r X ( H ) . Lemma 1. Let ( A + γ I ) ∈ T r X ( H ) . Then exp( A + γ I ) ∈ T r X ( H ) . Consider next the inv erse function log = exp − 1 : L ( H ) → L ( H ) . For any ( A + γ I ) ∈ PT r( H ) , log( A + γ I ) is always well-deﬁned as follows. Let { λ k } ∞ k =1 be the eigen v alues of A with corresponding orthonormal eigen vectors { φ k } ∞ k =1 . Then A = ∞ X k =1 λ k φ k ⊗ φ k , log( A + γ I ) = ∞ X k =1 log( λ k + γ ) φ k ⊗ φ k , (26) where φ k ⊗ φ k : H → H is a rank-one operator deﬁned by ( φ k ⊗ φ k ) w = h φ k , w i φ k ∀ w ∈ H . Moreover , log ( A + γ I ) ∈ Sym( H ) ∩ T r X ( H ) and assumes the form log( A + γ I ) = A 1 + γ 1 I , A 1 ∈ Sym( H ) ∩ T r( H ) , γ 1 ∈ R . By Proposition 6 in [17], for any α ∈ R , the power function ( A + γ I ) α is then well- deﬁned via the expression ( A + γ I ) α = exp[ α log( A + γ I )] ∈ PT r( H ) . For the purposes of the current w ork, we need to go be yond the set PT r( H ) . Specif- ically , for two operators ( A + γ I ) , ( B + µI ) ∈ PT r( H ) , we sho w that log[( A + γ I )( B + µI ) − 1 ] , [( A + γ I )( B + µI ) − 1 ] α , α ∈ R (27) are all well-deﬁned and are elements of T r X ( H ) , even though they are no longer nec- essarily self-adjoint. 11 First, let B ∈ L ( H ) be an y in vertible operator , then for any A ∈ L ( H ) , we hav e exp( B AB − 1 ) = ∞ X j =0 ( B AB − 1 ) j j ! = B   ∞ X j =0 A j j !   B − 1 = B exp( A ) B − 1 . Thus for ( A + γ I ) ∈ PT r( H ) , the logarithm of B ( A + γ I ) B − 1 = B AB − 1 + γ I ∈ T r X ( H ) is also well-deﬁned and is gi ven by log[ B ( A + γ I ) B − 1 ] = B log( A + γ I ) B − 1 = B ( A 1 + γ 1 I ) B − 1 = B A 1 B − 1 + γ 1 I ∈ T r X ( H ) . (28) Using Eq. (28), we obtain the following results. Proposition 1. Let ( A + γ I ) , ( B + µI ) ∈ PT r( H ) . Let Λ + γ µ I = ( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 . Then 1. The lo garithm log[( A + γ I )( B + µI ) − 1 ] ∈ T r X ( H ) is well-deﬁned and is given by log[( A + γ I )( B + µI ) − 1 ] = ( B + µI ) 1 / 2 log  Λ + γ µ I  ( B + µI ) − 1 / 2 . (29) 2. F or any α ∈ R , the power function [( A + γ I )( B + µI ) − 1 ] α ∈ T r X ( H ) is well-deﬁned and is given by [( A + γ I )( B + µI ) − 1 ] α = ( B + µI ) 1 / 2  Λ + γ µ I  α ( B + µI ) − 1 / 2 . (30) 3. F or any p, q ∈ R , any α, β ∈ R such that α + β 6 = 0 , det X  α [( A + γ I )( B + µI ) − 1 ] p + β [( A + γ I )( B + µI ) − 1 ] q α + β  =det X " α (Λ + γ µ I ) p + β (Λ + γ µ I ) q α + β # . (31) 4. Inﬁnite-Dimensional Alpha-Beta Log-Determinant diver gences W e no w show the motiv ations and deriv ations leading to Deﬁnition 1. W e recall that in the case dim( H ) < ∞ , the Log-Det di vergences were motiv ated by K y Fan’ s 12 inequality [21] on the log-concavity of the determinant, which states that for A, B ∈ Sym ++ ( n ) , det( αA + (1 − α ) B ) ≥ det( A ) α det( B ) 1 − α , 0 ≤ α ≤ 1 , with equality if and only if A = B ( 0 < α < 1 ). This inequality has recently been generalized to the inﬁnite-dimensional setting for the extended Fredholm determinant (Theorem 1 in [17]). The following is a further generalization of Theorem 1 in [17]. Theorem 5. Let 0 ≤ α ≤ 1 . F or ( A + γ I ) , ( B + µI ) ∈ PT r( H ) , for any p, q ∈ R , det X [ α ( A + γ I ) p + (1 − α )( B + µI ) q ] ≥  γ p µ q  α − δ det X ( A + γ I ) pδ det X ( B + µI ) q (1 − δ ) , (32) wher e δ = αγ p αγ p +(1 − α ) µ q , 1 − δ = (1 − α ) µ q αγ p +(1 − α ) µ q . F or 0 < α < 1 , equality happens if and only if  A γ + I  p =  B µ + I  q and γ p = µ q ⇐ ⇒ ( A + γ I ) p = ( B + µI ) q . (33) In particular , for γ = µ 6 = 1 , equality happens if and only if simultaneously p = q and A = B . (34) In particular, for p = q = 1 , we recover Theorem 1 in [17]. From Theorem 5, we immediately hav e the following result. Corollary 1. Let α > 0 , β > 0 . F or ( A + γ I ) , ( B + µI ) ∈ PT r( H ) , for any p, q ∈ R , det X  α ( A + γ I ) p + β ( B + µI ) q α + β  ≥  γ p µ q  α α + β − δ det X ( A + γ I ) pδ det X ( B + µI ) q (1 − δ ) , (35) wher e δ = αγ p αγ p + β µ q , 1 − δ = β µ q αγ p + β µ q . Equality happens if and only if ( A + γ I ) p = ( B + µI ) q . F or γ = µ 6 = 1 , equality happens if and only if simultaneously p = q and A = B . Motiv ated by Theorem 5 and Corollary 1, we ﬁrst deﬁne the following quantity . 13 Deﬁnition 4. Let α > 0 , β > 0 be ﬁxed. F or ( A + γ I ) , ( B + µI ) ∈ PT r( H ) , for p, q ∈ R , deﬁne D ( α,β ) ( p,q ) [( A + γ I ) , ( B + µI )] = 1 αβ log "  γ µ  ( p + q )( δ − α α + β ) det X α (Λ + γ µ I ) p + β (Λ + γ µ I ) − q α + β !# , (36) wher e Λ + γ µ I = ( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 , δ = α ( γ µ ) p + q α ( γ µ ) p + q + β . The following theorem giv es sufﬁcient conditions for p, q ∈ R , with α > 0 , β > 0 being ﬁxed, so that for a giv en pair of operators ( A + γ I ) , ( B + µI ) ∈ PT r( H ) , the quantity D ( α,β ) ( p,q ) [( A + γ I ) , ( B + µI )] in Deﬁnition 4 is nonnegativ e, with equality if and only if A = B and γ = µ . Theorem 6. Let α > 0 , β > 0 be ﬁxed. F or ( A + γ I ) , ( B + µI ) ∈ PT r( H ) , assume that p, q ∈ R satisfy the following conditions p + q 6 = 0 , (37) αp  γ µ  p + q = β q . (38) Then the quantity D ( α,β ) ( p,q ) [( A + γ I ) , ( B + µI )] satisﬁes D ( α,β ) ( p,q ) [( A + γ I ) , ( B + µI )] ≥ 0 , (39) D ( α,β ) ( p,q ) [( A + γ I ) , ( B + µI )] = 0 ⇐ ⇒ A = B , γ = µ. (40) Subsequently , we assume that conditions (37) and (38) are satisﬁed. W e see that p and q are not uniquely determined by (38). One way to enforce the uniqueness of p and q is by ﬁxing the sum p + q . This is the approach we adopt in this work, which leads to Deﬁnition 1. Theorem 7. Under the hypothesis of Theor em 6, assume further that p + q = r , r ∈ R , r 6 = 0 , r ﬁxed. Under this condition, in Deﬁnition 4, we have δ = α ( γ µ ) r α ( γ µ ) r + β , p = r (1 − δ ) = β r α ( γ µ ) r + β , q = rδ = αr ( γ µ ) r α ( γ µ ) r + β . (41) 14 Plugging the expr essions for p and q in Eq. (41) into Deﬁnition 4, we obtain Deﬁni- tion 1. Furthermor e, the two formulas given in Eqs. (8) and (9) in Deﬁnition 1 are equivalent. W e now sho w how D ( α,β ) ( p,q ) [( A + γ I ) , ( B + µI )] can be expressed concretely in terms of the Fredholm determinant. Theorem 8. Let α > 0 , β > 0 be ﬁxed. F or ( A + γ I ) , ( B + µI ) ∈ PT r( H ) , assume that p, q ∈ R satisfy conditions (37) and (38) in Theorem 6. Then D ( α,β ) ( p,q ) [( A + γ I ) , ( B + µI )] = ( p + q )( δ − α α + β ) αβ  log γ µ  (42) + 1 αβ log α ( γ µ ) p + β ( γ µ ) − q α + β ! + 1 αβ log det " α (Λ + γ µ I ) p + β (Λ + γ µ I ) − q α ( γ µ ) p + β ( γ µ ) − q # . 5. Special cases of the Alpha-Beta Log-Determinant diver gences W e now describe sev eral important special cases of Deﬁnition 1, including the inﬁnite-dimensional af ﬁne-in v ariant Riemannian distance, the inﬁnite-dimensional Al- pha Log-Det div ergences [17], and the inﬁnite-dimensional Beta Log-Det di ver gences. 5.1. Afﬁne-in variant Riemannian distance Let HS( H ) denote the space of Hilbert-Schmidt operators on H , which is deﬁned by HS( H ) = { A ∈ L ( H ) : || A || 2 HS = tr( A ∗ A ) < ∞} , where || || HS is the Hilbert-Schmidt norm. If A is Hilbert-Schmidt, then A is compact and possesses a countable set of eigen v alues { λ k } ∞ k =1 . If A is furthermore self-adjoint, then the Hilbert-Schmidt norm of A is gi ven by || A || 2 HS = ∞ X k =1 λ 2 k . W e recall the inﬁnite-dimensional Hilbert manifold of positi ve deﬁnite unitized Hilbert- Schmidt operators on H , considered in [18] Σ( H ) = { A + γ I > 0 : A = A ∗ , A ∈ HS( H ) , γ ∈ R } . 15 In the case dim( H ) = ∞ , the set PT r( H ) of positive deﬁnite unitized trace class operators on H is a strict subset of Σ( H ) . The manifold Σ( H ) can be equipped with the following Riemannian metric, as formulated by [18]. For each P ∈ Σ( H ) , on the tangent space T P (Σ( H )) ∼ = H R = { A + γ I : A = A ∗ , A ∈ HS( H ) , γ ∈ R } , we deﬁne the following inner product h A + γ I , B + µI i P = h P − 1 / 2 ( A + γ I ) P − 1 / 2 , P − 1 / 2 ( B + µI ) P − 1 / 2 i eHS , where h , i eHS is the extended Hilbert-Schmidt inner product, deﬁned by h A + γ I , B + µI i eHS = h A, B i HS + γ µ. The Riemannian metric giv en by h , i P then makes Σ( H ) an inﬁnite-dimensional Riemannian manifold. Under this metric, the geodesic distance between ( A + γ I ) , ( B + µI ) is gi ven by d aiHS [( A + γ I ) , ( B + µI )] = || log[( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 ] || eHS . (43) W e now show that the af ﬁne-in variant distance d aiHS [( A + γ I ) , ( B + µI )] is a limiting case of D ( α,β ) r [( A + γ I ) , ( B + µI )] , as α → 0 , β → 0 . In this section, we consider β = α , in which case Deﬁnition 1 reduces to the following. Deﬁnition 5. In Deﬁnition 1, with α = β , we have D ( α,α ) r [( A + γ I ) , ( B + µI )] = 1 α 2 log "  γ µ  r ( δ − 1 2 ) det X (Λ + γ µ I ) r (1 − δ ) + (Λ + γ µ I ) − rδ 2 !# , (44) wher e δ = ( γ µ ) r ( γ µ ) r +1 , 1 − δ = 1 ( γ µ ) r +1 . By Theorem 8, we have the following formula, which expresses D ( α,α ) r [( A + γ I ) , ( B + µI )] concretely in terms of the Fredholm determinant. D ( α,α ) r [( A + γ I ) , ( B + µI )] = r ( δ − 1 2 ) α 2 log  γ µ  + 1 α 2 log ( γ µ ) p + ( γ µ ) − q 2 ! + 1 α 2 log det " (Λ + γ µ I ) p + (Λ + γ µ I ) − q ( γ µ ) p + ( γ µ ) − q # , (45) 16 where δ = ( γ µ ) r ( γ µ ) r +1 , 1 − δ = 1 ( γ µ ) r +1 , p = r (1 − δ ) , q = r δ . The following is the main result in this section. Theorem 9 ( Afﬁne-In variant Riemannian Distance ) . Let ( A + γ I ) , ( B + µI ) ∈ PT r( H ) . Assume that r = r ( α ) is smooth, with r (0) = 0 , r 0 (0) 6 = 0 , and r ( α ) 6 = 0 for α 6 = 0 . Then lim α → 0 D ( α,α ) r [( A + γ I ) , ( B + µI )] = [ r 0 (0)] 2 8 d 2 aiHS [( A + γ I ) , ( B + µI )] . (46) In particular , for r = 2 α , we have lim α → 0 D ( α,α ) 2 α [( A + γ I ) , ( B + µI )] = 1 2 d 2 aiHS [( A + γ I ) , ( B + µI )] . (47) Remark 8 . W e stress that, as the y are currently stated, the limits in Theorem 9 are v alid for ( A + γ I ) , ( B + µI ) ∈ PT r( H ) , that is A and B must be trace class operators. The generalization of Theorem 9 to the entire Hilbert manifold Σ( H ) , where A and B are Hilbert-Schmidt operators, will be presented in an upcoming work. 5.2. Inﬁnite-dimensional Alpha Log-Determinant diver gences W e no w sho w that the formulation for the inﬁnite-dimensional Alpha Log-Determinant div ergences in [17] is a special case of the present formulation, with β = 1 − α and r = ± 1 . Let dim( H ) = ∞ . W e recall that for − 1 < α < 1 , the Log-Det α -div ergence d α logdet [( A + γ I ) , ( B + µI )] for ( A + γ I ) , ( B + µI ) ∈ PT r( H ) is deﬁned in [17] to be d α logdet [( A + γ I ) , ( B + µI )] = 4 1 − α 2 log " det X  1 − α 2 ( A + γ I ) + 1+ α 2 ( B + µI )  det X ( A + γ I ) q det X ( B + µI ) 1 − q  γ µ  q − 1 − α 2 # , (48) where q = (1 − α ) γ (1 − α ) γ +(1+ α ) µ and 1 − q = (1+ α ) µ (1 − α ) γ +(1+ α ) µ , with the limiting cases α = ± 1 giv en by d 1 logdet [( A + γ I ) , ( B + µI )] =  γ µ − 1  log γ µ + tr X [( B + µI ) − 1 ( A + γ I ) − I ] − γ µ log det X [( B + µI ) − 1 ( A + γ I )] . (49) d − 1 logdet [( A + γ I ) , ( B + µI )] =  µ γ − 1  log µ γ + tr X  ( A + γ I ) − 1 ( B + µI ) − I  − µ γ log det X [( A + γ I ) − 1 ( B + µI )] . (50) 17 Deﬁnition 6. In Deﬁnition 1, with 0 < α < 1 and β = 1 − α , we have D ( α, 1 − α ) r [( A + γ I ) , ( B + µI )] (51) = 1 α (1 − α ) log "  γ µ  r ( δ − α ) det X α  Λ + γ µ I  r (1 − δ ) + (1 − α )  Λ + γ µ I  − rδ !# . wher e δ = α ( γ µ ) r α ( γ µ ) r +1 − α , 1 − δ = 1 − α α ( γ µ ) r +1 − α . The follo wing result sho ws that D ( α, 1 − α ) r [( A + γ I ) , ( B + µI )] for the cases r = ± 1 are precisely d 1 − 2 α logdet [( A + γ I ) , ( B + µI )] and d 2 α − 1 logdet [( A + γ I ) , ( B + µI )] , respecti vely . Theorem 10 ( Alpha Log-Determinant Divergences ) . Let 0 < α < 1 be ﬁxed. F or ( A + γ I ) , ( B + µI ) ∈ PT r( H ) , D ( α, 1 − α ) 1 [( A + γ I ) , ( B + µI )] = δ − α α (1 − α ) log γ µ + 1 α (1 − α ) log  det X [ α ( A + γ I ) + (1 − α )( B + µI )] det X ( A + γ I ) δ det X ( B + µI ) 1 − δ  (52) = d 1 − 2 α logdet [( A + γ I ) , ( B + µI )] , wher e δ = αγ αγ +(1 − α ) µ . Similarly , D ( α, 1 − α ) − 1 [( A + γ I ) , ( B + µI )] = d 2 α − 1 logdet [( A + γ I ) , ( B + µI )] . (53) At the endpoints α = 0 and α = 1 , lim α → 1 D ( α, 1 − α ) 1 [( A + γ I ) , ( B + µI )] = d − 1 logdet [( A + γ I ) , ( B + µI )] (54) lim α → 0 D ( α, 1 − α ) 1 [( A + γ I ) , ( B + µI )] = d 1 logdet [( A + γ I ) , ( B + µI )] . (55) In particular , in Theorem 10, for γ = µ , we have δ = α , and D ( α, 1 − α ) 1 [( A + γ I ) , ( B + γ I )] = 1 α (1 − α ) log  det X [ α ( A + γ I ) + (1 − α )( B + γ I )] det X ( A + γ I ) α det X ( B + γ I ) 1 − α  . (56) This is the direct generalization of the ﬁnite-dimensional formula giv en by Eq. (6) in [14]. Remark 9 ( Beta Log-Determinant Diver gences ) . In the ﬁnite-dimensional setting in [15], the authors call D 1 ,β ( A, B ) the Beta Log-Determinant div ergence between 18 A, B ∈ Sym ++ ( n ) . Similarly , in the case dim( H ) = ∞ , let β > 0 be ﬁxed and let r ∈ R , r 6 = 0 be ﬁxed. For ( A + γ I ) , ( B + µI ) ∈ PT r( H ) , we then have the corresponding inﬁnite-dimensional Beta Log-Determinant div ergence D (1 ,β ) r [( A + γ I ) , ( B + µI )] = 1 β log "  γ µ  r ( δ − 1 1+ β ) det X (Λ + γ µ I ) r (1 − δ ) + β (Λ + γ µ I ) − rδ 1 + β !# , (57) where Λ + γ µ I = ( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 , δ = ( γ µ ) r ( γ µ ) r + β , 1 − δ = β ( γ µ ) r + β . Howe ver , we do not explore this di ver gence in detail in this work. 5.3. Other limiting cases W e consider next two other limiting cases, namely β → 0 when α > 0 is ﬁx ed, and α → 0 when β > 0 is ﬁx ed. In particular , our deﬁnitions of D ( α, 0) r [( A + γ I ) , ( B + µI )] , α > 0 , and D (0 ,β ) r [( A + γ I ) , ( B + µI )] , β > 0 , as given in Deﬁnition 2, are based on the respectiv e limits in Theorems 11 and 12 belo w . Theorem 11 ( Liming case α > 0 , β → 0 ) . Let α > 0 be ﬁxed. Assume that r = r ( β ) is smooth, with r (0) = r ( β = 0) . Then lim β → 0 D ( α,β ) r [( A + γ I ) , ( B + µI )] = r (0) α 2 "  µ γ  r (0) − 1 # log µ γ (58) + 1 α 2 tr X ([( A + γ I ) − 1 ( B + µI )] r (0) − I ) − 1 α 2  µ γ  r (0) log det X [( A + γ I ) − 1 ( B + µI )] r (0) . Theorem 12 ( Limit case α → 0 , β > 0 ) . Let β > 0 be ﬁxed. Assume that r = r ( α ) is smooth, with r (0) = r ( α = 0) . Then lim α → 0 D ( α,β ) r [( A + γ I ) , ( B + µI )] = r (0) β 2 "  γ µ  r (0) − 1 # log γ µ (59) + 1 β 2 tr X ([( B + µI ) − 1 ( A + γ I )] r (0) − I ) − 1 β 2  γ µ  r (0) log det X [( B + µI ) − 1 ( A + γ I )] r (0) . 19 Special cases . Let us now describe several special cases of Theorems 11 and 12, including their specialization to the ﬁnite-dimensional setting. (i) For γ = µ , we hav e lim β → 0 D ( α,β ) r [( A + γ I ) , ( B + γ I )] = 1 α 2 tr X ([( A + γ I ) − 1 ( B + γ I )] r (0) − I ) − 1 α 2 log det X [( A + γ I ) − 1 ( B + γ I )] r (0) , (60) lim α → 0 D ( α,β ) r [( A + γ I ) , ( B + γ I )] = 1 β 2 tr X ([( B + γ I ) − 1 ( A + γ I )] r (0) − I ) − 1 β 2 log det X [( B + γ I ) − 1 ( A + γ I )] r (0) . (61) In particular , for r = α + β , we have r ( β = 0) = α , r ( α = 0) = β , so that lim β → 0 D ( α,β ) α + β [( A + γ I ) , ( B + γ I )] (62) = 1 α 2  tr X ([( A + γ I ) − 1 ( B + γ I )] α − I ) − α log det X [( A + γ I ) − 1 ( B + γ I )]  , lim α → 0 D ( α,β ) α + β [( A + γ I ) , ( B + γ I )] (63) = 1 β 2  tr X ([( B + γ I ) − 1 ( A + γ I )] β − I ) − β log det X [( B + γ I ) − 1 ( A + γ I )]  . These are the direct generalizations of the corresponding formulas in the ﬁnite-dimensional setting. In fact, for A, B ∈ Sym ++ ( n ) , n ∈ N , by setting γ = 0 , we obtain lim β → 0 D ( α,β ) α + β [ A, B ] = 1 α 2  tr([( A − 1 B ) α − I ) − α log det( A − 1 B )  , (64) lim α → 0 D ( α,β ) α + β [ A, B ] = 1 β 2  tr([( B − 1 A ) β − I ) − β log det( B − 1 A )  . (65) These are precisely the ﬁnite-dimensional expressions gi ven by Eqs. (5) and 6, which are Eqs. (23) and (22) in [15], respectiv ely . (ii) If r (0) = r ( β = 0) = 1 , we hav e for α > 0 ﬁxed, lim β → 0 D ( α,β ) r [( A + γ I ) , ( B + µI )] = 1 α 2  µ γ − 1  log µ γ + 1 α 2  tr X [( A + γ I ) − 1 ( B + µI ) − I ] − µ γ log det X [( A + γ I ) − 1 ( B + µI )]  = 1 α 2 d − 1 logdet [( A + γ I ) , ( B + µI )] . (66) 20 Similarly , if r (0) = r ( α = 0) = 1 , we have for β > 0 ﬁxed, lim α → 0 D ( α,β ) r [( A + γ I ) , ( B + µI )] = 1 β 2  γ µ − 1  log γ µ + 1 β 2  tr X [( B + µI ) − 1 ( A + γ I ) − I ] − γ µ log det X [( B + µI ) − 1 ( A + γ I )]  = 1 β 2 d 1 logdet [( A + γ I ) , ( B + µI )] . (67) In particular , if r ≡ 1 as a constant function, then with β = 1 − α , we have lim α → 1 D ( α, 1 − α ) 1 [( A + γ I ) , ( B + µI )] = d − 1 logdet [( A + γ I ) , ( B + µI )] lim α → 0 D ( α, 1 − α ) 1 [( A + γ I ) , ( B + µI )] = d 1 logdet [( A + γ I ) , ( B + µI )] , which are precisely the limiting cases stated in Eqs. (54) and (55) in Theorem 10. 6. Properties of the Alpha-Beta Log-Determinant di vergences The follo wing results establish se veral important results of D ( α,β ) r as deﬁned above, which generalize those from both the ﬁnite-dimensional setting [14, 15] and the inﬁnite- dimensional Alpha Log-Det div ergences [17]. Theorem 13 ( Dual symmetry ) . D ( β ,α ) r [( B + µI ) , ( A + γ I )] = D ( α,β ) r [( A + γ I ) , ( B + µI )] . (68) In particular , for β = α , we have D ( α,α ) r [( B + µI ) , ( A + γ I )] = D ( α,α ) r [( A + γ I ) , ( B + µI )] . (69) Special case: Dual symmetry of the inﬁnite-dimensional Alpha Log-Det diver - gences . By Theorem 10, we have for 0 ≤ α ≤ 1 , D ( α, 1 − α ) 1 [( A + γ I ) , ( B + µI )] = D (1 − α,α ) 1 [( B + µI ) , ( A + γ I )] ⇐ ⇒ d 1 − 2 α logdet [( A + γ I ) , ( B + µI )] = d − (1 − 2 α ) logdet [( B + µI ) , ( A + γ I )] . (70) This is precisely the dual symmetry of the inﬁnite-dimensional Alpha Log-Det di ver - gences (Theorem 4 in [17]). 21 Theorem 14 ( Dual in variance under in version ) . D ( α,β ) r [( A + γ I ) − 1 , ( B + µI ) − 1 ] = D ( α,β ) − r [( A + γ I ) , ( B + µI )] (71) Special case: Dual in variance under inv ersion of the inﬁnite-dimensional Al- pha Log-Det diver gences . By Theorem 10, we hav e D ( α, 1 − α ) 1 [( A + γ I ) − 1 , ( B + µI ) − 1 ] = D ( α, 1 − α ) − 1 [( A + γ I ) , ( B + µI )] ⇐ ⇒ d 1 − 2 α logdet [( A + γ I ) − 1 , ( B + µI ) − 1 ] = d − (1 − 2 α ) logdet [( A + γ I ) , ( B + µI )] . (72) This is precisely the dual inv ariance under inv ersion of the inﬁnite-dimensional Alpha Log-Det div ergences (Theorem 5 in [17]). Theorem 15 ( Afﬁne inv ariance ) . F or any ( A + γ I ) , ( B + µI ) ∈ PT r( H ) and any in vertible ( C + ν I ) ∈ T r X ( H ) , ν 6 = 0 , D ( α,β ) r [( C + ν I )( A + γ I )( C + ν I ) ∗ , ( C + ν I )( B + µI )( C + ν I ) ∗ ] = D ( α,β ) r [( A + γ I ) , ( B + µI )] . (73) Theorem 16 ( In variance under unitary transformations ) . F or any ( A + γ I ) , ( B + µI ) ∈ PT r( H ) and any C ∈ L ( H ) , with C C ∗ = C ∗ C = I , D ( α,β ) r [ C ( A + γ I ) C ∗ , C ( B + µI ) C ∗ ] = D ( α,β ) r [( A + γ I ) , ( B + µI )] . (74) Theorem 17. D ( α,β ) r [( A + γ I ) , ( B + µI )] = D ( α,β ) r  Λ + γ µ I  , I  . (75) Theorem 18. Let ω ∈ R , ω 6 = 0 be arbitrary . Then D ( ωα,ω β ) ωr [( A + γ I ) , ( B + µI )] = 1 ω 2 D ( α,β ) r  Λ + γ µ I  ω , I  . (76) The follo wing two properties are important for pro ving that the square root function q D ( α,α ) 2 α [( A + γ I ) , ( B + γ I )] , is a metric on PT r( H )( γ ) . W e focus on the case α > 0 , since for α = 0 , q D ( α,α ) 2 α [( A + γ I ) , ( B + µI )] = 1 √ 2 d aiHS [( A + γ I ) , ( B + µI )] is automatically a metric on PT r( H ) . 22 Theorem 19 ( Con vergence in trace norm ) . Let α > 0 be ﬁxed. Let H be a separ able Hilbert space. Let A, B : H → H be self-adjoint, trace class operators such that ( I + A ) > 0 , ( I + B ) > 0 . Let { A n } n ∈ N , { B n } n ∈ N be sequences of self-adjoint, trace-class operator s such that lim n →∞ || A n − A || tr = 0 , lim n →∞ || B n − B || tr = 0 . Then lim n →∞ D ( α,α ) 2 α [( I + A n ) , ( I + B n )] = D ( α,α ) 2 α [( I + A ) , ( I + B )] . (77) Theorem 20 ( T riangle inequality ) . Let α > 0 be ﬁxed. Let H be a separable Hilbert space. Let γ > 0 , γ ∈ R be ﬁxed. Let A, B , C : H → H be self-adjoint, trace class operators suc h that ( A + γ I ) > 0 , ( B + γ I ) > 0 , ( C + γ I ) > 0 . Then q D ( α,α ) 2 α [( A + γ I ) , ( B + γ I )] ≤ q D ( α,α ) 2 α [( A + γ I ) , ( C + γ I )] + q D ( α,α ) 2 α [( C + γ I ) , ( B + γ I )] . (78) In particular , for α = 1 / 2 and γ = 1 , we obtain the following triangle inequality . Theorem 21 ( T riangle inequality- square root of symmetric Stein div ergence ) . Let H be a separable Hilbert space. Let A, B , C : H → H be self-adjoint trace-class operators with A + I > 0 , B + I > 0 , C + I > 0 . Then s log det( A + B 2 + I ) p det( A + I ) det( B + I ) ≤ s log det( A + C 2 + I ) p det( A + I ) det( C + I ) + s log det( C + B 2 + I ) p det( C + I ) det( B + I ) . (79) Theorem 22 ( Diagonalization ) . Let α ≥ 0 be ﬁxed. Let H be a separable Hilbert space. Let γ > 0 , γ ∈ R , be ﬁxed. Let A, B : H → H be self-adjoint trace class operators, such that A + γ I > 0 , B + γ I > 0 . Let Eig( A ) , Eig ( B ) : ` 2 → ` 2 be diagonal operators with the diagonals consisting of the eigen values of A and B , r espectively , in decreasing or der . Then D ( α,α ) 2 α [(Eig( A ) + γ I ) , (Eig( B ) + γ I )] ≤ D ( α,α ) 2 α [( A + γ I ) , ( B + γ I )] . (80) 7. Alpha-Beta Log-Det diver gences between RKHS covariance operators Let X be an arbitrary non-empty set. W e no w compute the Alpha-Beta Log-Det div ergences between cov ariance operators on an RKHS induced by a positi ve deﬁnite 23 kernel K on X × X . In this case, we have explicit formulas for D ( α,β ) r via the cor- responding Gram matrices. W e recall that similar formulas exist in the cases of the Log-Hilbert-Schmidt distance [19], the inﬁnite-dimensional afﬁne-in variant Rieman- nian distance [18, 20], and the inﬁnite-dimensional Alpha Log-Det div ergences [17]. W e ﬁrst prove the follo wing result. Theorem 23. Let H 1 , H 2 be separable Hilbert spaces. Let A, B : H 1 → H 2 be compact linear operators such that both AA ∗ : H 2 → H 2 and B B ∗ : H 2 → H 2 ar e trace class operators. Assume that dim( H 2 ) = ∞ . Let α, β > 0 be ﬁxed. F or any r ∈ R , r 6 = 0 , for any γ > 0 , µ > 0 , D ( α,β ) r [( AA ∗ + γ I H 2 ) , ( B B ∗ + µI H 2 )] (81) = r ( δ − α α + β ) αβ  log γ µ  + 1 αβ log α ( γ µ ) p + β ( γ µ ) − q α + β ! + 1 αβ log det " α ( γ µ ) p ( C + I H 1 ⊗ I 3 ) p + β ( γ µ ) − q ( C + I H 1 ⊗ I 3 ) − q α ( γ µ ) p + β ( γ µ ) − q # , wher e δ = αγ r αγ r + β µ r , p = r (1 − δ ) , q = r δ , and C =      A ∗ A γ − A ∗ B √ γ µ ( I H 1 + B ∗ B µ ) − 1 − A ∗ AA ∗ B γ √ γ µ ( I H 1 + B ∗ B µ ) − 1 B ∗ A √ γ µ − B ∗ B µ ( I H 1 + B ∗ B µ ) − 1 − B ∗ AA ∗ B γ µ ( I H 1 + B ∗ B µ ) − 1 B ∗ A √ γ µ − B ∗ B µ ( I H 1 + B ∗ B µ ) − 1 − B ∗ AA ∗ B γ µ ( I H 1 + B ∗ B µ ) − 1      . (82) For comparison, the following is the corresponding version of D ( α,β ) r [( AA ∗ + γ I H 2 ) , ( B B ∗ + µI H 2 )] , using the ﬁnite-dimensional formula given in Eq. (19), when dim( H 2 ) < ∞ . Theorem 24. Let H 1 , H 2 be separable Hilbert spaces. Let A, B : H 1 → H 2 be compact linear operators such that both AA ∗ : H 2 → H 2 and B B ∗ : H 2 → H 2 ar e trace class operators. Assume that dim( H 2 ) < ∞ . Let α, β > 0 be ﬁxed. F or any 24 r ∈ R , r 6 = 0 , for any γ > 0 , µ > 0 , D ( α,β ) r [( AA ∗ + γ I H 2 ) , ( B B ∗ + µI H 2 )] (83) = 1 αβ " log α ( γ µ ) p + β ( γ µ ) − q α + β !# dim( H 2 ) + 1 αβ log det " α ( γ µ ) p ( C + I H 1 ⊗ I 3 ) p + β ( γ µ ) − q ( C + I H 1 ⊗ I 3 ) − q α ( γ µ ) p + β ( γ µ ) − q # , wher e p = r β α + β , q = r α α + β , and C is as given in Theor em 23. Let us brieﬂy recall the RKHS covariance operators discussed in [17]. Let x = [ x 1 , . . . , x m ] be a data matrix randomly sampled from X according to a Borel probabil- ity distrib ution ρ , where m ∈ N is the number of observ ations. Let K be a positi ve deﬁ- nite k ernel on X ×X and H K its induced reproducing k ernel Hilbert space (RKHS). Let Φ : X → H K be the corresponding feature map, so that K ( x, y ) = h Φ( x ) , Φ( y ) i H K for all pairs ( x, y ) ∈ X × X . The feature map Φ gives rise to the bounded linear operator Φ( x ) : R m → H K , Φ( x ) b = m X j =1 b j Φ( x j ) , b ∈ R m . (84) The operator Φ( x ) can also be viewed as the (potentially inﬁnite) mapped data matrix Φ( x ) = [Φ( x 1 ) , . . . , Φ( x m )] of size dim( H K ) × m in the feature space H K , with the j th column being Φ( x j ) . The corresponding empirical co variance operator for Φ( x ) is deﬁned to be C Φ( x ) = 1 m Φ( x ) J m Φ( x ) ∗ : H K → H K , (85) where Φ( x ) ∗ : H K → R m is the adjoint operator of Φ( x ) and J m is the centering matrix, deﬁned by J m = I m − 1 m 1 m 1 T m with 1 m = (1 , . . . , 1) T ∈ R m . Let x = [ x i ] m i =1 , y = [ y i ] m i =1 , m ∈ N , be two random data matrices sampled from X according to two Borel probability distributions and C Φ( x ) , C Φ( y ) be the corre- sponding cov ariance operators induced by the kernel K . Let K [ x ] , K [ y ] , and K [ x , y ] be the m × m Gram matrices deﬁned by ( K [ x ]) ij = K ( x i , x j ) , ( K [ y ]) ij = K ( y i , y j ) , ( K [ x , y ]) ij = K ( x i , y j ) , 1 ≤ i, j ≤ m. (86) 25 Let A = 1 √ m Φ( x ) J m : R m → H K , B = 1 √ m Φ( y ) J m : R m → H K , so that AA ∗ = C Φ( x ) , B B ∗ = C Φ( y ) , A ∗ A = 1 m J m K [ x ] J m , B ∗ B = 1 m J m K [ y ] J m , A ∗ B = 1 m J m K [ x , y ] J m , B ∗ A = 1 m J m K [ y , x ] J m . (87) Theorems 23 and 24 can then be applied to gi ve closed form formulas for the di- ver gences between ( C Φ( x ) + γ I ) and ( C Φ( y ) + µI ) , as follows. Theorem 25 ( Alpha-Beta Log-Det diver gences between RKHS covariance oper - ators - Inﬁnite-dimensional version ) . Let α, β > 0 be ﬁxed. Let r ∈ R , r 6 = 0 be ﬁxed. Assume that dim( H K ) = ∞ . F or any γ > 0 , µ > 0 , the diver gence D ( α,β ) r [( C Φ( x ) + γ I ) , ( C Φ( y ) + µI )] is given by D ( α,β ) r [( C Φ( x ) + γ I ) , ( C Φ( y ) + µI )] (88) = r ( δ − α α + β ) αβ  log γ µ  + 1 αβ log α ( γ µ ) p + β ( γ µ ) − q α + β ! + 1 αβ log det " α ( γ µ ) p ( C + I 3 m ) p + β ( γ µ ) − q ( C + I 3 m ) − q α ( γ µ ) p + β ( γ µ ) − q # , wher e δ = αγ r αγ r + β µ r , p = r (1 − δ ) , q = r δ , and C =      C 11 C 12 C 13 C 21 C 22 C 23 C 21 C 22 C 23      ∈ R 3 m × 3 m . (89) Her e the sub-matrices C ij , i = 1 , 2 , j = 1 , 2 , 3 , each of size m × m , ar e given by C 11 = 1 γ m J m K [ x ] J m , (90) C 12 = − 1 √ γ µm J m K [ x , y ] J m  I m + 1 µm J m K [ y ] J m  − 1 , (91) C 13 = − 1 γ √ γ µm 2 J m K [ x ] J m K [ x , y ] J m  I m + 1 µm J m K [ y ] J m  − 1 , (92) C 21 = 1 √ γ µm J m K [ y , x ] J m , (93) C 22 = − 1 µm J m K [ y ] J m  I m + 1 µm J m K [ y ] J m  − 1 , (94) C 23 = − 1 γ µm 2 J m K [ y , x ] J m K [ x , y ] J m  I m + 1 µm J m K [ y ] J m  − 1 . (95) 26 Theorem 26 ( Alpha-Beta Log-Det diver gences between RKHS covariance oper - ators - Finite-dimensional version ) . Let α, β > 0 be ﬁxed. Let r ∈ R , r 6 = 0 be ﬁxed. Assume that dim( H K ) < ∞ . F or any γ > 0 , µ > 0 , the diver gence D ( α,β ) r [( C Φ( x ) + γ I ) , ( C Φ( y ) + µI )] is given by D ( α,β ) r [( C Φ( x ) + γ I ) , ( C Φ( y ) + µI )] (96) = 1 αβ " log α ( γ µ ) p + β ( γ µ ) − q α + β !# dim( H K ) + 1 αβ log det " α ( γ µ ) p ( C + I 3 m ) p + β ( γ µ ) − q ( C + I 3 m ) − q α ( γ µ ) p + β ( γ µ ) − q # , wher e p = r β α + β , q = r α α + β , and C is as given in Theor em 25. Remark 10 . The closed form formulas for D ( α,β ) r [( C Φ( x ) + γ I ) , ( C Φ( y ) + µI )] given in Eqs. (88) and (96) in Theorems 25 and 26, respecti vely , coincide if and only if γ = µ . If γ 6 = µ , then the right hand side of Eq. (96) approaches inﬁnity when dim( H K ) → ∞ . Thus in general, the inﬁnite-dimensional version is not obtainable as the limit of the ﬁnite-dimensional version as the dimension goes to inﬁnity . Remark 11 . The closed form formulas giv en by Eqs. (88) and (96) in Theorems 25 and 26, respectiv ely , are derived under more general conditions than those in [17] and are consequently more general but more complicated than the corresponding closed form formulas for the Alpha Log-Det di vergences in [17] (see Theorems 12,13, 15, 16 in [17]). Thus for practical applications inv olving the Alpha Log-Det di ver gences, the corresponding closed form formulas in [17] should be employed. Appendix A. Proofs of main r esults Appendix A.1. Pr oofs for the gener al Alpha-Beta Log-Determinant diver gences In this section, we prov e Lemma 1, Proposition 1, and Theorems 5, 6, 7, and 8. Proof of Lemma 1 . Since any bounded operator A commutes with the identity opera- tor I , we ha ve exp( A + γ I ) = e γ exp( A ) = e γ   I + ∞ X j =1 A j j !   = e γ I + e γ ∞ X j =1 A j j ! , 27 where P ∞ j =1 A j j ! is trace class, since       ∞ X j =1 A j j !       tr ≤ ∞ X j =1 || A || j tr j ! = exp( || A || tr ) − 1 < ∞ . Thus exp( A + γ I ) ∈ T r X ( H ) . This completes the proof. Proof of Proposition 1 . For ( A + γ I ) , ( B + µI ) ∈ PT r( H ) , we hav e ( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 ∈ PT r( H ) and the logarithm log [( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 ] ∈ T r X ( H ) is well-deﬁned. By the discussion preceding Proposition 1, we hav e log[( A + γ I )( B + µI ) − 1 ] = log[( B + µI ) 1 / 2 ( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 ( B + µI ) − 1 / 2 ] = ( B + µI ) 1 / 2 log[( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 ]( B + µI ) − 1 / 2 = ( B + µI ) 1 / 2 log  Λ + γ µ I  ( B + µI ) − 1 / 2 ∈ T r X ( H ) . For the po wer function, we hav e [( A + γ I )( B + µI ) − 1 ] α = exp( α log[( A + γ I )( B + µI ) − 1 ]) = exp  ( B + µI ) 1 / 2 α log  Λ + γ µ I  ( B + µI ) − 1 / 2  = ( B + µI ) 1 / 2 exp  α log  Λ + γ µ I  ( B + µI ) − 1 / 2 = ( B + µI ) 1 / 2  Λ + γ µ I  α ( B + µI ) − 1 / 2 . For the sum of two po wer functions, we then have α [( A + γ I )( B + µI ) − 1 ] p + β [( A + γ I )( B + µI ) − 1 ] q α + β = ( B + µI ) 1 / 2 " α (Λ + γ µ I ) p + β (Λ + γ µ I ) q α + β # ( B + µI ) − 1 / 2 . By Lemma 5 in [17], det X [ C ( A + γ I ) C − 1 ] = det X ( A + γ I ) for any inv ertible 28 operator C ∈ L ( H ) . It follows that det X  α [( A + γ I )( B + µI ) − 1 ] p + β [( A + γ I )( B + µI ) − 1 ] q α + β  =det X " α (Λ + γ µ I ) p + β (Λ + γ µ I ) q α + β # . This completes the proof. Proof of Theorem 5 . By deﬁnition of the power function, we ha ve α ( A + γ I ) p + (1 − α )( B + µI ) q = α exp[ p log( A + γ I )] + (1 − α ) exp[ q log( B + µI )] = α exp  p log  A γ + I  + p (log γ ) I  + (1 − α ) exp  q log  B µ + I  + q (log µ ) I  = αγ p  A γ + I  p + (1 − α ) µ q  B µ + I  q . It follows that for δ = αγ p αγ p +(1 − α ) µ q , 1 − δ = (1 − α ) µ q αγ p +(1 − α ) µ q , we hav e det X [ α ( A + γ I ) p + (1 − α )( B + µI ) q ] = [ αγ p + (1 − α ) µ q ] det  αγ p αγ p + (1 − α ) µ q  A γ + I  p + (1 − α ) µ q αγ p + (1 − α ) µ q  B µ + I  q  ≥ [ α γ p + (1 − α ) µ q ] det  A γ + I  pδ det  B µ + I  q (1 − δ ) by Proposition 7 in [17] ≥ γ pα µ (1 − α ) q det  A γ + I  pδ det  B µ + I  q (1 − δ ) by Ky F an’ s Inequality applied to αγ p + (1 − α ) µ q = γ p ( α − δ ) µ − q ( α − δ ) det X ( A + γ I ) pδ det X ( B + µI ) q (1 − δ ) =  γ p µ q  α − δ det X ( A + γ I ) pδ det X ( B + µI ) q (1 − δ ) . For 0 < α < 1 , equality happens if and only if simultaneously , we hav e  A γ + I  p =  B µ + I  q and γ p = µ q ⇐ ⇒ ( A + γ I ) p = ( B + µI ) q . In particular , for γ = µ , the condition γ p = µ q becomes γ p = γ q ⇐ ⇒ γ p − q = 1 ⇐ ⇒ p = q if γ 6 = 1 . 29 W ith the conditions γ = µ 6 = 1 and p = q , we then hav e  A γ + I  p =  B γ + I  p ⇐ ⇒ A = B . This completes the proof of the theorem. Proof of Theorem 6 . Recall that we write the operator ( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 in the form ( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 = Λ + ( γ /µ ) I ∈ PT r( H ) . Its in verse has the form ( B + µI ) 1 / 2 ( A + γ I ) − 1 ( B + µI ) 1 / 2 = [Λ + ( γ /µ ) I ] − 1 = µ γ I −  µ γ  2 Λ  I + µ γ Λ  − 1 ∈ PT r( H ) . It follows from Corollary 1 that det X  α [(Λ + ( γ /µ ) I ] p + β [(Λ + ( γ /µ ) I ) − 1 ] q α + β  ≥  ( γ /µ ) p ( µ/γ ) q  α α + β − δ det X (Λ + ( γ /µ ) I ] pδ det X [(Λ + ( γ /µ ) I ] − q (1 − δ ) =  γ µ  ( p + q )( α α + β − δ ) det X (Λ + ( γ /µ ) I ] pδ det X [(Λ + ( γ /µ ) I ] − q (1 − δ ) , (A.1) where δ = α ( γ µ ) p α ( γ µ ) p + β ( µ γ ) q = α ( γ µ ) p + q α ( γ µ ) p + q + β , 1 − δ = β ( µ γ ) q α ( γ µ ) p + β ( µ γ ) q = β α ( γ µ ) p + q + β . For the two determinants on the right hand side of (A.1) to cancel each other out, we need pδ = q (1 − δ ) ⇐ ⇒ α p  γ µ  p = β q  µ γ  q ⇐ ⇒ αp  γ µ  p + q = β q . Assuming that this condition holds, then along with the deﬁnition of D ( α,β ) ( p,q ) , (A.1) giv es "  γ µ  ( p + q )( δ − α α + β ) det X α (Λ + γ µ I ) p + β (Λ + γ µ I ) − q α + β !# ≥ 1 ⇐ ⇒ D ( α,β ) ( p,q ) [( A + γ I ) , ( B + µI )] ≥ 0 . 30 In the inequality in (A.1), the equality sign happens if and only if [(Λ + ( γ /µ ) I ] p = [(Λ + ( γ /µ ) I ] − q ⇐ ⇒ [(Λ + ( γ /µ ) I ] p + q = I . If p + q = 0 , then this is always true, so that D ( α,β ) ( p,q ) [( A + γ I ) , ( B + µI )] = 0 for all pairs ( A + γ I ) , ( B + µI ) ∈ PT r( H ) , which is not what we want. In fact, with p + q = 0 , the condition αp  γ µ  p + q = β q gives ( α + β ) p = 0 ⇒ p = 0 ⇒ q = 0 . If p + q 6 = 0 , since Λ + ( γ /µ ) I > 0 , this happens if and only if Λ + ( γ /µ ) I = I ⇐ ⇒ ( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 = I ⇐ ⇒ A + γ I = B + µI ⇐ ⇒ A = B and γ = µ. This completes the proof. Proof of Theorem 7 . Under the condition p + q = r , by Theorem 6, we hav e αp  γ µ  r = β ( r − p ) ⇒ p = β r α  γ µ  r + β It follo ws then that q = r − p = rα ( γ µ ) r α ( γ µ ) r + β . The equiv alence of Eqs. (8) and (9) follo ws from Proposition 1. Proof of Theorem 8 . W e have α (Λ + γ µ I ) p + β (Λ + γ µ I ) − q α + β = α ( γ µ ) p ( µ γ Λ + I ) p + β ( γ µ ) − q ( µ γ Λ + I ) − q α + β = α ( γ µ ) p ( I + C 1 ) + β ( γ µ ) − q ( I + C 2 ) α + β = h α ( γ µ ) p + β ( γ µ ) − q i I + h α ( γ µ ) p C 1 + β ( γ µ ) − q C 2 i α + β = α ( γ µ ) p + β ( γ µ ) − q α + β " I + α ( γ µ ) p C 1 + β ( γ µ ) − q C 2 α ( γ µ ) p + β ( γ µ ) − q # , where C 1 = P ∞ k =1 p k k ! h log  µ γ Λ + I i k ∈ T r( H ) , C 2 = P ∞ k =1 ( − 1) k q k k ! h log  µ γ Λ + I i k ∈ 31 T r( H ) . By deﬁnition of the det X function, we then hav e log det X " α (Λ + γ µ I ) p + β (Λ + γ µ I ) − q ) α + β # = log α ( γ µ ) p + β ( γ µ ) − q α + β ! + log det " I + α ( γ µ ) p C 1 + β ( γ µ ) − q C 2 α ( γ µ ) p + β ( γ µ ) − q # = log α ( γ µ ) p + β ( γ µ ) − q α + β ! + log det " α (Λ + γ µ I ) p + β (Λ + γ µ I ) − q α ( γ µ ) p + β ( γ µ ) − q # . This, together with the deﬁnition of D ( α,β ) ( p,q ) , giv es us the desired expression. Appendix A.2. Pr oofs for the Af ﬁne-in variant Riemannian distance In this section, we prove Theorem 9. W e ﬁrst need the following preliminary re- sults. Lemma 2. Let γ > 0 . Assume that r = r ( α ) is smooth, with r (0) = 0 . Let δ = γ r γ r +1 . Then lim α → 0 r ( δ − 1 2 ) α 2 = [ r 0 (0)] 2 4 log γ . (A.2) In particular , for r = 2 α , we have lim α → 0 r ( δ − 1 2 ) α 2 = log γ . (A.3) Proof of Lemma 2 . By L ’Hopital’ s rule applied twice, we obtain lim α → 0 r ( δ − 1 2 ) α 2 = lim α → 0 r ( γ r − 1) 2 α 2 ( γ r + 1) = lim α → 0 r ( γ r − 1) 4 α 2 = lim α → 0 r 0 ( α )( γ r − 1) + r γ r r 0 ( α ) log γ 8 α = lim α → 0 r 00 ( α )( γ r − 1) + γ r ( r 0 ( α )) 2 log γ + γ r ( r 0 ( α )) 2 log γ 8 + lim α → 0 r γ r ( r 0 ( α ) log γ ) 2 + r γ r r 00 ( α ) log γ 8 = [ r 0 (0)] 2 log γ 4 . This completes the proof. 32 Lemma 3. Let γ > 0 be ﬁxed. Let λ > 0 be ﬁxed. Assume that r = r ( α ) is smooth, with r (0) = 0 . Deﬁne δ = γ r γ r +1 , p = r (1 − δ ) , q = r δ . Then lim α → 0 1 α 2 log  λ p + λ − q 2  = [ r 0 (0)] 2 4  − (log γ )(log λ ) + 1 2 (log λ ) 2  . (A.4) In particular , if γ = λ , then lim α → 0 1 α 2 log  γ p + γ − q 2  = − [ r 0 (0)] 2 8 (log γ ) 2 . (A.5) Proof of Lemma 3 . For p , q sufﬁciently small, λ p = e p log λ = 1 + p log λ + p 2 2 (log λ ) 2 + o ( p 3 ) , λ − q = e − q log λ = 1 − q log λ + q 2 2 (log λ ) 2 + o ( q 3 ) . Thus for α sufﬁciently small, so that p = o ( α ) , q = o ( α ) , we hav e λ p + λ − q 2 = 1 + p − q 2 log λ + p 2 + q 2 4 (log λ ) 2 + o ( p 3 , q 3 ) = 1 + r  1 2 − δ  (log λ ) + r 2 4  (1 − δ ) 2 + δ 2  (log λ ) 2 + o ( α 3 ) . By Lemma 2, we hav e lim α → 0 r  1 2 − δ  α 2 = − [ r 0 (0)] 2 4 log γ . W e have by L ’Hopital’ s rule lim α → 0 r 2 α 2 = lim α → 0 2 r r 0 ( α ) 2 α = lim α → 0 [ r 0 ( α )] 2 + r r 00 ( α ) = [ r 0 (0)] 2 . Since lim α → 0 δ = 1 2 , it follows then that lim α → 0 r 2 4 α 2 [(1 − δ ) 2 + δ 2 ] = [ r 0 (0)] 2 8 . Combining these limits with lim x → 0 log(1+ ax ) x = a , we obtain lim α → 0 1 α 2 log  λ p + λ − q 2  = [ r 0 (0)] 2 4  − (log γ )(log λ ) + 1 2 (log λ ) 2  . This completes the proof of the lemma. 33 Lemma 4. Let γ > 0 be ﬁxed. Let λ ∈ R be ﬁxed such that λ + γ > 0 . Assume that r = r ( α ) is smooth, with r (0) = 0 . Deﬁne δ = γ r γ r +1 , p = r (1 − δ ) , q = r δ . Then lim α → 0 1 α 2 log  ( λ + γ ) p + ( λ + γ ) − q γ p + γ − q  = [ r 0 (0)] 2 8 [log( λ + γ ) − log γ ] 2 = [ r 0 (0)] 2 8  log  λ γ + 1  2 . (A.6) In particular , for r = r ( α ) = 2 α , we have lim α → 0 1 α 2 log  ( λ + γ ) p + ( λ + γ ) − q γ p + γ − q  = 1 2 [log( λ + γ ) − log γ ] 2 = 1 2  log  λ γ + 1  2 . (A.7) Proof of Lemma 4 . W e have by Lemma 3 lim α → 0 1 α 2 log  ( λ + γ ) p + ( λ + γ ) − q γ p + γ − q  = lim α → 0 1 α 2 log  ( λ + γ ) p + ( λ + γ ) − q 2  − lim α → 0 1 α 2 log  γ p + γ − q 2  = [ r 0 (0)] 2 4  − (log γ )[log( λ + γ )] + 1 2 [log( λ + γ )] 2 − [ − 1 2 (log γ ) 2 ]  = [ r 0 (0)] 2 8 [log( λ + γ ) − log γ ] 2 = [ r 0 (0)] 2 8  log  λ γ + 1  2 . This completes the proof. Lemma 5. Let γ > 0 be ﬁxed. Let λ ∈ R be ﬁxed such that λ + γ > 0 . Assume that r = r ( α ) is smooth, with r (0) = 0 . Deﬁne δ = γ r γ r +1 , p = r (1 − δ ) , q = r δ . Then ( λ + γ ) p + ( λ + γ ) − q γ p + γ − q ≥ 1 , (A.8) log  ( λ + γ ) p + ( λ + γ ) − q γ p + γ − q  ≥ 0 . (A.9) 34 Proof of Lemma 5 . By Theorem 5, we hav e ( λ + γ ) p + ( λ + γ ) − q γ p + γ − q = γ p γ p + γ − q  λ γ + 1  p + γ − q γ p + γ − q  λ γ + 1  − q = α  λ γ + 1  p + (1 − α )  λ γ + 1  − q where α = γ p γ p + γ − q = γ p + q γ p + q + 1 = γ r γ r + 1 = δ ≥  λ γ + 1  pδ  λ γ + 1  − q (1 − δ ) =  λ γ + 1  ( p + q ) δ − q =  λ γ + 1  rδ − q = 1 , since q = rδ . This completes the proof. Proof of Theorem 9 . For α = β , we ha ve δ = ( γ µ ) r ( γ µ ) r + 1 , p = r (1 − δ ) , q = r δ. Let { λ j } j ∈ N be the eigen v alues of Λ . By Theorem 8, we hav e D ( α,α ) r [( A + γ I ) , ( B + µI )] = r ( δ − 1 2 ) α 2 log  γ µ  + 1 α 2 log ( γ µ ) p + ( γ µ ) − q 2 ! + 1 α 2 log det " (Λ + γ µ I ) p + (Λ + γ µ I ) − q ( γ µ ) p + ( γ µ ) − q # = r ( δ − 1 2 ) α 2 log  γ µ  + 1 α 2 log ( γ µ ) p + ( γ µ ) − q 2 ! + 1 α 2 ∞ X j =1 log ( λ j + γ µ ) p + ( λ j + γ µ ) − q ( γ µ ) p + ( γ µ ) − q ! . By Lemma 2, we hav e lim α → 0 r ( δ − 1 2 ) α 2 log  γ µ  = [ r 0 (0)] 2 4  log γ µ  2 . By Lemma 3, we hav e lim α → 0 1 α 2 log ( γ µ ) p + ( γ µ ) − q 2 ! = − [ r 0 (0)] 2 8  log γ µ  2 . 35 By Lemma 4, we hav e lim α → 0 1 α 2 log det " (Λ + γ µ I ) p + (Λ + γ µ I ) − q ( γ µ ) p + ( γ µ ) − q # = lim α → 0 1 α 2 ∞ X j =1 log " ( λ j + γ µ ) p + ( λ j + γ µ ) − q ( γ µ ) p + ( γ µ ) − q # = ∞ X j =1 lim α → 0 1 α 2 log " ( λ j + γ µ ) p + ( λ j + γ µ ) − q ( γ µ ) p + ( γ µ ) − q # by Lebesgue’ s Monotone Conv ergence Theorem, since log " ( λ j + γ µ ) p + ( λ j + γ µ ) − q ( γ µ ) p + ( γ µ ) − q # ≥ 0 ∀ j ∈ N by Lemma 5 = [ r 0 (0)] 2 8 ∞ X j =1  log  λ j + γ µ  − log  γ µ  2 = [ r 0 (0)] 2 8 ∞ X j =1  log  λ j µ γ + 1  2 . Summing up these three expressions, we obtain lim α → 0 D ( α,α ) r [( A + γ I ) , ( B + µI )] = [ r 0 (0)] 2 8    log γ µ  2 + ∞ X j =1  log  λ j µ γ + 1  2   = [ r 0 (0)] 2 8  log γ µ  2 +     log  Λ µ γ + I      2 HS ! = [ r 0 (0)] 2 8     log  Λ + γ µ I      2 eHS = [ r 0 (0)] 2 8 || log[( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 ] || 2 eHS = [ r 0 (0)] 2 8 d 2 aiHS [( A + γ I ) , ( B + µI )] . This completes the proof. Appendix A.3. Pr oofs for the Alpha Log-Determinant diver gences In this section, we prov e Theorem 10. Proof of Theorem 10 . The proof for the cases α = 0 and α = 1 is a special case of the results discussed at the end of Section 5.3. Consider now the case 0 < α < 1 . W e ﬁrst note that d 1 − 2 α logdet [( A + γ I ) , ( B + µI ) = 1 α (1 − α ) log  det X ( α ( A + γ I ) + (1 − α )( B + µI ) det X ( A + γ I ) q det X ( B + µI ) 1 − q  + q − α α (1 − α ) log γ µ , 36 where q = αγ αγ +(1 − α ) µ . By Deﬁnition 6, we hav e D ( α, 1 − α ) r [( A + γ I ) , ( B + µI )] = 1 α (1 − α ) log "  γ µ  r ( δ − α ) det X α  Λ + γ µ I  r (1 − δ ) + (1 − α )  Λ + γ µ I  − rδ !# = r ( δ − α ) α (1 − α ) log  γ µ  + 1 α (1 − α ) log det X α  Λ + γ µ I  r (1 − δ ) + (1 − α )  Λ + γ µ I  − rδ ! . By Proposition 1, we hav e det X α  Λ + γ µ I  r (1 − δ ) + (1 − α )  Λ + γ µ I  − rδ ! = det X h α [( A + γ I )( B + µI ) − 1 ] r (1 − δ ) + (1 − α )[( A + γ I )( B + µI ) − 1 ] − rδ i = det X [( A + γ I )( B + µI ) − 1 ] − rδ det X [ α [( A + γ I )( B + µI ) − 1 ] r + (1 − α ) I ] . In particular , for r = 1 , we have det X [ α [( A + γ I )( B + µI ) − 1 ] + (1 − α )] = det X [ α ( A + γ I ) + (1 − α )( B + µI )] det X ( B + µI ) . Thus it follows that det X α  Λ + γ µ I  (1 − δ ) + (1 − α )  Λ + γ µ I  − δ ! = det X [ α ( A + γ I ) + (1 − α )( B + µI )] det X ( A + γ I ) δ det X ( B + µI ) 1 − δ . Also for r = 1 , in Deﬁnition 6, we hav e δ = δ ( r = 1) = αγ αγ +(1 − α ) µ . Combining all of these expressions and comparing with the expressions for d 1 − 2 α logdet , we obtain the ﬁrst desired statement. For r = − 1 , we hav e D ( α, 1 − α ) − 1 [( A + γ I ) , ( B + µI )] = − ( δ − 1 − α ) α (1 − α ) log  γ µ  + 1 α (1 − α ) log det X α  Λ + γ µ I  − (1 − δ − 1 ) + (1 − α )  Λ + γ µ I  δ − 1 ! , 37 where δ − 1 = δ ( r = − 1) = α 1 γ α 1 γ +(1 − α ) 1 µ = αµ αµ +(1 − α ) γ . Similar to the case r = 1 , we have det X [ α [( A + γ I )( B + µI ) − 1 ] − 1 + (1 − α ) I ] = det X [(1 − α )( A + γ I ) + α ( B + µI )] det X ( A + γ I ) . Thus it follows that det X α  Λ + γ µ I  − (1 − δ − 1 ) + (1 − α )  Λ + γ µ I  δ − 1 ! = det X [(1 − α )( A + γ I ) + α ( B + µI )] det X ( A + γ I ) 1 − δ − 1 det X ( B + µI ) δ − 1 . On the other hand, we hav e d 2 α − 1 logdet [( A + γ I ) , ( B + µI ) = 1 α (1 − α ) log  det X ((1 − α )( A + γ I ) + α ( B + µI ) det X ( A + γ I ) p det X ( B + µI ) 1 − p  + p − (1 − α ) α (1 − α ) log γ µ , where p = (1 − α ) γ (1 − α ) γ + αµ = 1 − δ − 1 . Combining all of these expressions, we obtain the second desired statement, namely D ( α, 1 − α ) − 1 [( A + γ I ) , ( B + µI )] = d 2 α − 1 logdet [( A + γ I ) , ( B + µI ) . This completes the proof. Appendix A.4. Pr oofs for the other limiting cases In this section, we prove Theorems 11 and 12. W e need the following preliminary results. Lemma 6. Let H be a separable Hilbert space. Let A ∈ Sym( H ) ∩ T r( H ) be such that A + I > 0 . Then ∀ α ∈ R , the operator ( A + I ) α is well deﬁned and ( A + I ) α − I ∈ Sym( H ) ∩ T r( H ) . Equivalently , let { λ k } k ∈ N be the eigen values of A , then tr[( A + I ) α − I ] = ∞ X k =1 [( λ k + 1) α − 1] (A.10) has a ﬁnite value. 38 Proof of Lemma 6 . By Lemma 3 in [17], if A ∈ Sym( H ) ∩ T r( H ) and A + I > 0 , then log( A + I ) ∈ Sym( H ) ∩ T r( H ) . By deﬁnition of the power function, we ha ve ( A + I ) α = exp[ α log( A + I )] = I + ∞ X j =1 α j j ! [log( A + I )] j . Since T r( H ) is a Banach algebra under the trace norm, we ha ve || ( A + I ) α − I || tr =       ∞ X j =1 α j j ! [log( A + I )] j       tr ≤ ∞ X j =1 | α | j j ! || log( A + I ) || j tr = exp( | α | || log( A + I ) || tr ) − 1 < ∞ . Thus ( A + I ) α − I ∈ T r( H ) . The equiv alent statement is then obvious. This completes the proof. Lemma 7. Let H be a separable Hilbert space. Assume that ( A + γ I ) ∈ PT r( H ) . Then for any α ∈ R , we have ( A + γ I ) α − γ α I ∈ Sym( H ) ∩ T r( H ) and tr[( A + γ I ) α − γ α I ] = γ α tr  A γ + I  α − I  , (A.11) tr X [( A + γ I ) α ] = γ α  1 + tr  A γ + I  α − I  . (A.12) Proof of Lemma 7 . By deﬁnition of the power function, we ha ve ( A + γ I ) α = exp[ α log( A + γ I )] = exp  ( α log γ ) I + α log  A γ + I  = γ α  A γ + I  α = γ α  A γ + I  α − I  + γ α I , where h A γ + I  α − I i ∈ T r( H ) by Lemma 6. Thus it follo ws that ( A + γ I ) α − γ α I ∈ Sym( H ) ∩ T r( H ) and tr[( A + γ I ) α − γ α I ] = γ α tr  A γ + I  α − I  , which is the ﬁrst identity . By deﬁnition of the extended trace tr X [( A + γ I ) α ] = tr X ([( A + γ I ) α − γ α I ] + γ α I ) = γ α tr  A γ + I  α − I  + γ α , which is the second identity . This completes the proof. 39 Lemma 8. Let ( A + γ I ) , ( B + µI ) ∈ PT r( H ) . Let Λ + γ µ I = ( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 . Then for any α ∈ R , tr X [( A + γ I )( B + µI ) − 1 ] α = tr X  Λ + γ µ  α  = tr X [( B + µI ) − 1 ( A + γ I )] α . (A.13) det X [( A + γ I )( B + µI ) − 1 ] α = det X  Λ + γ µ  α  = det X [( B + µI ) − 1 ( A + γ I )] α . (A.14) Proof of Lemma 8 . By Proposition 1, we hav e [( A + γ I )( B + µI ) − 1 ] α = ( B + µI ) 1 / 2  Λ + γ µ  α ( B + µI ) − 1 / 2 . Similarly , [( B + µI ) − 1 ( A + γ I )] α = ( B + µI ) − 1 / 2  Λ + γ µ  α ( B + µI ) 1 / 2 . By the commutativity of the tr X operation (Lemma 4 in [17]), we then hav e tr X [( A + γ I )( B + µI ) − 1 ] α = tr X  Λ + γ µ  α  = tr X [( B + µI ) − 1 ( A + γ I )] α . Similarly , by the product property of the det X operation (Proposition 4 in [17]), det X [( A + γ I )( B + µI ) − 1 ] α = det X  Λ + γ µ  α  = det X [( B + µI ) − 1 ( A + γ I )] α . This completes the proof. Lemma 9. Assume that λ > 0 , γ > 0 , α > 0 ar e ﬁxed. Assume that r = r ( β ) is smooth. Then for δ = αγ r αγ r + β , p = r (1 − δ ) , q = r δ , we have lim β → 0 1 αβ log  αλ p + β λ − q α + β  = 1 α 2  (log λ ) r (0) γ r (0) + λ − r (0) − 1  . (A.15) In particular , for λ = γ , we have lim β → 0 1 αβ log  αγ p + β γ − q α + β  = 1 α 2  [(log γ ) r (0) + 1] γ − r (0) − 1  . (A.16) 40 Proof of Lemma 9 . W e have for α > 0 , lim β → 0 δ = 1 , lim β → 0 p = 0 , lim β → 0 q = r (0) , so that lim β → 0 ( αλ p + β λ − q ) = α . W ith p = r (1 − δ ) = rβ αγ r + β , we hav e ∂ p ∂ β = ( ∂ r ∂ β β + r )( αγ r + β ) − rβ ( αγ r log γ ∂ r ∂ β + 1) ( αγ r + β ) 2 , lim β → 0 ∂ p ∂ β = r (0) αγ r (0) . W ith q = r δ = rαγ r αγ r + β , we hav e ∂ q ∂ β = ( ∂ r ∂ β αγ r + r αγ r log γ ∂ r ∂ β )( αγ r + β ) − rα γ r ( αγ r log γ ∂ r ∂ β + 1) ( αγ r + β ) 2 , lim β → 0 ∂ q ∂ β = ∂ r ∂ β (0) − r (0) αγ r (0) . The required limit is of the form 0 0 and L ’Hopital’ s rule can be applied to gi ve lim β → 0 1 αβ log  αλ p + β λ − q α + β  = 1 α lim β → 0 α + β αλ p + β λ − q [ αλ p (log λ ) ∂ p ∂ β + λ − q − β λ − q (log λ ) ∂ q ∂ β ]( α + β ) − ( αλ p + β λ − q ) ( α + β ) 2 = α (log λ ) ∂ p ∂ β (0) + λ − r (0) − 1 α 2 = 1 α 2  (log λ ) r (0) γ r (0) + λ − r (0) − 1  . This completes the proof. Lemma 10. Assume that γ > 0 , α > 0 ar e ﬁxed. Assume that λ ∈ R is also ﬁxed, such that λ + γ > 0 . Assume that r = r ( β ) is smooth. Then for δ = αγ r αγ r + β , p = r (1 − δ ) , q = r δ , we have lim β → 0 1 αβ log  α ( λ + γ ) p + β ( λ + γ ) − q αγ p + β γ − q  = 1 α 2  log  λ γ + 1  r (0) γ r (0) + ( λ + γ ) − r (0) − γ − r (0)  . (A.17) Proof of Lemma 10 . By Lemma 9, we hav e lim β → 0 1 αβ log  α ( λ + γ ) p + β ( λ + γ ) − q αγ p + β γ − q  = lim β → 0 1 αβ log  α ( λ + γ ) p + β ( λ + γ ) − q α + β  − lim β → 0 1 αβ log  αγ p + β γ − q α + β  = 1 α 2  (log( λ + γ ) r (0) γ r (0) + ( λ + γ ) − r (0) − 1  − 1 α 2  (log γ ) r (0) γ r (0) + γ − r (0) − 1  = 1 α 2  log  λ γ + 1  r (0) γ r (0) + ( λ + γ ) − r (0) − γ − r (0)  . 41 This completes the proof. Lemma 11. Assume that γ > 0 , α > 0 ar e ﬁxed. Assume that λ ∈ R is also ﬁxed, such that λ + γ > 0 . Assume that r = r ( β ) is smooth. Then for δ = αγ r αγ r + β , p = r (1 − δ ) , q = r δ , we have α ( λ + γ ) p + β ( λ + γ ) − q αγ p + β γ − q ≥ 1 , (A.18) log  α ( λ + γ ) p + β ( λ + γ ) − q αγ p + β γ − q  ≥ 0 . (A.19) Proof of Lemma 11 . W e proceed as in the proof of Lemma 5, by applying Theorem 5 as follows α ( λ + γ ) p + β ( λ + γ ) − q αγ p + β γ − q = αγ p αγ p + β γ − q  λ γ + 1  p + β γ − q αγ p + β γ − q  λ γ + 1  − q = s  λ γ + 1  p + (1 − s )  λ γ + 1  − q , where s = αγ p αγ p + β γ − q = αγ p + q αγ p + q + β = αγ r αγ r + β = δ, ≥  λ γ + 1  pδ  λ γ + 1  − q (1 − δ ) =  λ γ + 1  ( p + q ) δ − q =  λ γ + 1  rδ − q = 1 , since r δ = q . This completes the proof. Lemma 12. Assume that γ > 0 , α > 0 ar e ﬁxed. Assume that r = r ( β ) is smooth. Then for δ = αγ r αγ r + β , lim β → 0 r ( δ − α α + β ) αβ = 1 α 2 r (0)[ − γ − r (0) + 1] . (A.20) Proof of Lemma 12 . W e ﬁrst have ∂ δ ∂ β = αγ r log γ ∂ r ∂ β ( αγ r + β ) − αγ r ( αγ r log γ ∂ r ∂ β + 1) ( αγ r + β ) 2 lim β → 0 ∂ δ ∂ β = − 1 αγ r (0) . Since the required limit has the form 0 0 , we apply L ’Hopital’ s rule to get lim β → 0 r ( δ − α α + β ) αβ = lim β → 0 1 α  ∂ r ∂ β  δ − α α + β  + r  ∂ δ ∂ β + α ( α + β ) 2  = 1 α  r (0)  − 1 αγ r (0) + 1 α  = 1 α 2 r (0)[ − γ − r (0) + 1] . This completes the proof. 42 Proof of Theorem 11 . Let { λ j } ∞ j =1 be the eigen v alues of Λ . By Theorem 8, we hav e D ( α,β ) r [( A + γ I ) , ( B + µI )] = r ( δ − α α + β ) αβ log  γ µ  + 1 αβ log α ( γ µ ) p + β ( γ µ ) − q α + β ! + 1 αβ log det α (Λ + γ µ I ) p + β (Λ + γ µ I ) − q α ( γ µ ) p + β ( γ µ ) − q ! = r ( δ − α α + β ) αβ log  γ µ  + 1 αβ log α ( γ µ ) p + β ( γ µ ) − q α + β ! + 1 αβ ∞ X j =1 log α ( λ j + γ µ ) p + β ( λ j + γ µ ) − q α ( γ µ ) p + β ( γ µ ) − q ! , where p = p ( β ) = r (1 − δ ) = rβ α ( γ µ ) r + β , q = q ( β ) = r δ = rα ( γ µ ) r α ( γ µ ) r + β . For α > 0 ﬁx ed, as functions of β , we have lim β → 0 p ( β ) = 0 , lim β → 0 q ( β ) = r (0) . For simplicity , in the following, we replace γ µ by γ . By Lemma 9, lim β → 0 1 αβ log  αγ p + β γ − q α + β  = 1 α 2  [(log γ ) r (0) + 1] γ − r (0) − 1  . By Lemma 10, lim β → 0 1 αβ log  α ( λ j + γ ) p + β ( λ j + γ ) − q αγ p + β γ − q  = 1 α 2  log  λ j γ + 1  r (0) γ r (0) + ( λ j + γ ) − r (0) − γ − r (0)  . By Lemma 11, we have log  α ( λ j + γ ) p + β ( λ j + γ ) − q αγ p + β γ − q  ≥ 0 ∀ j ∈ N , so that by Lebesgue’ s Monotone Con ver gence Theorem, we obtain lim β → 0 1 αβ ∞ X j =1 log  α ( λ j + γ ) p + β ( λ j + γ ) − q αγ p + β γ − q  = ∞ X j =1 lim β → 0 1 αβ log α ( λ j + γ ) p + β ( λ j + γ µ ) − q αγ p + β γ − q ! = 1 α 2 ∞ X j =1  log  λ j γ + 1  r (0) γ r (0) + ( λ j + γ ) − r (0) − γ − r (0)  . By Lemma 12 log( γ ) lim β → 0 r ( δ − α α + β ) αβ = 1 α 2 r (0)[ − γ − r (0) + 1] log( γ ) . 43 Combining all three expressions, we obtain the desired limit as the sum 1 α 2 [ γ − r (0) + r (0) log( γ ) − 1] + 1 α 2    r (0) γ r (0) ∞ X j =1 log  λ j γ + 1  + ∞ X j =1 " 1 ( λ j + γ ) r (0) − 1 γ r (0) #    . (A.21) By Lemmas 6 and 7, we hav e ∞ X j =1  1 ( λ j + γ ) r (0) − 1 γ r (0)  = γ − r (0) ∞ X j =1 "  λ j γ + 1  − r (0) − 1 # = γ − r (0) tr "  Λ γ + I  − r (0) − I # = tr[(Λ + γ I ) − r (0) − γ − r (0) I ] . Thus it follows that γ − r (0) − 1 + ∞ X j =1 " 1 ( λ j + γ ) r (0) − 1 γ r (0) # = γ − r (0) − 1 + tr[(Λ + γ I ) − r (0) − γ − r (0) I ] = tr X [(Λ + γ I ) − r (0) − I ] . Furthermore, r (0) γ r (0) ∞ X j =1 log  λ j γ + 1  = r (0) γ − r (0) log det  Λ γ + I  = r (0) γ − r (0) log det X (Λ + γ I ) − r (0) γ − r (0) log γ = − γ − r (0) log det X (Λ + γ I ) − r (0) − r (0) γ − r (0) log γ . Plugging the last two expressions into (A.21), we obtain the desired limit as 1 α 2 n r (0)(1 − γ − r (0) ) log γ o + 1 α 2 n tr X [(Λ + γ I ) − r (0) − I ] − γ − r (0) log det X (Λ + γ I ) − r (0) o . (A.22) W e now replace γ by γ µ . W e have by Lemma 8, tr X "  Λ + γ µ I  − r (0) # = tr X [( B + µI ) − 1 ( A + γ I )] − r (0) = tr X [( A + γ I ) − 1 ( B + µI )] r (0) , det X  Λ + γ µ I  − r (0) = det X [( B + µI ) − 1 ( A + γ I )] − r (0) = det X  ( A + γ I ) − 1 ( B + µI )  r (0) . 44 Then (A.22) becomes r (0) α 2 "  µ γ  r (0) − 1 # log µ γ + 1 α 2 tr X ([( A + γ I ) − 1 ( B + µI )] r (0) − I ) − 1 α 2  µ γ  r (0) log det X [( A + γ I ) − 1 ( B + µI )] r (0) . This completes the proof of the theorem. Proof of Theorem 12 . The dual symmetry in Theorem 13 giv es lim α → 0 D ( α,β ) r [( A + γ I ) , ( B + µI )] = lim α → 0 D ( β ,α ) r [( B + µI ) , ( A + γ I )] . The limit on the right hand side then follows from Theorem 11. Appendix A.5. Pr oofs of the pr operties of the Alpha-Beta Log-Determinant diver gences In this section, we prov e Theorems 13, 14, 15, 16, 17, and 18. For the case α = β = 0 , we ha ve D (0 , 0) 0 [( A + γ I ) , ( B + µI )] = 1 2 d 2 aiHS [( A + γ I ) , ( B + µI )] , with d aiHS being the afﬁne-in variant Riemannian distance on PT r( H ) . Thus these properties are either automatic or straightforward to verify . W e thus focus on the three cases ( α > 0 , β > 0) , ( α > 0 , β = 0) , and ( α = 0 , β > 0) . Proof of Theorem 13 (Dual symmetry) . For the case α > 0 , β = 0 and α = 0 , β > 0 , from Eqs. (10) and (11), we immediately hav e D ( α, 0) r [( A + γ I ) , ( B + µI )] = r α 2  µ γ  r − 1  log µ γ + 1 α 2 tr X ([( A + γ I ) − 1 ( B + µI )] r − I ) − 1 α 2  µ γ  r log det X [( A + γ I ) − 1 ( B + µI )] r = D (0 ,α ) r [( B + µI ) , ( A + γ I )] . Consider now the case α > 0 , β > 0 . Write δ = δ ( α, β ) to emphasize its dependence on α and β , we hav e δ ( α, β ) = αγ r αγ r + β µ r in D ( α,β ) r [( A + γ I ) , ( B + µI )] . Then for D ( β ,α ) r [( B + µI ) , ( A + γ I )] , we hav e δ ( β , α ) = β µ r αγ r + β µ r = 1 − δ ( α, β ) , 1 − δ ( β , α ) = δ ( α , β ) , δ ( β , α ) − β α + β = 1 − δ ( α, β ) − β α + β = −  δ ( α, β ) − α α + β  . 45 By Deﬁnition 1, we hav e D ( β ,α ) r [( B + µI ) , ( A + γ I )] = 1 αβ log  µ γ  r ( δ ( β ,α ) − β α + β ) + 1 αβ log det X  β [( B + µI )( A + γ I ) − 1 ] r (1 − δ ( β ,α )) + α [( B + µI )( A + γ I ) − 1 ] − rδ ( β ,α ) α + β  = 1 αβ log  γ µ  r ( δ ( α,β ) − α α + β ) + 1 αβ log det X  β [( A + γ I )( B + µI ) − 1 ] − rδ ( α,β ) + α [( A + γ I )( B + µI ) − 1 ] r (1 − δ ( α,β )) α + β  = D ( α,β ) r [( A + γ I ) , ( B + µI )] . This completes the proof of the theorem. Proof of Theorem 14 (Dual in variance under in version) . W e hav e ( A + γ I ) − 1 = 1 γ I − A γ ( A + γ I ) − 1 , ( B + µI ) − 1 = 1 µ I − B µ ( B + µI ) − 1 , ( B + µI ) 1 / 2 ( A + γ I ) − 1 ( B + µI ) 1 / 2 = [( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 ] − 1 . Consider the case α > 0 , β > 0 . By Deﬁnition 1, we have D ( α,β ) r [( A + γ I ) − 1 , ( B + µI ) − 1 ] = 1 αβ log  1 /γ 1 /µ  r ( δ 2 − α α + β ) + 1 αβ log det X α (Λ + γ µ I ) − r (1 − δ 2 ) + β (Λ + γ µ ) rδ 2 α + β ! where δ 2 = α (1 /γ ) r α (1 /γ ) r + β (1 /µ ) r = αµ r αµ r + β γ r = δ ( − r ) . Thus D ( α,β ) r [( A + γ I ) − 1 , ( B + µI ) − 1 ] = D ( α,β ) − r [( A + γ I ) , ( B + µI )] . Consider the case α = 0 , β > 0 (the case α > 0 , β = 0 then follows by dual symme- 46 try). W e have ] D (0 ,β ) r [( A + γ I ) − 1 , ( B + µI ) − 1 ] = r β 2  1 /γ 1 /µ  r − 1  log 1 /γ 1 /µ + 1 β 2 tr X ([( B + µI )( A + γ I ) − 1 ] r − I ) − 1 β 2  1 /γ 1 /µ  r log det X [( B + µI )( A + γ I ) − 1 ] r = − r β 2 "  γ µ  − r − 1 # log γ µ + 1 β 2 tr X ([( A + γ I )( B + µI ) − 1 ] − r − I ) − 1 β 2  γ µ  − r log det X [( A + γ I )( B + µI ) − 1 ] − r . By Lemma 8, we hav e tr X [( A + γ I )( B + µI ) − 1 ] − r = tr X "  Λ + γ µ  − r # = tr X [( B + µI ) − 1 ( A + γ I )] − r , det X [( A + γ I )( B + µI ) − 1 ] − r = det X "  Λ + γ µ  − r # = det X [( B + µI ) − 1 ( A + γ I )] − r . Thus it follows that D (0 ,β ) r [( A + γ I ) − 1 , ( B + µI ) − 1 ] = − r β 2 "  γ µ  − r − 1 # log γ µ + 1 β 2 tr X ([( B + µI ) − 1 ( A + γ I )] − r − I ) − 1 β 2  γ µ  − r log det X [( B + µI ) − 1 ( A + γ I )] − r = D (0 ,β ) − r [( A + γ I ) , ( B + µI )] . This completes the proof. Proof of Theorem 15 (Afﬁne-in variance) . W e hav e for ( A + γ I ) ∈ PT r( H ) and ( C + ν I ) ∈ T r X ( H ) , ν 6 = 0 , ( C + ν I )( A + γ I )( C + ν I ) ∗ = C AC ∗ + ν ( C A + AC ∗ ) + ν 2 A + γ C C ∗ + γ ν ( C + C ∗ ) + γ ν 2 I ∈ T r X ( H ) . Since ( C + ν I ) is assumed to be in vertible, the operator ( C + ν I )( A + γ I )( C + ν I ) ∗ is also in vertible, with inv erse [( C + ν I ) ∗ ] − 1 ( A + γ I ) − 1 ( C + ν I ) − 1 . Furthermore, 47 ∀ x ∈ H , h x, ( C + ν I )( A + γ I )( C + ν I ) ∗ x i = h ( C + ν I ) ∗ x, ( A + γ I )( C + ν I ) ∗ x i ≥ M A || ( C + ν I ) ∗ x || ≥ 0 , with equality if and only if ( C + ν I ) ∗ x = 0 ⇐ ⇒ x = 0 . Thus ( C + ν I )( A + γ I )( C + ν I ) ∗ is strictly positi ve. T ogether with its in vertibility , this sho ws that this is a positiv e deﬁnite operator . Hence ( C + ν I )( A + γ I )( C + ν I ) ∗ ∈ PT r( H ) . For two operators ( A + γ I ) , ( B + µI ) ∈ PT r( H ) , we then have [( C + ν I )( A + γ I )( C + ν I ) ∗ ][( C + ν I )( B + µI )( C + ν I ) ∗ ] − 1 = ( C + ν I )[( A + γ I )( B + µI ) − 1 ]( C + ν I ) − 1 . Then for any p ∈ R , we hav e ([( C + ν I )( A + γ I )( C + ν I ) ∗ ][( C + ν I )( B + µI )( C + ν I ) ∗ ] − 1 ) p = ( C + ν I )[( A + γ I )( B + µI ) − 1 ] p ( C + ν I ) − 1 . Thus for any a, b > 0 and an y p, q ∈ R . a ([( C + ν I )( A + γ I )( C + ν I ) ∗ ][( C + ν I )( B + µI )( C + ν I ) ∗ ] − 1 ) p + b ([( C + ν I )( A + γ I )( C + ν I ) ∗ ][( C + ν I )( B + µI )( C + ν I ) ∗ ] − 1 ) q = ( C + ν I )( a [( A + γ I )( B + µI ) − 1 ] p + b [( A + γ I )( B + µI ) − 1 ] q )( C + ν I ) − 1 . By the deﬁnition of D ( α,β ) r and the following inv ariances of the extended Fredholm determinant det X as well as of the extended trace operation tr X , namely , det X [ C ( A + γ I ) C − 1 ] = det X [( A + γ I )] , tr X [ C ( A + γ I ) C − 1 ] = tr X [( A + γ I )] , for A + γ I ∈ T r X ( H ) , γ 6 = 0 , and C ∈ L ( H ) in v ertible (Lemma 5 in [17]), we then obtain the desired afﬁne in variance for D ( α,β ) r , namely D ( α,β ) r [( C + ν I )( A + γ I )( C + ν I ) ∗ , ( C + ν I )( B + µI )( C + ν I ) ∗ ] = D ( α,β ) r [( A + γ I ) , ( B + µI )] . This completes the proof. 48 Proof of Theorem 16 (In variance under unitary transformations) . The proof of this theorem is similar to that of the proof for Theorem 15 , using the fact that C ∗ = C − 1 and the properties det X [ C ( A + γ I ) C − 1 ] = det X [( A + γ I )] , tr X [ C ( A + γ I ) C − 1 ] = tr X [( A + γ I )] , of the operations det X and tr X . Proof of Theorem 17 . For the case α > 0 , β > 0 , this follows immediately from Deﬁnition 1. For the case α > 0 , β = 0 , by Deﬁnition 2 and Lemma 8, we have D ( α, 0) r [( A + γ I ) , ( B + µI )] = r α 2  µ γ  r − 1  log  µ γ  + 1 α 2 tr X (Λ + γ µ ) − r − I ) − 1 α 2  µ γ  r log det X (Λ + γ µ ) − r = D ( α, 0) r [(Λ + γ µ ) , I ] . The case α = 0 , β > 0 is entirely similar . Proof of Theorem 18 . W e ﬁrst note that (Λ + γ µ I ) ω = ( γ µ ) ω ( µ γ Λ + I ) ω . Then for α > 0 , β > 0 , the statement of the theorem follows immediately from Deﬁnition 1. For the case α > 0 , β = 0 , by Deﬁnition 2 and Lemma 8, we have D ( ωα, 0) ωr [( A + γ I ) , ( B + µI )] = r ω 2 α 2  µ γ  ωr − 1  log  µ γ  ω + 1 ω 2 α 2 tr X (Λ + γ µ ) − ωr − I ) − 1 ω 2 α 2  µ γ  ωr log det X (Λ + γ µ ) − ωr = 1 ω 2 D ( α, 0) r [(Λ + γ µ ) ω , I ] . The case α = 0 , β > 0 is entirely similar . Appendix A.6. Pr oofs of Theor ems 1, 2, and 3 W e are now ready to provide the proofs for Theorems 1, 2, and 3. For the proof of positi vity , we ﬁrst need the following technical result. 49 Lemma 13. (i ) Let r 6 = 0 be ﬁxed. The function f ( x ) = x r − 1 − r log ( x ) for x > 0 has a unique global minimum f min = f (1) = 0 . In other wor ds, f ( x ) ≥ 0 ∀ x > 0 , with equality if and only if x = 1 . (ii) Let ν > 0 , r 6 = 0 be ﬁxed. F or r 6 = 0 , the function g ( x ) = ( x ν + 1) r − 1 − r log( x ν + 1) for x > − ν has a unique global minimum g min = g (0) = 0 . In other wor ds, g ( x ) ≥ 0 ∀ x > − ν , with equality if and only if x = 0 . Proof of Lemma 13 . (i) W e have f 0 ( x ) = r ( x r − 1) x . When r > 0 , we have x r < 1 for 0 < x < 1 and x r > 1 for x > 1 . When r < 0 , we hav e x r > 1 for 0 < x < 1 and x r < 1 for x > 1 . Thus, for all r 6 = 0 , we have f 0 ( x ) < 0 when 0 < x < 1 and f 0 ( x ) > 0 when x > 1 . Hence f has a unique global minimum f min = f (1) = 0 . (ii) The proof for g follows that for f by the change of variable y = x ν + 1 . Proof of Theorem 1 (P ositivity) . For the case α > 0 , β > 0 , this is a special case of Theorem 6, with p + q = r . Consider now the case α = 0 , β > 0 (the case α > 0 , β = 0 then follows by dual symmetry). For the proof of positivity , we can ignore the positiv e factor β 2 and thus it sufﬁces to consider D (0 , 1) r . W e recall that we deﬁne Λ + ν I = ( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 , where ν = γ µ . Then, since det X [( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 ] = det X [( B + µI ) − 1 ( A + γ I )] and tr X [( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 ] = tr X [( B + µI ) − 1 ( A + γ I )] , we ha ve D (0 , 1) r [( A + γ I ) , ( B + µI )] = r ( ν r − 1) log ν + tr X [(Λ + ν I ) r − I ] − ν r log det X (Λ + ν I ) r By Lemma 7, tr X [(Λ + ν I ) r − I ] = ν r − 1 + ν r tr  Λ ν + I  r − I  . Also log det X (Λ + ν I ) r = log  ν r det  Λ ν + I  r  = r log det  Λ ν + I  + r log ν. 50 Thus we hav e D (0 , 1) r [( A + γ I ) , ( B + µI )] = ν r − 1 − r log ν + ν r  tr  Λ ν + I  r − I  − r log det  Λ ν + I  = ν r − 1 − r log ν + ν r " ∞ X k =1  λ k ν + 1  r − 1 − r log  λ k ν + 1  # . By the ﬁrst part of Lemma 13, we hav e for all ν > 0 ν r − 1 − r log ν ≥ 0 , with equality if and only if ν = 1 . By the second part of the Lemma 13, we hav e for all k ∈ N  λ k ν + 1  r − 1 − r log  λ k ν + 1  ≥ 0 , with equality if and only λ k = 0 . Combining these two inequalities, we obtain D (0 , 1) r [( A + γ I ) , ( B + µI )] ≥ 0 , with equality if and only if ν = γ µ = 1 and λ k = 0 ∀ k ∈ N ⇐ ⇒ Λ = I , that is if and only ( B + µI ) − 1 / 2 ( A + γ I )( B + µI ) − 1 / 2 = I ⇐ ⇒ A + γ I = B + µI ⇐ ⇒ A = B and γ = µ . This completes the proof. Proof of Theorem 2 (Special cases - I) . The ﬁrst statement of the theorem is the con- tent of Theorem 9. The second statement is the content of Theorem 10. Proof of Theorem 3 (Special cases - II) . This theorem follows from Theorems 9 and 10 as well as the symmetry of D ( α,α ) r [( A + γ I ) , ( B + µI )] as proved in Theorem 13. Appendix A.7. Pr oofs for the diver gences between RKHS co variance operators In this section, we prov e Theorems 23, 24, 25, and 26. W e ﬁrst need the following preliminary results. 51 Lemma 14. Let H 1 , H 2 be separable Hilbert spaces. Let A : H 1 → H 2 and B : H 2 → H 1 be compact linear operators such that both AB : H 2 → H 2 and B A : H 1 → H 1 ar e trace class oper ators. Let α, β > 0 be ﬁxed. F or any p, q ∈ R , det  α ( AB + I H 2 ) p + β ( AB + I H 2 ) q α + β  = det  α ( B A + I H 1 ) p + β ( B A + I H 1 ) q α + β  . (A.23) Proof of Lemma 14 . Since the nonzero eigenv alues of AB : H 2 → H 2 and B A : H 1 → H 1 are the same, we hav e for any p ∈ R det[( AB + I H 2 ) p ] = det[( B A + I H 1 ) p ] . For an y p, q ∈ R , det  α ( AB + I H 2 ) p + β ( AB + I H 2 ) q α + β  = det  α ( B A + I H 1 ) p + β ( B A + I H 1 ) q α + β  . In the above equality , we have used the fact that a zero eigenv alue of AB and B A corresponds to an eigenv alue equal to 1 for α ( AB + I H 2 ) p + β ( AB + I H 2 ) q α + β : H 2 → H 2 and α ( B A + I H 1 ) p + β ( B A + I H 1 ) q α + β : H 1 → H 1 , respectively , which does not change the determinant. This completes the proof. Lemma 15. Let H 1 , H 2 be separable Hilbert spaces. Let A, B : H 1 → H 2 be com- pact linear operators suc h that both AA ∗ : H 2 → H 2 and B B ∗ : H 2 → H 2 ar e trace class operators. Let α, β > 0 be ﬁxed. F or any p, q ∈ R , det  α [( AA ∗ + I H 2 )( B B ∗ + I H 2 ) − 1 ] p + β [( AA ∗ + I H 2 )( B B ∗ + I H 2 ) − 1 ] q α + β  = det  α ( C + I H 1 ⊗ I 3 ) p + β ( C + I H 1 ⊗ I 3 ) q α + β  , (A.24) wher e C =      A ∗ A − A ∗ B ( I H 1 + B ∗ B ) − 1 − A ∗ AA ∗ B ( I H 1 + B ∗ B ) − 1 B ∗ A − B ∗ B ( I H 1 + B ∗ B ) − 1 − B ∗ AA ∗ B ( I H 1 + B ∗ B ) − 1 B ∗ A − B ∗ B ( I H 1 + B ∗ B ) − 1 − B ∗ AA ∗ B ( I H 1 + B ∗ B ) − 1      . (A.25) 52 Proof of Lemma 15 . W e make use of the follo wing notation. Let A, B , C : H 1 → H 2 be three bounded linear operators. Consider the operator ( A B C ) : H 3 1 → H 2 , with ( A B C ) ∗ =      A ∗ B ∗ C ∗      : H 2 → H 3 1 . Here H 3 1 = H 1 ⊕ H 1 ⊕ H 1 denotes the direct sum of H 1 with itself, that is H 3 1 = H 1 ⊕ H 1 ⊕ H 1 = { ( v 1 , v 2 , v 3 ) : v 1 , v 2 , v 3 ∈ H 1 } , equipped with the inner product h ( v 1 , v 2 , v 3 ) , ( w 1 , w 2 , w 3 ) i H 3 1 = h v 1 , w 1 i H 1 + h v 2 , w 2 i H 1 + h v 3 , w 3 i H 1 . If { e i } ∞ i =1 is an orthonormal basis for H 1 , then { ( e i , 0 , 0) } ∞ i =1 ∪ { (0 , e i , 0) } ∞ i =1 ∪ { (0 , 0 , e i ) } ∞ i =1 is an orthonormal basis for H 3 1 . W e now utilize this notation in our setting. By the Sherman-Morrison-W oodbury formula, we hav e ( B B ∗ + I H 2 ) − 1 = I H 2 − B ( I H 1 + B ∗ B ) − 1 B ∗ . Thus it follows that ( AA ∗ + I H 2 )( B B ∗ + I H 2 ) − 1 = I H 2 + AA ∗ − B ( I H 1 + B ∗ B ) − 1 B ∗ − AA ∗ B ( I H 1 + B ∗ B ) − 1 B ∗ = I H 2 + C 1 C 2 . Here the operators C 1 , C 2 are deﬁned as follows. C 1 = [ A − B ( I H 1 + B ∗ B ) − 1 − AA ∗ B ( I H 1 + B ∗ B ) − 1 ] : H 3 1 → H 2 , C 2 =      A ∗ B ∗ B ∗      : H 2 → H 3 1 . The operator C 2 C 1 : H 3 1 → H 3 1 is giv en by C 2 C 1 =      A ∗ A − A ∗ B ( I H 1 + B ∗ B ) − 1 − A ∗ AA ∗ B ( I H 1 + B ∗ B ) − 1 B ∗ A − B ∗ B ( I H 1 + B ∗ B ) − 1 − B ∗ AA ∗ B ( I H 1 + B ∗ B ) − 1 B ∗ A − B ∗ B ( I H 1 + B ∗ B ) − 1 − B ∗ AA ∗ B ( I H 1 + B ∗ B ) − 1      . 53 It follows from Lemma 14 that det  α [( AA ∗ + I H 2 )( B B ∗ + I H 2 ) − 1 ] p + β [( AA ∗ + I H 2 )( B B ∗ + I H 2 ) − 1 ] q α + β  = det  α ( I H 2 + C 1 C 2 ) p + β ( I H 2 + C 1 C 2 ) q α + β  = det  α ( C 2 C 1 + I H 1 ⊗ I 3 ) p + β ( C 2 C 1 + I H 1 ⊗ I 3 ) q α + β  . This completes the proof. Proof of Theorem 23 . Let Λ + γ µ I = ( B B ∗ + µI H 2 ) − 1 / 2 ( AA ∗ + γ I )( B B ∗ + µI ) − 1 / 2 and Z + γ µ I = ( AA ∗ + γ I )( B B ∗ + µI ) − 1 , with µ γ Z + I = ( AA ∗ γ + I )( B B ∗ µ + I ) − 1 . By Theorem 8, we hav e D ( α,β ) r [( AA ∗ + γ I H 2 ) , ( B B ∗ + µI H 2 )] = r ( δ − α α + β ) αβ  log γ µ  + 1 αβ log α ( γ µ ) p + β ( γ µ ) − q α + β ! + 1 αβ log det " α (Λ + γ µ I ) p + β (Λ + γ µ I ) − q α ( γ µ ) p + β ( γ µ ) − q # , with p = r (1 − δ ) and q = r δ . The determinant in the last term is det " α (Λ + γ µ I ) p + β (Λ + γ µ I ) − q α ( γ µ ) p + β ( γ µ ) − q # = det " α ( γ µ ) p ( µ γ Λ + I ) p + β ( γ µ ) − q ( µ γ Λ + I ) − q α ( γ µ ) p + β ( γ µ ) − q # = det " α ( γ µ ) p ( µ γ Z + I ) p + β ( γ µ ) − q ( µ γ Z + I ) − q α ( γ µ ) p + β ( γ µ ) − q # = det " α ( γ µ ) p ( C + I H 1 ⊗ I 3 ) p + β ( γ µ ) − q ( C + I H 1 ⊗ I 3 ) − q α ( γ µ ) p + β ( γ µ ) − q # by Lemma 15, where C =      A ∗ A γ − A ∗ B √ γ µ ( I H 1 + B ∗ B µ ) − 1 − A ∗ AA ∗ B γ √ γ µ ( I H 1 + B ∗ B µ ) − 1 B ∗ A √ γ µ − B ∗ B µ ( I H 1 + B ∗ B µ ) − 1 − B ∗ AA ∗ B γ µ ( I H 1 + B ∗ B µ ) − 1 B ∗ A √ γ µ − B ∗ B µ ( I H 1 + B ∗ B µ ) − 1 − B ∗ AA ∗ B γ µ ( I H 1 + B ∗ B µ ) − 1      , which is obtained by replacing AA ∗ and B B ∗ in Lemma 15 with AA ∗ γ and B B ∗ µ , re- spectiv ely . This completes the proof of the theorem. 54 Proof of Theorem 24 . Let Z + γ µ I = ( AA ∗ + γ I )( B B ∗ + µI ) − 1 . By the ﬁnite- dimensional formula giv en in Eq. (19), we hav e D ( α,β ) r [( AA ∗ + γ I H 2 ) , ( B B ∗ + µI H 2 )] = 1 αβ log det " α ( Z + γ µ I ) p + β ( Z + γ µ I ) − q α + β # = 1 αβ " log α ( γ µ ) p + β ( γ µ ) − q α + β !# dim( H 2 ) + 1 αβ log det " α ( Z + γ µ I ) p + β ( Z + γ µ I ) − q α ( γ µ ) p + β ( γ µ ) − q # . As in the proof of Theorem 23, the determinant in last term in the abov e expression is det " α (Λ + γ µ I ) p + β (Λ + γ µ I ) − q α ( γ µ ) p + β ( γ µ ) − q # = det " α ( γ µ ) p ( C + I H 1 ⊗ I 3 ) p + β ( γ µ ) − q ( C + I H 1 ⊗ I 3 ) − q α ( γ µ ) p + β ( γ µ ) − q # . This giv es us the ﬁnal expression. Proof of Theorem 25 . W e consider the linear operators A = 1 √ m Φ( x ) J m : R m → H K , B = 1 √ m Φ( y ) J m : R m → H K . The desired expression then follo ws from Theorem 23. Proof of Theorem 26 . This is proved in the same way as Theorem 25, except that we in vok e Theorem 24. Appendix A.8. Pr oofs for the metric pr operties In this section, we pro ve Theorems 19, 20, 21, which lead to the proofs of Theorems 4 and 22. W e present two sets of separate proofs for Theorems 4 and 22, one simpler proof for the particular case α = 1 / 2 , which corresponds to the inﬁnite-dimensional symmetric Stein div ergence, and one general proof for any α > 0 . The former case utilizes Theorem 28 and the latter case utilizes Theorem 30, both of which should be of interest in their own right. 55 Appendix A.8.1. The case of the inﬁnite-dimensional symmetric Stein diver gence Consider the ﬁrst case α = 1 / 2 , which corresponds to the inﬁnite-dimensional symmetric Stein div ergence. Lemma 16. Let H be a separ able Hilbert space. Let A, B , C : H → H be self-adjoint ﬁnite-rank oper ators, such that A + I > 0 , B + I > 0 , C + I > 0 . Then s log det( A + B 2 + I ) p det( A + I ) det( B + I ) ≤ s log det( A + C 2 + I ) p det( A + I ) det( C + I ) + s log det( C + B 2 + I ) p det( C + I ) det( B + I ) . (A.26) Proof of Lemma 16 . Since A, B , C are all ﬁnite-rank operators, there exists a ﬁnite- dimensional subspace H n ⊂ H , with dim( H n ) = n for some n ∈ N , such that range( A ) ⊂ H n , range( B ) ⊂ H n , and range( C ) ⊂ H n . Let A n = A   H n : H n → H n , B n = B   H n : H n → H n , C n = C   H n : H n → H n . Then A n , B n , C n are linear operators on the ﬁnite-dimensional space H n and thus are represented by n × n matrices, which we denote by the same symbols. W e also have ( A + B ) n = ( A + B )   H n = A   H n + B   H n = A n + B n , ( A + C ) n = A n + C n , ( C + B ) n = B n + C n . Applying the ﬁnite-dimensional result in [16], we then obtain s log det( A n + B n 2 + I n ) p det( A n + I n ) det( B n + I n ) ≤ s log det( A n + C n 2 + I n ) p det( A n + I n ) det( C n + I n ) + s log det( C n + B n 2 + I n ) p det( C n + I n ) det( B n + I n ) . It is clear that the non-zero eigen values of A and A n are the same, so that det( A + I ) = det( A n + I n ) and the same holds true for the other operators. This gi ves us the ﬁnal result. Proof of Theorem 21 (T riangle inequality- square root of symmetric Stein divergence) . Let { A n } n ∈ N , { B n } n ∈ N , { C n } n ∈ N be sequences of ﬁnite-rank operators with || A n − A || tr → 0 , || B n − B || tr → 0 , || C n − C || tr → 0 , as n → ∞ . 56 By Lemma 16, we hav e s log det( A n + B n 2 + I ) p det( A n + I ) det( B n + I ) ≤ s log det( A n + C n 2 + I ) p det( A n + I ) det( C n + I ) + s log det( C n + B n 2 + I ) p det( C n + I ) det( B n + I ) . By Theorem 3.5 in [22], as n → ∞ , we have det( A n + I ) → det( A + I ) , det( B n + I ) → det( B + I ) , det( A n + B n 2 + I ) → det( A + B 2 + I ) , and the same holds true for the other operators. Thus by taking the limit as n → ∞ in the abov e triangle inequality for ( A n + I ) , ( B n + I ) and ( C n + I ) , we obtain the ﬁnal triangle inequality for ( A + I ) , ( B + I ) , and ( C + I ) . The following is the specialization of Theorem 4 when α = 1 / 2 . Theorem 27 ( Metric property - square root of symmetric Stein divergence ) . Let γ > 0 , γ ∈ R be ﬁxed. The square r oot of the inﬁnite-dimensional symmetric Stein diver gence q D (1 / 2 , 1 / 2) 1 [( A + γ I ) , ( B + γ I )] is a metric on PT r( H )( γ ) . Proof of Theorem 27 . W e hav e already sho wn the positivity and symmetry of D (1 / 2 , 1 / 2) 1 [( A + γ I ) , ( B + γ I )] . It remains for us to show the triangle inequality , namely q D (1 / 2 , 1 / 2) 1 [( A + γ I ) , ( B + γ I )] ≤ q D (1 / 2 , 1 / 2) 1 [( A + γ I ) , ( C + γ I )] + q D (1 / 2 , 1 / 2) 1 [( C + γ I ) , ( B + γ I )] , for any three operators ( A + γ I ) , ( B + γ I ) , ( C + γ I ) ∈ PT r( H ) . W e have D (1 / 2 , 1 / 2) 1 [( A + γ I ) , ( B + γ I )] = 4 log " det X ( ( A + γ I )+( B + γ I ) 2 ) det X ( A + γ I ) 1 / 2 det X ( B + γ I ) 1 / 2 # = 4 log " det( A + B 2 γ + I ) det( A γ + I ) 1 / 2 det( B γ + I ) 1 / 2 # . Thus the triangle inequality for D (1 / 2 , 1 / 2) 1 [( A + γ I ) , ( B + γ I )] follows that stated in Theorem 21. 57 Lemma 17. Let H be a separable Hilbert space. Let A, B : H → H be self-adjoint ﬁnite-rank oper ators, with maximum rank n , n ∈ N , such that A + I > 0 , B + I > 0 . Then n Y j =1  λ j ( A ) + λ j ( B ) 2 + 1  ≤ det  A + B 2 + I  . (A.27) Proof of Lemma 17 . Since A, B are both ﬁnite-rank operators, there exists a ﬁnite- dimensional subspace H n ⊂ H , with dim( H n ) = n , such that range( A ) ⊂ H n , range( B ) ⊂ H n . Let A n = A   H n : H n → H n , B n = B   H n : H n → H n . Then A n , B n are linear operators on the ﬁnite-dimensional space H n and thus are represented by n × n matrices, which we denote by the same symbols. W e also have ( A + B ) n = ( A + B )   H n = A   H n + B   H n = A n + B n . Thus we can apply the follo wing inequality for ﬁnite-dimensional SPD matrices ([23]) n Y j =1  λ j ( A n ) + λ j ( B n ) 2 + 1  = n Y j =1  λ j ( A n + I n ) + λ j ( B n + I n ) 2  ≤ det  A n + B n 2 + I n  . W e note that the non-zero eigen values of A n , B n are the same as those of A, B , respec- tiv ely , with the maximum number being n , and det( A + B 2 + I ) = det( A n + B n 2 + I n ) . T ogether with the previous inequality , this giv es us the ﬁnal result. Theorem 28. Let H be a separable Hilbert space. Let A, B : H → H be self-adjoint trace class oper ators, such that A + I > 0 , B + I > 0 . Then ∞ Y j =1  λ j ( A ) + λ j ( B ) 2 + 1  ≤ det  A + B 2 + I  . (A.28) Proof of Theorem 28 . Let A = P ∞ j =1 λ j ( A ) φ j ⊗ φ j denote the spectral decomposi- tion for A . For each n ∈ N , deﬁne A n = n X j =1 λ j ( A ) φ j ⊗ φ j . 58 Then A n is a ﬁnite-rank operator with the eigen values being the ﬁrst n eigen values of A and lim n →∞ || A n − A || tr = 0 . In the same way , we construct a sequence of ﬁnite-rank operators B n with lim n →∞ || B n − B || tr = 0 , so that lim n →∞ || ( A n + B n ) − ( A + B ) || tr = 0 . By Theorem 3.5 in [22], as n → ∞ , we then have lim n →∞ det  A n + B n 2 + I  = det  A + B 2 + I  . Applying Lemma 17 to A n and B n , we hav e n Y j =1  λ j ( A n ) + λ j ( B n ) 2 + 1  ≤ det  A n + B n 2 + I  . (A.29) The ﬁnal result is then obtained by taking the limit as n → ∞ , noting that the eigen- values of A n , B n , are precisely the ﬁrst n eigen v alues of A, B , respectively . The following is the specialization of Theorem 22 when α = 1 / 2 . Theorem 29. Let H be a separable Hilbert space. Let A, B : H → H be self-adjoint trace class operators, such that A + I > 0 , B + I > 0 . Let Eig( A ) , Eig( B ) : ` 2 → ` 2 be diagonal operators with the diagonals consisting of the eigen values of A and B , r espectively , in decreasing or der . Then D (1 / 2 , 1 / 2) 1 [(Eig( A ) + I ) , (Eig ( B ) + I )] ≤ D (1 / 2 , 1 / 2) 1 [( A + I ) , ( B + I )] . (A.30) Proof of Theorem 29 . By deﬁnition, we hav e D (1 / 2 , 1 / 2) 1 [(Eig( A ) + I ) , (Eig ( B ) + I )] = 4 log " det( Eig( A )+Eig( B ) 2 + I ) p det(Eig( A ) + I ) det(Eig ( B ) + I ) # = 4 log   Q ∞ j =1 h λ j ( A )+ λ j ( B ) 2 + 1 i p det( A + I ) det( B + I )   ≤ 4 log " det( A + B 2 + I ) p det( A + I ) det( B + I ) # by Theorem 28 = D (1 / 2 , 1 / 2) 1 [( A + I ) , ( B + I )] . This completes the proof. 59 Appendix A.8.2. The general case W e now consider the general case α > 0 . W e need the following results. In the following, let C p ( H ) denote the class of p th Schatten class operators on H , under the norm || || p , 1 ≤ p ≤ ∞ , which is deﬁned by || A || p = [ ∞ X k =1 λ p k ( A ∗ A ) 1 / 2 )] 1 /p , (A.31) with C 1 ( H ) being the space of trace class operators T r( H ) , C 2 ( H ) being the space of Hilbert-Schmidt operators HS( H ) , and C ∞ ( H ) being the set of compact operators under the operator norm || || . Theorem 30. Let r ∈ R be ﬁxed but arbitrary . Assume that 1 ≤ p ≤ ∞ . Let { A n } n ∈ N ∈ Sym( H ) ∩ C p ( H ) , A ∈ Sym( H ) ∩ C p ( H ) be such that I + A > 0 , I + A n > 0 ∀ n ∈ N . Assume that lim n →∞ || A n − A || p = 0 . Then lim n →∞ || ( I + A n ) r − ( I + A ) r || p = 0 . (A.32) Proof of Theorem 30 . (i) W e ﬁrst prove that lim n →∞ || ( I + A n ) r − ( I + A ) r || p = 0 , 0 ≤ r ≤ 1 . (A.33) The case r = 0 is trivial. Let us prove this for 0 < r ≤ 1 . For this limit, we make use of the follo wing result from [24] (Corollary 3.2), which states that for any tw o positi ve operators A, B on H such that A ≥ c > 0 , B ≥ c > 0 , and any operator X on H , || A r X − X B r || p ≤ r c r − 1 || AX − X B || p , (A.34) where 0 < r ≤ 1 and || || p , 1 ≤ p ≤ ∞ , denotes the Schatten p -norm. By the assumption I + A > 0 , there exists M A > 0 such that h x, ( I + A ) x i ≥ M A || x || 2 ∀ x ∈ H . By the assumption lim n →∞ || A n − A || p = 0 , for any  satisfying 0 <  < M A , there exists N = N (  ) ∈ N such that || A n − A || p <  ∀ n ≥ N . Then ∀ x ∈ H , |h x, ( A n − A ) x i| ≤ || A n − A || || x || 2 ≤ || A n − A || p || x || 2 ≤  || x || 2 . 60 It thus follows that ∀ x ∈ H , h x, ( I + A n ) x i = h x, ( I + A ) x i + h x, ( A n − A ) x i ≥ ( M A −  ) || x || 2 . Thus we have I + A ≥ M A > 0 , I + A n ≥ M A −  > 0 ∀ n ≥ N = N (  ) . Then, applying Eq. (A.34), we hav e for all n ≥ N , || ( I + A n ) r − ( I + A ) r || p ≤ r ( M A −  ) r − 1 || ( I + A n ) − ( I + A ) || p = r  1 M A −   1 − r || A n − A || p , which implies lim n →∞ || ( I + A n ) r − ( I + A ) r || p = 0 . This completes the proof of the ﬁrst limit. (ii) For r > 1 , we proceed by induction as follo ws. W e have || ( I + A n ) r − ( I + A ) r || p ≤ || ( I + A n ) r − ( I + A n )( I + A ) r − 1 || p + || ( I + A n )( I + A ) r − 1 − ( I + A ) r || p ≤ || I + A n || || ( I + A n ) r − 1 − ( I + A ) r − 1 || p + || A n − A || p || ( I + A ) r − 1 || . Thus this case follows from the case 0 ≤ r ≤ 1 by induction. (iii) W e now prov e that lim n →∞ || ( I + A n ) − 1 − ( I + A ) − 1 || p = 0 . (A.35) W e have ∀ n ≥ N = N (  ) , || ( I + A n ) − 1 − ( I + A ) − 1 || p = || ( I + A n ) − 1 [( I + A n ) − ( I + A )]( I + A ) − 1 || p ≤ || ( I + A n ) − 1 || || A n − A || p || ( I + A ) − 1 || ≤ 1 M A ( M A −  ) || A n − A || p , which implies that lim n →∞ || ( I + A n ) − 1 − ( I + A ) − 1 || p = 0 . (iii) W e next prove that lim n →∞ || ( I + A n ) − r − ( I + A ) − r || p = 0 , 0 < r ≤ 1 . (A.36) 61 W e have ( I + A ) − 1 ≥ 1 max { (1 + λ k ( A )) : k ∈ N } = 1 || I + A || > 0 . From the limit lim n →∞ || A n − A || = 0 , it follows that for any  satisfying 0 <  < || I + A || , there exists M = M (  ) ∈ N such that ∀ n ≥ M , || I + A || −  ≤ || I + A n || ≤ || I + A || + . It follows that ∀ n ≥ M , ( I + A n ) − 1 ≥ 1 max { (1 + λ k ( A n )) : k ∈ N } = 1 || I + A n || ≥ 1 || I + A || +  . Hence in voking Eq. (A.34) ag ain, we obtain ∀ n ≥ M || ( I + A n ) − r − ( I + A ) − r || p ≤ r ( || I + A || +  ) 1 − r || ( I + A n ) − 1 − ( I + A ) − 1 || p , which implies that lim n →∞ || ( I + A n ) − r − ( I + A ) − r || p = 0 by the previous limit, when r = 1 . (iv) By an induction ar gument as in step (ii), we then obtain that lim n →∞ || ( I + A n ) − r − ( I + A ) − r || p = 0 , ∀ r > 1 . (A.37) This completes the proof. Lemma 18. Let H be a separable Hilbert space. Assume that { A n } n ∈ N , A are trace class operators on H such that ( I + A ) > 0 , ( I + A n ) > 0 ∀ n ∈ N . Assume that || A n − A || tr = 0 as n → ∞ . Then A n ( I + A n ) − 1 and A ( I + A ) − 1 ar e trace class operators and lim n →∞ || A n ( I + A n ) − 1 − A ( I + A ) − 1 || tr = 0 . (A.38) Proof of Lemma 18 . It is obvious that, given that A n and A are trace class operators, both A n ( I + A n ) − 1 and A ( I + A ) − 1 are trace class operators. W e have || A n ( I + A n ) − 1 − A ( I + A ) − 1 || tr = || ( I + A n ) − 1 A n − A ( I + A ) − 1 || tr = || ( I + A n ) − 1 [ A n ( I + A ) − ( I + A n ) A ]( I + A ) − 1 || tr = || ( I + A n ) − 1 [ A n − A ]( I + A ) − 1 || tr ≤ || ( I + A n ) − 1 || || A n − A || tr || ( I + A ) − 1 || . 62 By the assumption I + A > 0 , there exists M A > 0 such that h x, ( I + A ) x i ≥ M A || x || 2 ∀ x ∈ H . By the assumption lim n →∞ || A n − A || tr = 0 , for any  satisfying 0 <  < M A , there exists N = N (  ) ∈ N such that || A n − A || tr <  ∀ n ≥ N . Then ∀ x ∈ H , |h x, ( A n − A ) x i| ≤ || A n − A || || x || 2 ≤ || A n − A || tr || x || 2 ≤  || x || 2 . It thus follows that ∀ x ∈ H , h x, ( I + A n ) x i = h x, ( I + A ) x i + h x, ( A n − A ) x i ≥ ( M A −  ) || x || 2 . Thus we have I + A ≥ M A > 0 , I + A n ≥ M A −  > 0 ∀ n ≥ N = N (  ) , from which it follows that || ( I + A n ) − 1 || ≤ 1 M A −  ∀ N ≥ N (  ) , || ( I + A ) − 1 || ≤ 1 M A . Combining this with the ﬁrst inequality , we hav e || A n ( I + A n ) − 1 − A ( I + A ) − 1 || tr ≤ 1 M A ( M A −  ) || A n − A || tr ∀ n ≥ N , which implies that lim n →∞ || A n ( I + A n ) − 1 − A ( I + A ) − 1 || tr = 0 . This completes the proof. Lemma 19. Let H be a separable Hilbert space. Let { A n } n ∈ N , A , { B n } n ∈ N , B , be self-adjoint, tr ace class operators on H , with lim n →∞ || A n − A || tr = 0 , lim n →∞ || B n − B || tr = 0 . Assume that I + A > 0 , I + B > 0 , I + A n > 0 , I + B n > 0 ∀ n ∈ N . Then ( I + B n ) − 1 / 2 ( I + A n )( I + B n ) − 1 / 2 − I and ( I + B ) − 1 / 2 ( I + A )( I + B ) − 1 / 2 − I ar e self-adjoint, trace class oper ators on H and lim n →∞ || ( I + B n ) − 1 / 2 ( I + A n )( I + B n ) − 1 / 2 − ( I + B ) − 1 / 2 ( I + A )( I + B ) − 1 / 2 || tr = 0 . (A.39) 63 Proof of Lemma 19 . W e write ( I + B n ) − 1 / 2 ( I + A n )( I + B n ) − 1 / 2 = I − B n ( I + B n ) − 1 − ( I + B n ) − 1 / 2 A n ( I + B n ) − 1 / 2 , ( I + B ) − 1 / 2 ( I + A )( I + B ) − 1 / 2 = I − B ( I + B ) − 1 − ( I + B ) − 1 / 2 A ( I + B ) − 1 / 2 . It follows immediately that [( I + B n ) − 1 / 2 ( I + A n )( I + B n ) − 1 / 2 − I ] and [( I + B ) − 1 / 2 ( I + A )( I + B ) − 1 / 2 − I ] are self-adjoint, trace class operators on H . By Lemma 18, we hav e lim n →∞ || B n ( I + B n ) − 1 − B ( I + B ) − 1 || tr = 0 . Consider next the dif ference between the third terms of the abov e two expressions || ( I + B n ) − 1 / 2 A n ( I + B n ) − 1 / 2 − ( I + B ) − 1 / 2 A ( I + B ) − 1 / 2 || tr ≤ || ( I + B n ) − 1 / 2 A n ( I + B n ) − 1 / 2 − ( I + B n ) − 1 / 2 A ( I + B n ) − 1 / 2 || tr + || ( I + B n ) − 1 / 2 A ( I + B n ) − 1 / 2 − ( I + B n ) − 1 / 2 A ( I + B ) − 1 / 2 || tr + || ( I + B n ) − 1 / 2 A ( I + B ) − 1 / 2 − ( I + B ) − 1 / 2 A ( I + B ) − 1 / 2 || tr . (A.40) By the assumption I + A > 0 , I + B > 0 , there exist constants M A > 0 , M B > 0 such that I + A ≥ M A , I + B ≥ M B . As in the proof of Lemma 18, since lim n →∞ || A n − A || = 0 , lim n →∞ || B n − B || = 0 , for any 0 <  < min { M A , M B } , there exist N A = N A (  ) ∈ N , N B = N B (  ) ∈ N , such that I + A n ≥ M A − , ∀ n ≥ N A , I + B n ≥ M B −  ∀ n ≥ N B . The ﬁrst term on the right hand side of the inequality in Eq. (A.40) is || ( I + B n ) − 1 / 2 ( A n − A )( I + B n ) − 1 / 2 || tr ≤ || A n − A || tr || ( I + B n ) − 1 / 2 || 2 ≤ 1 M B −  || A n − A || tr ∀ n ≥ N B . The second term is || ( I + B n ) − 1 / 2 A [( I + B n ) − 1 / 2 − ( I + B ) − 1 / 2 ] || tr ≤ || ( I + B n ) − 1 / 2 || || A || tr || [( I + B n ) − 1 / 2 − ( I + B ) − 1 / 2 ] || ≤ 1 √ M B −  || A || tr || [( I + B n ) − 1 / 2 − ( I + B ) − 1 / 2 ] || . 64 Similarly , for the third term, we hav e || ( I + B n ) − 1 / 2 A ( I + B ) − 1 / 2 − ( I + B ) − 1 / 2 A ( I + B ) − 1 / 2 || tr ≤ || A ( I + B ) − 1 / 2 || tr || [( I + B n ) − 1 / 2 − ( I + B ) − 1 / 2 ] || . By Theorem 30, we hav e || ( I + B n ) − 1 / 2 − ( I + B ) − 1 / 2 || ≤ || ( I + B n ) − 1 / 2 − ( I + B ) − 1 / 2 || tr → 0 as n → ∞ . The ﬁnal result is obtained by combining all of the abov e inequalities. Lemma 20. Let H be a separable Hilbert space. Let A, B , C : H → H be self-adjoint, ﬁnite-rank oper ators such that ( I + A ) > 0 , ( I + B ) > 0 , ( I + C ) > 0 . Then D ( α,α ) 2 α [( I + A ) , ( I + B )] ≤ D ( α,α ) 2 α [( I + A ) , ( I + C )] + D ( α,α ) 2 α [( I + C ) , ( I + B )] . (A.41) Proof of Lemma 20 . Since A, B , C are all ﬁnite-rank operators, there exists a ﬁnite- dimensional subspace H n ⊂ H , with dim( H n ) = n for some n ∈ N , such that range( A ) ⊂ H n , range( B ) ⊂ H n , and range( C ) ⊂ H n . Let A n = A   H n : H n → H n , B n = B   H n : H n → H n , C n = C   H n : H n → H n . Then A n , B n , C n are linear operators on the ﬁnite-dimensional space H n and thus are represented by n × n matrices, which we denote by the same symbols. W e have ( I + A n )( I + B n ) − 1 = ( I + A n )[ I − B n ( I + B n ) − 1 ] = I + A n − B n ( I + B n ) − 1 − A n B n ( I + B n ) − 1 , ( I + A )( I + B ) − 1 = I + A − B ( I + B ) − 1 − AB ( I + B ) − 1 , where A − B ( I + B ) − 1 − AB ( I + B ) − 1 is of ﬁnite rank, since both A and B are, with range in H n . It is clear that [ A − B ( I + B ) − 1 − AB ( I + B ) − 1 ]   H n = A n − B n ( I + B n ) − 1 − A n B n ( I + B n ) − 1 . Thus the nonzero eigen values of ( I + A )( I + B ) − 1 − I = [ A − B ( I + B ) − 1 − AB ( I + B ) − 1 ] and ( I + A n )( I + B n ) − 1 − I = [ A n − B n ( I + B n ) − 1 − A n B n ( I + B n ) − 1 ] 65 are the same. It follows that D ( α,α ) 2 α [( I + A ) , ( I + B )] = 1 α 2 log det  [( I + A )( I + B ) − 1 ] α + [( I + A )( I + B ) − 1 ] − α 2  = 1 α 2 log det  [( I + A n )( I + B n ) − 1 ] α + [( I + A n )( I + B n ) − 1 ] − α 2  = D ( α,α ) 2 α [( I + A n ) , ( I + B n )] . Similarly , we hav e D ( α,α ) 2 α [( I + A ) , ( I + C )] = D ( α,α ) 2 α [( I + A n ) , ( I + C n )] , D ( α,α ) 2 α [( I + C ) , ( I + B )] = D ( α,α ) 2 α [( I + C n ) , ( I + B n )] . Applying the triangle inequality from the ﬁnite-dimensional setting [15], we get D ( α,α ) 2 α [( I + A n ) , ( I + B n )] ≤ D ( α,α ) 2 α [( I + A n ) , ( I + C n )] + D ( α,α ) 2 α [( I + C n ) , ( I + B n )] . T ogether with the above expressions, this gi ves us the ﬁnal result. Proof of Theorem 19 (Con vergence in trace norm) . Let I + Λ = ( I + B ) − 1 / 2 ( I + A )( I + B ) − 1 / 2 and I + Λ n = ( I + B n ) − 1 / 2 ( I + A n )( I + B n ) − 1 / 2 , with Λ , Λ n ∈ Sym( H ) ∩ T r( H ) . By Lemma 19, we hav e lim n →∞ || Λ n − Λ || tr = 0 . Thus by Theorem 30, we hav e lim n →∞ || ( I + Λ n ) α − ( I + Λ) α || tr = 0 ∀ α ∈ R . By Deﬁnition 5, we hav e D ( α,α ) 2 α [( I + A n ) , ( I + B n )] = 1 α 2 log det  ( I + Λ n ) α + ( I + Λ n ) − α 2  . T aking limit as n → ∞ and applying the continuity of the Fredholm determinant in the trace norm (e.g. Theorem 3.5 in [22]), we obtain lim n →∞ D ( α,α ) 2 α [( I + A n ) , ( I + B n )] = 1 α 2 log det  ( I + Λ) α + ( I + Λ) − α 2  = D ( α,α ) 2 α [( I + A ) , ( I + B )] . 66 This completes the proof. Proof of Theorem 20 (T riangle inequality) . For a ﬁx ed γ > 0 , we have D ( α,α ) 2 α [( A + γ I ) , ( B + γ I )] = 1 α 2 log det X  [( A + γ I )( B + γ I ) − 1 ] α + ( A + γ I )( B + γ I ) − 1 ] − α 2  = 1 α 2 logdet [( A γ + I )( B γ + I ) − 1 ] α + ( A γ + I )( B γ + I ) − 1 ] − α 2 ! , which thus reduces to the case γ = 1 . Thus it sufﬁces for us to prov e in triangle inequality for γ = 1 . Let { A n } n ∈ N , { B n } n ∈ N , and { C n } n ∈ N be sequences of ﬁnite-rank operators such that lim n →∞ || A n − A || tr = 0 , lim n →∞ || B n − B || tr = 0 , lim n →∞ || C n − C || tr = 0 . By Lemma 20, we hav e the triangle inequality q D ( α,α ) 2 α [( I + A n ) , ( I + B n )] ≤ q D ( α,α ) 2 α [( I + A n ) , ( I + C n )] + q D ( α,α ) 2 α [( I + C n ) , ( I + B n )] . T aking limits on both side as n → ∞ and in voking Theorem 19, we then obtain q D ( α,α ) 2 α [( I + A ) , ( I + B )] ≤ q D ( α,α ) 2 α [( I + A ) , ( I + C )] + q D ( α,α ) 2 α [( I + C ) , ( I + B )] . This completes the proof of the theorem. Proof of Theorem 4 (Metric property) . The case α = 0 corresponds to the afﬁne- in v ariant Riemannian distance on the Hilbert manifold Σ( H ) [18], which is still a metric when restricted to PT r( H ) . Consider the case α > 0 . The positi vity and symmetry of the diver gence D ( α,α ) 2 α [( A + γ I ) , ( B + γ I )] are from Theorems 1 and 13, respectiv ely . The triangle inequality for q D ( α,α ) 2 α [( A + γ I ) , ( B + γ I )] is from Theorem 20. Thus q D ( α,α ) 2 α [( A + γ I ) , ( B + γ I )] is a metric on PT r( H )( γ ) . 67 Proof of Theorem 22 (Diagonalization) . Consider ﬁrst the case α > 0 . As in the proof of Theorem 20, it sufﬁces for us to prove this theorem for the case γ = 1 . Let A = P ∞ j =1 λ j ( A ) φ j ⊗ φ j denote the spectral decomposition for A . For each n ∈ N , deﬁne A n = n X j =1 λ j ( A ) φ j ⊗ φ j . Then A n is a ﬁnite-rank operator with the eigen values being the ﬁrst n eigen values of A and lim n →∞ || A n − A || tr = 0 . In the same way , we construct a sequence of ﬁnite-rank operators B n with lim n →∞ || B n − B || tr = 0 . By construction, we also have lim n →∞ || Eig( A n ) − Eig ( A ) || tr = 0 , lim n →∞ || Eig( B n ) − Eig ( B ) || tr = 0 . Thus by Theorem 19, we hav e lim n →∞ D ( α,α ) 2 α [(Eig( A n ) + I ) , (Eig( B n ) + I )] = D ( α,α ) 2 α [(Eig( A ) + I ) , (Eig ( B ) + I )] , lim n →∞ D ( α,α ) 2 α [( A n + I ) , ( B n + I )] = D ( α,α ) 2 α [( A + I ) , ( B + I )] . Since A n , B n can be identiﬁed with ﬁnite-dimensional matrices, as in the proof of Lemma 16, we can apply the corresponding ﬁnite-dimensional result in [15] to obtain D ( α,α ) 2 α [(Eig( A n ) + I ) , (Eig( B n ) + I )] ≤ D ( α,α ) 2 α [( A n + I ) , ( B n + I )] . Thus taking limits as n → ∞ giv es D ( α,α ) 2 α [(Eig( A ) + I ) , (Eig ( B ) + I )] ≤ D ( α,α ) 2 α [( A + I ) , ( B + I )] . Letting α → 0 on both sides of the above expression, we also obtain the result for the case α = 0 . This completes the proof of the theorem. References [1] G. Mosto w , Some ne w decomposition theorems for semi-simple groups, Memoirs of the American Mathematical Society 14 (1955) 31–54. [2] J. D. Lawson, Y . Lim, The geometric mean, matrices, metrics, and more, The American Mathematical Monthly 108 (9) (2001) 797–812. 68 [3] R. Bhatia, Positiv e Deﬁnite Matrices, Princeton Uni versity Press, 2007. [4] V . Arsigny , P . Fillard, X. Pennec, N. A yache, Geometric means in a novel vector space structure on symmetric positiv e-deﬁnite matrices, SIAM J. on Matrix An. and App. 29 (1) (2007) 328–347. [5] X. Pennec, P . Fillard, N. A yache, A Riemannian frame work for tensor computing, International Journal of Computer V ision 66 (1) (2006) 41–66. [6] O. Tuzel, F . Porikli, P . Meer, Pedestrian detection via classiﬁcation on Rieman- nian manifolds, IEEE T ransactions on P attern Analysis and Machine Intelligence 30 (10) (2008) 1713–1727. [7] B. Kulis, M. A. Sustik, I. S. Dhillon, Low-rank kernel learning with Bregman matrix div ergences, The Journal of Machine Learning Research 10 (2009) 341– 376. [8] A. Cherian, S. Sra, A. Banerjee, N. Papanikolopoulos, Jensen-Bregman LogDet div ergence with application to ef ﬁcient similarity search for cov ariance matrices, IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (9) (2013) 2161–2174. [9] S. Jayasumana, R. Hartley , M. Salzmann, H. Li, M. Harandi, Kernel methods on the Riemannian manifold of symmetric positi ve deﬁnite matrices, in: IEEE Conference on Computer V ision and Pattern Recognition (CVPR), 2013, pp. 73– 80. [10] P . Formont, J.-P . Ovarlez, F . Pascal, On the use of matrix information geome- try for polarimetric SAR image classiﬁcation, in: Matrix Information Geometry , Springer , 2013, pp. 257–276. [11] F . Barbaresco, Information geometry of cov ariance matrix: Cartan-Siegel homo- geneous bounded domains, Mosto w/Berger ﬁbration and Frechet median, in: Ma- trix Information Geometry , Springer , 2013, pp. 199–255. 69 [12] D. A. Bini, B. Iannazzo, Computing the Karcher mean of symmetric positive deﬁnite matrices, Linear Algebra and its Applications 438 (4) (2013) 1700–1710. [13] P . Li, Q. W ang, W . Zuo, L. Zhang, Log-Euclidean kernels for sparse represen- tation and dictionary learning, in: International Conference on Computer V ision (ICCV), 2013, pp. 1601 – 1608. [14] Z. Chebbi, M. Moakher, Means of Hermitian positive-deﬁnite matrices based on the log-determinant α -diver gence function, Linear Algebra and its Applications 436 (7) (2012) 1872–1889. [15] A. Cichocki, S. Cruces, S. Amari, Log-Determinant diver gences re visited: Alpha- Beta and Gamma Log-Det div ergences, Entrop y 17 (5) (2015) 2988–3034. [16] S. Sra, A new metric on the manifold of kernel matrices with application to ma- trix geometric means, in: Advances in Neural Information Processing Systems (NIPS), 2012, pp. 144–152. [17] H. Minh, Inﬁnite-dimensional Log-Determinant div ergences between positive deﬁnite trace class operators, Linear Algebra and Its Applications (In Press) (2016) http://dx.doi.org/10.1016/j.laa.2016.09.018. [18] G. Larotonda, Nonpositi ve curvature: A geometrical approach to Hilbert-Schmidt operators, Differential Geometry and its Applications 25 (2007) 679–700. [19] H. Minh, M. San Biagio, V . Murino, Log-Hilbert-Schmidt metric between pos- itiv e deﬁnite operators on Hilbert spaces, in: Advances in Neural Information Processing Systems (NIPS), 2014, pp. 388–396. [20] H. Q. Minh, Af ﬁne-in variant Riemannian distance between inﬁnite-dimensional cov ariance operators, in: Geometric Science of Information, 2015, pp. 30–38. [21] K. Fan, On a theorem of Weyl concerning eigen v alues of linear transformations: II, Proceedings of the National Academy of Sciences of the United States of America 36 (1) (1950) 31. 70 [22] B. Simon, Notes on inﬁnite determinants of Hilbert space operators, Adv ances in Mathematics 24 (1977) 244–273. [23] R. Bhatia, Matrix analysis, V ol. 169, Springer Science & Business Media, 2013. [24] F . Kittaneh, H. Kosaki, Inequalities for the Schatten p-norm V, Publications of the Research Institute for Mathematical Sciences 23 (2) (1987) 433–443. 71

Infinite-dimensional Log-Determinant divergences II: Alpha-Beta divergences

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment