Isometric Invariant Quantification of Gaussian Divergence over Poincare Disc

The paper presents a geometric duality between the spherical squared-Hellinger distance and a hyperbolic isometric invariant of the Poincare disc under the action of the general Mobius group. Motivated by the geometric connection, we propose the usag…

Authors: Levent Ali Mengütürk

Isometric In v arian t Quan tification of Gaussian Div ergence o v er P oincar ´ e Disc Lev en t Ali Meng ¨ ut ¨ urk ∗ Abstract The pap er presents a geometric duality b et ween the spherical squared-Hellinger distance and a hyperb olic isometric inv arian t of the Poincar ´ e disc under the action of the general M¨ obius group. Motiv ated b y the geometric connection, w e prop ose the usage of the L 2 -em b edded h yp erb olic isometric in v ariant as an alternative wa y to quantify divergence b et ween Gaussian measures as a con tribution to information theory . Keyw ords: Measure divergence, hyperb olic geometry , M¨ obius transformation 1 In tro duction The ob jective of this w ork is to present a geometric relationship b etw een the spherical squared- Hellinger distance – commonly used in information theory , machine learning and artificial in tel- ligence – and a hyperb olic isometric in v ariant of the P oincar ´ e disc under the action of the general M¨ obius group. This spherical-h yp erb olic dualit y motiv ates us to prop ose the L 2 -em b edded h yp erb olic isometric inv ariant as an alternativ e w ay to quan tify div ergence betw een t wo Gaussian probabilit y measures. W e shall k eep this note reasonably concise to w ards communicating a geometric connection as a con tribution to information theory . T o the b est of our knowledge, the most relev ant observ ation w as drafted in [7], whic h we shall formalise, deep en and extend hereafter. F or the rest of this pap er, consider a measurable space (Ω , F ) endo w ed with probability measures P and Q . F or our purposes, P and Q are Gaussian measures hereafter. Using the language of measure theory , the squared-Hellinger distance b et ween P and Q (see [5]), whic h we shall denote as Φ( P , Q ), is giv en by the following: Φ( P , Q ) = 1 2 Z Ω  p d P ( ω ) − p d Q ( ω )  2 . (1) W e are now in position to presen t our main statemen t before pro viding the details in the next section. Prop osal 1.1. Define Ψ( P , Q ) as fol lows: Ψ( P , Q ) = 2 R Ω (d P ( ω ) − d Q ( ω )) 2  1 − R Ω d P ( ω ) 2   1 − R Ω d Q ( ω ) 2  . (2) Then, Φ in (1) and Ψ in (2) form a spheric al-hyp erb olic dual in a ge ometric sense. In the next section, w e shall unpack Prop osal 1.1 in detail and provide the mathematical argu- men ts leading to it, whic h will clarify the statemen t for the reader. W e shall also pro vide a closed-form solution to (2) in order to enable its implementation more broadly in machine learning and artificial in telligence applications. The closed-form equation will also allow us to quantify a distributional distance b etw een t wo Gaussian pro cesses as they progress in time. Finally , we shall provide the parametric asymptotics of (2) across relev ant limits, which shed further light on how it b eha ves. ∗ Univ ersit y College London, Departmen t of Mathematics, l.menguturk@ucl.ac.uk, Artificial In telligence and Math- ematics Research Lab, lev ent@aimresearc hlab.com 1 2 Geometric Dualit y F or the rest of this pap er, let P and Q b e measures on R , where p : R → R + and q : R → R + are the corresp onding Gaussian probabilit y densit y functions. The choice of the state-space R here is for parsimony and can b e generalized to R n for n ≥ 1. W e identify the parameters of p and q as Θ p = { µ p , σ 2 p } and Θ q = { µ q , σ 2 q } , resp ectively , where µ p , µ q ∈ ( −∞ , ∞ ) and σ p , σ q ∈ (0 , ∞ ). Denoting || . || L 2 as the L 2 -norm and using (1), w e hav e the following: Φ( P , Q ) = 1 2 || √ p − √ q || 2 L 2 . (3) Since p and q are non-negativ e functions and || √ p || L 2 = || √ q || L 2 = 1, b oth √ p and √ q determine p oin ts on the p ositive-orthan t of the unit-sphere S + ⊂ L 2 . Remark 2.1. Note that ( S , ⟨ . ⟩ ) is a Riemannian manifold with c onstant curvatur e K = +1 , given that the inner pr o duct ⟨ . ⟩ on L 2 is the Riemannian metric on S . The geo desics b et ween any t wo p oints on S are the great-circles, and accordingly , an angle using the L 2 -inner pro duct can b e defined via cos ( d S ( P , Q )) =  √ p, √ q   √ p, √ p  1 2  √ q , √ q  1 2 = Z R p p ( x ) p q ( x )d x = 1 − 1 2 Z R  p p ( x ) − p q ( x )  2 d x, where d S ( P , Q ) is the Bhattacharyy a angle b et ween P and Q – the angle from the cen ter of S subtended to the endp oin ts on S + – which is equiv alent to the spherical distance on S + with v alues in [0 , π / 2]. Therefore, using (3), the map Φ can b e rewritten in terms of the cosine of the spherical distance on S + as follo ws: Φ( P , Q ) = 1 − cos ( d S ( P , Q )) . (4) In information geometry , it is w ell-kno wn that the parameter space of Gaussian measures forms a 2-dimensional differentiable manifold lo cally diffeomorphic to R 2 (see, for example, [2]), whic h is a h yp erb olic space with constan t curv ature K = − 1 / 2. If this manifold, which w e shall denote as M , is endo wed with the Fisher information metric, then M b ecomes a Riemannian manifold. Accordingly , it is p ossible to define a distance b et ween P and Q via this metric by integrating the infinitesimal line element along the geo desic connecting the corresp onding p oints on M – determined b y the parameters of p and q , resp ectiv ely – giv en by d M ( P , Q ) = √ 2     log  1 + ζ ( p, q ) 1 − ζ ( p, q )      = 2 √ 2 tanh − 1 ( ζ ( p, q )) , (5) where ζ is a function of the parameters of p and q : ζ p,q =  ( µ q − µ p ) 2 + 2( σ q − σ p ) 2 ( µ q − µ p ) 2 + 2( σ q + σ p ) 2  1 2 . (6) The expression in (5) holds when µ q  = µ p and σ q  = σ p , and the metric tak es alternative forms when µ q = µ p or σ q = σ p (see [4]), whic h we omit from this note. One can adjust the lengths on M measured in an alternativ e unit R = − √ − K /K = √ 2, whic h is analogous to the radian on S . This geometrical p ersp ective on Gaussian measures is what we shall use to establish a connection b et ween the squared-Hellinger distance and a hyperb olic isometric inv arian t of the Poincar ´ e disc under the action of the general M¨ obius group. First, recall that the general M¨ obius group forms a group of transformations of the Riemannian sphere C = C ∪ {∞} (a topological construction, namely one-p oin t compactification), where geometric quantities (e.g. h yp erb olic lengths and hyperb olic 2 angles) are in v ariant under its action. More sp ecifically , a M¨ obius transformation is a holomorphic function η ∗ : C → C that satisfies the functional form: η ∗ ( z ) = ( az + b ) / ( cz + d ) , where a, b, c, d ∈ C and ad − bc  = 0. The general M¨ obius group, which w e shall denote as M¨ ob, is generated b y the set of M¨ obius transformations and the set of complex conjugations, suc h that η ∈ M¨ ob is the composition given by η = C ◦ η ∗ k ◦ . . . ◦ C ◦ η ∗ 1 , for some k ≥ 1, where each η ∗ j is a M¨ obius transformation, C ( z ) = z for z ∈ C and C ( ∞ ) = ∞ . W e highligh t that C is a homeomorphism of C , and M¨ ob is equal to the set of homeomorphisms of C that tak e circles in C to circles in C . In fact, it can b e sho wn that elements of M¨ ob are conformal homeomorphisms of C , and M¨ ob( W ) = { η ∈ M¨ ob | η ( W ) = W } , is equal to the group of isometries of the hyperb olic space W = { z ∈ C : ℑ ( z ) > 0 } , (7) giv en that ℑ ( z ) is the imaginary part of z . Remark 2.2. The sp ac e W has c onstant curvatur e K = − 1 . Using Poincar ´ e uniformization theorem, the metric on M can b e transformed to the metric on W , since M and W are conformally equiv alent. This can b e ac hieved by adjusting the metric on M via m ultiplying it with the positive constant 1 /R = 1 / √ 2. Then, using (5), the distance b et ween the p oin ts determined by p and q mapped on W is given b y d W ( P , Q ) = 2 tanh − 1 ( ζ ( p, q )) , where the function ζ is defined as in (6). On the other hand, the h yp erb olic space of P oincar´ e disc mo del is the unit disc D in the complex plane C such that D = { z ∗ ∈ C : || z ∗ || < 1 } , (8) where || . || is the Euclidean norm. Since W and D are b oth in C , it is p ossible to identify a family of η ∈ M¨ ob, suc h that η : W → D . Essentially , M¨ ob allows to use W in (7) and D in (8) in terc hangeably when mo delling hyperb olic spaces. In particular, if z and κ are points in W and z ∗ is a p oin t in D , z ∗ = e iθ z − κ z − κ , (9) is a M¨ obius transformation conformally maping W to D , where κ ∈ W is an arbitrary p oint mapp ed to the cen ter of the disk D , θ rotates the disk and z ∗ ∈ D is the corresp onding p oint of z ∈ W . Accordingly , using M¨ ob, it is p ossible to compute distances b et w een t w o p oin ts on D starting from distances on W , or conformally map the p oin ts on W to p oin ts on D using (9), and calculate the distances on D directly – hence, M¨ ob allows for d W ( P , Q ) 7→ d D ( P , Q ) . Therefore, there is a natural wa y to characterise Gaussian measures and their distances on the P oincar´ e disc D starting from M follo wed b y W , using M¨ ob. W e are no w in the p osition to highlight an isometric in v ariant of D under the action of M¨ ob( D ) = { η ∈ M¨ ob | η ( D ) = D } , whic h we denote b y Ψ, as follows: Ψ( x, y ) = 2 || x − y || 2 (1 − || x || 2 )(1 − || y || 2 ) , (10) 3 for || x || < 1 and || y || < 1, where x and y are p oints on D . It can b e shown that the h yp erb olic isometric in v arian t Ψ c haracterises the distance d D on D (see [1]). Th us, if we choose x and y in (10) as p oints determined b y P and Q conformally mapp ed to D from W , we can write the following represen tation: Ψ( P , Q ) = cosh ( d D ( P , Q )) − 1 . (11) Remark 2.3 below is imp ortant for our purp oses as a leading observ ation to wards Prop osal 1.1 given in the previous section. Remark 2.3. Note that the spheric al squar e d-Hel linger distanc e Φ in (4) is closely r elate d to the hyp erb olic isometric invariant Ψ in (11) fr om a ge ometric stanc e. W e shall further highlight the following c haracteristics building on Remark 2.3: 1. The curv ature of S and that of D are opp osite in sign with K = +1 for S and K = − 1 for D , whic h is reflected in the functional forms in (4) and (11), resp ectively . 2. The spherical cosine function on the spherical space S is replaced b y the h yp erbolic cosine function on the h yp erb olic space D . 3. Since d D is a metric, w e must ha ve d D ( P , Q ) ≥ 0, whic h means Ψ( P , Q ) ≥ 0 must hold having cosh(0) = 1 and cosh( r ) monotonically increasing in r ∈ R + – in particular, Ψ = 0 when P = Q , and Ψ is strictly p ositiv e otherwise. 4. Ψ is symmetric in P and Q such that Ψ( P , Q ) = Ψ( Q , P ) – note that Φ is also symmetric in P and Q suc h that Φ( P , Q ) = Φ( Q , P ). Accordingly , Ψ in (11) hosts desirable properties as a div ergence metric, also shared by the spherical squared-Hellinger distance Φ. Mo ving one step further, it is p ossible to construct a natural em b edding of D into a Hilbert space, e.g. in to L 2 – natural embedding here refers to an em b edding where an isometry on D can b e associated to an isometry on a giv en Hilbert space. Accordingly , extending (10) in the L 2 -sense, and using the language of measure theory as in (1), we propose an L 2 -em b edded Ψ (while k eeping notations the same) as follows: Ψ( P , Q ) = 2 || p − q || 2 L 2 (1 − || p || 2 L 2 )(1 − || q || 2 L 2 ) = 2 R Ω (d P ( ω ) − d Q ( ω )) 2  1 − R Ω d P ( ω ) 2   1 − R Ω d Q ( ω ) 2  , (12) for || p || L 2 < 1 and || q || L 2 < 1. Remark 2.4 b elo w is a more technical v ersion of Remark 2.3 that leads to our statemen t in Proposal 1.1. Remark 2.4. In the Gaussian setting, Φ in (4) and Ψ in (11) form a spheric al-hyp erb olic dual in the afor ementione d ge ometric sense when || p || L 2 < 1 and || q || L 2 < 1 . (13) W e are no w in p osition to presen t the closed-form solution for the h yp erb olic dual of the spherical squared-Hellinger distance w e prop ose in (12). Prop osition 2.5. The diver genc e (12) on R is given by Ψ( P , Q ) = ( σ p √ π ) − 1 − 2 √ 2 λ (Θ p , Θ q ) + ( σ q √ π ) − 1 (1 − (2 σ p √ π ) − 1 )(1 − (2 σ q √ π ) − 1 ) , (14) wher e λ ( . ) in (14) is define d as fol lows: λ (Θ p , Θ q ) = exp  − ( µ p − µ q ) 2 2( σ 2 p + σ 2 q )   π ( σ 2 p + σ 2 q )  − 1 2 . (15) Henc e, the c onditions in (13) materialize when || p || L 2 < 1 ⇐ ⇒ σ p > 1 2 √ π and || q || L 2 < 1 ⇐ ⇒ σ q > 1 2 √ π . (16) 4 Pr o of. The pro of follows from pro ducts of Gaussian densities on R as they app ear in (12). More sp ecifically , w e ha ve the following pro duct: p ( x ) q ( x ) = 1 2 π σ p σ q exp  −  ( x − µ p ) 2 2 σ 2 p + ( x − µ q ) 2 2 σ 2 q  = h pq √ 2 π σ pq exp  − ( x − µ pq ) 2 2 σ 2 pq  , (17) for an y x ∈ R , where the new terms in (17) are defined as follows: µ pq = µ p σ 2 q + µ q σ 2 p σ 2 p + σ 2 q , σ pq = s σ 2 p σ 2 q σ 2 p + σ 2 q , h pq = 1 q 2 π ( σ 2 p + σ 2 q ) exp  − ( µ p − µ q ) 2 2( σ 2 p + σ 2 q )  . (18) Therefore, w e hav e the following: 4 Z Ω d P ( ω )d Q ( ω ) = 4 h pq √ 2 π σ pq Z R exp  − ( x − µ pq ) 2 2 σ 2 pq  d x = 4 h pq = 2 √ 2 λ (Θ p , Θ q ) where λ ( . ) is as giv en in (15). Accordingly , w e also ha ve the following pro ducts integrals: Z Ω d P ( ω ) 2 = h pp √ 2 π σ pp Z R exp  − ( x − µ pp ) 2 2 σ 2 pp  d x = h pp = 1 2 σ p √ π Z Ω d Q ( ω ) 2 = h q q √ 2 π σ q q Z R exp  − ( x − µ q q ) 2 2 σ 2 q q  d x = h q q = 1 2 σ q √ π , and the expression given in (14) follo ws. Finally , the statement in (16) follo ws from the denominator of (14) for || p || L 2 < 1 and || q || L 2 < 1. As an example, the closed-form expression (14) can b e used to compare an y tw o R -v alued Gaussian pro cesses on a distributional basis after they reac h a time-p oin t where (16) is satisfied. As a canonical family , we can analyze t w o R -v alued Brownian motions as p er b elo w. Example 2.6. L et { W ( p ) t } t ≥ 0 b e a P -Br ownian motion with E P [ W ( p ) t ] = tµ p and V ar P [ W ( p ) t ] = tσ 2 p . A lso, let { W ( q ) t } t ≥ 0 b e a Q -Br ownian motion with E Q [ W ( q ) t ] = tµ q and V ar Q [ W ( q ) t ] = tσ 2 q . Then, their diver genc e in terms of Ψ is Ψ( P , Q ) = ( σ p √ tπ ) − 1 − 2 √ 2 λ t (Θ p , Θ q ) + ( σ q √ tπ ) − 1 (1 − (2 σ p √ tπ ) − 1 )(1 − (2 σ q √ tπ ) − 1 ) for t > max  1 4 σ 2 p π , 1 4 σ 2 q π  , (19) given that λ t (Θ p , Θ q ) = exp  − t ( µ p − µ q ) 2 2( σ 2 p + σ 2 q )   tπ ( σ 2 p + σ 2 q )  − 1 2 , Using the closed-form expression (14), we can also study the asymptotic b eha viour of Ψ( P , Q ) across relev ant parametric limits, for which we omit the pro of since it follo ws directly from (14). Corollary 2.7. The fol lowing limits hold: Ψ( P , Q ) → 2 σ p √ π 2 σ 2 p π − σ p √ π as σ q → ∞ and Ψ( P , Q ) → 2 σ q √ π 2 σ 2 q π − σ q √ π as σ p → ∞ (20) for any µ p , µ q . In addition, the double-limit satisfies Ψ( P , Q ) → 0 as σ p , σ q → ∞ (21) for any µ p , µ q . It can b e seen from (20)-(21) that the mean parameters µ p and µ q lose their impact on Ψ asymp- totically as one or b oth of the v ariance parameters div erge – i.e. if an y of the Gaussian measures hav e a significantly large v ariance, then the mean parameter is decreasingly important in distinguishing the difference b et ween those measures. F rom (21) we further see that tw o Gaussian measures with significan tly large v ariances are less distinguishable from each other. 5 3 Conclusion Giv en the hyperb olic geometric connection of (12) to the spherical squared-Hellinger distance, we are encouraged to communicate form ula (14) as an alternative measure of divergence b et ween Gaussian v ariables in agreemen t with our Prop osal 1.1. W e refer the reader to [3, 6] for applications of div ergence measures in v ariational Ba y esian inference. W e also refer to [8] for quan tum counterparts of the squared-Hellinger distance within quan tum information theory , which lend themselv es for future researc h in connection with the spherical-hyperb olic duality we presen ted in this paper. App endix The closed-form expression in (14) can b e generalised to m ultiv ariate Gaussian measures. Accord- ingly , let P and Q b e Gaussian measures on R n , where p : R n → R + and q : R n → R + are the corresp onding Gaussian probabilit y densit y functions for some n ≥ 1. Denote the parameters of p and q as Θ p = { µ p , Σ p } and Θ q = { µ q , Σ q } , resp ectiv ely . Hence, w e ha ve the densit y expressions p ( x ) = 1 (2 π ) n 2 p | Σ p | exp  − 1 2 ( x − µ p ) ⊤ Σ − 1 p ( x − µ p )  = exp  Λ p + ˆ µ ⊤ p x − 1 2 x ⊤ Σ − 1 p x  and q ( x ) = 1 (2 π ) n 2 p | Σ q | exp  − 1 2 ( x − µ q ) ⊤ Σ − 1 q ( x − µ q )  = exp  Λ q + ˆ µ ⊤ q x − 1 2 x ⊤ Σ − 1 q x  , giv en that ˆ µ i = Σ − 1 i µ i Λ i = − 1 2  n log(2 π ) − log( | Σ − 1 i | ) + ˆ µ ⊤ i Σ i ˆ µ i  , for i ∈ { p, q } . Therefore, we can write the following: p ( x ) q ( x ) = exp  Λ p + Λ q +  ˆ µ ⊤ p + ˆ µ ⊤ q  x − 1 2 x ⊤  Σ − 1 p + Σ − 1 q  x  = exp  Λ p + Λ q + ˆ µ ⊤ pq x − 1 2 x ⊤ Σ − 1 pq x  = exp ( Λ p + Λ q − Λ pq ) exp  Λ pq + ˆ µ ⊤ pq x − 1 2 x ⊤ Σ − 1 pq x  where w e hav e Λ pq = − 1 2  n log(2 π ) − log( | Σ − 1 pq | ) + ˆ µ ⊤ pq Σ pq ˆ µ pq  . It follo ws that Z Ω d P ( ω )d Q ( ω ) = exp ( Λ p + Λ q − Λ pq ) , Z Ω d P ( ω ) 2 = exp (2 Λ p − Λ pp ) , Z Ω d Q ( ω ) 2 = exp (2 Λ q − Λ q q ) . Finally , as we insert these terms into (12), w e get the following exp ession: Ψ( P , Q ) = 2 (exp (2 Λ p − Λ pp ) − 2 exp ( Λ p + Λ q − Λ pq ) + exp (2 Λ q − Λ q q )) (1 − exp (2 Λ p − Λ pp )) (1 − exp (2 Λ q − Λ q q )) , (22) as a multiv ariate generalisation to (14) in Prop osition 2.5. The expression in (22) can b e further simplified, whic h we leav e to the interested reader. 6 References [1] Anderson, J.W., Hyp erb olic Ge ometry . Springer-V erlag (2005) [2] Arwini, K., Do dson, C.T.J., Information Ge ometry, Ne ar R andomness and Ne ar Indep endenc e , Springer-V erlag, Berlin (2008) [3] Bishop, C. M., Pattern R e c o gnition and Machine L e arning , Springer New Y ork, NY (2006) [4] Burb ea, J., Rao, C. R., Entr opy Differ ential Metric, Distanc e and Diver genc e Me asur es in Pr ob ability Sp ac es: A Unifie d Appr o ach , Journal of Multiv ariate Analysis 12 (1982) [5] Jaco d, J., Shiryaev, A. N., Limit The or ems for Sto chastic Pr o c esses , Springer Berlin, Heidelb erg (2003) [6] Rasm ussen, C. E., Williams, C. K. I., Gaussian Pr o c esses for Machine L e arning , The MIT Press (2005) [7] Meng ¨ ut¨ urk L.A., Gaussian R andom Bridges and a Ge ometric Mo del for Information Equilib- rium , Ph ysica A: Statistical Mec hanics and its Applications 494 (2018) [8] Pitrik, J., Virosztek, D., Quantum Hel linger Distanc es R evisite d . Letters in Mathematical Ph ysics 110 (2020) 7

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment