A non-negative expansion for small Jensen-Shannon Divergences

A non-negativ e expansion for small Jensen-Shannon Div ergences Anil Ra j Dep artment of Applie d Physics and Applie d Mathematics Columbia University, New Y ork ∗ Chris H. Wiggins Dep artment of Applie d Physics and Applie d Mathematics Center for Computational Biolo gy and Bioinformatics Columbia University, New Y ork † (Dated: Octob er 13, 2021) In this report, we deriv e a non-negativ e series expansion for the Jensen-Shannon div ergence (JSD) b et ween t wo probabilit y distributions. This series expansion is shown to b e useful for numerical calculations of the JSD, when the probabilit y distributions are nearly equal, and for whic h, conse- quen tly , small numerical errors dominate ev aluation. Keywords: entrop y , JS divergence I. INTR ODUCTION The Jensen-Shannon div ergence (JSD) has b een widely used as a dissimilarit y measure betw een w eigh ted probability distributions. The direct numerical ev aluation of the exact expression for the JSD (in volving diﬀerence of logarithms), ho wev er, leads to numerical errors when the distributions are close to each other (small JSD); when the element-wise diﬀerence b etw een the distributions is O (10 − 1 ), this naiv e formula pro duces erroneous v alues (sometimes negativ e) when used for n umerical calculations. In this rep ort, w e deriv e a prov ably non-negativ e series expansion for the JSD whic h can b e used in the small JSD limit, where the naiv e form ula fails. I I. SERIES EXP ANSION FOR JENSEN-SHANNON DIVERGENCE Consider tw o discrete probabilit y distributions p 1 and p 2 o ver a sample space S of cardinality N with relativ e normalized weigh ts π 1 and π 2 b et ween them. The JSD b etw een the distributions is then deﬁned as [1] ∆ naiv e [ p 1 , p 2 ; π 1 , π 2 ] = H [ π 1 p 1 + π 2 p 2 ] − ( π 1 H [ p 1 ] + π 2 H [ p 2 ]) (1) where the entrop y (measured in nats) of a probabilit y distribution is deﬁned as H [ p ] = − N X j =1 h ( p j ) = − N X j =1 p j log( p j ) . (2) Deﬁning ¯ p j = ( p 1 j + p 2 j ) / 2 ; 0 6 ¯ p j 6 1 ; P N j =1 ¯ p j = 1 η j = ( p 1 j − p 2 j ) / 2 ; P N j =1 η j = 0 ε j = ( η j ) / ¯ p j ; − 1 6 ε j 6 1 α = π 1 − π 2 ; − 1 6 α 6 1 (3) w e ha ve h ( π 1 p 1 j + π 2 p 2 j ) = − ( π 1 ( ¯ p j + η j ) + π 2 ( ¯ p j − η j )) log( π 1 ( ¯ p j + η j ) + π 2 ( ¯ p j − η j )) = − ¯ p j (1 + αε j ) [log( ¯ p j ) + log(1 + αε j )] (4) ∗ Electronic address: ar2384@columbia.edu † Electronic address: chris.wiggins@colum bia.edu 2 and π 1 h ( p 1 j ) + π 2 h ( p 2 j ) = − π 1 ( ¯ p j + η j ) log( ¯ p j + η j ) − π 2 ( ¯ p j − η j ) log( ¯ p j − η j ) = − 1 2 ¯ p j (1 + α )(1 + ε j ) log( ¯ p j (1 + ε j )) − 1 2 ¯ p j (1 − α )(1 − ε j ) log( ¯ p j (1 − ε j )) = − ¯ p j (1 + αε j ) log( ¯ p j ) − 1 2 ¯ p j (1 + αε j ) log(1 − ε 2 j ) − 1 2 ¯ p j ( α + ε j ) log  1 + ε j 1 − ε j  . (5) Th us, h ( π 1 p 1 j + π 2 p 2 j ) − ( π 1 h ( p 1 j ) + π 2 h ( p 2 j )) = 1 2 ¯ p j " (1 + αε j ) log 1 − ε 2 j (1 + αε j ) 2 ! + ( α + ε j ) log  1 + ε j 1 − ε j  # . (6) The T aylor series expansion of the logarithm function is giv en as log(1 + x ) = ∞ X i =1 c i x i ; c i = ( − 1) i +1 i . (7) The logarithms in the expression for the J-S divergence can then be written as log(1 + ε j ) = ∞ X i =1 c i ε i j log(1 − ε j ) = ∞ X i =1 ( − 1) i c i ε i j (8) log(1 + αε j ) = ∞ X i =1 c i α i ε i j . W e then ha ve ∆ = 1 2 P N j =1 ¯ p j δ j , with δ j = (1 + α ε j ) [log(1 + ε j ) + log (1 − ε j ) − 2 log (1 + αε j )] + ( α + ε j ) [log(1 + ε j ) − log (1 − ε j )] = (1 + α ε j ) " ∞ X i =1 c i ε i j + ∞ X i =1 ( − 1) i c i ε i j − 2 ∞ X i =1 c i α i ε i j # + ( α + ε j ) " ∞ X i =1 c i ε i j − ∞ X i =1 ( − 1) i c i ε i j # = ∞ X i =1 c i  ε i j + αε i +1 j + ( − 1) i ε i j + ( − 1) i αε i +1 j − 2 α i ε i j − 2 α i +1 ε i +1 j + αε i j + ε i +1 j + ( − 1) i +1 αε i j + ( − 1) i +1 ε i +1 j  = ∞ X i =1 c i  ( − 1) i − 2 α i + α + ( − 1) i +1 α + 1  ε i j +  ( − 1) i α − 2 α i +1 + 1 + ( − 1) i +1 + α  ε i +1 j  . (9) When i = 1, coeﬀ ( ε j ) = c 1 ( − 1 − 2 α + α + α + 1) = 0. The ﬁrst non-v anishing term in the expansion is then of order 2. Shifting indices of the ﬁrst term in Eqn. (9) giv es δ j = ∞ X i =1  c i +1  ( − 1) i +1 − 2 α i +1 + α + ( − 1) i +2 α + 1  + c i  ( − 1) i α − 2 α i +1 + 1 + ( − 1) i +1 + α  ε i +1 j = ∞ X i =1 ( c i + c i +1 )  ( − 1) i α − 2 α i +1 + α + 1 + ( − 1) i +1 )  ε i +1 j = ∞ X i =1 ( − 1) i +1 i ( i + 1)  ( − 1) i α − 2 α i +1 + α + 1 + ( − 1) i +1 )  ε i +1 j = ∞ X i =1 B i ε i +1 j (10) where B i = 1 − α + ( − 1) i +1 (1 + α − 2 α i +1 ) i ( i + 1) = ( 2(1 − α i +1 ) / ( i ( i + 1)) i o dd , − 2( α − α i +1 ) / ( i ( i + 1)) i even . (11) 3 FIG. 1: Plot comparing the naive and approximate formulae, truncated at diﬀerent orders for calculating JSD as a function of the normalized L2-distance ( k ε k ; see Section I I I) b etw een pairs of randomly generated probabilit y distributions. Best ﬁt slop es are: -2.05 ( k = 3), -5.89 ( k = 6), -8.14 ( k = 9), -11.91 ( k = 12) and -105.43 (comparing naive with k = 100). FIG. 2: Probability of obtaining (erroneous) negativ e v alues, when directly ev aluating JSD using its exact expression, is plotted as a function of k ε k . When implemen ted in ma tlab , w e observe that the naive form ula gives negativ e JSD when k ε k is merely of O (10 − 6 ). This series expansion can be further simpliﬁed as δ j = ∞ X i =1 ( B 2 i − 1 + B 2 i ε j ) ε 2 i j = ∞ X i =1 B 2 i − 1  1 + B 2 i B 2 i − 1 ε j  ε 2 i j , (12) B 2 i B 2 i − 1 ε j = −  2 i − 1 2 i + 1  αε j . (13) Since − 1 6 αε j 6 1, w e hav e − 1 6 B 2 i B 2 i − 1 ε j 6 1. Thus, for every i , ( B 2 i − 1 + B 2 i ε j ) ε 2 i j > 0, making δ j — and the series e xpansion for ∆ naiv e — non-negative up to all orders. I I I. NUMERICAL RESUL TS The accuracy of the truncated series expansion can b e compared with the naiv e form ula by measuring the JSD b et ween randomly generated probability distributions. P airs of probabilit y distributions with − 4 6 log 10 k ε k < 0, where k ε k = q P N j =1 ε 2 j N , were randomly generated and the J-S divergence b etw een each pair w as calculated b y b oth a direct ev aluation of the exact expression (∆ naiv e ) and the appro ximate expansion (∆ k ; k ∈ { 3 , 6 , 9 , 12 } ), where ∆ k = 1 2 N X j =1 ¯ p j δ j k ; δ j k = k X i =1 B i ε i +1 j . (14) The results shown in Fig. 1 suggest the series expansion to b e a more numerically useful form ula when the probability distributions diﬀer by k ε k ∼ O (10 − 0 . 5 ). Fig. 2 further sho ws that when k ε k ∼ O (10 − 6 ), a direct ev aluation of the exact formula for JSD giv es negativ e v alues (when implemented in ma tlab ). APPENDIX Here we include the ma tlab co de used in the ﬁgures for appro ximate ev aluation of JSD using its series expansion. 4 function [JS,epsnorm] = JSapprx(pi1,p1,pi2,p2,order) % [JS,epsnorm]=JSapprx(pi1,p1,pi2,p2,order) calculates JS % divergence given two probability distributions and % their relative weights. JSapprx uses an approximation % to the JSD by expanding in powers of epsilon=(p1-p2)/(p1+p2) % and truncating at an order input by the user. % % This calculation is described in the technical report % ‘‘A non-negative expansion % for small Jensen-Shannon Divergences’’ % by Anil Raj and Chris H. Wiggins, October 2008 % average of distributions pbar=(p1+p2)/2; % difference of distributions eta=(p1-p2)/2; % ratio of difference to average epsilon=eta./pbar; % difference in biases, where pi1+pi2=1 alpha=pi1-pi2; % calculate JS by summing up to order ‘order’ js=zeros(size(pbar)); % denominator computed by summing, as well denominator=0; for i=2:order denominator=denominator+(i-1); % numerical coefficient c=(-1)^i*(1/denominator); Bi=c*(alpha^(mod(i,2))-alpha^i); js=js+Bi*(epsilon.^i); end % sum over ‘j’: JS=pbar’*js/2; % convert from nats to bits: JS=JS/log(2); % norm of epsilon reported as output if nargout==2 epsnorm=sqrt(sum(epsilon.^2)/length(pbar)); end [1] J Lin. Divergence measures based on the shannon entrop y . IEEE T ransactions on Information The ory , 37(1):145–151, Jan 1991.

A non-negative expansion for small Jensen-Shannon Divergences

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment