A non-negative expansion for small Jensen-Shannon Divergences

Reading time: 6 minute
...

📝 Original Info

  • Title: A non-negative expansion for small Jensen-Shannon Divergences
  • ArXiv ID: 0810.5117
  • Date: 2008-10-29
  • Authors: Researchers from original ArXiv paper

📝 Abstract

In this report, we derive a non-negative series expansion for the Jensen-Shannon divergence (JSD) between two probability distributions. This series expansion is shown to be useful for numerical calculations of the JSD, when the probability distributions are nearly equal, and for which, consequently, small numerical errors dominate evaluation.

💡 Deep Analysis

Deep Dive into A non-negative expansion for small Jensen-Shannon Divergences.

In this report, we derive a non-negative series expansion for the Jensen-Shannon divergence (JSD) between two probability distributions. This series expansion is shown to be useful for numerical calculations of the JSD, when the probability distributions are nearly equal, and for which, consequently, small numerical errors dominate evaluation.

📄 Full Content

A non-negative expansion for small Jensen-Shannon Divergences Anil Raj Department of Applied Physics and Applied Mathematics Columbia University, New York∗ Chris H. Wiggins Department of Applied Physics and Applied Mathematics Center for Computational Biology and Bioinformatics Columbia University, New York† (Dated: October 13, 2021) In this report, we derive a non-negative series expansion for the Jensen-Shannon divergence (JSD) between two probability distributions. This series expansion is shown to be useful for numerical calculations of the JSD, when the probability distributions are nearly equal, and for which, conse- quently, small numerical errors dominate evaluation. Keywords: entropy, JS divergence I. INTRODUCTION The Jensen-Shannon divergence (JSD) has been widely used as a dissimilarity measure between weighted probability distributions. The direct numerical evaluation of the exact expression for the JSD (involving difference of logarithms), however, leads to numerical errors when the distributions are close to each other (small JSD); when the element-wise difference between the distributions is O(10−1), this naive formula produces erroneous values (sometimes negative) when used for numerical calculations. In this report, we derive a provably non-negative series expansion for the JSD which can be used in the small JSD limit, where the naive formula fails. II. SERIES EXPANSION FOR JENSEN-SHANNON DIVERGENCE Consider two discrete probability distributions p1 and p2 over a sample space S of cardinality N with relative normalized weights π1 and π2 between them. The JSD between the distributions is then defined as [1] ∆naive[p1, p2; π1, π2] = H[π1p1 + π2p2] −(π1H[p1] + π2H[p2]) (1) where the entropy (measured in nats) of a probability distribution is defined as H[p] = − N X j=1 h(pj) = − N X j=1 pj log(pj). (2) Defining ¯pj = (p1j + p2j)/2 ; 0 ⩽¯pj ⩽1 ; PN j=1 ¯pj = 1 ηj = (p1j −p2j)/2 ; PN j=1 ηj = 0 εj = (ηj)/¯pj ; −1 ⩽εj ⩽1 α = π1 −π2 ; −1 ⩽α ⩽1 (3) we have h(π1p1j + π2p2j) = −(π1(¯pj + ηj) + π2(¯pj −ηj)) log(π1(¯pj + ηj) + π2(¯pj −ηj)) = −¯pj(1 + αεj) [log(¯pj) + log(1 + αεj)] (4) ∗Electronic address: ar2384@columbia.edu †Electronic address: chris.wiggins@columbia.edu arXiv:0810.5117v1 [stat.ML] 28 Oct 2008 2 and π1h(p1j) + π2h(p2j) = −π1(¯pj + ηj) log(¯pj + ηj) −π2(¯pj −ηj) log(¯pj −ηj) = −1 2 ¯pj(1 + α)(1 + εj) log(¯pj(1 + εj)) −1 2 ¯pj(1 −α)(1 −εj) log(¯pj(1 −εj)) = −¯pj(1 + αεj) log(¯pj) −1 2 ¯pj(1 + αεj) log(1 −ε2 j) −1 2 ¯pj(α + εj) log 1 + εj 1 −εj  . (5) Thus, h(π1p1j + π2p2j) −(π1h(p1j) + π2h(p2j)) = 1 2 ¯pj " (1 + αεj) log

1 −ε2 j (1 + αεj)2 !

  • (α + εj) log 1 + εj 1 −εj # . (6) The Taylor series expansion of the logarithm function is given as log(1 + x) = ∞ X i=1 cixi; ci = (−1)i+1 i . (7) The logarithms in the expression for the J-S divergence can then be written as log(1 + εj) = ∞ X i=1 ciεi j log(1 −εj) = ∞ X i=1 (−1)iciεi j (8) log(1 + αεj) = ∞ X i=1 ciαiεi j. We then have ∆= 1 2 PN j=1 ¯pjδj, with δj = (1 + αεj) [log(1 + εj) + log(1 −εj) −2 log(1 + αεj)] + (α + εj) [log(1 + εj) −log(1 −εj)] = (1 + αεj) " ∞ X i=1 ciεi j + ∞ X i=1 (−1)iciεi j −2 ∞ X i=1 ciαiεi j

  • (α + εj) " ∞ X i=1 ciεi j − ∞ X i=1 (−1)iciεi j

= ∞ X i=1 ci  εi j + αεi+1 j

  • (−1)iεi j + (−1)iαεi+1 j −2αiεi j −2αi+1εi+1 j
  • αεi j + εi+1 j
  • (−1)i+1αεi j + (−1)i+1εi+1 j  = ∞ X i=1 ci  (−1)i −2αi + α + (−1)i+1α + 1

εi j +  (−1)iα −2αi+1 + 1 + (−1)i+1 + α

εi+1 j  . (9) When i = 1, coeff(εj) = c1(−1 −2α + α + α + 1) = 0. The first non-vanishing term in the expansion is then of order 2. Shifting indices of the first term in Eqn. (9) gives δj = ∞ X i=1  ci+1  (−1)i+1 −2αi+1 + α + (−1)i+2α + 1

  • ci  (−1)iα −2αi+1 + 1 + (−1)i+1 + α  εi+1 j = ∞ X i=1 (ci + ci+1)  (−1)iα −2αi+1 + α + 1 + (−1)i+1)

εi+1 j

∞ X i=1 (−1)i+1 i(i + 1)  (−1)iα −2αi+1 + α + 1 + (−1)i+1)

εi+1 j

∞ X i=1 Biεi+1 j (10) where Bi = 1 −α + (−1)i+1(1 + α −2αi+1) i(i + 1)

( 2(1 −αi+1)/ (i(i + 1)) i odd, −2(α −αi+1)/ (i(i + 1)) i even. (11) 3 FIG. 1: Plot comparing the naive and approximate formulae, truncated at different orders for calculating JSD as a function of the normalized L2-distance (∥ε∥; see Section III) between pairs of randomly generated probability distributions. Best fit slopes are: -2.05 (k = 3), -5.89 (k = 6), -8.14 (k = 9), -11.91 (k = 12) and -105.43 (comparing naive with k = 100). FIG. 2: Probability of obtaining (erroneous) negative values, when directly evaluating JSD using its exact expression, is plotted as a function of ∥ε∥. When implemented in matlab, we observe that the naive formula gives negative JSD when ∥ε∥is merely of O(10−6). This series expansion can be further simplified as δj = ∞ X i=1 (B2i−1 + B2iεj) ε2i j

∞ X i=1 B2i−1  1 + B2i B2i−1 εj  ε2i j , (12) B2i B2i−1 εj = − 2i −1 2i + 1  αεj. (13) Since −1 ⩽αεj ⩽1, we have −1 ⩽ B2i B2i−1 εj ⩽1. Thus, for every i, (B2i−1 + B2iεj)ε2i j ⩾0, making δj — and the series expansion for ∆naive — non-negative up to all orders. III. NUMERICAL RESULTS The accu

…(Full text truncated)…

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut