A non-negative expansion for small Jensen-Shannon Divergences

February 23, 2026

Reading time: 6 minute

...

📝 Original Info

Title: A non-negative expansion for small Jensen-Shannon Divergences
ArXiv ID: 0810.5117
Date: 2008-10-29
Authors: Researchers from original ArXiv paper

📝 Abstract

In this report, we derive a non-negative series expansion for the Jensen-Shannon divergence (JSD) between two probability distributions. This series expansion is shown to be useful for numerical calculations of the JSD, when the probability distributions are nearly equal, and for which, consequently, small numerical errors dominate evaluation.

💡 Deep Analysis

Deep Dive into A non-negative expansion for small Jensen-Shannon Divergences.

📄 Full Content

A non-negative expansion for small Jensen-Shannon Divergences Anil Raj Department of Applied Physics and Applied Mathematics Columbia University, New York∗ Chris H. Wiggins Department of Applied Physics and Applied Mathematics Center for Computational Biology and Bioinformatics Columbia University, New York† (Dated: October 13, 2021) In this report, we derive a non-negative series expansion for the Jensen-Shannon divergence (JSD) between two probability distributions. This series expansion is shown to be useful for numerical calculations of the JSD, when the probability distributions are nearly equal, and for which, conse- quently, small numerical errors dominate evaluation. Keywords: entropy, JS divergence I. INTRODUCTION The Jensen-Shannon divergence (JSD) has been widely used as a dissimilarity measure between weighted probability distributions. The direct numerical evaluation of the exact expression for the JSD (involving diﬀerence of logarithms), however, leads to numerical errors when the distributions are close to each other (small JSD); when the element-wise diﬀerence between the distributions is O(10−1), this naive formula produces erroneous values (sometimes negative) when used for numerical calculations. In this report, we derive a provably non-negative series expansion for the JSD which can be used in the small JSD limit, where the naive formula fails. II. SERIES EXPANSION FOR JENSEN-SHANNON DIVERGENCE Consider two discrete probability distributions p1 and p2 over a sample space S of cardinality N with relative normalized weights π1 and π2 between them. The JSD between the distributions is then deﬁned as [1] ∆naive[p1, p2; π1, π2] = H[π1p1 + π2p2] −(π1H[p1] + π2H[p2]) (1) where the entropy (measured in nats) of a probability distribution is deﬁned as H[p] = − N X j=1 h(pj) = − N X j=1 pj log(pj). (2) Deﬁning ¯pj = (p1j + p2j)/2 ; 0 ⩽¯pj ⩽1 ; PN j=1 ¯pj = 1 ηj = (p1j −p2j)/2 ; PN j=1 ηj = 0 εj = (ηj)/¯pj ; −1 ⩽εj ⩽1 α = π1 −π2 ; −1 ⩽α ⩽1 (3) we have h(π1p1j + π2p2j) = −(π1(¯pj + ηj) + π2(¯pj −ηj)) log(π1(¯pj + ηj) + π2(¯pj −ηj)) = −¯pj(1 + αεj) [log(¯pj) + log(1 + αεj)] (4) ∗Electronic address: ar2384@columbia.edu †Electronic address: chris.wiggins@columbia.edu arXiv:0810.5117v1 [stat.ML] 28 Oct 2008 2 and π1h(p1j) + π2h(p2j) = −π1(¯pj + ηj) log(¯pj + ηj) −π2(¯pj −ηj) log(¯pj −ηj) = −1 2 ¯pj(1 + α)(1 + εj) log(¯pj(1 + εj)) −1 2 ¯pj(1 −α)(1 −εj) log(¯pj(1 −εj)) = −¯pj(1 + αεj) log(¯pj) −1 2 ¯pj(1 + αεj) log(1 −ε2 j) −1 2 ¯pj(α + εj) log 1 + εj 1 −εj . (5) Thus, h(π1p1j + π2p2j) −(π1h(p1j) + π2h(p2j)) = 1 2 ¯pj " (1 + αεj) log

1 −ε2 j (1 + αεj)2 !

(α + εj) log 1 + εj 1 −εj # . (6) The Taylor series expansion of the logarithm function is given as log(1 + x) = ∞ X i=1 cixi; ci = (−1)i+1 i . (7) The logarithms in the expression for the J-S divergence can then be written as log(1 + εj) = ∞ X i=1 ciεi j log(1 −εj) = ∞ X i=1 (−1)iciεi j (8) log(1 + αεj) = ∞ X i=1 ciαiεi j. We then have ∆= 1 2 PN j=1 ¯pjδj, with δj = (1 + αεj) [log(1 + εj) + log(1 −εj) −2 log(1 + αεj)] + (α + εj) [log(1 + εj) −log(1 −εj)] = (1 + αεj) " ∞ X i=1 ciεi j + ∞ X i=1 (−1)iciεi j −2 ∞ X i=1 ciαiεi j

(α + εj) " ∞ X i=1 ciεi j − ∞ X i=1 (−1)iciεi j

= ∞ X i=1 ci εi j + αεi+1 j

(−1)iεi j + (−1)iαεi+1 j −2αiεi j −2αi+1εi+1 j
αεi j + εi+1 j
(−1)i+1αεi j + (−1)i+1εi+1 j = ∞ X i=1 ci (−1)i −2αi + α + (−1)i+1α + 1

εi j + (−1)iα −2αi+1 + 1 + (−1)i+1 + α

εi+1 j . (9) When i = 1, coeﬀ(εj) = c1(−1 −2α + α + α + 1) = 0. The ﬁrst non-vanishing term in the expansion is then of order 2. Shifting indices of the ﬁrst term in Eqn. (9) gives δj = ∞ X i=1 ci+1 (−1)i+1 −2αi+1 + α + (−1)i+2α + 1

ci (−1)iα −2αi+1 + 1 + (−1)i+1 + α εi+1 j = ∞ X i=1 (ci + ci+1) (−1)iα −2αi+1 + α + 1 + (−1)i+1)

εi+1 j

∞ X i=1 (−1)i+1 i(i + 1) (−1)iα −2αi+1 + α + 1 + (−1)i+1)

εi+1 j

∞ X i=1 Biεi+1 j (10) where Bi = 1 −α + (−1)i+1(1 + α −2αi+1) i(i + 1)

( 2(1 −αi+1)/ (i(i + 1)) i odd, −2(α −αi+1)/ (i(i + 1)) i even. (11) 3 FIG. 1: Plot comparing the naive and approximate formulae, truncated at diﬀerent orders for calculating JSD as a function of the normalized L2-distance (∥ε∥; see Section III) between pairs of randomly generated probability distributions. Best ﬁt slopes are: -2.05 (k = 3), -5.89 (k = 6), -8.14 (k = 9), -11.91 (k = 12) and -105.43 (comparing naive with k = 100). FIG. 2: Probability of obtaining (erroneous) negative values, when directly evaluating JSD using its exact expression, is plotted as a function of ∥ε∥. When implemented in matlab, we observe that the naive formula gives negative JSD when ∥ε∥is merely of O(10−6). This series expansion can be further simpliﬁed as δj = ∞ X i=1 (B2i−1 + B2iεj) ε2i j

∞ X i=1 B2i−1 1 + B2i B2i−1 εj ε2i j , (12) B2i B2i−1 εj = − 2i −1 2i + 1 αεj. (13) Since −1 ⩽αεj ⩽1, we have −1 ⩽ B2i B2i−1 εj ⩽1. Thus, for every i, (B2i−1 + B2iεj)ε2i j ⩾0, making δj — and the series expansion for ∆naive — non-negative up to all orders. III. NUMERICAL RESULTS The accu

…(Full text truncated)…

🇰🇷 이 논문을 한글로 읽기

📄 Read Full PDF on ArXiv

Reference

This content is AI-processed based on ArXiv data.

A non-negative expansion for small Jensen-Shannon Divergences

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

εi+1 j

εi+1 j

∞ X i=1 Biεi+1 j (10) where Bi = 1 −α + (−1)i+1(1 + α −2αi+1) i(i + 1)

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

εi+1 j

εi+1 j

∞ X i=1 Biεi+1 j (10) where Bi = 1 −α + (−1)i+1(1 + α −2αi+1) i(i + 1)

Reference

Related Posts

A Multivariate Regression Approach to Association Analysis of Quantitative Trait Network

Inference with Discriminative Posterior

Information Preserving Component Analysis: Data Projections for Flow Cytometry Analysis

Start searching

No results found