📝 Original Info
- Title: A non-negative expansion for small Jensen-Shannon Divergences
- ArXiv ID: 0810.5117
- Date: 2008-10-29
- Authors: Researchers from original ArXiv paper
📝 Abstract
In this report, we derive a non-negative series expansion for the Jensen-Shannon divergence (JSD) between two probability distributions. This series expansion is shown to be useful for numerical calculations of the JSD, when the probability distributions are nearly equal, and for which, consequently, small numerical errors dominate evaluation.
💡 Deep Analysis
Deep Dive into A non-negative expansion for small Jensen-Shannon Divergences.
In this report, we derive a non-negative series expansion for the Jensen-Shannon divergence (JSD) between two probability distributions. This series expansion is shown to be useful for numerical calculations of the JSD, when the probability distributions are nearly equal, and for which, consequently, small numerical errors dominate evaluation.
📄 Full Content
A non-negative expansion for small Jensen-Shannon Divergences
Anil Raj
Department of Applied Physics and Applied Mathematics
Columbia University, New York∗
Chris H. Wiggins
Department of Applied Physics and Applied Mathematics
Center for Computational Biology and Bioinformatics
Columbia University, New York†
(Dated: October 13, 2021)
In this report, we derive a non-negative series expansion for the Jensen-Shannon divergence (JSD)
between two probability distributions. This series expansion is shown to be useful for numerical
calculations of the JSD, when the probability distributions are nearly equal, and for which, conse-
quently, small numerical errors dominate evaluation.
Keywords: entropy, JS divergence
I.
INTRODUCTION
The Jensen-Shannon divergence (JSD) has been widely used as a dissimilarity measure between weighted probability
distributions. The direct numerical evaluation of the exact expression for the JSD (involving difference of logarithms),
however, leads to numerical errors when the distributions are close to each other (small JSD); when the element-wise
difference between the distributions is O(10−1), this naive formula produces erroneous values (sometimes negative)
when used for numerical calculations. In this report, we derive a provably non-negative series expansion for the JSD
which can be used in the small JSD limit, where the naive formula fails.
II.
SERIES EXPANSION FOR JENSEN-SHANNON DIVERGENCE
Consider two discrete probability distributions p1 and p2 over a sample space S of cardinality N with relative
normalized weights π1 and π2 between them. The JSD between the distributions is then defined as [1]
∆naive[p1, p2; π1, π2] = H[π1p1 + π2p2] −(π1H[p1] + π2H[p2])
(1)
where the entropy (measured in nats) of a probability distribution is defined as
H[p] = −
N
X
j=1
h(pj) = −
N
X
j=1
pj log(pj).
(2)
Defining
¯pj = (p1j + p2j)/2 ;
0 ⩽¯pj ⩽1
; PN
j=1 ¯pj = 1
ηj = (p1j −p2j)/2 ; PN
j=1 ηj = 0
εj =
(ηj)/¯pj
; −1 ⩽εj ⩽1
α =
π1 −π2
; −1 ⩽α ⩽1
(3)
we have
h(π1p1j + π2p2j) = −(π1(¯pj + ηj) + π2(¯pj −ηj)) log(π1(¯pj + ηj) + π2(¯pj −ηj))
= −¯pj(1 + αεj) [log(¯pj) + log(1 + αεj)]
(4)
∗Electronic address: ar2384@columbia.edu
†Electronic address: chris.wiggins@columbia.edu
arXiv:0810.5117v1 [stat.ML] 28 Oct 2008
2
and
π1h(p1j) + π2h(p2j) = −π1(¯pj + ηj) log(¯pj + ηj) −π2(¯pj −ηj) log(¯pj −ηj)
= −1
2 ¯pj(1 + α)(1 + εj) log(¯pj(1 + εj)) −1
2 ¯pj(1 −α)(1 −εj) log(¯pj(1 −εj))
= −¯pj(1 + αεj) log(¯pj) −1
2 ¯pj(1 + αεj) log(1 −ε2
j) −1
2 ¯pj(α + εj) log
1 + εj
1 −εj
.
(5)
Thus,
h(π1p1j + π2p2j) −(π1h(p1j) + π2h(p2j)) = 1
2 ¯pj
"
(1 + αεj) log
1 −ε2
j
(1 + αεj)2
!
- (α + εj) log
1 + εj
1 −εj
#
.
(6)
The Taylor series expansion of the logarithm function is given as
log(1 + x) =
∞
X
i=1
cixi;
ci = (−1)i+1
i
.
(7)
The logarithms in the expression for the J-S divergence can then be written as
log(1 + εj) =
∞
X
i=1
ciεi
j
log(1 −εj) =
∞
X
i=1
(−1)iciεi
j
(8)
log(1 + αεj) =
∞
X
i=1
ciαiεi
j.
We then have ∆= 1
2
PN
j=1 ¯pjδj, with
δj = (1 + αεj) [log(1 + εj) + log(1 −εj) −2 log(1 + αεj)] + (α + εj) [log(1 + εj) −log(1 −εj)]
= (1 + αεj)
" ∞
X
i=1
ciεi
j +
∞
X
i=1
(−1)iciεi
j −2
∞
X
i=1
ciαiεi
j
- (α + εj)
" ∞
X
i=1
ciεi
j −
∞
X
i=1
(−1)iciεi
j
=
∞
X
i=1
ci
εi
j + αεi+1
j
- (−1)iεi
j + (−1)iαεi+1
j
−2αiεi
j −2αi+1εi+1
j
- αεi
j + εi+1
j
- (−1)i+1αεi
j + (−1)i+1εi+1
j
=
∞
X
i=1
ci
(−1)i −2αi + α + (−1)i+1α + 1
εi
j +
(−1)iα −2αi+1 + 1 + (−1)i+1 + α
εi+1
j
.
(9)
When i = 1, coeff(εj) = c1(−1 −2α + α + α + 1) = 0. The first non-vanishing term in the expansion is then of
order 2. Shifting indices of the first term in Eqn. (9) gives
δj =
∞
X
i=1
ci+1
(−1)i+1 −2αi+1 + α + (−1)i+2α + 1
- ci
(−1)iα −2αi+1 + 1 + (−1)i+1 + α
εi+1
j
=
∞
X
i=1
(ci + ci+1)
(−1)iα −2αi+1 + α + 1 + (−1)i+1)
εi+1
j
∞
X
i=1
(−1)i+1
i(i + 1)
(−1)iα −2αi+1 + α + 1 + (−1)i+1)
εi+1
j
∞
X
i=1
Biεi+1
j
(10)
where
Bi = 1 −α + (−1)i+1(1 + α −2αi+1)
i(i + 1)
(
2(1 −αi+1)/ (i(i + 1))
i odd,
−2(α −αi+1)/ (i(i + 1)) i even.
(11)
3
FIG. 1: Plot comparing the naive and approximate formulae,
truncated at different orders for calculating JSD as a function
of the normalized L2-distance (∥ε∥; see Section III) between
pairs of randomly generated probability distributions. Best fit
slopes are: -2.05 (k = 3), -5.89 (k = 6), -8.14 (k = 9), -11.91
(k = 12) and -105.43 (comparing naive with k = 100).
FIG. 2: Probability of obtaining (erroneous) negative values,
when directly evaluating JSD using its exact expression, is
plotted as a function of ∥ε∥. When implemented in matlab,
we observe that the naive formula gives negative JSD when
∥ε∥is merely of O(10−6).
This series expansion can be further simplified as
δj =
∞
X
i=1
(B2i−1 + B2iεj) ε2i
j
∞
X
i=1
B2i−1
1 +
B2i
B2i−1
εj
ε2i
j ,
(12)
B2i
B2i−1
εj = −
2i −1
2i + 1
αεj.
(13)
Since −1 ⩽αεj ⩽1, we have −1 ⩽
B2i
B2i−1 εj ⩽1. Thus, for every i, (B2i−1 + B2iεj)ε2i
j ⩾0, making δj — and the
series expansion for ∆naive — non-negative up to all orders.
III.
NUMERICAL RESULTS
The accu
…(Full text truncated)…
Reference
This content is AI-processed based on ArXiv data.