A Local Characterization of $f$-Divergences Yielding PSD Mutual-Information Matrices

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We study when the variable-indexed matrix of pairwise $f$-mutual informations $M^{(f)}{ij} = I_f(X_i;X_j)$ is positive semidefinite (PSD). Let $f:(0,\infty)\to R$ be convex with $f(1)=0$, finite in a neighborhood of $1$, and with $f(0)<\infty$ so that diagonal terms are finite. We give a sharp local characterization around independence: there exists $δ=δ(f)>0$ such that for every $n$ and every finite-alphabet family $(X_1,\ldots,X_n)$ whose pairwise joint-to-product ratios lie in $(1-δ,1+δ)$, the matrix $M^{(f)}$ is PSD if and only if $f$ is analytic at $1$ with a convergent expansion $f(t)=\sum{m\ge 2} a_m (t-1)^m$ and $a_m\ge 0$ on a neighborhood of $1$. Consequently, any negative Taylor coefficient yields an explicit finite-alphabet counterexample under arbitrarily weak dependence, and non-analytic convex divergences (e.g., total variation) are excluded. This PSD requirement is distinct from Hilbertian or metric properties of divergences between distributions (e.g., $\sqrt{JS}$): we study PSD of the variable-indexed mutual-information matrix. The proof combines a replica embedding that turns monomial terms into Gram matrices with a replica-forcing reduction to positive-definite dot-product kernels, enabling an application of the Schoenberg–Berg–Christensen–Ressel classification.

💡 Research Summary

The paper investigates a fundamental question in information‑theoretic kernel methods: for which $f$‑divergences does the matrix of pairwise $f$‑mutual informations, $M^{(f)}_{ij}=I_f(X_i;X_j)$, define a positive‑semidefinite (PSD) kernel for any number of random variables? The authors focus on a local regime where all pairwise joint‑to‑product ratios lie in a narrow band $(1-\delta,1+\delta)$, i.e., the variables are only weakly dependent. Under this mild, dimension‑free assumption they obtain a sharp characterization.

Main result (Theorem II.1).
Let $f:(0,\infty)\to\mathbb R$ be convex, $f(1)=0$, finite near $1$, and with $f(0)<\infty$ (so diagonal entries are finite). Then the following are equivalent:

There exists a radius $\delta>0$ (depending only on $f$) such that for every $n$ and every finite‑alphabet collection $(X_1,\dots,X_n)$ that is $\delta$‑pairwise‑weakly‑dependent (all joint‑to‑product ratios belong to $(1-\delta,1+\delta)$), the matrix $M^{(f)}$ is PSD.
$f$ is absolutely monotone at $t=1$: there is an interval $|t-1|<\eta$ on which $f$ admits an analytic expansion
\

A Local Characterization of $f$-Divergences Yielding PSD Mutual-Information Matrices

💡 Research Summary

Comments & Academic Discussion

Leave a Comment