Non-parametric estimation of Fisher information from real data
The Fisher Information matrix is a widely used measure for applications ranging from statistical inference, information geometry, experiment design, to the study of criticality in biological systems. Yet there is no commonly accepted non-parametric algorithm to estimate it from real data. In this rapid communication we show how to accurately estimate the Fisher information in a nonparametric way. We also develop a numerical procedure to minimize the errors by choosing the interval of the finite difference scheme necessary to compute the derivatives in the definition of the Fisher information. Our method uses the recently published “Density Estimation using Field Theory” algorithm to compute the probability density functions for continuous densities. We use the Fisher information of the normal distribution to validate our method and as an example we compute the temperature component of the Fisher Information Matrix in the two dimensional Ising model and show that it obeys the expected relation to the heat capacity and therefore peaks at the phase transition at the correct critical temperature.
💡 Research Summary
The paper addresses a long‑standing gap in statistical physics and data analysis: a reliable, non‑parametric method for estimating the Fisher Information Matrix (FIM) directly from empirical data. Fisher information quantifies how sensitively a probability density function (PDF) p(x; θ) depends on its parameters θ, and it underpins a wide range of applications—from parameter estimation limits (Cramér‑Rao bound) to information geometry, optimal experimental design, and the detection of phase transitions in complex systems. Existing non‑parametric density estimators, especially kernel density estimators (KDE), suffer from the need to choose a bandwidth h. The bandwidth controls smoothness; if h is too large the estimate oversmooths and underestimates Fisher information, while too small h creates spurious peaks and overestimates it. Moreover, the choice of h is largely heuristic, making the resulting FIM unreliable.
The authors propose to use the recently introduced DEFT (Density Estimation using Field Theory) algorithm as the core density estimator. DEFT treats smoothness as a field‑theoretic ultraviolet cutoff ℓ, which is inferred from the data within a Bayesian framework. In the large‑N limit the data automatically select an optimal ℓ, thereby eliminating the need for an external bandwidth parameter. DEFT has been shown to be fast and accurate in one and two dimensions, and it avoids the “curse of dimensionality” that plagues many non‑parametric methods.
To compute the FIM, the authors replace the analytical derivatives in the definition
g_{μν}(θ)=∫ p(x;θ) ∂_μ ln p(x;θ) ∂_ν ln p(x;θ) dx
with centered finite‑difference approximations. This introduces a new free parameter, the finite‑difference step Δθ_μ. The error in the FIM estimate now has two competing contributions: (i) truncation error of the finite‑difference scheme, scaling as O(Δθ²), which favours small Δθ; (ii) statistical error due to finite sample size N, which favours larger Δθ because the densities at θ±Δθ must be distinguishable given the data. The authors resolve this trade‑off by invoking Sanov’s theorem and the Kullback‑Leibler (KL) divergence. They define a “hypersphere” in parameter space whose radius ε (measured in units of Δθ) is set by the condition that the probability of mistaking a sample drawn from p(·;θ) for one drawn from p(·;θ+εΔθ) equals e⁻¹. Using a quadratic expansion of the KL divergence, they obtain
ε² = 2 N g_{μν}(θ) Δθ_μ Δθ_ν.
Thus, for a given N and desired ε (≈0.05 in practice), the optimal Δθ can be computed directly from an initial estimate of the FIM. The procedure can be iterated: start with a provisional Δθ, compute the FIM via DEFT, update ε, and adjust Δθ until convergence.
The methodology is validated on two fronts. First, synthetic data are drawn from a univariate normal distribution N(μ,σ). The analytic Fisher information for σ is g_{σσ}=2/σ². Using N=10⁴ samples, the authors compare DEFT with KDE (Scott’s rule for bandwidth). DEFT, combined with the finite‑difference formula (2a), yields unbiased estimates with a relative spread of 30‑40 %. KDE, by contrast, systematically overestimates the Fisher information by about 40 % and shows a much larger spread (≈100 %). The authors also explore the dependence of the relative error on ε, confirming that the error curve has a clear minimum near ε≈0.05, as predicted by the theory.
Second, the authors apply the method to the two‑dimensional Ising model, a canonical system exhibiting a continuous phase transition. The Gibbs distribution at zero external field is sampled via Metropolis‑Hastings Monte Carlo on a 25×25 lattice, collecting 15 000 energy measurements per temperature. Using DEFT to estimate the energy density p(E;T), the temperature component of the FIM, g_{TT}, is computed via the finite‑difference scheme. Theory predicts g_{TT}=C_h/T², where C_h is the specific heat at constant field. The numerical results show that g_{TT} peaks sharply at T_c≈2.269 (the Onsager critical temperature) and that the ratio g_{TT} T²/C_h stays close to unity across the temperature range, confirming the theoretical relationship. The optimal ε for this application is found to be ≈0.12, consistent with the earlier synthetic tests.
Overall, the paper makes three substantive contributions: (1) it demonstrates that DEFT provides a robust, data‑driven non‑parametric density estimator suitable for Fisher information calculations; (2) it derives a principled, KL‑based criterion for selecting the finite‑difference step Δθ, balancing truncation and statistical errors; (3) it validates the approach on both analytically tractable and physically non‑trivial systems, showing that the estimated FIM correctly captures critical behaviour. The work opens the door to applying Fisher‑information‑based diagnostics in experimental settings where the underlying PDF is unknown, such as neuroscience, ecology, and complex network dynamics, without resorting to ad‑hoc smoothing parameters. Future extensions could address higher‑dimensional parameter spaces, adaptive Δθ selection across different parameters, and integration with Bayesian inference pipelines for simultaneous density and parameter estimation.
Comments & Academic Discussion
Loading comments...
Leave a Comment