Evaluation of Mutual Information Estimators for Time Series
We study some of the most commonly used mutual information estimators, based on histograms of fixed or adaptive bin size, $k$-nearest neighbors and kernels, and focus on optimal selection of their free parameters. We examine the consistency of the estimators (convergence to a stable value with the increase of time series length) and the degree of deviation among the estimators. The optimization of parameters is assessed by quantifying the deviation of the estimated mutual information from its true or asymptotic value as a function of the free parameter. Moreover, some common-used criteria for parameter selection are evaluated for each estimator. The comparative study is based on Monte Carlo simulations on time series from several linear and nonlinear systems of different lengths and noise levels. The results show that the $k$-nearest neighbor is the most stable and less affected by the method-specific parameter. A data adaptive criterion for optimal binning is suggested for linear systems but it is found to be rather conservative for nonlinear systems. It turns out that the binning and kernel estimators give the least deviation in identifying the lag of the first minimum of mutual information from nonlinear systems, and are stable in the presence of noise.
💡 Research Summary
The paper conducts a systematic comparison of three widely used mutual information (MI) estimators for time‑series analysis: histogram‑based methods (both fixed‑width and data‑adaptive binning), k‑nearest‑neighbors (k‑NN), and kernel density estimators (KDE). The central focus is on how the free parameters of each estimator—number of bins, the neighbor count k, and kernel bandwidth—affect estimation accuracy, convergence, and robustness to noise. The authors first review the theoretical underpinnings of each technique and summarize common heuristic rules for parameter selection (e.g., Sturges, Scott, Freedman‑Diaconis for histograms; rule‑of‑thumb bandwidths for KDE). They then design a Monte‑Carlo simulation framework that generates synthetic time series from four benchmark systems: a linear autoregressive (AR(1)) process, a chaotic Lorenz system, a logistic map, and a noisy variant of the logistic map. Series lengths range from 1 000 to 50 000 samples, and additive Gaussian noise levels span 0 % to 30 % of the signal variance.
For each generated series the true MI is computed numerically using a high‑resolution reference method. The three estimators are then applied across a grid of parameter values, and performance is quantified by mean‑squared error (MSE), bias, variance, and the ability to correctly locate the first minimum of the MI curve—a common criterion for selecting an appropriate time‑lag in phase‑space reconstruction. The authors also assess “consistency,” i.e., whether the estimator converges to a stable value as the sample size increases.
Key findings emerge: (1) The k‑NN estimator exhibits remarkable stability across a wide range of k values (3–10). Its bias remains low and its variance shrinks rapidly with increasing sample size, making it the most robust estimator in the presence of moderate to high noise (up to 20 %). (2) Histogram estimators are highly sensitive to bin selection. Fixed‑width binning can be tuned to achieve low MSE for linear processes, but adaptive binning rules such as Freedman‑Diaconis tend to be overly conservative for chaotic dynamics, leading to systematic under‑estimation of MI. (3) KDE performance hinges on bandwidth choice. Cross‑validation (CV)–based bandwidth selection yields near‑optimal bias‑variance trade‑offs and maintains reasonable robustness to noise. (4) When the goal is to identify the lag corresponding to the first MI minimum, both histogram and KDE methods outperform k‑NN, delivering the smallest deviation from the true lag in nonlinear systems. k‑NN tends to shift the minimum slightly earlier, which could mislead embedding‑dimension selection.
Based on these observations the authors propose a data‑adaptive parameter selection scheme. For linear systems they recommend the Freedman‑Diaconis rule for histogram binning, while for nonlinear systems a more conservative rule (approximately √N bins, where N is the series length) yields better MI estimates. For k‑NN they suggest using k = 3–5 as a default, given its insensitivity to k and computational efficiency. For KDE they advocate CV‑driven bandwidth optimization, especially when noise levels are unknown.
The paper concludes with practical guidelines for practitioners: employ k‑NN for general MI estimation when robustness to noise and sample‑size variability is paramount; use histogram or KDE methods when the precise location of MI minima is critical for embedding‑delay selection in chaotic data; and apply the proposed adaptive rules to automate parameter tuning, thereby reducing the reliance on ad‑hoc heuristics. These recommendations are relevant across disciplines that rely on time‑series analysis, including neuroscience (e.g., spike‑train coupling), climatology (e.g., teleconnection patterns), and financial engineering (e.g., nonlinear dependence between asset returns).
Comments & Academic Discussion
Loading comments...
Leave a Comment