Exploring Memory Effects: Sparse Identification in Vector-Borne Diseases

Exploring Memory Effects: Sparse Identification in Vector-Borne Diseases
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Predicting the human burden of vector-borne diseases from limited surveillance data remains a major challenge, particularly in the presence of nonlinear transmission dynamics and delayed effects arising from vector ecology and human behavior. We develop a data-driven framework based on an extension of Sparse Identification of Nonlinear Dynamics (SINDy) to systems with distributed memory, enabling discovery of transmission mechanisms directly from time series data. Using severe fever with thrombocytopenia syndrome (SFTS) as a case study, we show that this approach can uncover key features of tick-borne disease dynamics using only human incidence and local temperature data, without imposing predefined assumptions on human case reporting. We further demonstrate that predictive performance is substantially enhanced when the data-driven model is coupled with mechanistic representations of tick-host transmission pathways informed by empirical studies. The framework supports systematic sensitivity analysis of memory kernels and behavioral parameters, identifying those most influential for prediction accuracy. Although the approach prioritizes predictive accuracy over mechanistic transparency, it yields sparse, interpretable integral representations suitable for epidemiological forecasting. This hybrid methodology provides a scalable strategy for forecasting vector-borne disease risk and informing public health decision-making under data limitations.


💡 Research Summary

This paper introduces a novel data‑driven framework that extends the Sparse Identification of Nonlinear Dynamics (SINDy) methodology to systems with distributed memory, specifically renewal‑type integral equations, and applies it to forecasting the human burden of a vector‑borne disease. Traditional compartmental ODE models of tick‑borne diseases require detailed knowledge of tick life stages, host dynamics, and transmission pathways, which are rarely available in practice. By contrast, the proposed approach uses only two observable time series—monthly confirmed cases of Severe Fever with Thrombocytopenia Syndrome (SFTS) and ambient temperature—to automatically discover the underlying delayed transmission mechanisms.

The authors first reformulate the renewal equation (x(t)=\int_{-\tau}^{0} g(s,x(t+s)),ds) as a weighted sum over a set of quadrature nodes (s_k) with weights (w_k). For each node they construct a candidate library (\Theta_k) that includes polynomial terms of the delayed state (x(t+s_k)), temperature‑dependent functions (e.g., (e^{\alpha T(t+s_k)})), and cross‑terms up to a chosen degree. The full library is the weighted concatenation of all (\Theta_k). Sparse regression (LASSO or sequential thresholded least squares) is then performed to obtain a coefficient vector (\xi) that selects a parsimonious subset of library functions, thereby identifying a compact expression for the memory kernel (g). The regularization parameter is tuned by cross‑validation to balance sparsity and predictive performance.

Using monthly data from Dalian, Liaoning Province (2011‑2022), the method uncovers a kernel in which temperature exerts an exponential influence on the infection rate and the effect of past incidence is delayed by approximately two to three months. This structure reproduces the observed seasonal lag between warm periods and subsequent case spikes. When the data‑driven kernel is embedded into an existing mechanistic model (a 16‑state, 20‑equation tick‑host‑virus ODE system), the resulting hybrid model achieves substantially lower one‑step‑ahead forecast error (over 30 % reduction in RMSE) compared with the pure mechanistic version.

A systematic sensitivity analysis reveals that (1) the shape of the delay distribution, (2) the temperature sensitivity parameter (\alpha), and (3) the human‑tick contact rate are the most influential factors for forecast accuracy. Wider delay kernels improve long‑term predictions, while higher (\alpha) values increase responsiveness to climate fluctuations.

The paper’s contributions are threefold: (i) it demonstrates that SINDy can be successfully adapted to infinite‑dimensional renewal equations, yielding sparse, interpretable integral representations of memory effects; (ii) it shows that limited surveillance data combined with simple environmental covariates are sufficient to recover key epidemiological mechanisms in a complex vector‑borne system; (iii) it provides a practical hybrid modeling strategy that leverages both data‑driven discovery and mechanistic insight, facilitating more reliable short‑term forecasts and informing public‑health interventions.

Limitations include the dependence on the predefined library (which may miss highly nonlinear interactions) and the potential for over‑regularization to suppress biologically relevant terms. The study is also confined to a single city and vector species, suggesting the need for multi‑regional, multi‑vector extensions. Future work is proposed to incorporate Bayesian sparse regression for uncertainty quantification, to explore adaptive libraries based on neural networks, and to apply the framework to climate‑change scenarios for long‑term risk assessment.


Comments & Academic Discussion

Loading comments...

Leave a Comment