Skip to main content
QUICK REVIEW

[论文解读] Exploring Memory Effects: Sparse Identification in Vector-Borne Diseases

Dimitri Breda, Muhammad Tanveer|arXiv (Cornell University)|Jan 28, 2026
Viral Infections and Vectors被引用 0
一句话总结

This paper extends SINDy to discover distributed memory (delay) dynamics in vector-borne disease transmission, demonstrated on SFTS in Dalian using incidence and temperature data. It shows data-driven, sparse, integral representations that improve short-term forecasting when coupled with mechanistic components.

ABSTRACT

Predicting the human burden of vector-borne diseases from limited surveillance data remains a major challenge, particularly in the presence of nonlinear transmission dynamics and delayed effects arising from vector ecology and human behavior. We develop a data-driven framework based on an extension of Sparse Identification of Nonlinear Dynamics (SINDy) to systems with distributed memory, enabling discovery of transmission mechanisms directly from time series data. Using severe fever with thrombocytopenia syndrome (SFTS) as a case study, we show that this approach can uncover key features of tick-borne disease dynamics using only human incidence and local temperature data, without imposing predefined assumptions on human case reporting. We further demonstrate that predictive performance is substantially enhanced when the data-driven model is coupled with mechanistic representations of tick-host transmission pathways informed by empirical studies. The framework supports systematic sensitivity analysis of memory kernels and behavioral parameters, identifying those most influential for prediction accuracy. Although the approach prioritizes predictive accuracy over mechanistic transparency, it yields sparse, interpretable integral representations suitable for epidemiological forecasting. This hybrid methodology provides a scalable strategy for forecasting vector-borne disease risk and informing public health decision-making under data limitations.

研究动机与目标

  • Motivate short-term forecasting of vector-borne diseases under data limitations and uncertain memory effects.
  • Develop a data-driven extension of SINDy capable of discovering distributed memory kernels from time-series data.
  • Apply the framework to SFTS in Dalian using human incidence and local temperature, with mechanistic coupling to improve prediction.
  • Assess sensitivity of memory kernels and behavioral parameters to identify drivers of predictive accuracy.
  • Provide interpretable, sparse integral representations suitable for epidemiological forecasting under data constraints.

提出的方法

  • Extend SINDy to systems with distributed memory via renewal equations and integral kernels.
  • Construct quadrature-based approximations to transform distributed-delay integrals into weighted sums of candidate functions.
  • Build a library of candidate functions that depends on delay nodes and delayed state trajectories, including nonautonomous terms like e^{ωT} and polynomial interactions.
  • Solve sparse regression (STLS/LASSO) to identify the minimal set of active terms in the kernel for each state component.
  • Couple the data-driven kernel with mechanistic components (tick-host transmission paths) to improve predictive performance.
  • Evaluate models on monthly SFTS incidence and temperature data, and, in extended analysis, include infected tick populations to enrich the library.

实验结果

研究问题

  • RQ1Can SINDy be extended to identify distributed memory kernels from time-series data in vector-borne disease transmission?
  • RQ2How well can a data-driven kernel capture tick-host–human transmission dynamics using limited human incidence and temperature data?
  • RQ3Does incorporating nonautonomous memory terms like e^{ωT} improve short-term forecasts of SFTS incidence?
  • RQ4What is the impact of including tick infection data on the identified memory kernel and predictive accuracy?
  • RQ5How sensitive are memory kernels and reporting-window parameters (sigma) to forecast accuracy for SFTS?

主要发现

  • A distributed-delay SINDy framework can identify sparse, interpretable kernels that link past incidence, temperature, and activity to current cases.
  • Two model variants were tested: a baseline memory model and one incorporating an outdoor-activity term e^{ωT}, with polynomial libraries up to degree 1–2.
  • Including temperature-driven outdoor activity improves predictive performance similarly to the baseline, with degree-1 polynomials performing best in some cases.
  • Extending the library to include infected tick populations (I_Nq, I_Aq) yields further improvements and yields a sparse kernel that includes tick-state terms.
  • The approach captures seasonality in SFTS incidence and provides RMSE values that are reasonable given the limited training data (96 monthly samples).
  • Sensitivity analyses show how the outdoor-activity parameter ω and the integration window σ influence validation error, guiding robust parameter selection.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。