Skip to main content
QUICK REVIEW

[论文解读] Deep Transformer Models for Time Series Forecasting: The Influenza Prevalence Case

Neo Wu, Bradley Green|arXiv (Cornell University)|Jan 23, 2020
Anomaly Detection Techniques and Applications参考文献 21被引用 349
一句话总结

论文提出一种基于Transformer的方法用于时间序列预测,在流感样疾病ILI数据上进行评估,并与ARIMA、LSTM和Seq2Seq模型进行比较,显示出具有竞争力或更优的性能。

ABSTRACT

In this paper, we present a new approach to time series forecasting. Time series data are prevalent in many scientific and engineering disciplines. Time series forecasting is a crucial task in modeling time series data, and is an important area of machine learning. In this work we developed a novel method that employs Transformer-based machine learning models to forecast time series data. This approach works by leveraging self-attention mechanisms to learn complex patterns and dynamics from time series data. Moreover, it is a generic framework and can be applied to univariate and multivariate time series data, as well as time series embeddings. Using influenza-like illness (ILI) forecasting as a case study, we show that the forecasting results produced by our approach are favorably comparable to the state-of-the-art.

研究动机与目标

  • 动机:将时间序列预测作为健康监测和疾病监测中一个关键任务。
  • 提出一个用于单变量和多变量时间序列预测的通用基于Transformer的框架。
  • 证明Transformer能够通过嵌入同时建模观测数据和相空间动力学。
  • 以ARIMA、LSTM和Seq2Seq为基准,确立在ILI预测任务上的性能。

提出的方法

  • 使用具有编码器和解码器的Transformer架构来预测多步前瞻的ILI比率。
  • 通过一个全连接层进行输入映射,后跟位置编码和四层编码器/解码器。
  • 使用前瞻掩蔽和自定义学习率调度,在以10步输入窗口预测4步的情况下进行训练。
  • 使用Pearson相关性和RMSE进行评估,并与ARIMA、LSTM和Seq2Seq基线进行比较。
  • 尝试时间延迟嵌入(TDE)以捕捉相空间信息,并评估对性能的影响。

实验结果

研究问题

  • RQ1基于Transformer的模型能否在官方CDC数据上达到或超过最先进的ILI预测方法?
  • RQ2引入多变量特征(如周数、一次/二次差分)是否能提升Transformer对ILI的预测?
  • RQ3时间延迟嵌入是否能通过捕捉数据的相空间结构来提升Transformer的性能?
  • RQ4在单步前瞻ILI预测中,Transformer与带注意力的ARIMA、LSTM和Seq2Seq相比如何?
  • RQ5单个全局Transformer模型是否能够跨美国各州对国家级ILI预测实现泛化?

主要发现

  • Transformer-based forecasting achieves high correlation (e.g., US-level: 0.984) and low RMSE (0.3318) for one-step-ahead ILI prediction.
  • Transformer generally outperforms ARIMA in correlation and RMSE, and surpasses LSTM and Seq2Seq+attention in RMSE.
  • Including week number and difference features yields modest improvements, suggesting self-attention captures dependencies without extra features.
  • Time delay embeddings (TDE) with optimal dimensionality (8 in experiments) provide comparable or slightly improved performance over scalar inputs.
  • Compared with ARGONet, the Transformer achieves similar or slightly better correlation and RMSE on average across states.
  • A single global Transformer model trained on concatenated state data can generalize to country-level forecasting.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。