QUICK REVIEW

[论文解读] Deep Transformer Models for Time Series Forecasting: The Influenza Prevalence Case

Neo Wu, Bradley Green|arXiv (Cornell University)|Jan 23, 2020

Anomaly Detection Techniques and Applications参考文献 21被引用 349

一句话总结

论文提出一种基于Transformer的方法用于时间序列预测，在流感样疾病ILI数据上进行评估，并与ARIMA、LSTM和Seq2Seq模型进行比较，显示出具有竞争力或更优的性能。

ABSTRACT

In this paper, we present a new approach to time series forecasting. Time series data are prevalent in many scientific and engineering disciplines. Time series forecasting is a crucial task in modeling time series data, and is an important area of machine learning. In this work we developed a novel method that employs Transformer-based machine learning models to forecast time series data. This approach works by leveraging self-attention mechanisms to learn complex patterns and dynamics from time series data. Moreover, it is a generic framework and can be applied to univariate and multivariate time series data, as well as time series embeddings. Using influenza-like illness (ILI) forecasting as a case study, we show that the forecasting results produced by our approach are favorably comparable to the state-of-the-art.

研究动机与目标

动机：将时间序列预测作为健康监测和疾病监测中一个关键任务。
提出一个用于单变量和多变量时间序列预测的通用基于Transformer的框架。
证明Transformer能够通过嵌入同时建模观测数据和相空间动力学。
以ARIMA、LSTM和Seq2Seq为基准，确立在ILI预测任务上的性能。

提出的方法

使用具有编码器和解码器的Transformer架构来预测多步前瞻的ILI比率。
通过一个全连接层进行输入映射，后跟位置编码和四层编码器/解码器。
使用前瞻掩蔽和自定义学习率调度，在以10步输入窗口预测4步的情况下进行训练。
使用Pearson相关性和RMSE进行评估，并与ARIMA、LSTM和Seq2Seq基线进行比较。
尝试时间延迟嵌入（TDE）以捕捉相空间信息，并评估对性能的影响。

实验结果

研究问题

RQ1基于Transformer的模型能否在官方CDC数据上达到或超过最先进的ILI预测方法？
RQ2引入多变量特征（如周数、一次/二次差分）是否能提升Transformer对ILI的预测？
RQ3时间延迟嵌入是否能通过捕捉数据的相空间结构来提升Transformer的性能？
RQ4在单步前瞻ILI预测中，Transformer与带注意力的ARIMA、LSTM和Seq2Seq相比如何？
RQ5单个全局Transformer模型是否能够跨美国各州对国家级ILI预测实现泛化？

主要发现

Transformer-based forecasting achieves high correlation (e.g., US-level: 0.984) and low RMSE (0.3318) for one-step-ahead ILI prediction.
Transformer generally outperforms ARIMA in correlation and RMSE, and surpasses LSTM and Seq2Seq+attention in RMSE.
Including week number and difference features yields modest improvements, suggesting self-attention captures dependencies without extra features.
Time delay embeddings (TDE) with optimal dimensionality (8 in experiments) provide comparable or slightly improved performance over scalar inputs.
Compared with ARGONet, the Transformer achieves similar or slightly better correlation and RMSE on average across states.
A single global Transformer model trained on concatenated state data can generalize to country-level forecasting.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。