[论文解读] A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction
介绍了 DA-RNN,使用输入注意力编码器和时序注意力解码器,通过选择相关驱动序列和长期时序依赖来改进时间序列预测;在 SML 2010 和 NASDAQ 100 数据集上达到最先进的结果。
The Nonlinear autoregressive exogenous (NARX) model, which predicts the current value of a time series based upon its previous values as well as the current and past values of multiple driving (exogenous) series, has been studied for decades. Despite the fact that various NARX models have been developed, few of them can capture the long-term temporal dependencies appropriately and select the relevant driving series to make predictions. In this paper, we propose a dual-stage attention-based recurrent neural network (DA-RNN) to address these two issues. In the first stage, we introduce an input attention mechanism to adaptively extract relevant driving series (a.k.a., input features) at each time step by referring to the previous encoder hidden state. In the second stage, we use a temporal attention mechanism to select relevant encoder hidden states across all time steps. With this dual-stage attention scheme, our model can not only make predictions effectively, but can also be easily interpreted. Thorough empirical studies based upon the SML 2010 dataset and the NASDAQ 100 Stock dataset demonstrate that the DA-RNN can outperform state-of-the-art methods for time series prediction.
研究动机与目标
- 动机:在具有多个外生驱动序列的时间序列预测中建立动机(NARX 设置)。
- Develop a model that can automatically select relevant input features (driving series) at each time step.
- Capture long-term temporal dependencies by selecting encoder hidden states across time steps.
- Provide an interpretable mechanism to understand which inputs and time steps influence predictions.
- Demonstrate robustness to noisy inputs and compare against state-of-the-art baselines.
提出的方法
- 提出一种基于双阶段注意力的RNN(DA-RNN),并整合了LSTM单元。
- 编码器使用输入注意力机制在每个时间步对驱动序列进行加权,并为编码器产生 \\tilde{x}_t。
- 解码器使用时序注意力机制,将编码器隐藏状态 h_i 在不同时序上加权求和,得到上下文向量 c_t。
- 预测通过线性映射将解码器状态 d_T 与上下文 c_T 结合,得到 \\hat{y}_T。
- 训练在 TensorFlow 中使用均方误差损失并采用 Adam 优化器。
- 关键方程包括:e_t^k = v_e^T tanh(W_e [h_{t-1}; s_{t-1}] + U_e x^k) 和 \\alpha_t^k = softmax(e_t^k); \\tilde{x}_t = (\\alpha_t^1 x_t^1, ..., \\alpha_t^n x_t^n)^T; 编码器通过 LSTM 更新;时序注意力:l_t^i = v_d^T tanh(W_d [d_{t-1}; s'_{t-1}] + U_d h_i) 和 \\beta_t^i = softmax(l_t^i); c_t = sum_i \beta_t^i h_i; 最终输出使用 y_tilde 和 d_t 通过线性映射得到 \\hat{y}_T。
实验结果
研究问题
- RQ1双阶段注意力在具有大量外生输入的 NARX 风格时间序列中能否提高预测准确性?
- RQ2在每个时间步进行输入特征选择是否提高对嘈杂驱动序列的鲁棒性?
- RQ3对编码器状态的时序注意力是否有效捕捉时间序列预测中的长期依赖?
- RQ4在标准时间序列基准测试中,DA-RNN 与编码器-解码器和基于注意力的 RNN 相比如何?
主要发现
| 模型 | SML MAE | SML MAPE | SML RMSE | NASDAQ MAE | NASDAQ MAPE | NASDAQ RMSE |
|---|---|---|---|---|---|---|
| ARIMA | 1.95 | 9.29 | 2.65 | 0.91 | 1.84 | 1.45 |
| NARX RNN | 1.79 | 8.64 | 2.34 | 0.75 | 1.51 | 0.98 |
| Encoder-Decoder (64) | 2.59 | 12.1 | 3.37 | 0.97 | 1.96 | 1.27 |
| Encoder-Decoder (128) | 1.91 | 9.00 | 2.52 | 0.72 | 1.46 | 1.00 |
| Attention RNN (64) | 1.78 | 8.46 | 2.32 | 0.76 | 1.54 | 1.00 |
| Attention RNN (128) | 1.77 | 8.45 | 2.33 | 0.71 | 1.43 | 0.96 |
| Input-Attn-RNN (64) | 1.88 | 8.89 | 2.50 | 0.28 | 0.57 | 0.41 |
| Input-Attn-RNN (128) | 1.70 | 8.09 | 2.24 | 0.26 | 0.53 | 0.39 |
| DA-RNN (64) | 1.53 | 7.31 | 2.02 | 0.21 | 0.43 | 0.31 |
| DA-RNN (128) | 1.50 | 7.14 | 1.97 | 0.22 | 0.45 | 0.33 |
- 与基线相比,DA-RNN 在 MAE、MAPE 和 RMSE 上在所有数据集都取得最佳性能。
- 输入注意力有助于有选择性地强调相关的驱动序列,提高对嘈杂输入的鲁棒性。
- 时序注意力通过专注于跨时间步的显著编码器状态来实现对长期依赖的利用。
- DA-RNN 的性能优于编码器-解码器和注意力 RNN,且输入注意力与时序注意力的组合带来最强的结果。
- 在 NASDAQ 100 上,DA-RNN (128) 达到 MAE 0.22, MAPE 0.45%, RMSE 0.33;在 SML 2010 上,DA-RNN (128) 达到 MAE 1.50, MAPE 7.14%, RMSE 1.97。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。