QUICK REVIEW

[论文解读] Scaleformer: Iterative Multi-scale Refining Transformers for Time Series Forecasting

Amin Shabani, Amir H. Abdi|arXiv (Cornell University)|Jun 8, 2022

Time Series Analysis and Forecasting被引用 36

一句话总结

Scaleformer 引入了带有跨尺度归一化的迭代多尺度细化框架，可以嵌入现有的基于 transformer 的时间序列模型，在开销极小的同时带来显著的 MSE/MAE 提升。

ABSTRACT

The performance of time series forecasting has recently been greatly improved by the introduction of transformers. In this paper, we propose a general multi-scale framework that can be applied to the state-of-the-art transformer-based time series forecasting models (FEDformer, Autoformer, etc.). By iteratively refining a forecasted time series at multiple scales with shared weights, introducing architecture adaptations, and a specially-designed normalization scheme, we are able to achieve significant performance improvements, from 5.5% to 38.5% across datasets and transformer architectures, with minimal additional computational overhead. Via detailed ablation studies, we demonstrate the effectiveness of each of our contributions across the architecture and methodology. Furthermore, our experiments on various public datasets demonstrate that the proposed improvements outperform their corresponding baseline counterparts. Our code is publicly available in https://github.com/BorealisAI/scaleformer.

研究动机与目标

激发对时间序列预测的尺度感知处理，以捕捉尺度间的依赖关系。
提出一个通用、与架构无关的多尺度细化框架，可以应用于 transformer 骨干（如 FEDformer、Autoformer）。
引入跨尺度归一化，以在迭代细化过程中减轻跨尺度和窗口之间的分布漂移。
通过消融实验和比较，在多个数据集和骨干网络上展示经验提升。

提出的方法

定义一组时间尺度，包含下采样（s 及 s 的幂）以及从最小尺度到原始尺度的迭代细化。
在每个尺度应用相同的 Transformer 模块，编码器输入来自下采样的回看，解码器输入通过上采样的先前输出。
引入跨尺度归一化，使用移动平均统计量对编码器/解码器输入进行居中，以减少跨尺度的分布漂移。
使用数值、时间以及尺度感知的固定位置嵌入对输入进行嵧嵌入。
使用自适应损失（Barron, 2019）f(x, alpha, c) 进行端到端学习，在存在离群值时替代标准 MSE。

实验结果

研究问题

RQ1具有共享权重的迭代多尺度细化是否能在不同的 transformer 骨干上提升预测准确性？
RQ2跨尺度归一化是否能有效稳定训练并防止跨尺度的误差传播？
RQ3在 diverse 数据集上，尺度多尺度细化对基线模型如 FEDformer、Autoformer、Informer、Reformer、Performer 的性能提升多少？

主要发现

在骨干模型上应用 Scaleformer 时，均方误差（MSE）的降低范围为 5.5% 至 38.5%。
相较基线的平均提升在 MSE 为 5.6%（FEDFormer）、13.5%（Autoformer）、38.5%（Informer），并有相应的 MAE 提升。
跨尺度归一化是关键；若没有它，多尺度变体在许多情况下表现不佳，而仅使用一个尺度的归一化也对某些模型有帮助。
消融研究表明，将多尺度细化与自适应损失结合，在所有数据集上实现最佳性能。
该框架的参数量与基线相近，计算开销适中，且在多个数据集（Electricity、Weather、Exchange-rate、Traffic、ILI）上具有扩展性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。