QUICK REVIEW

[论文解读] Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Haoyi Zhou, Shanghang Zhang|arXiv (Cornell University)|Dec 14, 2020

Time Series Analysis and Forecasting参考文献 57被引用 462

一句话总结

Informer 引入 ProbSparse 自注意力、注意力蒸馏，以及生成式风格解码器，以在基于 Transformer 的模型中实现高效、可扩展的长序列时间序列预测。

ABSTRACT

Many real-world applications require the prediction of long sequence time-series, such as electricity consumption planning. Long sequence time-series forecasting (LSTF) demands a high prediction capacity of the model, which is the ability to capture precise long-range dependency coupling between output and input efficiently. Recent studies have shown the potential of Transformer to increase the prediction capacity. However, there are several severe issues with Transformer that prevent it from being directly applicable to LSTF, including quadratic time complexity, high memory usage, and inherent limitation of the encoder-decoder architecture. To address these issues, we design an efficient transformer-based model for LSTF, named Informer, with three distinctive characteristics: (i) a $ProbSparse$ self-attention mechanism, which achieves $O(L \log L)$ in time complexity and memory usage, and has comparable performance on sequences' dependency alignment. (ii) the self-attention distilling highlights dominating attention by halving cascading layer input, and efficiently handles extreme long input sequences. (iii) the generative style decoder, while conceptually simple, predicts the long time-series sequences at one forward operation rather than a step-by-step way, which drastically improves the inference speed of long-sequence predictions. Extensive experiments on four large-scale datasets demonstrate that Informer significantly outperforms existing methods and provides a new solution to the LSTF problem.

研究动机与目标

推动长序列时间序列预测（LSTF）及其预测能力挑战。
开发一个在计算和内存方面对 LSTF 友好的 Transformer 基模型。
提出机制以在不产生二次方成本的情况下改善对长期依赖的捕捉。
在大型真实世界数据集上演示实用且可扩展的预测。

提出的方法

用 ProbSparse 自注意力替换标准自注意力，使时间复杂度和内存达到 O(L log L)。
引入自注意力蒸馏，以突出主导的注意力并通过逐层下采样降低内存。
使用生成式风格解码器，在一次前向传播中预测长输出序列，从而减少推理时间和误差累积。
提供针对 LSTF 的编码-解码架构，以及能增强全局和局部时间上下文的输入表示。
在目标序列上使用均方误差损失进行训练，并在单变量和多变量预测任务上进行评估。

实验结果

研究问题

RQ1对于时间序列预测中的非常长的输入/输出序列，Transformer 风格的模型是否能实现计算和内存上的高效？
RQ2ProbSparse 自注意力、注意力蒸馏和生成解码器是否能共同提升 LSTF 的准确性和效率？
RQ3Informer 在真实世界数据集上的单变量与多变量长时域预测表现如何？

主要发现

Informer 在多个数据集和不同预测长度上显著提升了预测性能。
ProbSparse 自注意力将计算量和内存从二次方降低到接近线性，同时保持竞争的依赖关系对齐。
自注意力蒸馏显著降低了编码器内存，同时保留或提升对长程信息的处理。
生成式风格解码器使得一次前向传播即可获得长序列输出，提升推理速度并减缓误差传播。
消融研究显示 ProbSparse 机制和蒸馏方法在各种配置下的有效性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。