QUICK REVIEW

[论文解读] Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting

Shiyang Li, Xiaoyong Jin|arXiv (Cornell University)|Jun 29, 2019

Time Series Analysis and Forecasting参考文献 35被引用 1,006

一句话总结

该论文提出卷积自注意力和 LogSparse Transformer，提升局部上下文感知并降低内存成本，使 Transformer 基于时间序列的预测在内存受限下具有长期依赖性。

ABSTRACT

Time series forecasting is an important problem across many domains, including predictions of solar plant energy output, electricity consumption, and traffic jam situation. In this paper, we propose to tackle such forecasting problem with Transformer [1]. Although impressed by its performance in our preliminary study, we found its two major weaknesses: (1) locality-agnostics: the point-wise dot-product self-attention in canonical Transformer architecture is insensitive to local context, which can make the model prone to anomalies in time series; (2) memory bottleneck: space complexity of canonical Transformer grows quadratically with sequence length $L$, making directly modeling long time series infeasible. In order to solve these two issues, we first propose convolutional self-attention by producing queries and keys with causal convolution so that local context can be better incorporated into attention mechanism. Then, we propose LogSparse Transformer with only $O(L(\log L)^{2})$ memory cost, improving forecasting accuracy for time series with fine granularity and strong long-term dependencies under constrained memory budget. Our experiments on both synthetic data and real-world datasets show that it compares favorably to the state-of-the-art.

研究动机与目标

推动在时间序列预测中使用 Transformer 架构，以捕捉长期和短期依赖。
通过因果卷积引入局部上下文，解决对局部性敏感的自注意力问题。
缓解标准 Transformer 的内存瓶颈，以实现对长且细粒度时间序列的建模。
在受限内存条件下，展示在合成数据和真实世界数据集上的预测性能提升。

提出的方法

通过因果卷积生成查询和值来引入局部上下文，从而引入卷积自注意力。
将 canonical 自注意力推广到核大小为 k 的形式，其中 k=1 时恢复为标准注意力。
提出 LogSparse Transformer，使每个单元的注意力仅限于 O(log L) 个前置位置，从而获得 O(L (log L)^2) 的内存。
理论上证明通过 O(log L) 层，信息可以从任意过去位置流向任意当前位置信息。
公开局部注意力和重启注意力变体，以进一步改善信息流动和效率。
在合成数据和真实数据集上对比基线的实验，包括滚动窗口预测和基于水平的任务。

实验结果

研究问题

RQ1卷积自注意力是否能在时间序列中提升局部性感知与相对于标准 Transformer 的预测准确性？
RQ2LogSparse Transformer 在显著降低内存使用的同时，是否能在长时间、细粒度时间序列上保持或提升预测性能？
RQ3核大小和稀疏模式如何影响具有不同长期依赖的数据集上的学习动力学和预测准确性？
RQ4与全注意力相比，局部感知注意力对训练收敛和模型效率的影响？

主要发现

卷积自注意力通过在查询-键匹配中利用局部上下文来提升预测准确性。
LogSparse Transformer 实现了 O(L (log L)^2) 的内存，在内存受限条件下实现长时间、细粒度时间序列建模。
卷积自注意力中更大的核大小在具有强长期依赖的挑战性数据集上带来显著提升。
实验结果显示在合成数据和真实世界数据集上，所提出方法相对于最先进基线具有良好表现。
卷积自注意力加速训练并降低训练损失，表明优化更容易。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。