QUICK REVIEW

[论文解读] Resurrecting Recurrent Neural Networks for Long Sequences

Antonio Orvieto, Samuel Smith|arXiv (Cornell University)|Mar 11, 2023

Neural Networks and Applications被引用 43

一句话总结

本论文表明，通过对深度 RNN 架构使用 Linear Recurrent Units (LRU)，通过仔细的线性化、对角化、稳定初始化和归一化，可以在像 Long Range Arena 这样的远距离序列任务上匹配深度状态空间模型，同时保持训练效率。

ABSTRACT

Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are hard to optimize and slow to train. Deep state-space models (SSMs) have recently been shown to perform remarkably well on long sequence modeling tasks, and have the added benefits of fast parallelizable training and RNN-like fast inference. However, while SSMs are superficially similar to RNNs, there are important differences that make it unclear where their performance boost over RNNs comes from. In this paper, we show that careful design of deep RNNs using standard signal propagation arguments can recover the impressive performance of deep SSMs on long-range reasoning tasks, while also matching their training speed. To achieve this, we analyze and ablate a series of changes to standard RNNs including linearizing and diagonalizing the recurrence, using better parameterizations and initializations, and ensuring proper normalization of the forward pass. Our results provide new insights on the origins of the impressive performance of deep SSMs, while also introducing an RNN block called the Linear Recurrent Unit that matches both their performance on the Long Range Arena benchmark and their computational efficiency.

研究动机与目标

推动研究在 Transformers 和 SSM 已显示出优势的领域，使用 RNNs 进行长距离序列建模。
明确深度 RNNs 是否能够在长距离任务上赶上深度 SSMs。
分离并消融会影响 RNN 长距离推理的架构与初始化选择。
提供一个有原则性、可扩展的 RNN 设计（LRU），在长序列上实现有竞争力的性能与效率。

提出的方法

在 Long Range Arena 基准上，将普通 RNN 与像 S4 这样的 SSM 进行比较。
用线性 RNN 层替换 SSM 层，并堆叠非线性 MLP 块以形成 Linear Recurrent Unit (LRU)。
证明在递归中去除非线性对深层结构的性能提升有帮助。
引入复数对角递归矩阵和指数参数化以稳定学习并实现长距离建模。
在非常长的序列任务上应用隐藏激活的归一化以稳定训练。
提供消融研究，展示对角化、初始化谱和归一化对任务性能的影响。

Figure 1: (Left) Deep Linear Recurrent Unit (LRU) architecture introduced in this paper, inspired by S4 (Gu et al., 2021a ) . The model is a stack of LRU blocks, with nonlinear projections in between, and also uses skip connections and normalization methods like batch/layer normalization. We expand

实验结果

研究问题

RQ1深度 RNNs 能否在长距离推理任务上匹配深度连续时间 SSM 的性能？
RQ2需要哪些架构和初始化的改动才能让 RNN 达到类似 SSM 的性能？
RQ3带有适当对角化和归一化的线性递推是否能够实现对长序列的高效训练？
RQ4特征值的稳定化策略如何影响 LRUs 的长距离依赖学习？

主要发现

在若干 Long Range Arena 任务上，深线性递归且不含递归非线性项，可以超越带非线性的 RNN 变体。
用复数对角矩阵对递推进行对角化可加速训练，并能在 LRA 任务上达到 S4/S5 的性能。
对角谱的指数参数化实现了稳定训练并提升长距离推理，特别是在像 Pathfinder 这样的更难任务上。
在前向传递中对隐藏激活进行归一化对于缩小与深度 SSM 在长距离任务上的差距至关重要。
在以接近单位圆盘的谱初始化并结合归一化的情况下，LRU 在 LRA 基准上实现了与深度 SSMs 的有竞争力的性能。

Figure 4: Evolution of $x\in\mathbb{R}^{3}$ under impulse input $u=(1,0,0,\dots,0)\in\mathbb{R}^{16k}$ . Plotted in different colors are the 3 components of $x$ . $\Lambda$ has parameters $\nu_{j}=0.00005$ and $\theta_{j}$ sampled uniformly in $[0,2\pi]$ or with small phase $[0,\pi/50]$ . For small

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。