[论文解读] Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN
IndRNN 在同一层内部引入独立神经元,通过调节循环权重并使用非饱和激活函数如 ReLU,能够训练更长更深的 RNN。它在长序列建模和深层结构方面,相较传统 RNNs 和 LSTMs 展示出更优的性能。
Recurrent neural networks (RNNs) have been widely used for processing sequential data. However, RNNs are commonly difficult to train due to the well-known gradient vanishing and exploding problems and hard to learn long-term patterns. Long short-term memory (LSTM) and gated recurrent unit (GRU) were developed to address these problems, but the use of hyperbolic tangent and the sigmoid action functions results in gradient decay over layers. Consequently, construction of an efficiently trainable deep network is challenging. In addition, all the neurons in an RNN layer are entangled together and their behaviour is hard to interpret. To address these problems, a new type of RNN, referred to as independently recurrent neural network (IndRNN), is proposed in this paper, where neurons in the same layer are independent of each other and they are connected across layers. We have shown that an IndRNN can be easily regulated to prevent the gradient exploding and vanishing problems while allowing the network to learn long-term dependencies. Moreover, an IndRNN can work with non-saturated activation functions such as relu (rectified linear unit) and be still trained robustly. Multiple IndRNNs can be stacked to construct a network that is deeper than the existing RNNs. Experimental results have shown that the proposed IndRNN is able to process very long sequences (over 5000 time steps), can be used to construct very deep networks (21 layers used in the experiment) and still be trained robustly. Better performances have been achieved on various tasks by using IndRNNs compared with the traditional RNN and LSTM. The code is available at https://github.com/Sunnydreamrain/IndRNN_Theano_Lasagne.
研究动机与目标
- 激发传统 RNN 在学习长期依赖时由于梯度消失/梯度爆炸而受到的限制。
- 提出在同一层中神经元独立并跨层连接的 IndRNN,以提高可训练性和可解释性。
- 证明 IndRNN 可以使用非饱和激活(如 ReLU)并支持深层、残差结构。
- 通过实验证明在需要长序列与深层模型的任务上,IndRNN 优于传统 RNNs 和 LSTMs。
提出的方法
- 用 h_t = sigma(W x_t + u ⊙ h_{t-1} + b) 定义 IndRNN,其中 u 是循环权重向量,⊙ 是哈达玛积。
- IndRNN 的时序反向传播会产生涉及 u^({T-t}) 和 sigma' 的梯度,从而能够对梯度流动进行显式调控。
- 通过将 |u_n| 限制在一个合理范围内,推导记忆保持界限以防止梯度消失和爆炸。
- 证明将多个 IndRNN 层(包括残差连接)堆叠以构建非常深的网络。
- 将 IndRNN 扩展为卷积变体,并整合批归一化与残差块以提升稳定性和性能。
- 给出由于层内独立性而导致的神经元行为解释。
实验结果
研究问题
- RQ1IndRNN 能否维持超越传统 RNN 与 LSTMs 的长期依赖?
- RQ2将同一层中的神经元独立并调节循环权重,是否能够实现更深更长的网络训练?
- RQ3非饱和激活(如 ReLU)如何影响梯度流动和 IndRNN 的训练鲁棒性?
- RQ4IndRNN 在需要长序列、深层结构以及语言建模、MNIST、动作识别等多领域任务上取得了哪些性能提升?
主要发现
- IndRNN 能处理非常长的序列(超过 5000 个时间步)。
- IndRNN 使非常深的网络成为可能(在语言建模中显示可达 21 层)。
- 使用 ReLU 的 IndRNN 训练稳健,在若干任务上优于传统 RNNs 和 LSTM。
- 当安排得当时,两层 IndRNN 具有独立神经元可以表示传统 RNN 的行为。
- 残差 IndRNN 架构有利于训练更深的网络并提升性能。
- IndRNN 在顺序 MNIST、字符级和词级语言建模,以及基于骨架的动作识别(NTU RGB+D)上取得优异结果。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。