Skip to main content
QUICK REVIEW

[論文レビュー] Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN

Shuai Li, Wanqing Li|arXiv (Cornell University)|Mar 13, 2018
Neural Networks and Applications参考文献 50被引用数 125
ひとこと要約

IndRNNは層内に独立したニューロンを導入し、再帰的重みを制御し、ReLUのような飽和しない活性化を用いることで、はるかに長く深いRNNの訓練を可能にする。従来のRNNやLSTMよりも長いシーケンスのモデリングと深いアーキテクチャを示す。

ABSTRACT

Recurrent neural networks (RNNs) have been widely used for processing sequential data. However, RNNs are commonly difficult to train due to the well-known gradient vanishing and exploding problems and hard to learn long-term patterns. Long short-term memory (LSTM) and gated recurrent unit (GRU) were developed to address these problems, but the use of hyperbolic tangent and the sigmoid action functions results in gradient decay over layers. Consequently, construction of an efficiently trainable deep network is challenging. In addition, all the neurons in an RNN layer are entangled together and their behaviour is hard to interpret. To address these problems, a new type of RNN, referred to as independently recurrent neural network (IndRNN), is proposed in this paper, where neurons in the same layer are independent of each other and they are connected across layers. We have shown that an IndRNN can be easily regulated to prevent the gradient exploding and vanishing problems while allowing the network to learn long-term dependencies. Moreover, an IndRNN can work with non-saturated activation functions such as relu (rectified linear unit) and be still trained robustly. Multiple IndRNNs can be stacked to construct a network that is deeper than the existing RNNs. Experimental results have shown that the proposed IndRNN is able to process very long sequences (over 5000 time steps), can be used to construct very deep networks (21 layers used in the experiment) and still be trained robustly. Better performances have been achieved on various tasks by using IndRNNs compared with the traditional RNN and LSTM. The code is available at https://github.com/Sunnydreamrain/IndRNN_Theano_Lasagne.

研究の動機と目的

  • 従来のRNNが勾配消失/爆発により長期依存の学習に限界があることを動機づける。
  • 同じ層のニューロンを独立させ、層間で結合させることで訓練性と解釈性を改善するIndRNNを提案する。
  • IndRNNが飽和していない活性化関数(例:ReLU)を使用でき、深い残差アーキテクチャをサポートすることを示す。
  • 長いシーケンスと深いモデルを要するタスクで、IndRNNが従来のRNNやLSTMを上回ることを実験を通じて示す。

提案手法

  • Define IndRNN with h_t = sigma(W x_t + u ⊙ h_{t-1} + b), where u is a vector of recurrent weights and ⊙ is the Hadamard product.
  • Backpropagation through time for IndRNN yields gradients involving u^({T-t}) and sigma' terms, enabling explicit regulation of gradient flow.
  • Derive memory retention bounds to prevent gradient vanishing and exploding by constraining |u_n| within a justified range.
  • Demonstrate stacking multiple IndRNN layers (including residual connections) to construct very deep networks.
  • Extend IndRNN to convolutional variants and integrate batch normalization and residual blocks for stability and performance.
  • Provide interpretations of neuron behavior due to independence within layers.

実験結果

リサーチクエスチョン

  • RQ1Can IndRNN maintain long-term dependencies beyond traditional RNNs and LSTMs?
  • RQ2Does making neurons independent within a layer and regulating recurrent weights enable training of much deeper and longer networks?
  • RQ3How do non-saturated activations (e.g., ReLU) impact gradient flow and training robustness in IndRNN?
  • RQ4What performance gains do IndRNNs achieve on tasks requiring long sequences, deep architectures, and various domains (language modeling, MNIST, action recognition)?

主な発見

  • IndRNN can process very long sequences (over 5000 time steps).
  • IndRNN enables very deep networks (up to 21 layers shown with language modeling).
  • IndRNN with ReLU trains robustly and outperforms traditional RNNs and LSTM on several tasks.
  • Two-layer IndRNN with independent neurons can represent traditional RNN behavior when appropriately arranged.
  • Residual IndRNN architectures facilitate training of deeper networks and improve performance.
  • IndRNN achieves strong results on sequential MNIST, character-level and word-level language modeling, and skeleton-based action recognition (NTU RGB+D).

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。