QUICK REVIEW

[Paper Review] Improving performance of recurrent neural network with relu nonlinearity

Sachin S. Talathi, Aniket Vartak|arXiv (Cornell University)|Nov 12, 2015

Neural Networks and Applications23 references66 citations

TL;DR

This paper proposes a normalized positive-definite weight initialization strategy for ReLU-based recurrent neural networks (np-RNN), motivated by a dynamical systems analysis of identity matrix initialization in IRNNs. The method reduces hidden state sensitivity to input perturbations, leading to more stable training and improved performance on long-range sequence tasks, including the MNIST pixel sequence task and UCF-101 action recognition, where np-RNN achieves 75.2% test accuracy—surpassing IRNN and iRNN and approaching LSTM performance with lower complexity.

ABSTRACT

In recent years significant progress has been made in successfully training recurrent neural networks (RNNs) on sequence learning problems involving long range temporal dependencies. The progress has been made on three fronts: (a) Algorithmic improvements involving sophisticated optimization techniques, (b) network design involving complex hidden layer nodes and specialized recurrent layer connections and (c) weight initialization methods. In this paper, we focus on recently proposed weight initialization with identity matrix for the recurrent weights in a RNN. This initialization is specifically proposed for hidden nodes with Rectified Linear Unit (ReLU) non linearity. We offer a simple dynamical systems perspective on weight initialization process, which allows us to propose a modified weight initialization strategy. We show that this initialization technique leads to successfully training RNNs composed of ReLUs. We demonstrate that our proposal produces comparable or better solution for three toy problems involving long range temporal structure: the addition problem, the multiplication problem and the MNIST classification problem using sequence of pixels. In addition, we present results for a benchmark action recognition problem.

Motivation & Objective

To investigate the dynamical systems behavior of identity matrix initialization in ReLU-based RNNs (IRNN) and its impact on training stability.
To address the sensitivity of IRNN hidden states to input perturbations, which increases hyperparameter dependence.
To propose a new weight initialization strategy that stabilizes hidden state dynamics by collapsing them to a one-dimensional manifold.
To evaluate the proposed np-RNN on toy problems and real-world benchmarks, comparing performance to IRNN, iRNN, and LSTM.
To develop a low-complexity RNN alternative to LSTMs with comparable performance on sequence learning tasks.

Proposed method

Proposes a normalized positive-definite weight matrix for recurrent weights in ReLU RNNs, derived from a dynamical systems analysis of identity initialization.
Analyzes the fixed-point dynamics of ReLU RNNs under identity initialization, identifying neutral stability and high sensitivity to input perturbations.
Designs the np-RNN initialization to reduce dynamical sensitivity by constraining the recurrent weight matrix to a normalized positive-definite form.
Employs RMSProp optimization with learning rate scheduling and dropout for regularization in all RNN models.
Uses pre-trained GoogLeNet features from ImageNet as input for the UCF-101 action recognition benchmark.
Performs grid search over learning rates (10⁻⁵ to 10⁻²) and dropout rates (0.5, 0.7, 0.9) to tune hyperparameters.

Experimental results

Research questions

RQ1How does identity matrix initialization in ReLU RNNs affect the dynamical stability of hidden states?
RQ2Why is IRNN performance highly sensitive to hyperparameter choices, and can this be mitigated?
RQ3Can a modified weight initialization strategy reduce hidden state sensitivity and improve training robustness?
RQ4Does the proposed np-RNN achieve better or comparable performance than IRNN and iRNN on long-range temporal sequence tasks?
RQ5Can np-RNN match LSTM performance on real-world benchmarks while maintaining lower model complexity?

Key findings

The np-RNN achieves 75.2% test accuracy on the UCF-101 action recognition benchmark, outperforming IRNN (67%) and iRNN (56.6%).
The np-RNN's performance is more robust to hyperparameter choices than IRNN and iRNN, as shown by validation accuracy plots across learning rate and dropout values.
On the MNIST pixel sequence task, np-RNN demonstrates comparable or better performance than IRNN and iRNN across all evaluated benchmarks.
The normalized positive-definite initialization reduces hidden state sensitivity to input perturbations, leading to more stable training dynamics.
The np-RNN achieves performance close to LSTM (78.5% test accuracy) while maintaining lower computational complexity than LSTM, which has four times more parameters.
The proposed method provides a stable, low-complexity alternative to LSTMs for sequence modeling on mobile platforms.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.