[Paper Review] Gated Feedback Recurrent Neural Networks
This paper proposes Gated-Feedback Recurrent Neural Networks (GF-RNN), a novel deep RNN architecture that enhances stacked recurrent networks by introducing adaptive, learnable feedback connections from higher to lower layers via a global gating mechanism. The method improves modeling of long-term dependencies and hierarchical sequence structures, achieving state-of-the-art performance on character-level language modeling and Python program evaluation tasks with faster convergence and better generalization than standard stacked RNNs.
In this work, we propose a novel recurrent neural network (RNN) architecture. The proposed RNN, gated-feedback RNN (GF-RNN), extends the existing approach of stacking multiple recurrent layers by allowing and controlling signals flowing from upper recurrent layers to lower layers using a global gating unit for each pair of layers. The recurrent signals exchanged between layers are gated adaptively based on the previous hidden states and the current input. We evaluated the proposed GF-RNN with different types of recurrent units, such as tanh, long short-term memory and gated recurrent units, on the tasks of character-level language modeling and Python program evaluation. Our empirical evaluation of different RNN units, revealed that in both tasks, the GF-RNN outperforms the conventional approaches to build deep stacked RNNs. We suggest that the improvement arises because the GF-RNN can adaptively assign different layers to different timescales and layer-to-layer interactions (including the top-down ones which are not usually present in a stacked RNN) by learning to gate these interactions.
Motivation & Objective
- To address the challenge of modeling long-term dependencies in sequential data using deep recurrent networks.
- To improve the representational capacity of stacked RNNs by enabling adaptive, top-down feedback signals between layers.
- To investigate whether learnable gating of inter-layer feedback can enhance performance on complex sequence modeling tasks.
- To evaluate the scalability and efficiency of the proposed architecture on large-scale sequence modeling benchmarks.
Proposed method
- The GF-RNN architecture stacks multiple recurrent layers and introduces a global gating unit for each adjacent layer pair to control feedback signals from upper to lower layers.
- The gating mechanism adaptively modulates the strength of feedback connections based on the current input and previous hidden states, enabling dynamic control of inter-layer interactions.
- The model uses standard RNN units such as LSTM, GRU, or tanh, but extends them with gated feedback to allow top-down information flow not present in standard stacked RNNs.
- The feedback connections are fully differentiable and trained end-to-end using backpropagation through time, with the gating unit parameterized as a learned function of input and hidden states.
- The architecture supports both residual and non-residual connections, allowing for stable training of deep networks with feedback pathways.
- Experiments use Adam optimization with learning rate 0.001 and $eta_1=0.9$, $eta_2=0.99$ for training on character-level language modeling and Python program evaluation.
Experimental results
Research questions
- RQ1Can adaptive feedback connections between stacked RNN layers improve modeling of long-term dependencies in sequential data?
- RQ2Does the introduction of top-down feedback via learnable gates enhance performance on complex sequence tasks compared to standard stacked RNNs?
- RQ3How does the GF-RNN architecture scale in performance and training efficiency when applied to deep networks with multiple layers?
- RQ4What is the impact of different recurrent units (LSTM, GRU, tanh) when combined with gated feedback connections?
Key findings
- The GF-RNN outperformed standard stacked RNNs on character-level language modeling, achieving a test set BPC of 1.58 on the Hutter dataset, which is better than the previously reported best result of 1.60 from a multiplicative RNN.
- On the Python program evaluation task, GF-RNNs significantly outperformed stacked RNNs, especially on sequences with high nesting levels or long lengths, as shown by red and yellow regions in the accuracy gap heatmap indicating large performance gains.
- The GF-RNN with five stacked LSTM layers (700 units each) achieved state-of-the-art performance on character-level language modeling, demonstrating scalability and strong generalization.
- The model trained faster in wall-clock time than standard stacked RNNs with equivalent capacity, indicating improved training efficiency.
- The performance gain was most pronounced when using LSTM or GRU units, while the GF-RNN with tanh units showed a performance deterioration, suggesting sensitivity to activation function choice.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.