QUICK REVIEW

[论文解读] Understanding LSTM -- a tutorial into Long Short-Term Memory Recurrent Neural Networks

Ralf C. Staudemeyer, Eric Rothstein Morris|arXiv (Cornell University)|Sep 12, 2019

Neural Networks and Applications被引用 497

一句话总结

简要摘要：一篇回顾 LSTM-RNN 演进、统一记法并阐述早期出版物中的核心概念的教程，其中包括 FFNNs、backpropagation、以及循环网络的训练。

ABSTRACT

Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN) are one of the most powerful dynamic classifiers publicly known. The network itself and the related learning algorithms are reasonably well documented to get an idea how it works. This paper will shed more light into understanding how LSTM-RNNs evolved and why they work impressively well, focusing on the early, ground-breaking publications. We significantly improved documentation and fixed a number of errors and inconsistencies that accumulated in previous publications. To support understanding we as well revised and unified the notation used.

研究动机与目标

Clarify how LSTM-RNNs evolved from early perceptron-based models and why they work well.
Provide unified notation to reduce confusion from historical papers.
Explain foundational concepts: perceptron, sigmoid units, FFNNs, backpropagation, and recurrent architectures.
Describe training algorithms for RNNs (BPTT and RTRL) and address the vanishing gradient problem.
Support beginners in understanding LSTM and its extensions with detailed derivations.

提出的方法

Review historical literature on neural networks leading to LSTM.
Present a unified notation for LSTM-related equations and variables.
Derive key equations for perceptron, delta rule, sigmoid units, and feed-forward networks.
Explain backpropagation in FFNNs and extend to recurrent architectures (RNNs).
Describe Backpropagation Through Time (BPTT) and Real-Time Recurrent Learning (RTRL) as training methods for RNNs.
Discuss the vanishing gradient problem and its implications for training RNNs.

实验结果

研究问题

RQ1What is the historical evolution of LSTM-RNNs from early publications?
RQ2How can a unified notation improve understanding of LSTM and related networks?
RQ3How do core learning rules (perceptron, delta rule, backpropagation) extend from FFNNs to RNNs?
RQ4How are RNNs trained via BPTT and RTRL, and what challenges (e.g., vanishing gradients) arise?
RQ5What extensions and variations of LSTM are covered by the tutorial, and how do they relate to foundational concepts?

主要发现

The article provides a unified notation and descriptive figures to resolve inconsistencies across early LSTM publications.
It documents the evolution from perceptrons to vanilla LSTM and its extensions, including detailed derivations.
It reinforces that FFNNs can express non-linear decision surfaces via sigmoid units, contrasting with linear perceptrons.
It explains backpropagation in feed-forward networks and extends the methodology to recurrent architectures (RNNs).
It outlines two main RNN training approaches, BPTT and RTRL, and discusses their characteristics and use cases.
It discusses the vanishing gradient problem in RNNs and explains the mathematical basis behind it.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。