Skip to main content
QUICK REVIEW

[论文解读] Benefits of Depth for Long-Term Memory of Recurrent Networks

Yoav Levine, Or Sharir|arXiv (Cornell University)|Oct 25, 2017
Parallel Computing and Optimization Techniques参考文献 35被引用 8
一句话总结

本文引入了起始-终止分离秩作为循环网络长期记忆容量的度量,证明深层RNN在表达长程时间依赖方面相较于浅层网络具有指数级优势。通过使用量子张量网络工具分析循环算术电路,本文确立了深度作为建模长时间尺度序列数据表达能力的根本驱动力。

ABSTRACT

The key attribute that drives the unprecedented success of modern Recurrent Neural Networks (RNNs) on learning tasks which involve sequential data, is their ever-improving ability to model intricate long-term temporal dependencies. However, an adequate measure of RNNs long-term memory capacity is lacking, and thus formal understanding of their ability to correlate data throughout time is limited. Though depth efficiency in convolutional networks is well established, it does not suffice in order to account for the success of deep RNNs on data of varying lengths, and the need to address their `time-series expressive power' arises. In this paper, we analyze the effect of depth on the ability of recurrent networks to express correlations ranging over long time-scales. To meet the above need, we introduce a measure of the information flow across time supported by the network, referred to as the Start-End separation rank. This measure essentially reflects the distance of the function realized by the recurrent network from a function that models no interaction whatsoever between the beginning and end of the input sequence. We prove that deep recurrent networks support Start-End separation ranks which are exponentially higher than those supported by their shallow counterparts. Thus, we establish that depth brings forth an overwhelming advantage in the ability of recurrent networks to model long-term dependencies. Such analyses may be readily extended to other RNN architectures of interest, e.g. variants of LSTM networks. We obtain our results by considering a class of recurrent networks referred to as Recurrent Arithmetic Circuits (RACs), which merge the hidden state with the input via the Multiplicative Integration operation. Finally, we make use of the tool of quantum Tensor Networks to gain additional graphic insight regarding the complexity brought forth by depth in recurrent networks.

研究动机与目标

  • 为解决循环网络长期记忆容量缺乏正式度量的问题。
  • 理解为何深层RNN在处理可变长度序列的序列任务中优于浅层网络。
  • 量化RNN在关联序列中遥远时间步长方面的表达能力。
  • 为循环架构中深度效率建立超越卷积网络已知理论的理论基础。
  • 通过所提出的度量和框架,将分析扩展至LSTM等实际RNN变体。

提出的方法

  • 本文引入起始-终止分离秩作为循环网络中序列起始与结束之间信息流的正式度量。
  • 通过循环算术电路(RACs)对循环网络进行建模,其中隐藏状态通过乘法集成与输入结合。
  • 理论分析表明,分离秩随深度呈指数增长,而浅层网络中仅保持多项式增长。
  • 该框架利用量子张量网络,为深度在RNN中引入的复杂性提供图形化和结构化洞察。
  • 通过调整RAC公式,该分析可推广至其他RNN架构,包括LSTM变体。

实验结果

研究问题

  • RQ1如何对循环网络的长期记忆容量进行正式度量?
  • RQ2深度在使RNN实现长程时间依赖方面具有何种定量优势?
  • RQ3为何深层RNN在可变长度序列任务中优于浅层网络?
  • RQ4能否为循环架构在不同时间尺度上的表达能力定义统一的度量?
  • RQ5RAC中乘法集成机制如何促进长序列上的信息流增强?

主要发现

  • 深层RNN的起始-终止分离秩随网络深度呈指数增长,表明其在建模长程依赖方面具有根本性优势。
  • 相比之下,浅层RNN的分离秩仅呈多项式增长,限制了其关联遥远时间步长的能力。
  • 深层网络中指数级的分离秩意味着其在建模复杂时间相关性方面表达能力显著增强。
  • 使用循环算术电路使RNN中信息流与深度效率的精确理论分析成为可能。
  • 量子张量网络表示为深度在循环架构中引入的复杂性提供了可视化和结构化直觉。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。