Skip to main content
QUICK REVIEW

[论文解读] Online Learning of Recurrent Neural Architectures by Locally Aligning Distributed Representations.

Alexander G. Ororbia, Ankur Mali|arXiv (Cornell University)|Oct 17, 2018
Neural Networks and Reservoir Computing被引用 1
一句话总结

本文提出并行时间神经编码网络(Parallel Temporal Neural Coding Network),一种受生物学启发的循环神经网络架构,通过局部表示对齐(Local Representation Alignment)进行训练——这是一种避免时间反向传播的局部学习规则。通过消除对时间步展开和可微激活函数的需求,该模型实现了高效的并行训练,并在序列建模任务(包括Bouncing MNIST和Penn Treebank)中达到最先进性能,甚至在某些情况下优于完整的反向传播。

ABSTRACT

Temporal models based on recurrent neural networks have proven to be quite powerful in a wide variety of applications, including language modeling and speech processing. However, to train these models, one relies on back-propagation through time, which entails unfolding the network over many time steps, making the process of conducting credit assignment considerably more challenging. Furthermore, the nature of back-propagation itself does not permit the use of non-differentiable activation functions and is inherently sequential, making parallelization of the underlying training process very difficult. In this work, we propose the Parallel Temporal Neural Coding Network, a biologically inspired model trained by the local learning algorithm known as Local Representation Alignment, that aims to resolve the difficulties and problems that plague recurrent networks trained by back-propagation through time. Most notably, this architecture requires neither unrolling nor the derivatives of its internal activation functions. We compare our model and learning procedure to other online back-propagation-through-time alternatives (which also tend to be computationally expensive), including real-time recurrent learning, echo state networks, and unbiased online recurrent optimization, and show that it outperforms them on sequence modeling benchmarks such as Bouncing MNIST, a new benchmark we call Bouncing NotMNIST, and Penn Treebank. Notably, our approach can, in some instances, even outperform full back-propagation through time itself as well as variants such as sparse attentive back-tracking. Furthermore, we present promising experimental results that demonstrate our model's ability to conduct zero-shot adaptation.

研究动机与目标

  • 解决循环神经网络中反向传播通过时间(BPTT)的计算低效和顺序性问题。
  • 克服BPTT的局限性,如需要展开网络和依赖可微激活函数。
  • 开发一种支持并行化且能在循环模型中实现零样本适应的训练方法。
  • 设计一种生物上合理的学习规则,避免全局信用分配,同时在序列建模任务中保持高性能。

提出的方法

  • 提出并行时间神经编码网络,一种专为局部学习规则设计的循环架构。
  • 使用局部表示对齐训练模型,这是一种无需梯度的局部学习算法,可在时间步之间对齐分布式表示。
  • 消除在时间步上展开网络的需求,实现在训练期间的并行计算。
  • 避免对内部激活函数导数的依赖,从而允许使用不可微单元。
  • 通过使用局部误差信号对连续时间步的隐藏状态表示进行对齐,实现局部信用分配。
  • 采用受生物学启发的机制,基于表示之间的局部相关性更新权重,而非依赖全局反向传播误差。

实验结果

研究问题

  • RQ1是否可以不依赖反向传播通过时间或梯度计算,有效训练循环神经网络?
  • RQ2像局部表示对齐这样的局部学习规则,是否能在序列建模基准上与BPTT及其变体相媲美?
  • RQ3所提出的模型是否支持在序列任务中的零样本适应?
  • RQ4在保持长期时间依赖性高性能的同时,该模型是否能够实现高效并行化?
  • RQ5与回声状态网络、实时循环学习和无偏在线循环优化等成熟方法相比,该模型表现如何?

主要发现

  • 在Bouncing MNIST和Bouncing NotMNIST上,该模型优于现有的在线BPTT替代方法,如实时循环学习、回声状态网络和无偏在线循环优化。
  • 在Penn Treebank语言建模基准上,该模型取得了具有竞争力的结果,甚至在某些配置下超越了完整的反向传播通过时间。
  • 该模型展现出强大的零样本适应能力,表明其对未见序列具有稳健的泛化能力。
  • 由于无需展开和梯度计算,实现了高效的并行训练,显著提升了相对于标准BPTT的计算效率。
  • 尽管使用了不可微激活函数(这与标准反向传播不兼容),该模型仍保持了高性能。
  • 局部表示对齐实现了有效的信用分配,而无需全局误差信号,验证了其作为生物上合理训练机制的潜力。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。