QUICK REVIEW

[论文解读] Online Learning of Recurrent Neural Architectures by Locally Aligning Distributed Representations.

Alexander G. Ororbia, Ankur Mali|arXiv (Cornell University)|Oct 17, 2018

Neural Networks and Reservoir Computing被引用 1

一句话总结

本文提出并行时间神经编码网络（Parallel Temporal Neural Coding Network），一种受生物学启发的循环神经网络架构，通过局部表示对齐（Local Representation Alignment）进行训练——这是一种避免时间反向传播的局部学习规则。通过消除对时间步展开和可微激活函数的需求，该模型实现了高效的并行训练，并在序列建模任务（包括Bouncing MNIST和Penn Treebank）中达到最先进性能，甚至在某些情况下优于完整的反向传播。

ABSTRACT

Temporal models based on recurrent neural networks have proven to be quite powerful in a wide variety of applications, including language modeling and speech processing. However, to train these models, one relies on back-propagation through time, which entails unfolding the network over many time steps, making the process of conducting credit assignment considerably more challenging. Furthermore, the nature of back-propagation itself does not permit the use of non-differentiable activation functions and is inherently sequential, making parallelization of the underlying training process very difficult. In this work, we propose the Parallel Temporal Neural Coding Network, a biologically inspired model trained by the local learning algorithm known as Local Representation Alignment, that aims to resolve the difficulties and problems that plague recurrent networks trained by back-propagation through time. Most notably, this architecture requires neither unrolling nor the derivatives of its internal activation functions. We compare our model and learning procedure to other online back-propagation-through-time alternatives (which also tend to be computationally expensive), including real-time recurrent learning, echo state networks, and unbiased online recurrent optimization, and show that it outperforms them on sequence modeling benchmarks such as Bouncing MNIST, a new benchmark we call Bouncing NotMNIST, and Penn Treebank. Notably, our approach can, in some instances, even outperform full back-propagation through time itself as well as variants such as sparse attentive back-tracking. Furthermore, we present promising experimental results that demonstrate our model's ability to conduct zero-shot adaptation.

研究动机与目标

解决循环神经网络中反向传播通过时间（BPTT）的计算低效和顺序性问题。
克服BPTT的局限性，如需要展开网络和依赖可微激活函数。
开发一种支持并行化且能在循环模型中实现零样本适应的训练方法。
设计一种生物上合理的学习规则，避免全局信用分配，同时在序列建模任务中保持高性能。

提出的方法

提出并行时间神经编码网络，一种专为局部学习规则设计的循环架构。
使用局部表示对齐训练模型，这是一种无需梯度的局部学习算法，可在时间步之间对齐分布式表示。
消除在时间步上展开网络的需求，实现在训练期间的并行计算。
避免对内部激活函数导数的依赖，从而允许使用不可微单元。
通过使用局部误差信号对连续时间步的隐藏状态表示进行对齐，实现局部信用分配。
采用受生物学启发的机制，基于表示之间的局部相关性更新权重，而非依赖全局反向传播误差。

实验结果

研究问题

RQ1是否可以不依赖反向传播通过时间或梯度计算，有效训练循环神经网络？
RQ2像局部表示对齐这样的局部学习规则，是否能在序列建模基准上与BPTT及其变体相媲美？
RQ3所提出的模型是否支持在序列任务中的零样本适应？
RQ4在保持长期时间依赖性高性能的同时，该模型是否能够实现高效并行化？
RQ5与回声状态网络、实时循环学习和无偏在线循环优化等成熟方法相比，该模型表现如何？

主要发现

在Bouncing MNIST和Bouncing NotMNIST上，该模型优于现有的在线BPTT替代方法，如实时循环学习、回声状态网络和无偏在线循环优化。
在Penn Treebank语言建模基准上，该模型取得了具有竞争力的结果，甚至在某些配置下超越了完整的反向传播通过时间。
该模型展现出强大的零样本适应能力，表明其对未见序列具有稳健的泛化能力。
由于无需展开和梯度计算，实现了高效的并行训练，显著提升了相对于标准BPTT的计算效率。
尽管使用了不可微激活函数（这与标准反向传播不兼容），该模型仍保持了高性能。
局部表示对齐实现了有效的信用分配，而无需全局误差信号，验证了其作为生物上合理训练机制的潜力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。