QUICK REVIEW

[论文解读] Decoupled Neural Interfaces using Synthetic Gradients

Max Jaderberg, Wojciech Marian Czarnecki|arXiv (Cornell University)|Aug 18, 2016

Advanced Neural Network Applications参考文献 28被引用 76

一句话总结

本文提出了解耦神经接口（DNI），通过使用合成梯度——一种仅依赖模块本地激活信息学习预测误差梯度的模型——打破反向传播中的顺序依赖关系，实现神经网络模块的异步、独立训练。主要贡献在于通过解耦前向与反向传播，使前馈网络、循环网络和分层网络的训练速度更快、可扩展性更强。

ABSTRACT

Training directed neural networks typically requires forward-propagating data through a computation graph, followed by backpropagating error signal, to produce weight updates. All layers, or more generally, modules, of the network are therefore locked, in the sense that they must wait for the remainder of the network to execute forwards and propagate error backwards before they can be updated. In this work we break this constraint by decoupling modules by introducing a model of the future computation of the network graph. These models predict what the result of the modelled subgraph will produce using only local information. In particular we focus on modelling error gradients: by using the modelled synthetic gradient in place of true backpropagated error gradients we decouple subgraphs, and can update them independently and asynchronously i.e. we realise decoupled neural interfaces. We show results for feed-forward models, where every layer is trained asynchronously, recurrent neural networks (RNNs) where predicting one's future gradient extends the time over which the RNN can effectively model, and also a hierarchical RNN system with ticking at different timescales. Finally, we demonstrate that in addition to predicting gradients, the same framework can be used to predict inputs, resulting in models which are decoupled in both the forward and backwards pass -- amounting to independent networks which co-learn such that they can be composed into a single functioning corporation.

研究动机与目标

解决深度神经网络中更新与反向传播锁定的问题，该问题限制了训练必须按顺序、同步执行。
通过用本地预测的合成梯度替代反向传播，实现神经网络模块的独立、异步训练。
通过同时预测合成输入，进一步解除前向传播的锁定，实现在前向与反向传播中完全解耦。
在深度前馈网络、具有长期依赖关系的RNN以及分层多网络系统中，验证该方法的有效性。
通过消除同步瓶颈，在分布式和多智能体学习场景中实现更快的训练速度和更高的可扩展性。

提出的方法

用合成梯度替代标准反向传播，其中合成梯度是仅基于模块当前激活信息学习预测误差梯度的模型。
训练一个合成梯度模型（一个小型神经网络），仅基于其当前激活信息，预测损失相对于模块输入的真实梯度。
使用预测的合成梯度立即更新模块权重，而无需等待下游模块执行或反向传播完成。
通过引入合成输入模型预测模块的期望输入，实现前向与反向传播的完全解耦，支持完全异步训练。
使用可微损失函数端到端训练合成梯度与输入模型，最小化真实梯度与合成梯度或真实输入与合成输入之间的差异。
将该框架应用于前馈网络、RNN以及分层多网络系统，根据任务需求采用共享或独立的架构。

实验结果

研究问题

RQ1合成梯度是否能在不依赖完整反向传播的情况下实现神经网络模块的异步训练？
RQ2合成梯度在多大程度上能突破截断时间反向传播（truncated BPTT）的限制，扩展RNN的有效序列长度？
RQ3该框架能否进一步扩展以解除前向传播的锁定，实现模块的完全独立训练？
RQ4在准确率和训练速度方面，使用合成梯度的模型与标准反向传播相比表现如何？
RQ5在具有不同时间尺度的分层或多智能体神经网络系统中，合成梯度能否提升训练效率？

主要发现

DNI框架通过允许每个模块使用合成梯度独立更新，成功实现训练解耦，消除了更新与反向传播锁定。
在前馈网络中，使用合成梯度的模型实现了与标准反向传播相当的准确率，同时支持完全异步训练。
在RNN中，合成梯度使模型能够建模长达1000步的序列，远超标准截断BPTT通常限制的50–100步。
在具有不同时间尺度的分层RNN系统中，使用合成梯度的快速网络训练速度显著提升，与同步训练相比，训练时间最多减少3倍。
合成输入模型实现了前向与反向传播的完全解耦，使网络能够独立协同学习，并组合成单一功能系统。
该方法在多种架构中保持稳定且高效，包括CIFAR-10上的CNN和Penn Treebank上的字符级语言模型，且仅需极少的超参数调优。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。