QUICK REVIEW

[论文解读] meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting

Xu Sun, Xuancheng Ren|arXiv (Cornell University)|Jun 19, 2017

Neural Networks and Applications参考文献 27被引用 84

一句话总结

meProp 通过仅保留幅度最大的前 k 个元素来稀疏化反向传播梯度，每步仅更新一小部分权重，在 LSTM/MLP 模型和任务中通常实现显著的加速（更新的权重占比 1–4%）并且常常提高准确率。

ABSTRACT

We propose a simple yet effective technique for neural network learning. The forward propagation is computed as usual. In back propagation, only a small subset of the full gradient is computed to update the model parameters. The gradient vectors are sparsified in such a way that only the top-$k$ elements (in terms of magnitude) are kept. As a result, only $k$ rows or columns (depending on the layout) of the weight matrix are modified, leading to a linear reduction ($k$ divided by the vector dimension) in the computational cost. Surprisingly, experimental results demonstrate that we can update only 1-4% of the weights at each back propagation pass. This does not result in a larger number of training iterations. More interestingly, the accuracy of the resulting models is actually improved rather than degraded, and a detailed analysis is given. The code is available at https://github.com/lancopku/meProp

研究动机与目标

通过降低反向传播成本而不牺牲准确性来促成更快的神经网络训练。
引入一个 top-k 梯度选择机制以稀疏化反向传播更新。
证明在不同模型和任务中，更新少量权重可提升泛化能力和训练效率。

提出的方法

像往常一样计算前向传播。
仅对输出相对于梯度的幅度最大的前 k 个分量进行反向传播，将其他分量设为零。
仅更新受 top-k 梯度影响的相应参数子集（行/列）。
使用基于最小堆的 top-k 选择，时间复杂度为 O(n log k)，空间为 O(k)。
将 meProp 应用于隐藏层（并非总是输出层），并讨论不同层的 k 值。
证明与优化器（Adam、AdaGrad）无关，并展示在 CPU/GPU 上对 LSTM/MLP、POS-tagging、parsing、MNIST 的加速。

实验结果

研究问题

RQ1将反向传播稀疏化为 top-k 梯度分量是否在不降低准确性的情况下降低计算成本？
RQ2top-k meProp 如何影响不同体系结构和任务的训练速度与收敛性？
RQ3观察到的准确性提升是由于减轻过拟合（类似 dropout）还是其他机制？
RQ4在不同层和任务中选择 k 的实际指南是什么？

主要发现

通过在每次反向传播中仅更新 1–4% 的权重，降低反向传播成本。
meProp 实现显著的加速：在某些矩阵乘法 GPU 基准中高达约 69 倍的反向传播速度，在报告的设置中依 k 和模型而定，达到 18–31 倍速度提升。
在 LSTM/MLP、Adam/AdaGrad 和自然语言处理/图像任务中，模型准确性通常随 meProp 提高。
前 k 梯度选择优于随机稀疏化，表明前 k 元素携带了最重要的梯度信息。
使用 meProp 能与 dropout 相辅相成，表明存在不同的过拟合降低机制。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。