QUICK REVIEW

[论文解读] Low-Rank Approximations for Conditional Feedforward Computation in Deep Neural Networks

Andrew S. Davis, Itamar Arel|arXiv (Cornell University)|Dec 16, 2013

Sparse and Compressive Sensing Techniques参考文献 11被引用 49

一句话总结

本文提出一种低秩近似方法，通过估计ReLU网络中前激活值的符号，实现在深度神经网络中的条件前馈计算。通过SVD对权重重构，该方法可预测哪些神经元在ReLU后将输出为零，从而跳过其计算，进而在MNIST和SVHN数据集上实现显著加速，且准确率损失极小。

ABSTRACT

Scalability properties of deep neural networks raise key research questions, particularly as the problems considered become larger and more challenging. This paper expands on the idea of conditional computation introduced by Bengio, et. al., where the nodes of a deep network are augmented by a set of gating units that determine when a node should be calculated. By factorizing the weight matrix into a low-rank approximation, an estimation of the sign of the pre-nonlinearity activation can be efficiently obtained. For networks using rectified-linear hidden units, this implies that the computation of a hidden unit with an estimated negative pre-nonlinearity can be ommitted altogether, as its value will become zero when nonlinearity is applied. For sparse neural networks, this can result in considerable speed gains. Experimental results using the MNIST and SVHN data sets with a fully-connected deep neural network demonstrate the performance robustness of the proposed scheme with respect to the error introduced by the conditional computation process.

研究动机与目标

通过识别并跳过不必要的ReLU神经元计算，减少深度神经网络中的计算开销。
利用ReLU激活引起的稀疏性与权重重构性，提升推理效率。
开发一种低成本、低秩的估计机制，以在完整计算前预测哪些隐藏单元将输出零值。
评估全连接网络中计算节省与模型准确率之间的权衡。
探索每轮训练一次SVD用于激活符号估计的可扩展性与高效性。

提出的方法

该方法通过SVD对权重重构矩阵W分解为U和V，使得W ≈ UV。
激活估计器计算sgn(a_l * U * V)，以预测下一层的前激活值符号。
对于ReLU单元，若预测的前激活值为负，则其输出必为零，因此可跳过计算。
估计器通过每轮训练一次的SVD更新，保持权重重构的低秩近似。
通过仅计算估计器预测为非零输出的激活值，实现条件计算，从而减少FLOPs。
该方法应用于MNIST和SVHN上的全连接网络，超参数通过验证集调优。

实验结果

研究问题

RQ1对权重重构矩阵进行低秩近似，能否可靠预测ReLU网络中前激活值的符号？
RQ2基于符号预测跳过ReLU神经元计算，能在多大程度上减少推理时间而不降低准确率？
RQ3不同低秩近似秩对条件计算方案的性能有何影响？
RQ4每轮一次的SVD更新是否足以在训练各轮之间保持估计精度？
RQ5该方法能否推广至其他硬阈值激活函数或CNN等架构？

主要发现

在MNIST数据集上，全秩网络测试误差为1.40%，使用低秩估计器后准确率损失极小。
即使采用10-10-5秩估计器，网络在MNIST上的测试误差仍为2.28%，表明对低秩近似的鲁棒性良好。
50-35-25与25-25-25秩配置的误差分别仅为1.43%和1.60%，性能下降可忽略。
在SVHN数据集上，条件计算方案在多种网络配置下均保持了有竞争力的性能。
由于权重更新，同一训练轮次内小批量之间激活估计误差略有上升，表明需要在线低秩更新机制。
作者观察到，基于SVD的估计方法在最小化ReLU网络输出差异的真正目标上表现次优，提示存在更优的近似目标函数。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。