QUICK REVIEW

[论文解读] Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

Matthieu Courbariaux, Itay Hubara|arXiv (Cornell University)|Feb 9, 2016

Advanced Neural Network Applications参考文献 57被引用 2,193

一句话总结

本文介绍了二值神经网络（BNNs），在训练和推理阶段将权重和激活值限制为±1，从而实现高效的前向传播并支持 GPU/硬件加速，同时在 MNIST、CIFAR-10 和 SVHN 上接近最先进的结果。

ABSTRACT

We introduce a method to train Binarized Neural Networks (BNNs) - neural networks with binary weights and activations at run-time. At training-time the binary weights and activations are used for computing the parameters gradients. During the forward pass, BNNs drastically reduce memory size and accesses, and replace most arithmetic operations with bit-wise operations, which is expected to substantially improve power-efficiency. To validate the effectiveness of BNNs we conduct two sets of experiments on the Torch7 and Theano frameworks. On both, BNNs achieved nearly state-of-the-art results over the MNIST, CIFAR-10 and SVHN datasets. Last but not least, we wrote a binary matrix multiplication GPU kernel with which it is possible to run our MNIST BNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy. The code for training and running our BNNs is available on-line.

研究动机与目标

激励并实现使用二值权重与激活值的神经网络训练。
证明二值约束可以减少内存并在前向传播阶段实现位运算。
在 TORCH7 和 Theano 实现中，在 MNIST、CIFAR-10 和 SVHN 上展示接近最先进的准确率。
提出针对二值神经网络的硬件导向优势和针对 GPU 内核的优化。

提出的方法

为权重和激活定义确定性和随机化的二值化函数。
使用直通估计器在离散化（符号函数）通过梯度传播。
采用基于移位的 Batch Normalization 和基于移位的 AdaMax 来减少乘法运算。
将实值权重在-1到1之间进行二值化，并在前向传播中进行二值化；累积实值梯度。
实现二值矩阵乘法内核（XNOR-计数）和 SWAR 技术以实现加速。
在 MNIST、CIFAR-10 和 SVHN 上使用 Torch7 和 Theano 训练和评估 BNNS。

实验结果

研究问题

RQ1使用二值权重和激活值训练的网络是否能在标准基准数据集上达到接近最先进的准确率？
RQ2与全精度网络相比，BNNs 在前向传播过程中的内存和能效提升有哪些？
RQ3二值化如何影响梯度传播和训练稳定性，直通估计器是否足够？
RQ4哪些硬件加速（如 XNOR-计数、SWAR）可以在 GPU 上最大化 BNNS 的加速？

主要发现

BNNs 在 Torch7 和 Theano 实现中，在 MNIST、CIFAR-10 和 SVHN 上达到接近最先进的结果。
BNNs 在前向传播过程中显著降低内存使用和内存访问，将许多算术运算替换为位运算。
一个二值矩阵乘法 GPU 内核使 MNIST 的 BNN 比未经优化的内核快七倍且不损失准确性。
基于移位的 Batch Normalization 和 AdaMax 能在实验中减少乘法运算且没有观测到精度损失。
存在强烈的硬件含义：二值卷积网络中许多滤波器重复出现，可能在专用硬件上带来进一步的加速。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。