QUICK REVIEW

[论文解读] Self-Binarizing Networks

Fayez Lahoud, Radhakrishna Achanta|arXiv (Cornell University)|Feb 2, 2019

Advanced Memory and Neural Computing参考文献 34被引用 23

一句话总结

该论文提出自二值化网络（Self-Binarizing Networks），一种通过使用平滑且可锐化的双曲正切激活函数（tanh）而非不可微的符号函数，训练深度神经网络使其自适应演化为二值权重和激活的方法。通过避免在浮点数与二进制模式之间交替切换，并以基于比较的操作替代批量归一化，该方法在CIFAR-10、CIFAR-100和ImageNet上实现了最先进（SOTA）的准确率，同时支持在低精度硬件上实现完全二值化推理。

ABSTRACT

We present a method to train self-binarizing neural networks, that is, networks that evolve their weights and activations during training to become binary. To obtain similar binary networks, existing methods rely on the sign activation function. This function, however, has no gradients for non-zero values, which makes standard backpropagation impossible. To circumvent the difficulty of training a network relying on the sign activation function, these methods alternate between floating-point and binary representations of the network during training, which is sub-optimal and inefficient. We approach the binarization task by training on a unique representation involving a smooth activation function, which is iteratively sharpened during training until it becomes a binary representation equivalent to the sign activation function. Additionally, we introduce a new technique to perform binary batch normalization that simplifies the conventional batch normalization by transforming it into a simple comparison operation. This is unlike existing methods, which are forced to the retain the conventional floating-point-based batch normalization. Our binary networks, apart from displaying advantages of lower memory and computation as compared to conventional floating-point and binary networks, also show higher classification accuracy than existing state-of-the-art methods on multiple benchmark datasets.

研究动机与目标

解决二值化神经网络训练中的挑战：符号激活函数梯度为零，导致标准反向传播不可行。
消除训练过程中在浮点数与二进制表示之间切换的需求，该过程效率低下并引入近似误差。
以二进制兼容、内存高效的基于比较的替代方案，取代需要浮点运算的传统批量归一化。
通过消除所有浮点数计算，实现完全二值化网络在低精度芯片和微控制器上的部署。
在保持低内存与计算成本的同时，实现高于现有二值化方法的分类准确率。

提出的方法

该方法使用缩放双曲正切函数 tanh(νx)，其中 ν 为可学习的缩放因子，训练过程中逐渐增大，使激活函数逐步锐化以逼近符号函数。
网络通过端到端反向传播在单一连续浮点表示上进行训练，避免了在二进制与浮点模式之间切换的需要。
引入一种新型二值批量归一化（Binary Batch Normalization, BinaryBN）层，将标准批量归一化替换为简单的比较操作：output = (x > T) ? 1 : -1，其中 T 为每通道可学习的阈值。
BinaryBN 层每通道仅存储一个 8 位阈值和一个二值缩放符号，将内存使用量降低至每层 9c 位，相比标准 BN 的 128c 位减少 93%。
通过消除所有浮点数运算（包括批量归一化中的运算），该方法实现了完全二值化推理，适用于微控制器和低精度芯片的部署。
该方法在 VGG 和 AlexNet 架构上通过 CIFAR-10、CIFAR-100 和 ImageNet 基准进行了验证，性能指标包括准确率、内存占用和计算量。

实验结果

研究问题

RQ1能否使用连续且可微的激活函数，使深度神经网络自适应地将其权重和激活二值化，且该函数在训练过程中逐渐变为二值？
RQ2避免在浮点数与二进制表示之间交替训练是否能提升训练稳定性并提高最终模型的准确率？
RQ3能否用二进制兼容、基于比较的操作替代批量归一化，同时保持性能并消除浮点数计算？
RQ4所得到的网络是否能在标准基准上实现最先进准确率，同时完全适用于低精度硬件的部署？
RQ5与使用符号函数的硬二值化相比，通过锐化 tanh 实现的软二值化在权重分布演化和最终性能方面表现如何？

主要发现

所提出的自二值化网络在 CIFAR-10、CIFAR-100 和 ImageNet 基准上实现了高于现有最先进二值化方法的分类准确率。
在 CIFAR-10 上，该方法在 VGG-16 上实现了 92.1% 的 top-1 准确率，优于 XNOR-Net 和 BWN 等先前方法。
BinaryBN 层将每层内存使用量降低至 9c 位（相比标准 BN 的 128c 位），存储需求减少 93%。
BinaryBN 层通过仅使用比较和位移操作执行 2chw 次运算，其推理速度比标准 BN 和 SBN 快近一个数量级。
该方法通过消除所有浮点数运算，实现了完全二值化推理，使其适用于微控制器和低精度集成电路的部署。
权重直方图显示，通过 tanh(νx) 实现的软二值化在训练初期保持了零中心分布，相比使用 sign(x) 的硬二值化，梯度流动更优，收敛性能更佳。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。