QUICK REVIEW

[论文解读] Forward and Backward Information Retention for Accurate Binary Neural Networks

Haotong Qin, Ruihao Gong|arXiv (Cornell University)|Sep 24, 2019

Advanced Neural Network Applications参考文献 50被引用 25

一句话总结

本文提出 IR-Net，一种新颖的框架，通过在前向传播和反向传播中同时最小化信息损失，实现高精度的二值化神经网络训练。它引入了 Libra 参数二值化（Libra-PB），通过最大熵的权重量化来保持激活多样性；并提出误差衰减估计器（EDE），在反向传播中自适应地近似符号函数，从而在 CIFAR-10 和 ImageNet 上实现了使用 1 位权重和激活的最先进精度。

ABSTRACT

Weight and activation binarization is an effective approach to deep neural network compression and can accelerate the inference by leveraging bitwise operations. Although many binarization methods have improved the accuracy of the model by minimizing the quantization error in forward propagation, there remains a noticeable performance gap between the binarized model and the full-precision one. Our empirical study indicates that the quantization brings information loss in both forward and backward propagation, which is the bottleneck of training accurate binary neural networks. To address these issues, we propose an Information Retention Network (IR-Net) to retain the information that consists in the forward activations and backward gradients. IR-Net mainly relies on two technical contributions: (1) Libra Parameter Binarization (Libra-PB): simultaneously minimizing both quantization error and information loss of parameters by balanced and standardized weights in forward propagation; (2) Error Decay Estimator (EDE): minimizing the information loss of gradients by gradually approximating the sign function in backward propagation, jointly considering the updating ability and accurate gradients. We are the first to investigate both forward and backward processes of binary networks from the unified information perspective, which provides new insight into the mechanism of network binarization. Comprehensive experiments with various network structures on CIFAR-10 and ImageNet datasets manifest that the proposed IR-Net can consistently outperform state-of-the-art quantization methods.

研究动机与目标

解决由于前向和反向传播中信息损失导致的全精度与二值化神经网络之间的性能差距。
从统一的信息论视角研究二值化网络中的前向与反向信息流。
开发一种方法，在前向传播中保持模型多样性，同时在反向传播中确保准确且稳定的梯度。
在显著提升精度的同时，保持高推理效率，优于现有量化方法。

提出的方法

Libra 参数二值化（Libra-PB）在二值化前对权重进行平衡与标准化，以最小化量化误差并最大化信息熵，从而保持激活多样性。
误差衰减估计器（EDE）在反向传播过程中逐步近似符号函数，减少梯度不匹配并提升优化稳定性。
EDE 根据训练进度动态调整其近似方式，确保训练初期具备强更新能力，后期提供高精度梯度。
该方法可无缝集成至标准训练流程中，无需额外的浮点运算或复杂修改。
IR-Net 与标准二值化神经网络框架兼容，支持 1 位和混合精度设置。
该框架设计高效，计算开销极小，尤其在推理阶段使用位移操作，显著提升效率。

实验结果

研究问题

RQ1前向与反向传播中的信息损失如何影响二值化神经网络的精度？
RQ2统一的信息论视角能否改善深度网络中二值化方法的设计？
RQ3在二值化网络中，如何同时最小化量化误差并保留反向传播中的梯度信息？
RQ4与 STE 等固定近似方法相比，自适应梯度近似能否提升训练稳定性和最终精度？
RQ5在前向与反向传播中保持信息，能在多大程度上缩小全精度与二值化模型之间的精度差距？

主要发现

在 ImageNet 上使用 ResNet-18 的 1W/1A 设置下，IR-Net 实现了 58.1% 的 Top-1 精度，优于 Bi-Real Net（56.4%），并超过使用 2 位权重的 TWIN 方法。
在 ImageNet 的 1W/32A 设置下，IR-Net 实现了 66.5% 的 Top-1 精度，超过 BWHN（64.3%）和 SQ-TWN（63.8%）的 2 位权重方法。
在 CIFAR-10 上，IR-Net 使用 ResNet-18 在 1W/1A 设置下达到 91.5% 的精度，显著优于此前的 SOTA 方法（86.5%）。
在 CIFAR-10 上的 VGG-Small 模型中，IR-Net 在 1W/1A 设置下达到 90.4% 的精度，超过 XNOR（89.8%）和 BNN（89.9%）超过 0.5 个百分点。
在 Raspberry Pi 3B 上，IR-Net 使用 1 位权重的 ResNet-18 推理时间为 261.98ms，显著快于 DSQ（551.22ms）和 NCNN（935.51ms）等更高位宽方法。
IR-Net 的模型大小仅为 4.21MB，位移操作引入的开销可忽略不计，证实其在实际部署中的高效性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。