[論文レビュー] BinaryDuo: Reducing Gradient Mismatch in Binary Activation Network by Coupling Binary Activations
BinaryDuoは、予め学習済みの三値活性化を2つの二値活性化に分離することにより二値活性化ネットワークを訓練し、勾配の適合を向上させ、CIFAR-10やImageNet規模のモデルなどのベンチマークで最先端のBNNを上回る精度を実現します。
Binary Neural Networks (BNNs) have been garnering interest thanks to their compute cost reduction and memory savings. However, BNNs suffer from performance degradation mainly due to the gradient mismatch caused by binarizing activations. Previous works tried to address the gradient mismatch problem by reducing the discrepancy between activation functions used at forward pass and its differentiable approximation used at backward pass, which is an indirect measure. In this work, we use the gradient of smoothed loss function to better estimate the gradient mismatch in quantized neural network. Analysis using the gradient mismatch estimator indicates that using higher precision for activation is more effective than modifying the differentiable approximation of activation function. Based on the observation, we propose a new training scheme for binary activation networks called BinaryDuo in which two binary activations are coupled into a ternary activation during training. Experimental results show that BinaryDuo outperforms state-of-the-art BNNs on various benchmarks with the same amount of parameters and computing cost.
研究の動機と目的
- Motivate and quantify gradient mismatch in binary activation networks during training.
- Propose a better gradient-mismatch estimation method using the gradient of a smoothed loss.
- Introduce BinaryDuo: a two-stage training scheme that decouples a ternary activation into two binary activations and later fine-tunes.
- Demonstrate that BinaryDuo achieves state-of-the-art or competitive accuracy on CIFAR-10, ImageNet with AlexNet and ResNet-18 at similar parameter and compute budgets.
提案手法
- Estimate gradient mismatch using the gradient of a smoothed loss via Coordinate Discrete Gradient (CDG).
- Show that higher-precision activations (ternary or 2-bit) mitigate gradient mismatch more effectively than sophisticated STEs.
- Propose BinaryDuo: train a network with ternary activation, then decouple into two binary activations with specific BN bias shifts to mimic the ternary function.
- Double the weights after decoupling and proportionally reconfigure layer widths to keep parameter count comparable, followed by fine-tuning the decoupled binary network.
実験結果
リサーチクエスチョン
- RQ1Can gradient mismatch be better estimated with the gradient of a smoothed loss rather than cumulative differences between activation and approximation?
- RQ2Does a training scheme that leverages ternary activations during training and decouples to binary activations during inference improve BNN performance at equal cost?
- RQ3How does BinaryDuo compare to state-of-the-art BNN methods on standard benchmarks like CIFAR-10 and ImageNet in terms of accuracy and efficiency?
主な発見
| ネットワーク | Top-1 | Top-5 | (Mbit) | FLOP |
|---|---|---|---|---|
| AlexNet (BNN) | 41.8 | 67.1 | 62.3 | 82.3M |
| XNOR-Net | 44.2 | 69.2 | 191 | 126M |
| BNN+ | 46.1 | 75.7 | 191 | 126M |
| BinaryDuo | 52.7 | 76.0 | 189 | 119M |
| BinaryDuo(+sc)† | - | - | - | 164M |
| ResNet-18 (BNN with shortcut) | - | - | - | - |
| BinaryDuo(+sc)† | 60.9 | 82.6 | 31.9 | 164M |
- Cosine similarity between coarse gradients and Coordinate Discrete Gradient (CDG) degrades with binary activations, and is not improved by sophisticated STEs.
- Higher precision activations (ternary/2-bit) reduce gradient mismatch more effectively than improving the backward surrogate alone.
- Coupling two binary activations to emulate a ternary activation (BinaryDuo) followed by decoupling and fine-tuning yields superior accuracy under the same parameter and compute budget.
- On CIFAR-10 with VGG-7, the decoupled BinaryDuo achieves 90.44% test accuracy, surpassing the 89.07% baseline binary model after training the coupled ternary model (89.69%) and subsequent fine-tuning.
- On ImageNet with AlexNet and ResNet-18, BinaryDuo achieves top-1 accuracy of 52.7% (AlexNet) and 60.4% (ResNet-18), outperforming other BNN schemes at similar parameter and compute budgets; BinaryDuo(+sc) reaches 60.9% top-1 with shortcut on ResNet-18.
- BinaryDuo consistently outperforms state-of-the-art BNN methods across tested architectures while maintaining comparable model size and FLOPs.
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。