QUICK REVIEW

[Paper Review] Improved training of binary networks for human pose estimation and image recognition

Adrian Bulat, Georgios Tzimiropoulos|arXiv (Cornell University)|Apr 11, 2019

Human Pose and Action Recognition44 references40 citations

TL;DR

The paper enhances binarized neural networks with a set of techniques—activation choices, reverse-order initialization, progressive quantization, and network stacking—and shows substantial accuracy gains on MPII pose estimation and ImageNet classification, including distillation strategies.

ABSTRACT

Big neural networks trained on large datasets have advanced the state-of-the-art for a large variety of challenging problems, improving performance by a large margin. However, under low memory and limited computational power constraints, the accuracy on the same problems drops considerable. In this paper, we propose a series of techniques that significantly improve the accuracy of binarized neural networks (i.e networks where both the features and the weights are binary). We evaluate the proposed improvements on two diverse tasks: fine-grained recognition (human pose estimation) and large-scale image recognition (ImageNet classification). Specifically, we introduce a series of novel methodological changes including: (a) more appropriate activation functions, (b) reverse-order initialization, (c) progressive quantization, and (d) network stacking and show that these additions improve existing state-of-the-art network binarization techniques, significantly. Additionally, for the first time, we also investigate the extent to which network binarization and knowledge distillation can be combined. When tested on the challenging MPII dataset, our method shows a performance improvement of more than 4% in absolute terms. Finally, we further validate our findings by applying the proposed techniques for large-scale object recognition on the Imagenet dataset, on which we report a reduction of error rate by 4%.

Motivation & Objective

motivate and enable highly accurate binary networks under low-resource constraints for pose estimation and image recognition.
propose and validate methodological improvements to binarization that surpass prior state-of-the-art on MPII and ImageNet.
explore combining binarization with knowledge distillation to boost performance.
demonstrate the generality of the approach across tasks and architectures.

Proposed method

adopt a strong binary baseline for HourGlass-based pose estimation and binary convolutional blocks.
replace ReLU with PReLU to stabilize binarized training.
use reverse-order initialization to binarize features first and weights second.
implement smooth progressive quantization by approximating sgn with a tunable tanh-based function and gradually increasing lambda.
stack multiple binary HourGlass networks to refine predictions.
investigate knowledge distillation from real-valued or binary teachers to binary students with soft labels.

Experimental results

Research questions

RQ1 Can training binary networks with improved activations, initialization, progressive quantization, and stacking close the gap to real-valued networks on pose estimation and ImageNet?
RQ2 How does combining binarization with knowledge distillation affect performance?
RQ3 Are the proposed improvements task- and architecture-agnostic across pose estimation and large-scale image classification?
RQ4 What is the effect of progressively binarizing features and weights on training stability and accuracy?

Key findings

On MPII, the method improves PCKh by up to 4.0 percentage points in absolute terms over the state-of-the-art binary baseline.
Replacing ReLU with PReLU yields notable accuracy gains and improves training stability.
Reverse-order initialization (features first, weights second) adds about 0.8 percentage points in PCKh.
Progressive binarization provides an additional ~0.4 percentage points in PCKh.
Stacking two and three binary HourGlass networks yields 1.5 and 1.9 percentage point gains, respectively.
Combining binarization with distillation provides further improvements (up to 0.6% with a binary student and real-valued teacher; additional gains in multi-stack setups).
For ImageNet, the approach yields up to a 4% absolute reduction in error rate over the previous state-of-the-art for both AlexNet and ResNet-18 when using binary networks.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.