QUICK REVIEW

[论文解读] Improved training of binary networks for human pose estimation and image recognition

Adrian Bulat, Georgios Tzimiropoulos|arXiv (Cornell University)|Apr 11, 2019

Human Pose and Action Recognition参考文献 44被引用 40

一句话总结

论文通过一组技术增强二值化神经网络——激活选择、逆序初始化、渐进量化和网络堆叠，并在 MPII 姿态估计和 ImageNet 分类上显示出显著的准确性提升，包括蒸馏策略。

ABSTRACT

Big neural networks trained on large datasets have advanced the state-of-the-art for a large variety of challenging problems, improving performance by a large margin. However, under low memory and limited computational power constraints, the accuracy on the same problems drops considerable. In this paper, we propose a series of techniques that significantly improve the accuracy of binarized neural networks (i.e networks where both the features and the weights are binary). We evaluate the proposed improvements on two diverse tasks: fine-grained recognition (human pose estimation) and large-scale image recognition (ImageNet classification). Specifically, we introduce a series of novel methodological changes including: (a) more appropriate activation functions, (b) reverse-order initialization, (c) progressive quantization, and (d) network stacking and show that these additions improve existing state-of-the-art network binarization techniques, significantly. Additionally, for the first time, we also investigate the extent to which network binarization and knowledge distillation can be combined. When tested on the challenging MPII dataset, our method shows a performance improvement of more than 4% in absolute terms. Finally, we further validate our findings by applying the proposed techniques for large-scale object recognition on the Imagenet dataset, on which we report a reduction of error rate by 4%.

研究动机与目标

在资源受限的条件下，促进并实现用于姿态估计和图像识别的高精度二值网络。
提出并验证对二值化的方法改进，这些改进在 MPII 与 ImageNet 上超越了此前的最优结果。
探索将二值化与知识蒸馏结合以提升性能。
展示该方法在不同任务和体系结构上的普适性。

提出的方法

为基于 HourGlass 的姿态估计和二值卷积块采用强基线。
将 ReLU 替换为 PReLU 以稳定二值化训练。
使用逆序初始化以先二值化特征、再二值化权重。
通过用可调的 tanh 基函数近似 sign(s) 并逐步增加 lambda，实现平滑的渐进量化。
堆叠多个二值 HourGlass 网络以细化预测。
研究从实值教师或二值教师到二值学生的软标签知识蒸馏。

实验结果

研究问题

RQ1通过改进激活、初始化、渐进量化和堆叠来训练二值网络，能否缩小在姿态估计和 ImageNet 上与实值网络的差距？
RQ2将二值化与知识蒸馏结合对性能有何影响？
RQ3所提出的改进在姿态估计和大规模图像分类任务及其架构上是否具有任务和架构无关性？
RQ4逐步对特征和权重量化对训练稳定性和准确性的影响是什么？

主要发现

在 MPII 上，该方法在绝对值上比最先进的二值基线提高多达 4.0 个百分点 PCKh。
将 ReLU 替换为 PReLU 可显著提升准确性并改善训练稳定性。
逆序初始化（先特征后权重）在 PCKh 上额外提升约 0.8 个百分点。
渐进二值化额外带来约 0.4 个百分点的 PCKh 提升。
堆叠两个和三个二值 HourGlass 网络分别带来 1.5 和 1.9 个百分点的提升。
将二值化与蒸馏结合可带来进一步提升（在二值学生和实值教师的组合下最高可达 0.6%；多栈设置中有额外增益）。
在 ImageNet 上，该方法在使用二值网络时，对 AlexNet 和 ResNet-18 相较于此前的最先进方法，错误率绝对下降多达 4%。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。