QUICK REVIEW

[论文解读] Comparison of Batch Normalization and Weight Normalization Algorithms for the Large-scale Image Classification

Igor Gitman, Boris Ginsburg|arXiv (Cornell University)|Sep 24, 2017

Advanced Neural Network Applications参考文献 18被引用 55

一句话总结

本文比较 ResNet-50 在 ImageNet 上的 Batch Normalization (BN) 与 Weight Normalization (WN) 方法，结果显示 BN 的测试准确率显著更高（约 6 个百分点），尽管 WN 提供更快的训练速度和更高的训练准确度。还记录了深层网络中 WN 的稳定性问题和不完全的激活归一化。

ABSTRACT

Batch normalization (BN) has become a de facto standard for training deep convolutional networks. However, BN accounts for a significant fraction of training run-time and is difficult to accelerate, since it is a memory-bandwidth bounded operation. Such a drawback of BN motivates us to explore recently proposed weight normalization algorithms (WN algorithms), i.e. weight normalization, normalization propagation and weight normalization with translated ReLU. These algorithms don't slow-down training iterations and were experimentally shown to outperform BN on relatively small networks and datasets. However, it is not clear if these algorithms could replace BN in practical, large-scale applications. We answer this question by providing a detailed comparison of BN and WN algorithms using ResNet-50 network trained on ImageNet. We found that although WN achieves better training accuracy, the final test accuracy is significantly lower ($\approx 6\%$) than that of BN. This result demonstrates the surprising strength of the BN regularization effect which we were unable to compensate for using standard regularization techniques like dropout and weight decay. We also found that training of deep networks with WN algorithms is significantly less stable compared to BN, limiting their practical applications.

研究动机与目标

促使在大规模图像分类中对 BN 与 WN 进行比较。
评估 WN 算法在深度网络中是否能在实际中替代 BN。
研究 WN 在深度结构中的稳定性与归一化行为。

提出的方法

在 ImageNet 上使用 BN 以及三种 WN 变体（包括 NP WN 和 TReLU WN）训练 ResNet-50。
对训练设置进行等价处理：带动量的 SGD，120 轮訓练，批量大小 256，使用相同的数据预处理以实现公平比较。
分析训练曲线、收敛速度和最终测试准确度。
在训练中检查激活归一化和逐层输出范数，以评估归一化的有效性。

实验结果

研究问题

RQ1在大规模图像分类任务中，WN 算法能否达到或超过 BN？
RQ2WN 算法是否能为像 ImageNet 上的 ResNet-50 这样的深度网络提供更快或更稳定的训练？
RQ3在使用 WN 时，BN 的正则化效果是否能被正则化技术（dropout、weight decay）复制？
RQ4WN 方法是否在深度网络中完全归一化激活，还是允许逐层输出范数发散？

主要发现

模型	数据集	Top-1 测试准确度
BN	ImageNet	~73%
WN	ImageNet	~67%

WN 在 ImageNet 的训练曲线中实现更快的收敛和比 BN 更高的训练准确度。
在 ImageNet 的 ResNet-50 上，WN 的最终测试 top-1 准确率比 BN 低约 6 个百分点。
BN 提供更强的正则化效果，无法通过 dropout 或增大 WN 的权重衰减来复制。
WN 在深度网络中表现出不稳定性，未能完全归一化激活，输出范数在各层可能增加。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。