QUICK REVIEW

[論文レビュー] Comparison of Batch Normalization and Weight Normalization Algorithms for the Large-scale Image Classification

Igor Gitman, Boris Ginsburg|arXiv (Cornell University)|Sep 24, 2017

Advanced Neural Network Applications参考文献 18被引用数 55

ひとこと要約

本論文はResNet-50 on ImageNet における Batch Normalization (BN) と Weight Normalization (WN) の比較を行い、WN は訓練の高速化と訓練精度の向上をもたらす一方で、BN がテスト精度を著しく高く（約6ポイント）を示すことを発見した。さらに、深層ネットワークにおける WN の安定性の問題と完全な活性化正規化の欠如も報告している。

ABSTRACT

Batch normalization (BN) has become a de facto standard for training deep convolutional networks. However, BN accounts for a significant fraction of training run-time and is difficult to accelerate, since it is a memory-bandwidth bounded operation. Such a drawback of BN motivates us to explore recently proposed weight normalization algorithms (WN algorithms), i.e. weight normalization, normalization propagation and weight normalization with translated ReLU. These algorithms don't slow-down training iterations and were experimentally shown to outperform BN on relatively small networks and datasets. However, it is not clear if these algorithms could replace BN in practical, large-scale applications. We answer this question by providing a detailed comparison of BN and WN algorithms using ResNet-50 network trained on ImageNet. We found that although WN achieves better training accuracy, the final test accuracy is significantly lower ($\approx 6\%$) than that of BN. This result demonstrates the surprising strength of the BN regularization effect which we were unable to compensate for using standard regularization techniques like dropout and weight decay. We also found that training of deep networks with WN algorithms is significantly less stable compared to BN, limiting their practical applications.

研究の動機と目的

BN と WN の大規模画像分類における比較を動機づける。
深いネットワークで実務上 WN アルゴリズムが BN の代替になり得るかを評価する。
深層アーキテクチャにおける WN の安定性と正規化挙動を調査する。

提案手法

ImageNet で BN と 3 種類の WN 変種（NP および TReLU WN を含む）を用いて ResNet-50 を訓練する。
公正な比較のために訓練設定を統一する：SGD 伴うモーメンタム、120 エポック、バッチサイズ 256、同じデータ前処理。
訓練曲線、収束速度、最終的なテスト精度を分析する。
訓練を通じて活性化の正規化と層ごとの出力ノルムを検討し、正規化の有効性を評価する。

実験結果

リサーチクエスチョン

RQ1大規模画像分類タスクで重み正規化アルゴリズムはバッチ正規化と一致または上回ることができるか。
RQ2ImageNet 上の ResNet-50 のような深いネットワークで WN アルゴリズムは訓練をより速く安定させるか。
RQ3BN の正則化効果は WN 使用時に正則化技術（ドロップアウト、ウェイト減衰）で再現できるか。
RQ4深いネットワークで WN は活性化を完全に正規化するか、それとも層ごとに出力ノルムが発散する可能性があるか。

主な発見

Model	Dataset	Top-1 Test Accuracy
BN	ImageNet	~73%
WN	ImageNet	~67%

WN は ImageNet の訓練曲線において BN より収束が速く、訓練精度も高い。
ImageNet 上の ResNet-50 で WN の最終的なテスト Top-1 精度は BN より約6%ポイント低い。
BN はより強い正則化効果を提供し、WN でのドロップアウトや増加したウェイト減衰では再現できなかった。
WN は深いネットワークにおいて不安定性を示し、活性化の正規化が完全には行われず、層ごとに出力ノルムが増加する可能性がある。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。