QUICK REVIEW

[論文レビュー] Micro-Batch Training with Batch-Channel Normalization and Weight Standardization

Siyuan Qiao, Huiyu Wang|arXiv (Cornell University)|Mar 25, 2019

Intravenous Infusion Technology and Safety参考文献 60被引用数 123

ひとこと要約

論文は Weight Standardization (WS) と Batch-Channel Normalization (BCN) を導入し、効果的なマイクロバッチ学習を可能にすることを示し、損失ランドスケープの理論的な平滑化と視覚タスク全般での経験的な向上を示す。WSとBCNは大規模バッチサイズなしで BN ライクな利点を再現することを目指す。

ABSTRACT

Batch Normalization (BN) has become an out-of-box technique to improve deep network training. However, its effectiveness is limited for micro-batch training, i.e., each GPU typically has only 1-2 images for training, which is inevitable for many computer vision tasks, e.g., object detection and semantic segmentation, constrained by memory consumption. To address this issue, we propose Weight Standardization (WS) and Batch-Channel Normalization (BCN) to bring two success factors of BN into micro-batch training: 1) the smoothing effects on the loss landscape and 2) the ability to avoid harmful elimination singularities along the training trajectory. WS standardizes the weights in convolutional layers to smooth the loss landscape by reducing the Lipschitz constants of the loss and the gradients; BCN combines batch and channel normalizations and leverages estimated statistics of the activations in convolutional layers to keep networks away from elimination singularities. We validate WS and BCN on comprehensive computer vision tasks, including image classification, object detection, instance segmentation, video recognition and semantic segmentation. All experimental results consistently show that WS and BCN improve micro-batch training significantly. Moreover, using WS and BCN with micro-batch training is even able to match or outperform the performances of BN with large-batch training.

研究の動機と目的

1～2 枚の画像がGPUあたりのマイクロバッチで処理される場合でもうまく機能する正規化技術の必要性を喚起する。
BNに類似した利点（損失ランドスケープの平滑化と elimination singularities の回避）をマイクロバッチ規模へ橋渡しする。
Convolutional weights を標準化する WS を提案し、BCN を導入してバッチとチャネル統計を組み合わせ、学習の安定性と性能を向上させる。
さまざまなコンピュータビジョンタスクで WS と BCN を評価し、実用的な利得を検証する。

提案手法

Weight Standardization (WS) を提案する：畳み込みのウェイトを WS(W) として再パラメータ化し、W を出力チャネルごとに平均ゼロ・分散1になるよう標準化する。
Batch-Channel Normalization (BCN) を導入する：バッチ統計とチャネルごとの統計を組み合わせて活性化の平均と分散を推定する。
WS が損失および勾配の Lipschitz 定数を低減し、損失ランドスケープを平滑化することを示す理論解析を提供する。
elimination singularities を分析し、BN が活性化をそれらの特異点から一意に離すことを示す。WS/BCN が同様の性質をマイクロバッチ設定へ拡張すると主張する。
WS を Weight Normalization (WN) および Centered Weight Normalization (CWN) と比較する。
WS+BCN が大規模バッチのBNおよびマイクロバッチのGNと比較して同等またはそれ以上の性能を示すことを示す。

実験結果

リサーチクエスチョン

RQ1WSとBCNはマイクロバATCH学習においてBNの利点（損失ランドスケープの平滑化と elimination singularities の回避）を再現できるか？
RQ2小さなバッチサイズで、WSとBCNは多様な視覚タスクにおいてトレーニング速度と最終精度を改善するか？
RQ3既存の正規化方法（GN/LN）および大規模バッチのBNと比較して、WSとBCNはどうか？
RQ4WSが Lipschitz 定数および elimination singularities に与える理論的影響は何か？
RQ5一般的なCNNアーキテクチャで、標準の正規化層に WSとBCN を後続させた場合は効果的か？

主な発見

WS は損失と勾配の Lipschitz 定数を低下させ、最適化ランドスケープを平滑化する。
WSとBCNはネットワークを elimination singularities から遠ざけ、学習の安定性を向上させる。
GN+WS は、いくつかのタスクで大規模バッチの BN と同等かそれ以上を達成できる。
BCN は大規模バッチとマイクロバッチ設定の両方で GN や BN より追加の性能向上を提供する。
実証評価は画像分類、物体検出、インスタンス分割、ビデオ認識、セマンティック分割をカバーし、WSとBCNを用いた一貫した改善を示す。）

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。