QUICK REVIEW

[論文レビュー] Rethinking the Usage of Batch Normalization and Dropout in the Training of Deep Neural Networks

Guangyong Chen, Pengfei Chen|arXiv (Cornell University)|May 15, 2019

Advanced Neural Network Applications参考文献 21被引用数 39

ひとこと要約

この論文は、バッチ正規化とドロップアウトを組み合わせて層入力をより独立させる Independent-Component (IC) レイヤを提案し、重み層の前に配置することで CIFAR-10/100 および ImageNet のテストで収束を速め、安定性を改善する。

ABSTRACT

In this work, we propose a novel technique to boost training efficiency of a neural network. Our work is based on an excellent idea that whitening the inputs of neural networks can achieve a fast convergence speed. Given the well-known fact that independent components must be whitened, we introduce a novel Independent-Component (IC) layer before each weight layer, whose inputs would be made more independent. However, determining independent components is a computationally intensive task. To overcome this challenge, we propose to implement an IC layer by combining two popular techniques, Batch Normalization and Dropout, in a new manner that we can rigorously prove that Dropout can quadratically reduce the mutual information and linearly reduce the correlation between any pair of neurons with respect to the dropout layer parameter $p$. As demonstrated experimentally, the IC layer consistently outperforms the baseline approaches with more stable training process, faster convergence speed and better convergence limit on CIFAR10/100 and ILSVRC2012 datasets. The implementation of our IC layer makes us rethink the common practices in the design of neural networks. For example, we should not place Batch Normalization before ReLU since the non-negative responses of ReLU will make the weight layer updated in a suboptimal way, and we can achieve better performance by combining Batch Normalization and Dropout together as an IC layer.

研究の動機と目的

各層の入力をホワイトニングさせて訓練効率を向上させる動機付け。
BatchNormとDropoutを基にしたICレイヤを導入し、ニューロン間の依存性を低減する。
ICを重み層の前に配置することで収束と汎化性能が向上することを示す。
CIFAR-10/100およびILSVRC2012のResNet系で訓練の安定性が向上することを示す。

提案手法

Independent Component (IC) レイヤを、各重み層の前に配置された BatchNorm の後に Dropout を適用して定義する。
Dropout がニューロン出力間の相互情報を因子 p^2、相関を因子 p だけ削減することを理論的に証明する。
Dropout 後の情報保持とそれが訓練ダイナミクスに与える影響の正式な分析を提供する。
従来の正規化/活性化の順序の代わりにICレイヤを挿入してResNetアーキテクチャを再構成する。
制御された dropout (p) と学習スケジュールを用いた修正版 ResNet/ResNet-B で CIFAR-10/100 および ILSVRC2012 を経験的に評価する。

実験結果

リサーチクエスチョン

RQ1重み層の前に BatchNorm と Dropout を IC レイヤとして組み合わせると訓練の安定性と収束速度が向上するか？
RQ2ICレイヤは訓練中の相互情報とニューロン間相関にどのような影響を与えるか？
RQ3IC強化ResNetは CIFAR-10/100 および ImageNet における精度と収束挙動の点で従来のResNetを上回るか？
RQ4BatchNormを ReLU および重み層の配置に対してどう配置するかという実用的含意は？

主な発見

ICレイヤはニューロン出力間の相互情報を因子 p^2、ニューロン対間の相関を因子 p だけ低減する。
IC強化ResNetはベースラインと比較して訓練がより安定し、収束が速く、CIFAR-10/100でより良い収束上限を示す。
ILSVRC2012では、IC対応ResNet系は標準的な訓練プロトコル下で同等のベースラインより収束と検証性能が改善される。
本研究は BatchNorm を活性化の前に置く従来の配置に異議を唱え、より速い更新とより良い最適化ダイナミクスのために IC を重み層の前に配置すべきと主張する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。