QUICK REVIEW

[论文解读] Rethinking the Usage of Batch Normalization and Dropout in the Training of Deep Neural Networks

Guangyong Chen, Pengfei Chen|arXiv (Cornell University)|May 14, 2019

Advanced Neural Network Applications被引用 61

一句话总结

提出一个独立成分（IC）层，将Batch Normalization和Dropout结合，以对每一层的输入进行白化，放在权重层之前，在CIFAR和ImageNet上的ResNet变体中实现更快收敛和更好的泛化。

ABSTRACT

In this work, we propose a novel technique to boost training efficiency of a neural network. Our work is based on an excellent idea that whitening the inputs of neural networks can achieve a fast convergence speed. Given the well-known fact that independent components must be whitened, we introduce a novel Independent-Component (IC) layer before each weight layer, whose inputs would be made more independent. However, determining independent components is a computationally intensive task. To overcome this challenge, we propose to implement an IC layer by combining two popular techniques, Batch Normalization and Dropout, in a new manner that we can rigorously prove that Dropout can quadratically reduce the mutual information and linearly reduce the correlation between any pair of neurons with respect to the dropout layer parameter $p$. As demonstrated experimentally, the IC layer consistently outperforms the baseline approaches with more stable training process, faster convergence speed and better convergence limit on CIFAR10/100 and ILSVRC2012 datasets. The implementation of our IC layer makes us rethink the common practices in the design of neural networks. For example, we should not place Batch Normalization before ReLU since the non-negative responses of ReLU will make the weight layer updated in a suboptimal way, and we can achieve better performance by combining Batch Normalization and Dropout together as an IC layer.

研究动机与目标

通过使层输入更独立来复兴白化思想，而不是严格的去相关化激活预处理。
开发一个计算高效的IC层，将BatchNorm和Dropout结合起来，以降低互信息和成对相关性。
在现代卷积神经网络架构（ResNet变体）上演示该技术，覆盖CIFAR-10/100和ILSVRC2012（ImageNet）。
通过重新思考BatchNorm和激活在权重层关系中的放置，为神经网络设计提供指导。

提出的方法

将独立成分（IC）层定义为在每个权重层之前应用的BatchNorm后接Dropout。
理论地证明Dropout将神经元对之间的互信息降低一个因子p^2，相关性降低一个因子p。
论证并证明将IC层放在权重层之前比传统的BatchNorm-before-activation设置能使收敛更快、训练更稳定。
修改ResNet结构以整合IC层，并确保参数数量可比，以实现公平比较。
在CIFAR-10/100和ILSVRC2012上进行实证验证，报告训练稳定性、收敛速度和泛化性。

实验结果

研究问题

RQ1在权重层前将BatchNorm和Dropout结合成IC层，是否比标准的BatchNorm放置在激活之前的方案提高训练稳定性和收敛速度？
RQ2在训练过程中，IC层如何影响神经元激活之间的互信息和相关性？
RQ3IC层是否能在不显著增加模型复杂度的前提下提升ResNet家族架构在CIFAR-10/100和ImageNet上的性能？
RQ4在大规模数据集上，IC层对收敛行为和最终泛化的经验影响是什么？

主要发现

The IC layer can reduce mutual information between any two neurons by a factor of p^2 and reduce their correlation by a factor of p.
IC-based ResNet variants show more stable training, faster convergence, and better convergence limits on CIFAR-10/100 compared to baselines.
Among IC-augmented residual units, the ReLU-IC-Conv2D configuration often provides the most stable training and strongest accuracy gains on CIFAR datasets.
On ILSVRC2012 (ImageNet), the IC-layer implementation demonstrates faster convergence and better convergence behavior compared to a cited dropout/BathNorm baseline approach.
Overall, placing the IC layer before the weight layer, rather than before activation, yields practical training benefits and challenges the traditional BatchNorm-before-activation practice.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。