QUICK REVIEW

[论文解读] Effective and Efficient Dropout for Deep Convolutional Neural Networks

Shaofeng Cai, Jinyang Gao|arXiv (Cornell University)|Apr 6, 2019

Advanced Neural Network Applications参考文献 49被引用 55

一句话总结

本文分析了用于 CNN 的 dropout 变体，识别了 BN 与 dropout 的冲突，并提出 Drop-Conv2d 及相关构建块（Drop-Neuron、Drop-Channel、Drop-Path），在跨越多种 CNN 架构时实现更好的正则化，开销最小。

ABSTRACT

Convolutional Neural networks (CNNs) based applications have become ubiquitous, where proper regularization is greatly needed. To prevent large neural network models from overfitting, dropout has been widely used as an efficient regularization technique in practice. However, many recent works show that the standard dropout is ineffective or even detrimental to the training of CNNs. In this paper, we revisit this issue and examine various dropout variants in an attempt to improve existing dropout-based regularization techniques for CNNs. We attribute the failure of standard dropout to the conflict between the stochasticity of dropout and its following Batch Normalization (BN), and propose to reduce the conflict by placing dropout operations right before the convolutional operation instead of BN, or totally address this issue by replacing BN with Group Normalization (GN). We further introduce a structurally more suited dropout variant Drop-Conv2d, which provides more efficient and effective regularization for deep CNNs. These dropout variants can be readily integrated into the building blocks of CNNs and implemented in existing deep learning platforms. Extensive experiments on benchmark datasets including CIFAR, SVHN and ImageNet are conducted to compare the existing building blocks and the proposed ones with dropout training. Results show that our building blocks improve over state-of-the-art CNNs significantly, which is mainly due to the better regularization and implicit model ensemble effect.

研究动机与目标

激发对深层 CNN 的稳健正则化，以对抗过拟合并提升泛化能力。
系统分析 dropout 变体（神经元、通道、路径）在 CNN 中及其与 Batch Normalization 及数据增强的相互作用。
开发统一的卷积构建块，将 dropout 有效且高效地整合到常见 CNN 架构中。
引入 Drop-Conv2d 作为一种可扩展、即插即用的正则化技术，可在推理阶段合并回去。
在标准基准（CIFAR, SVHN, ImageNet）上使用所提出的块显示广泛的实证收益。

提出的方法

将 CNN 转换表述为 split-transform-aggregate 框架，聚焦于通道级操作。
比较 dropout 变体（Drop-Neuron、Drop-Channel、Drop-Path）并分析它们与 Batch Normalization 和 Group Normalization 的相互作用。
在构建块中将 dropout 放在卷积之前，以减少梯度方差和方差漂移。
通过将每个通道连接复制到 P 条路径并对这些路径应用 dropout，在推理时重新聚合，提出 Drop-Conv2d。
提供集成 dropout 的卷积构建块（Drop-Neuron、Drop-Channel），便于在现有架构中采用。
在 CIFAR、SVHN 和 ImageNet 上对所提出的块进行评估，以展示性能提升。

实验结果

研究问题

RQ1dropout 如何与 CNN 中的 Batch Normalization 相互作用，以及为何标准 dropout 在卷积块中可能无效？
RQ2通道级和路径级 dropout（Drop-Channel、Drop-Path）是否比神经元级 dropout 在 CNN 正则化中更有效？
RQ3 dropout 能否以更小开销的方式整合到 CNN 块中以获得更好的正则化？
RQ4Drop-Conv2d 是否在 CIFAR、SVHN、ImageNet 等基准上提升了标准 CNN 架构的泛化能力？

主要发现

Drop-channel 与 drop-path 通常在 CNN 训练中优于 drop-neuron，因为它们与卷积通道结构及 BN 相互作用的对齐更好。
将 dropout 放在卷积层之前可减少方差漂移并稳定 BN，提升训练效率。
Drop-Conv2d，通过将通道连接复制成 P 路径并应用 dropout，提供更强的正则化，推理开销很小。
BN 在 dropout 下可能引入方差漂移，通过正确放置 dropout 或用 Group Normalization 来缓解。
实验表明，所提出的构建块在 CIFAR、SVHN、ImageNet 的最新 CNN 上取得显著的准确性提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。