QUICK REVIEW

[论文解读] Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization

Hyeonwoo Noh, Tackgeun You|arXiv (Cornell University)|Oct 14, 2017

Domain Adaptation and Few-Shot Learning参考文献 38被引用 78

一句话总结

论文将通过噪声进行正则化的思想（如 dropout）解释为在边缘似然的下界上进行优化，并提出带有每个训练样本多个噪声样本的重要性加权随机梯度下降（IWSGD）以收紧该下界并提升泛化能力。

ABSTRACT

Overfitting is one of the most critical challenges in deep neural networks, and there are various types of regularization methods to improve generalization performance. Injecting noises to hidden units during training, e.g., dropout, is known as a successful regularizer, but it is still not clear enough why such training techniques work well in practice and how we can maximize their benefit in the presence of two conflicting objectives---optimizing to true data distribution and preventing overfitting by regularization. This paper addresses the above issues by 1) interpreting that the conventional training methods with regularization by noise injection optimize the lower bound of the true objective and 2) proposing a technique to achieve a tighter lower bound using multiple noise samples per training example in a stochastic gradient descent iteration. We demonstrate the effectiveness of our idea in several computer vision applications.

研究动机与目标

对噪声基础正则化进行概率学解释，将其视为边际似然的下界的最小化。
引入并推导带有每个训练样本多个噪声样本的重要性加权随机梯度下降（IWSGD）。
将该方法针对 dropout 进行具体化，并在视觉任务中展示更好的泛化能力。
证明增加每个样本的噪声样本数量可以收紧界限，并在 CIFAR 数据集上接近状态-of-the-art 的结果。

提出的方法

将带噪声注入的隐藏单元视为随机激活，推导噪声上的边际似然。
应用重参数化技巧将目标函数改写为对噪声样本的边际（等式 3）。
使用多个噪声样本通过重要性权重归一化来导出 IWSGD 目标作为边际似然的下界（等式 4）。
将梯度计算为对样本的加权平均，权重采用归一化的重要性权重（等式 7、等式 8）。
推断阶段采用标准 dropout 风格的缩放（测试时不再进行额外取样）。
将方法专门化为 dropout：通过对每个训练样本采样多个 dropout 掩码并对梯度贡献进行加权。

实验结果

研究问题

RQ1将隐藏单元中注入噪声是否会优化真实目标的下界，并且通过为每个训练样本使用多个噪声样本是否可以收紧该下界？
RQ2对多噪声样本进行重要性加权（IWSGD）是否能比标准 dropout 训练在泛化方面有所提高？
RQ3所提出的训练方法是否易于与现有基于 dropout 的模型集成，并在视觉任务上提升性能？
RQ4增加每个样本的噪声样本数量是否在不需要架构改变的前提下持续提升性能？

主要发现

将注入噪声的隐藏单元解释为随机激活，并显示标准 dropout 在边际似然的下界上进行优化。
提出并推导使用多个噪声样本来收紧边界的 IWSGD（S>1）。
在多样本情况下，IWSGD 往往优于标准 dropout 的准确率，并且对 CIFAR 的 Wide ResNet 的 dropout 率不敏感。
在将 IWSGD（S=8）应用于 CIFAR-10/100 的 Wide ResNet 时，实验结果接近 CIFAR 的状态-of-the-art。
IWSGD 在 VQA、图像描述和动作识别基准测试中表现出改进，并且多次实验中随着 S 的增加获得持续的增益。
单独增加迭代次数（×4 次迭代）并不始终优于多样本 IWSGD 的方法。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。