QUICK REVIEW

[论文解读] Adding noise to the input of a model trained with a regularized objective

Salah Rifai, Xavier Glorot|arXiv (Cornell University)|Apr 16, 2011

Gaussian Processes and Bayesian Inference参考文献 10被引用 65

一句话总结

该论文提出了一种新颖的正则化技术，通过添加输入噪声并显式惩罚模型关于输入的雅可比矩阵的L2-范数，从而提升神经网络的泛化性能。利用噪声目标函数的二阶泰勒展开，该方法近似并控制了高阶正则化项（尤其是海森矩阵惩罚），而无需显式计算，从而在计算开销极小的情况下实现了更高的鲁棒性和测试准确率。

ABSTRACT

Regularization is a well studied problem in the context of neural networks. It is usually used to improve the generalization performance when the number of input samples is relatively small or heavily contaminated with noise. The regularization of a parametric model can be achieved in different manners some of which are early stopping (Morgan and Bourlard, 1990), weight decay, output smoothing that are used to avoid overfitting during the training of the considered model. From a Bayesian point of view, many regularization techniques correspond to imposing certain prior distributions on model parameters (Krogh and Hertz, 1991). Using Bishop's approximation (Bishop, 1995) of the objective function when a restricted type of noise is added to the input of a parametric function, we derive the higher order terms of the Taylor expansion and analyze the coefficients of the regularization terms induced by the noisy input. In particular we study the effect of penalizing the Hessian of the mapping function with respect to the input in terms of generalization performance. We also show how we can control independently this coefficient by explicitly penalizing the Jacobian of the mapping function on corrupted inputs.

研究动机与目标

提升在有限或含噪声数据集上训练的神经网络的泛化性能。
提供一种理论基础扎实的方法，用于控制由输入噪声引起的高阶正则化项。
实现对模型映射函数中雅可比矩阵和海森矩阵范数的独立控制。
相比显式计算高阶导数，显著降低计算成本，同时保持正则化优势。

提出的方法

利用损失函数在噪声输入附近的泰勒展开，近似输入污染的影响。
推导出由输入噪声引起的惩罚项，其为模型输出关于输入的雅可比矩阵和海森矩阵的函数。
显式惩罚模型关于输入的雅可比矩阵的L2-范数，以在小输入扰动下实现局部不变性。
应用Bishop（1995）的近似方法，将输入噪声与有效正则化目标函数联系起来。
通过独立缩放雅可比矩阵和海森矩阵惩罚的超参数，控制正则化强度。
采用弱噪声极限近似，推导出无需直接计算高阶导数的解析可处理的正则化项。

实验结果

研究问题

RQ1向输入添加噪声如何影响目标函数中的正则化项？
RQ2能否在不显式计算的情况下，有效正则化模型输出的高阶导数（如海森矩阵）？
RQ3联合惩罚雅可比矩阵和海森矩阵范数对泛化能力和鲁棒性有何影响？
RQ4与权重衰减或早停等标准正则化技术相比，该方法在测试误差方面表现如何？
RQ5是否可以独立调节正则化，以控制训练点附近损失曲面的平坦度？

主要发现

在MNIST数据集上，该方法的测试误差为1.19%，优于标准MLP（1.82%）及其他正则化变体。
在MNIST-BINARY数据集上，噪声与雅可比矩阵联合正则化将误差降低至1.51%，而标准MLP为2.01%。
如图2所示，正则化模型对输入污染表现出更强的鲁棒性，噪声输入下的泛化误差更低。
MNIST上的激活直方图显示，正则化模型的激活集中在线性区和饱和区，表明其表示更平坦且更稳定。
理论分析证实，输入噪声会引入涉及雅可比矩阵和海森矩阵的正则化项，且可通过显式惩罚项独立控制。
该方法为显式高阶导数计算提供了计算高效的替代方案，与标准训练相比仅增加微小开销。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。