Skip to main content
QUICK REVIEW

[论文解读] To Drop or Not to Drop: Robustness, Consistency and Differential Privacy Properties of Dropout

Prateek Jain, Vivek Kulkarni|arXiv (Cornell University)|Mar 6, 2015
Privacy-Preserving Technologies in Data参考文献 27被引用 25
一句话总结

本文通过证明dropout在单隐层神经网络中能增强鲁棒性和一致性,并在凸经验风险最小化(ERM)中起到稳定正则化器的作用,建立了深度学习中dropout的理论基础。研究证明,dropout可实现快速泛化误差率,并在不依赖强凸性的情况下实现差分隐私学习,在基准数据集上的实证评估中表现优于L2正则化。

ABSTRACT

Training deep belief networks (DBNs) requires optimizing a non-convex function with an extremely large number of parameters. Naturally, existing gradient descent (GD) based methods are prone to arbitrarily poor local minima. In this paper, we rigorously show that such local minima can be avoided (upto an approximation error) by using the dropout technique, a widely used heuristic in this domain. In particular, we show that by randomly dropping a few nodes of a one-hidden layer neural network, the training objective function, up to a certain approximation error, decreases by a multiplicative factor. On the flip side, we show that for training convex empirical risk minimizers (ERM), dropout in fact acts as a "stabilizer" or regularizer. That is, a simple dropout based GD method for convex ERMs is stable in the face of arbitrary changes to any one of the training points. Using the above assertion, we show that dropout provides fast rates for generalization error in learning (convex) generalized linear models (GLM). Moreover, using the above mentioned stability properties of dropout, we design dropout based differentially private algorithms for solving ERMs. The learned GLM thus, preserves privacy of each of the individual training points while providing accurate predictions for new test points. Finally, we empirically validate our stability assertions for dropout in the context of convex ERMs and show that surprisingly, dropout significantly outperforms (in terms of prediction accuracy) the L2 regularization based methods for several benchmark datasets.

研究动机与目标

  • 从理论上解释为何dropout有助于避免深度置信网络(DBNs)在非凸优化中陷入不良局部极小值。
  • 建立dropout在凸ERM设置中作为稳定正则化器的理论基础,确保对训练数据扰动的鲁棒性。
  • 设计一种新颖的基于dropout的差分隐私学习算法,适用于凸ERM,且无需强凸性假设。
  • 通过多个数据集和模型类型,实证验证dropout在稳定性与泛化性能方面相较于L2正则化的优越性。

提出的方法

  • 证明对于单隐层神经网络,当远离最优值时,dropout以常数概率使目标函数乘法性减小。
  • 分析dropout在凸ERM中诱导了一种类加权L2正则化形式,从而实现快速过剩风险率。
  • 利用dropout在训练数据移除(LOO稳定性)下的算法稳定性,构建差分隐私学习算法。
  • 仅需对Hessian矩阵最小特征值的期望值提供下界即可获得隐私保证,避免了对强凸性的依赖。
  • 在实验中采用确定性和标准dropout变体,比较在随机和对抗性训练数据移除下的稳定性。
  • 通过测试误差的边际误差(当训练数据被部分移除时误差的差异)来衡量稳定性,涵盖逻辑回归、线性回归和DBNs。

实验结果

研究问题

  • RQ1在何种条件下,dropout可防止非凸深度学习中收敛至不良局部极小值?
  • RQ2dropout如何影响凸经验风险最小化(ERM)问题中的稳定性与泛化误差?
  • RQ3是否可利用dropout设计无需强凸性的差分隐私学习算法?
  • RQ4与L2正则化相比,dropout在训练数据扰动下的鲁棒性如何?

主要发现

  • 当远离最优解时,dropout在单隐层网络中以常数概率使训练目标函数乘法性减小,为避免不良局部极小值提供了理论依据。
  • 在凸ERM设置中,dropout诱导的过剩风险率与加权L2正则化相当,且泛化界比以往工作更紧密。
  • 基于dropout的算法在不依赖强凸性的情况下实现差分隐私,仅需Hessian矩阵最小特征值期望值的下界。
  • 实证结果表明,无论在随机还是对抗性数据移除下,dropout在逻辑回归和线性回归任务中均表现出比L2正则化更强的稳定性。
  • 在Atheist数据集上,基于dropout的模型准确率高于L2正则化模型,且在高达50%的数据被移除时优势依然存在。
  • 在MNIST数据集上,当仅使用50%的训练数据时,dropout将测试准确率相比标准SGD提升了16%,展现出强大的鲁棒性与泛化能力。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。