QUICK REVIEW

[论文解读] The Pitfalls of Simplicity Bias in Neural Networks

Harshay Shah, Kaustav Tamuly|arXiv (Cornell University)|Jun 13, 2020

Adversarial Robustness in Machine Learning参考文献 81被引用 88

一句话总结

本文形式化了在 SGD 训练的神经网络中的 Simplicity Bias (SB)，显示网络可能过度依赖最简单的预测特征，导致脆弱性、鲁棒性差，甚至泛化能力下降，并提供数据集和实验，展示跨架构和训练方法的这些坑点。

ABSTRACT

Several works have proposed Simplicity Bias (SB)---the tendency of standard training procedures such as Stochastic Gradient Descent (SGD) to find simple models---to justify why neural networks generalize well [Arpit et al. 2017, Nakkiran et al. 2019, Soudry et al. 2018]. However, the precise notion of simplicity remains vague. Furthermore, previous settings that use SB to theoretically justify why neural networks generalize well do not simultaneously capture the non-robustness of neural networks---a widely observed phenomenon in practice [Goodfellow et al. 2014, Jo and Bengio 2017]. We attempt to reconcile SB and the superior standard generalization of neural networks with the non-robustness observed in practice by designing datasets that (a) incorporate a precise notion of simplicity, (b) comprise multiple predictive features with varying levels of simplicity, and (c) capture the non-robustness of neural networks trained on real data. Through theory and empirics on these datasets, we make four observations: (i) SB of SGD and variants can be extreme: neural networks can exclusively rely on the simplest feature and remain invariant to all predictive complex features. (ii) The extreme aspect of SB could explain why seemingly benign distribution shifts and small adversarial perturbations significantly degrade model performance. (iii) Contrary to conventional wisdom, SB can also hurt generalization on the same data distribution, as SB persists even when the simplest feature has less predictive power than the more complex features. (iv) Common approaches to improve generalization and robustness---ensembles and adversarial training---can fail in mitigating SB and its pitfalls. Given the role of SB in training neural networks, we hope that the proposed datasets and methods serve as an effective testbed to evaluate novel algorithmic approaches aimed at avoiding the pitfalls of SB.

研究动机与目标

给出一个精确且可调的特征简单性与预测能力的定义来研究 SB。
设计模块化的合成数据集和基于图像的数据集，将简单特征与复杂预测特征结合起来。
理论上和实验上在不同架构和优化器上证明极端 SB。
将 SB 与非鲁棒性、分布变动和对抗性脆弱性联系起来，并评估常见的补救方法。

提出的方法

通过决策边界中线性片段的最小数量引入特征简单性的正式概念。
构建多维合成数据集（例如 LMS-k、L̂MS-k、MS-(5,7)、MS-5）以及将简单与复杂特征结合的 MNIST-CIFAR 图像数据集。
证明使用 SGD 训练的一隐藏层 ReLU 网络在 LSN 数据集上表现出 SB。
在 FCN、CNN 和 GRU 模型，以及不同的优化器和正则化方法下，实证展示 SB。
分析在 SB 下的鲁棒性、置信度估计和泛化，包括 UAP 迁移性分析。

实验结果

研究问题

RQ1在存在多种预测特征时，SGD 训练的模型是否对最简单的预测特征表现出偏好？
RQ2SB 在不同架构和训练设置下有多极端，它在简单特征边际较低时是否仍然存在？
RQ3SB 对在分布变动或对抗扰动下的鲁棒性、置信度估计和泛化有何影响？
RQ4集成或对抗训练能否缓解 SB 及其坑点？

主要发现

SB 可能极端：神经网络可以完全依赖最简单的特征而忽略复杂的预测特征。
极端的 SB 与对对抗扰动和分布变动的鲁棒性差相关。
当简单特征的预测性低于复杂特征时，SB 也会损害泛化。
在所提出的数据集上，集成和对抗训练不能可靠地缓解 SB。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。