[论文解读] Sensitivity and Generalization in Neural Networks: an Empirical Study
论文在经验上把神经网络泛化与输入-输出 Jacobian 敏感性及线性区域转变联系起来,显示在数据流形附近的鲁棒性与在多种架构和设定下的更好泛化相关。
In practice it is often found that large over-parameterized neural networks generalize better than their smaller counterparts, an observation that appears to conflict with classical notions of function complexity, which typically favor smaller models. In this work, we investigate this tension between complexity and generalization through an extensive empirical exploration of two natural metrics of complexity related to sensitivity to input perturbations. Our experiments survey thousands of models with various fully-connected architectures, optimizers, and other hyper-parameters, as well as four different image classification datasets. We find that trained neural networks are more robust to input perturbations in the vicinity of the training data manifold, as measured by the norm of the input-output Jacobian of the network, and that it correlates well with generalization. We further establish that factors associated with poor generalization $-$ such as full-batch training or using random labels $-$ correspond to lower robustness, while factors associated with good generalization $-$ such as data augmentation and ReLU non-linearities $-$ give rise to more robust functions. Finally, we demonstrate how the input-output Jacobian norm can be predictive of generalization at the level of individual test points.
研究动机与目标
- 调查过参数网络中模型容量与泛化之间的张力。
- 定义并评估与输入扰动相关的敏感性度量。
- 检查不同架构、优化器和超参数下敏感性与泛化之间的关系。
- 评估敏感性度量是否可以在单个测试点层面预测泛化。
提出的方法
- 为全连接网络定义两种敏感性度量:softmax 输出的 Jacobian 范数以及沿输入轨迹的线性区域转变数量。
- 计算测试点周围的 average Jacobian Frobenius norm 以衡量局部敏感性。
- 通过对接近数据流形的轨迹编码神经元激活模式来计数线性区域之间的转变。
- 使用通过训练点的圆形轨迹和椭圆来比较数据流形内外的敏感性。
- 在不同影响泛化的因素下分析敏感性(例如数据增强、标签质量、ReLU 与饱和激活、mini-batch 与全批量训练)。
- 在多个图像分类数据集上对数千个全连接模型进行大规模实验。
实验结果
研究问题
- RQ1神经网络泛化是否与输入-输出 Jacobian 的敏感性相关?
- RQ2影响泛化的因素(例如数据增强、标签、激活函数、批量大小)如何影响敏感性?
- RQ3敏感性是否在单个测试点层面对泛化具有预测性?
- RQ4在评估在架构和优化超参数上不同的模型时,敏感性度量如何比较?
主要发现
- Jacobian 范数在多样设置和数据集上与泛化相关。
- 敏感性在数据流形之外更高,在训练数据点附近降低,指示该区域的函数更鲁棒。
- 改善泛化的因素(正确的标签、数据增强、ReLU 激活、mini-batch 优化)通常伴随敏感性降低。
- 仅凭转变密度不足以比较不同规模的网络;架构规模会影响转变数量。
- 单个测试点的 Jacobian 范数与交叉熵损失相关,表明在逐点层面对主动学习和置信估计具有预测效用。
- 该研究提供了大量实证证据,将学习函数的局部几何结构与图像分类的泛化联系起来。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。