[论文解读] Overfitting Mechanism and Avoidance in Deep Neural Networks
本文分析了深度神经网络的过拟合,来自持续的梯度更新和 softmax 输入缩放,并提出一种基于共识的分类算法,利用多个模型来识别并拒绝模棱两可的分类样本,在小规模训练集下提高准确性。
Assisted by the availability of data and high performance computing, deep learning techniques have achieved breakthroughs and surpassed human performance empirically in difficult tasks, including object recognition, speech recognition, and natural language processing. As they are being used in critical applications, understanding underlying mechanisms for their successes and limitations is imperative. In this paper, we show that overfitting, one of the fundamental issues in deep neural networks, is due to continuous gradient updating and scale sensitiveness of cross entropy loss. By separating samples into correctly and incorrectly classified ones, we show that they behave very differently, where the loss decreases in the correct ones and increases in the incorrect ones. Furthermore, by analyzing dynamics during training, we propose a consensus-based classification algorithm that enables us to avoid overfitting and significantly improve the classification accuracy especially when the number of training samples is limited. As each trained neural network depends on extrinsic factors such as initial values as well as training data, requiring consensus among multiple models reduces extrinsic factors substantially; for statistically independent models, the reduction is exponential. Compared to ensemble algorithms, the proposed algorithm avoids overgeneralization by not classifying ambiguous inputs. Systematic experimental results demonstrate the effectiveness of the proposed algorithm. For example, using only 1000 training samples from MNIST dataset, the proposed algorithm achieves 95% accuracy, significantly higher than any of the individual models, with 90% of the test samples classified.
研究动机与目标
- 解释过拟合如何在深度神经网络中产生,超越简单的数据量。
- 证明持续的梯度更新和 softmax 输入的缩放会推动验证损失上升。
- 提出一种基于共识的分类算法,通过拒绝模棱两可的样本来避免过拟合。
- 表明跨越多个模型的共识可以降低外在因素、提高内在准确性,尤其在小规模训练集下。
提出的方法
- 通过在 MNIST 上对训练网络解进行插值来对良好解的丰富性进行经验分析。
- 观察与分析训练动态表明由于 softmax 输入的缩放效应,训练损失在下降而验证损失在上升。
- 开发一种基于共识的分类算法(算法1),使用来自多个模型的概率来决定分类或拒绝模棱两可的样本。
- 对多种架构和数据集进行实验,以评估内在(一致分类)与外在(随机因素)分类的差异。
- 评估不同阈值 p_t 对内在准确性和 CCS 样本比例的影响。
- 与单一模型性能的比较以及对 dropout 对 CCS 结果影响的探索。
实验结果
研究问题
- RQ1为什么在参数过量并存在大量良好解的情况下,深度神经网络仍会过拟合?
- RQ2跨越多个模型的共识方法是否能够识别并拒绝过度泛化或模棱两可的样本,以在有限数据下改善泛化?
- RQ3正确与错误分类样本的训练动态如何与 softmax 输入缩放和交叉熵损失相关?
- RQ4模型多样性(不同架构)和正则化(dropout)对内在分类准确性有何影响?
主要发现
- 过拟合可能表现为训练损失下降而验证损失上升,这是由于持续的梯度更新使 softmax 输入的幅度增大。
- 错误分类样本推动验证损失的上升,而正确分类样本的损失在下降。
- 基于共识的分类方法能够对一致分类的样本进行分类并拒绝模棱两可的样本,尤其在小型训练集下提升内在准确性。
- 使用阈值参数 p_t,该方法相对于单一模型提高了内在准确性以及一致分类样本(CCS)所占比例。
- 在数据有限的情况下(例如使用 1000 个训练样本的 MNIST),该方法可带来显著的准确率提升,并在不同架构间表现出鲁棒性。
- Dropout 和类似集成的动态会影响 CCS,但即使在正则化变化时,共识方法仍可超越单一模型。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。