QUICK REVIEW

[论文解读] An Empirical Study of Example Forgetting during Deep Neural Network Learning

Mariya Toneva, Alessandro Sordoni|arXiv (Cornell University)|Dec 12, 2018

Domain Adaptation and Few-Shot Learning参考文献 37被引用 199

一句话总结

本文定义并分析在 SGD 训练中对单个训练样本的遗忘事件，发现不可忘记的样本和易忘记的样本、跨体系结构的稳定性，以及移除易忘记样本往往能保持泛化。

ABSTRACT

Inspired by the phenomenon of catastrophic forgetting, we investigate the learning dynamics of neural networks as they train on single classification tasks. Our goal is to understand whether a related phenomenon occurs when data does not undergo a clear distributional shift. We define a `forgetting event' to have occurred when an individual training example transitions from being classified correctly to incorrectly over the course of learning. Across several benchmark data sets, we find that: (i) certain examples are forgotten with high frequency, and some not at all; (ii) a data set's (un)forgettable examples generalize across neural architectures; and (iii) based on forgetting dynamics, a significant fraction of examples can be omitted from the training data set while still maintaining state-of-the-art generalization performance.

研究动机与目标

研究在单任务学习过程中的遗忘现象是否会出现，是否类似灾难性遗忘。
描述跨数据集和体系结构的遗忘事件的分布及其性质。
评估移除易忘记或不可忘记示例是否影响泛化与数据效率。

提出的方法

将遗忘事件定义为一个样本在 SGD 训练过程中从正确转变为错误的时刻。
在训练进行时，使用小批量更新计算逐样本的遗忘统计。
在 MNIST、permuted MNIST 和 CIFAR-10 上，使用 CNN、ResNet 和 WideResNet 架构进行评估。
分析遗忘事件与错分边际之间的相关性。
尝试移除按遗忘事件排序的一部分数据，以测试数据效率。

实验结果

研究问题

RQ1神经网络在单一任务中是否对单个训练样本出现遗忘事件？
RQ2是否有某些样本在不同随机种子和架构下都不可忘记，遗忘模式是否在不同模型之间泛化？
RQ3遗忘 dynamics 是否能识别信息性样本与嘈杂/离群样本，移除这些样本如何影响泛化？

主要发现

许多样本不可忘记，在随机种子之间稳定，与不同架构之间相关。
最易忘记的样本往往标签有噪声或特征不常见且在视觉上模糊。
移除大部分最不易忘记的样本不会损害泛化；在有策略地选择数据时，移除最易忘记的样本的降级较小。
对于 CIFAR-10，可以基于遗忘移除高达 30-35% 的数据而不会产生显著性能下降。
易忘记的样本往往位于决策边界附近，表现得像与 SVM 支持向量相似的数据点。
遗忘统计在不同训练轮次和架构之间保持稳定，允许在模型之间传递遗忘排序。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。