QUICK REVIEW

[论文解读] Early-Learning Regularization Prevents Memorization of Noisy Labels

Sheng Liu, Jonathan Niles‐Weed|arXiv (Cornell University)|Jun 30, 2020

Machine Learning and Data Classification参考文献 53被引用 261

一句话总结

本文提出了早期学习正则化（ELR），通过利用早期学习动态和基于半监督目标的正则化来防止对嘈杂标签的记忆，在带有噪声标签的基准测试中取得具有竞争力甚至是最先进的结果。

ABSTRACT

We propose a novel framework to perform classification via deep learning in the presence of noisy annotations. When trained on noisy labels, deep neural networks have been observed to first fit the training data with clean labels during an "early learning" phase, before eventually memorizing the examples with false labels. We prove that early learning and memorization are fundamental phenomena in high-dimensional classification tasks, even in simple linear models, and give a theoretical explanation in this setting. Motivated by these findings, we develop a new technique for noisy classification tasks, which exploits the progress of the early learning phase. In contrast with existing approaches, which use the model output during early learning to detect the examples with clean labels, and either ignore or attempt to correct the false labels, we take a different route and instead capitalize on early learning via regularization. There are two key elements to our approach. First, we leverage semi-supervised learning techniques to produce target probabilities based on the model outputs. Second, we design a regularization term that steers the model towards these targets, implicitly preventing memorization of the false labels. The resulting framework is shown to provide robustness to noisy annotations on several standard benchmarks and real-world datasets, where it achieves results comparable to the state of the art.

研究动机与目标

激发对带噪声标签训练的研究，并将早期学习与记忆作为高维现象的基本要素进行识别。
开发一种基于正则化的方法，利用早期学习来减少对错误标签的记忆。
利用半监督目标估计与正则化，将梯度引导离开被记忆的错误标签。
在标准带噪声标签基准和真实世界数据集上经验性评估 ELR 与 ELR+，以与最先进方法进行比较。

提出的方法

作者分析了带噪声标签下交叉熵的梯度动态，并指出错误标签梯度在训练后期占主导地位。
他们提出一个正则化项 L_ELR，在交叉熵中添加 lambda/n ∑_i log(1 - ⟨p_i, t_i⟩)，其中 p_i 是模型输出，t_i 是目标概率估计。
目标 t_i 通过对过去模型输出的时序集成计算，并且可以通过权重平均和双网络结构来增强（ELR+）。
他们给出一个梯度形式，表明 ELR 向梯度添加一个修正项 g^i，从而增强干净标签的影响并抑制错误标签的影响。
目标概率 t_i 通过动量 beta 更新，并且可以来自时序集成或均值加权平均。
他们还将数据增强（mixup）和网络集成为 ELR+ 的组成部分，以提高鲁棒性。

实验结果

研究问题

RQ1是否可以将早期学习动力学形式化为带噪声标签的高维分类中的一个基本现象？
RQ2一种与早期学习动力学对齐的正则化方法是否能够在不依赖样本选择的情况下防止对噪声标签的记忆？
RQ3如何将来自模型输出的目标概率并入梯度，以使学习偏向干净标签示例？
RQ4ELR及其增强变体 ELR+ 是否在 CIFAR-10/100、Clothing1M 和 WebVision 上相对于最先进的带噪声标签方法达到具有竞争力的性能？

主要发现

ELR 在 CIFAR-10/100 上对称噪声和非对称噪声的鲁棒性持续提升，优于若干基线。
ELR+ 通过将时序集成、权重平均、双网络和 mixup 数据增强结合在一起，进一步提升性能，在 Clothing1M 上达到最先进的性能，在 WebVision 上则具有竞争力的结果。
理论分析表明，初期学习支配干净标签的梯度，但后期可能被错误标签梯度主导；ELR 的梯度修正抵消了记忆。
消融研究表明每个组件（时序集成、权重平均、双网络、mixup）独立地对提升做出贡献，尤其在更高噪声水平时。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。