QUICK REVIEW

[论文解读] Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels

Lu Jiang, Di Huang|arXiv (Cornell University)|Nov 21, 2019

Machine Learning and Data Classification被引用 123

一句话总结

论文首次提出针对受控真实世界（网页）标签噪声的基准测试，提出 MentorMix 以同时处理合成和真实噪声标签，并给出大规模发现，揭示在不同噪声类型、水平、体系结构和训练设置下，深度神经网络在有噪声标签时的学习模式。

ABSTRACT

Performing controlled experiments on noisy data is essential in understanding deep learning across noise levels. Due to the lack of suitable datasets, previous research has only examined deep learning on controlled synthetic label noise, and real-world label noise has never been studied in a controlled setting. This paper makes three contributions. First, we establish the first benchmark of controlled real-world label noise from the web. This new benchmark enables us to study the web label noise in a controlled setting for the first time. The second contribution is a simple but effective method to overcome both synthetic and real noisy labels. We show that our method achieves the best result on our dataset as well as on two public benchmarks (CIFAR and WebVision). Third, we conduct the largest study by far into understanding deep neural networks trained on noisy labels across different noise levels, noise types, network architectures, and training settings. The data and code are released at the following link: http://www.lujiang.info/cnlw.html

研究动机与目标

通过对网页图像在多个噪声水平上的标注，开发一个面向网页（真实世界）标签噪声的受控基准测试。
提出并验证一种鲁棒学习方法（MentorMix），在没有干净标签的情况下同时处理合成和真实噪声标签。
在公开的合成和真实世界噪声标签基准上，以实证方式将 MentorMix 与最先进基线进行比较。
分析在不同噪声类型、水平、架构和训练设置下，当用有噪声标签训练时，DNN 的行为模式，以深化对该领域的理解。

提出的方法

引入 MentorMix，一种基于经验临近风险最小化并结合课程学习的鲁棒损失。
通过 MentorNet 为每个训练样本计算一个最优潜在权重，指导对样本进行加权 mixup。
使用实用的重要性采样方案来选择偏向低损失样本的混合对。
利用每个样本损失的移动平均分位值来调节样本权重并稳定训练。
证明 MentorMix 在受控的 red（网页）和 blue（合成）噪声上，跨数据集和训练设置均优于基线。

实验结果

研究问题

RQ1是否可以在受控设定中构建并使用受控网页标签噪声来研究在多种噪声水平下的DNN？
RQ2MentorMix 方法是否在无干净标签可用的情况下，对合成和真实世界噪声标签都提供鲁棒的性能？
RQ3在具有合成和真实世界噪声标签的公开基准上，MentorMix 的性能相对于最先进方法如何？
RQ4在不同噪声类型、水平、架构和训练方案下，用有噪声标签训练时，DNN 的更广泛行为模式是什么？

主要发现

建立了首个受控网页标签噪声基准（red 噪声），在 Mini-ImageNet 和 Stanford Cars 上通过人工标注覆盖 10 个噪声水平。
MentorMix 在合成和网页噪声标签上均持续优于基线，并在 CIFAR 和 WebVision 基准上取得了最先进的结果。
在 WebVision 1.0 上，MentorMix 相较于先前方法，在无需额外干净标签的情况下，提升了大约 3% 的 top-1 准确率。
MentorMix 相对于原始训练和现有鲁棒方法有显著改进，在多种架构下的有噪声设置中也取得了较大提升。
研究证实了关于带有合成噪声的神经网络的既有发现，并提供了新的观察，挑战了关于有噪声标签学习的常见直觉。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。