QUICK REVIEW

[论文解读] A Survey of Label-noise Representation Learning: Past, Present and Future

Bo Han, Quanming Yao|arXiv (Cornell University)|Nov 9, 2020

Machine Learning and Data Classification参考文献 149被引用 101

一句话总结

一份全面的综述，定义 Label-noise Representation Learning（LNRL），综述理论、分类体系与在带噪声标签条件下鲁棒训练深度模型的方法，并勾勒未来方向。

ABSTRACT

Classical machine learning implicitly assumes that labels of the training data are sampled from a clean distribution, which can be too restrictive for real-world scenarios. However, statistical-learning-based methods may not train deep learning models robustly with these noisy labels. Therefore, it is urgent to design Label-Noise Representation Learning (LNRL) methods for robustly training deep models with noisy labels. To fully understand LNRL, we conduct a survey study. We first clarify a formal definition for LNRL from the perspective of machine learning. Then, via the lens of learning theory and empirical study, we figure out why noisy labels affect deep models' performance. Based on the theoretical guidance, we categorize different LNRL methods into three directions. Under this unified taxonomy, we provide a thorough discussion of the pros and cons of different categories. More importantly, we summarize the essential components of robust LNRL, which can spark new directions. Lastly, we propose possible research directions within LNRL, such as new datasets, instance-dependent LNRL, and adversarial LNRL. We also envision potential directions beyond LNRL, such as learning with feature-noise, preference-noise, domain-noise, similarity-noise, graph-noise and demonstration-noise.

研究动机与目标

定义 Label-noise Representation Learning（LNRL）及其范围。
从学习理论与经验角度解释为何噪声标签会影响深度模型。
基于数据、目标与优化，提出统一的 LNRL 方法分类法。
回顾现有使用噪声传输矩阵、损失修正与优化技巧的方法。
提出未来研究方向和超越标签噪声的 LNRL 数据集。

提出的方法

用一个通用的问题设定，将训练标签设为被污染来形式化 LNRL。
从数据、目标与优化角度综述理论基础。
在数据（噪声传输矩阵）、目标（耐噪声损失）与优化（基于记忆化的策略）方面提出统一的方法分类法。
讨论锚点、传输矩阵和损失修正作为核心工具。
强调记忆化效应和早停作为优化指引。
总结未来方向与可能超越 LNRL 的研究方向。

实验结果

研究问题

RQ1LNRL 的形式定义及其范围是什么？
RQ2从学习理论与经验角度，噪声标签为何会影响深度模型？
RQ3如何对 LNRL 方法进行分类，各类别的优缺点是什么？
RQ4鲁棒 LNRL 的关键组成部分与未来方向，包括数据集和对抗设定是什么？

主要发现

LNRL 将数据、目标与优化整合起来，在带噪声标签的情况下实现鲁棒学习。
估计并利用标签噪声传输矩阵是许多方法的核心。
耐噪声损失与分类一致估计量有助于连接噪声分布与清洁分布。
利用记忆化效应与早停的优化策略可以提高鲁棒性。
统一的分类法有助于澄清不同 LNRL 策略的优劣与权衡。
未来方向包括实例依赖噪声、对抗性 LNRL，以及在多种噪声模态下的学习。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。