QUICK REVIEW

[论文解读] Selective-Supervised Contrastive Learning with Noisy Labels

Shikun Li, Xiaobo Xia|arXiv (Cornell University)|Mar 8, 2022

Music and Audio Processing被引用 23

一句话总结

Sel-CL 从带噪声标签的数据中选择有信心的样本和有信心的对，以推动有监督对比学习，产生鲁棒的表征，并在带噪声的 CIFAR 和 WebVision 数据集上实现出色的性能。

ABSTRACT

Deep networks have strong capacities of embedding data into latent representations and finishing following tasks. However, the capacities largely come from high-quality annotated labels, which are expensive to collect. Noisy labels are more affordable, but result in corrupted representations, leading to poor generalization performance. To learn robust representations and handle noisy labels, we propose selective-supervised contrastive learning (Sel-CL) in this paper. Specifically, Sel-CL extend supervised contrastive learning (Sup-CL), which is powerful in representation learning, but is degraded when there are noisy labels. Sel-CL tackles the direct cause of the problem of Sup-CL. That is, as Sup-CL works in a extit{pair-wise} manner, noisy pairs built by noisy labels mislead representation learning. To alleviate the issue, we select confident pairs out of noisy ones for Sup-CL without knowing noise rates. In the selection process, by measuring the agreement between learned representations and given labels, we first identify confident examples that are exploited to build confident pairs. Then, the representation similarity distribution in the built confident pairs is exploited to identify more confident pairs out of noisy pairs. All obtained confident pairs are finally used for Sup-CL to enhance representations. Experiments on multiple noisy datasets demonstrate the robustness of the learned representations by our method, following the state-of-the-art performance. Source codes are available at https://github.com/ShikunLi/Sel-CL

研究动机与目标

在标签嘈杂且获取成本高昂时，激发鲁棒的表征学习。
利用有监督对比学习的成对性质来减轻标签噪声的影响。
开发一种动态的、无阈值的方法，用于识别学习的有信心的样本和有信心的对。
在人工合成和真实世界的带噪数据集上验证该方法的有效性，并分析各组件的贡献。

提出的方法

通过仅有选择地使用有信心的对来优化对比目标，扩展有监督对比学习（Sup-CL）。
通过基于邻居的伪标签方法，测量学习表征与给定的带噪标签之间的一致性来识别有信心的样本。
从有信心的样本以及表示相似的样本对中创建有信心的对，使用一个动态的、数据驱动的阈值。
通过混合目标进行优化，结合选择性有监督对比损失、对有信心样本的分类损失，以及基于相似性的损失，以利用成对信息。
可选地执行一个微调阶段（Sel-CL+），使用带有鲁棒微调策略的预训练编码器。

实验结果

研究问题

RQ1选择性配对是否能降低带噪标签对表征学习的负面影响？
RQ2在不知道噪声率的情况下，如何识别有信心的样本和有信心的对？
RQ3相较于标准的 Sup-CL 或其他带噪标签的方法，逐时期选择有信心对的动态方法是否能提升鲁棒性和泛化能力？
RQ4所提出的组成部分（Mixup、MOCO 风格的增强、对有信心样本的分类损失，以及基于相似性的损失）是否对鲁棒性和准确性具有叠加贡献？

主要发现

在对称和非对称噪声设置下，Sel-CL 始终优于强基线，适用于 CIFAR-10/100。
与同期方法相比，Sel-CL+ 在真实世界的 WebVision-50 上取得具有竞争力或最佳的结果。
选择性配对策略降低标签噪声的影响，并通过加权 KNN 评估显示表征质量的提升。
出现一个正反馈循环：更好的有信心对带来更好的表征，进而使得发现更多有信心对成为可能。
消融研究证实，Mixup、选择机制以及基于相似性的组件都对性能提升有贡献。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。