QUICK REVIEW

[论文解读] Learning From Noisy Large-Scale Datasets With Minimal Supervision

Andreas Veit, Neil Alldrin|arXiv (Cornell University)|Jan 6, 2017

Domain Adaptation and Few-Shot Learning参考文献 27被引用 54

一句话总结

该论文提出一种半监督多任务模型，利用一个小规模的经验证子集来清理大规模图像注释中的噪声，并联合训练一个鲁棒的多标签分类器，效果优于在 Open Images 上直接微调。

ABSTRACT

We present an approach to effectively use millions of images with noisy annotations in conjunction with a small subset of cleanly-annotated images to learn powerful image representations. One common approach to combine clean and noisy data is to first pre-train a network using the large noisy dataset and then fine-tune with the clean dataset. We show this approach does not fully leverage the information contained in the clean set. Thus, we demonstrate how to use the clean annotations to reduce the noise in the large dataset before fine-tuning the network using both the clean set and the full set with reduced noise. The approach comprises a multi-task network that jointly learns to clean noisy annotations and to accurately classify images. We evaluate our approach on the recently released Open Images dataset, containing ~9 million images, multiple annotations per image and over 6000 unique classes. For the small clean set of annotations we use a quarter of the validation set with ~40k images. Our results demonstrate that the proposed approach clearly outperforms direct fine-tuning across all major categories of classes in the Open Image dataset. Further, our approach is particularly effective for a large number of classes with wide range of noise in annotations (20-80% false positive annotations).

研究动机与目标

在大多数注释嘈杂或弱监督的情况下，激发学习鲁棒的多标签分类器的动机。
提出一个标签清理网络，将嘈杂标签映射为清洁标签，条件为图像特征。
联合优化标签清理和图像分类，以利用嘈杂和清洁注释。
证明在一个大规模、嘈杂的数据集上，性能优于传统的对清洁标签的微调。

提出的方法

引入一个多任务结构，其标签清理网络 g 和图像分类器 h 共享视觉特征。
将 g 建模为从嘈杂标签 y 到清洁标签 c_hat 的残差映射，条件为图像特征 f(I)。
使用一个带有经验证标签 v 的小集合 V 训练 g 以预测清洁标签，损失函数为 L_clean = sum_i |c_hat_i − v_i|。
训练 h 以使用来自 T 的 c_hat 或来自 V 的 v 作为目标来预测图像标签，损失函数通过带有交叉熵的 L_classify 实现。
通过将 L_clean 的权重设为 0.1，L_classify 的权重设为 1.0；批次构成比例为 9:1（T:V）。
采用 Inception-v3 作为骨干网络，最终采用 6012 路的 Sigmoid 层进行多标签分类。

实验结果

研究问题

RQ1一个从小规模经验证集学习的清洁标签映射是否能够在大规模嘈杂数据集上降低噪声，从而提升多标签分类？
RQ2联合训练标签清理和图像分类是否优于在清洁标签上直接微调或使用混合标签的微调？
RQ3所提方法在大规模数据集中的标签频率与注释质量分布下的表现如何？
RQ4对清理网络进行预训练与联合训练在性能和实用性上有何影响？

主要发现

模型	AP_all	MAP
Baseline	83.82	61.82
Misra et al. (visual classifier)	83.55	61.85
Misra et al. (relevance classifier)	83.79	61.89
Fine-Tuning with mixed labels	84.80	61.90
Fine-Tuning with clean labels	85.88	61.53
Our Approach with pre-training	87.68	62.36
Our Approach trained jointly	87.67	62.38

所提出的方法在 Open Images 的主要类别和整体指标上均优于直接微调。
在联合训练下，平均精度（MAP）提升至 62.38，相比基线 61.82（预训练可提升至 62.36）。
仅使用清洁标签进行微调可能产生过拟合并降低 MAP，而我们的方法在常见和罕见类别上均保持收益。
对于具有 20%–80% 错误标注的类别，方法带来更大的增益，体现了对嘈杂标签的鲁棒性。
在高层类别（车辆、产品、艺术、人物、运动、食品、动物、植物）中的性能提升是一致的。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。