QUICK REVIEW

[论文解读] TRAINING DEEP NEURAL NETWORKS ON NOISY LABELS WITH BOOTSTRAPPING

Scott Reed, Honglak Lee|arXiv (Cornell University)|Jan 1, 2015

Face recognition and analysis参考文献 34被引用 330

一句话总结

本文提出了一种自举方法，通过利用深度特征嵌入在相似输入之间强制预测一致性，提升了深度神经网络对噪声标签和不完整标签的鲁棒性。该方法在 MNIST 数据集的标签污染场景下、多伦多人脸数据库上的主观情绪识别任务以及 ILSVRC2014 上的可扩展目标检测任务中均取得了最先进性能，且无需对模型架构进行任何修改即可集成未标注数据。

ABSTRACT

Current state-of-the-art deep learning systems for visual object recognition and detection use purely supervised training with regularization such as dropout to avoid overfitting. The performance depends critically on the amount of labeled examples, and in current practice the labels are assumed to be unambiguous and accurate. However, this assumption often does not hold; e.g. in recognition, class labels may be missing; in detection, objects in the image may not be localized; and in general, the labeling may be subjective. In this work we propose a generic way to handle noisy and incomplete labeling by augmenting the prediction objective with a notion of consistency. We consider a prediction consistent if the same prediction is made given similar percepts, where the notion of similarity is between deep network features computed from the input data. In experiments we demonstrate that our approach yields substantial robustness to label noise on several datasets. On MNIST handwritten digits, we show that our model is robust to label corruption. On the Toronto Face Database, we show that our model handles well the case of subjective labels in emotion recognition, achieving state-of-theart results, and can also benefit from unlabeled face images with no modification to our method. On the ILSVRC2014 detection challenge data, we show that our approach extends to very deep networks, high resolution images and structured outputs, and results in improved scalable detection.

研究动机与目标

解决深度学习中因标签噪声或不完整而导致性能显著下降的关键局限性。
开发一种通用且与架构无关的方法，提升模型鲁棒性，且无需依赖干净标签。
实现未标注数据的有效利用，并处理真实视觉任务中主观标注的问题。
将鲁棒训练扩展至高分辨率图像和目标检测等结构化输出任务。

提出的方法

该方法引入了一种类一致性目标，促使具有相似深层特征的输入产生相同的预测结果。
通过网络提取的深层特征空间中的 L2 距离来衡量输入之间的相似性。
模型通过联合目标端到端训练：标准交叉熵损失加上一致性正则化项。
在特征距离在学习到的阈值范围内的输入对之间强制实现一致预测。
该方法可不经修改地应用于现有模型，从而实现对噪声标签的即插即用型鲁棒性增强。
未标注数据可通过一致性目标被动利用，无需显式数据增强或模型微调。

实验结果

研究问题

RQ1基于一致性的正则化是否能提升图像分类任务中深度学习模型对标签噪声的鲁棒性？
RQ2该方法在主观或模糊标注的数据集（如面部情绪识别）上表现如何？
RQ3该方法是否可扩展至高分辨率图像和目标检测等复杂结构化输出任务？
RQ4该方法是否能从未标注数据中获益，而无需对架构或训练流程进行修改？

主要发现

在 MNIST 数据集中标签污染率达 50% 的情况下，模型测试误差仅为 1.8%，显著优于基线模型。
在多伦多人脸数据库上，该方法在主观标注条件下实现了最先进的情绪识别性能。
在 ILSVRC2014 检测任务中，模型展现出更强的泛化能力，表明其可扩展至深度网络和高分辨率输入。
该方法有效利用了未标注的人脸图像，在不修改任何训练流程的前提下提升了性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。