QUICK REVIEW

[论文解读] Fine-grained Categorization and Dataset Bootstrapping using Deep Metric Learning with Humans in the Loop

Yin Cui, Feng Zhou|arXiv (Cornell University)|Dec 16, 2015

Domain Adaptation and Few-Shot Learning参考文献 32被引用 24

一句话总结

本文提出了一种迭代式、人机协同的细粒度视觉分类框架，采用基于三元组损失的深度度量学习方法，为每个类别学习具有区分性的低维嵌入。通过迭代地从Instagram中提取高置信度图像、经人工标注者验证，并结合真实正样本与人工标注的困难负样本，该方法在620类花卉数据集上实现了6.9%的准确率提升，通过更优地处理类内差异与数据稀缺问题，展现出当前最优的性能表现。

ABSTRACT

Existing fine-grained visual categorization methods often suffer from three challenges: lack of training data, large number of fine-grained categories, and high intraclass vs. low inter-class variance. In this work we propose a generic iterative framework for fine-grained categorization and dataset bootstrapping that handles these three challenges. Using deep metric learning with humans in the loop, we learn a low dimensional feature embedding with anchor points on manifolds for each category. These anchor points capture intra-class variances and remain discriminative between classes. In each round, images with high confidence scores from our model are sent to humans for labeling. By comparing with exemplar images, labelers mark each candidate image as either a "true positive" or a "false positive". True positives are added into our current dataset and false positives are regarded as "hard negatives" for our metric learning model. Then the model is retrained with an expanded dataset and hard negatives for the next round. To demonstrate the effectiveness of the proposed framework, we bootstrap a fine-grained flower dataset with 620 categories from Instagram images. The proposed deep metric learning scheme is evaluated on both our dataset and the CUB-200-2001 Birds dataset. Experimental evaluations show significant performance gain using dataset bootstrapping and demonstrate state-of-the-art results achieved by the proposed deep metric learning methods.

研究动机与目标

解决细粒度视觉分类（FGVC）中数据稀缺、类别数量庞大，以及类内差异大、类间差异小的问题。
开发一种可扩展的端到端深度度量学习系统，利用人类反馈提升模型的泛化能力与鲁棒性。
通过人工验证的数据与困难负样本，实现从网络来源（如Instagram）迭代式地自举数据集。
在统一的度量学习框架中有效利用新添加的正样本与人工标注的困难负样本，以提升模型性能。
在大规模620类花卉数据集与CUB-200-200鸟类数据集上验证该框架的有效性。

提出的方法

采用基于三元组的深度度量学习方法，为每个类别学习低维特征嵌入，使用多个锚点捕捉类内差异，同时保持类间可分性。
实施在线三元组采样策略，基于边缘损失选择困难负样本，正样本从可配置区域内的最近邻中选取。
集成人机协同反馈：将高置信度预测发送至标注者，由其验证真实正样本并识别错误正样本作为困难负样本。
在每次迭代中，使用经审核的正样本与人工标注和自动采样得到的困难负样本组合数据集，重新训练度量模型。
使用PCA可视化学习到的特征的二维嵌入，定性验证模型对类内变化的分组能力。
将三元组模型与基于Softmax的基线模型进行比较，后者将所有困难负样本合并为单一新类别，或将其视为多个新类别。

实验结果

研究问题

RQ1在数据稀缺条件下，结合人工反馈的深度度量学习能否有效提升细粒度视觉分类性能？
RQ2在迭代式数据集自举过程中，人工标注的困难负样本相较于新添加的正样本，对性能提升的贡献程度如何？
RQ3所提出的基于三元组的度量学习框架如何在保持类间可分性的同时处理高类内差异？
RQ4该框架能否有效扩展至包含数千个类别的大规模细粒度分类任务？
RQ5整合人工验证数据与困难负样本是否能显著优于标准Softmax训练方法，无论是否处理困难负样本？

主要发现

与基线相比，所提框架在flowers-620 + Ins数据集上实现了6.9%的绝对准确率提升，其中3.4%来自新增正样本，3.5%来自人工标注的困难负样本。
三元组模型（Triplet-A + HN）在flowers-620 + Ins数据集上达到73.7%的准确率，显著优于最佳Softmax基线（70.8% with HNM），证明度量学习在利用困难负样本方面更具优势。
在flowers-620与CUB-200-200数据集上，使用最近邻的60%作为正样本采样范围时性能最佳。
学习到的特征嵌入的2D PCA可视化结果表明，模型成功捕捉了类内变化（如同一花种内的颜色差异），并在特征空间中将它们分组在一起。
该框架成功将11,567张新的Instagram图像引入训练集，使总训练图像数增至27,004张，同时收集了240,338个经人工标注的困难负样本用于模型优化。
结果验证了困难负样本与正样本同等重要，且三元组损失在利用困难负样本方面比基于Softmax的方法更为有效。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。