QUICK REVIEW

[论文解读] Multi-Evidence Filtering and Fusion for Multi-Label Classification, Object Detection and Semantic Segmentation Based on Weakly Supervised Learning

Weifeng Ge, Sibei Yang|arXiv (Cornell University)|Feb 26, 2018

Advanced Neural Network Applications被引用 27

一句话总结

本文提出了一种弱监督课程学习流水线，通过融合并过滤来自多个弱监督源的多证据目标实例和像素级预测，以提升多标签分类、目标检测和语义分割的性能。通过结合度量学习、基于密度的聚类以及注意力图融合，该方法在 MS-COCO、PASCAL VOC 2007 和 PASCAL VOC 2012 上实现了最先进性能，其中在 VOC 2012 上达到 69.4% 的 CorLoc，在 MS-COCO 上达到 72.8% 的 F1-C。

ABSTRACT

Supervised object detection and semantic segmentation require object or even pixel level annotations. When there exist image level labels only, it is challenging for weakly supervised algorithms to achieve accurate predictions. The accuracy achieved by top weakly supervised algorithms is still significantly lower than their fully supervised counterparts. In this paper, we propose a novel weakly supervised curriculum learning pipeline for multi-label object recognition, detection and semantic segmentation. In this pipeline, we first obtain intermediate object localization and pixel labeling results for the training images, and then use such results to train task-specific deep networks in a fully supervised manner. The entire process consists of four stages, including object localization in the training images, filtering and fusing object instances, pixel labeling for the training images, and task-specific network training. To obtain clean object instances in the training images, we propose a novel algorithm for filtering, fusing and classifying object instances collected from multiple solution mechanisms. In this algorithm, we incorporate both metric learning and density-based clustering to filter detected object instances. Experiments show that our weakly supervised pipeline achieves state-of-the-art results in multi-label image classification as well as weakly supervised object detection and very competitive results in weakly supervised semantic segmentation on MS-COCO, PASCAL VOC 2007 and PASCAL VOC 2012.

研究动机与目标

解决弱监督与全监督模型在目标检测和语义分割任务中的性能差距。
克服因图像级监督不完整而导致的精确率与召回率限制。
利用多样化弱监督算法的互补输出，提升模型的鲁棒性与准确性。
开发一种统一的课程学习流水线，整合图像级、目标级和像素级监督，实现端到端训练。
仅使用图像级标签，实现多标签分类、弱监督目标检测的最先进结果，以及具有竞争力的语义分割性能。

提出的方法

使用自底向上和自顶向下的弱监督检测算法收集目标定位结果。
应用度量学习与基于密度的聚类方法，对检测到的目标实例进行过滤与融合，以减少噪声与异常值。
在过滤后的实例上训练单标签分类器，为目标提议分配最终类别标签。
融合图像级注意力图、目标级注意力图与检测热力图，生成清晰的、每类独立的像素级概率图。
在融合后的像素图上训练全卷积网络，为每张训练图像生成最终的像素级标签图。
利用生成的目标实例与像素图作为监督信号，通过多任务学习训练检测、分割和多标签分类任务的专用网络。

实验结果

研究问题

RQ1与单一方法相比，来自多样化弱监督算法的多证据融合是否能提升检测与分割性能？
RQ2度量学习与基于密度的聚类相结合，在从弱监督中过滤噪声目标实例方面有多高效？
RQ3融合图像级、目标级与像素级注意力图在多大程度上提升了像素级标注的准确性？
RQ4一种利用多级中间监督信号的课程学习流水线，能否实现接近全监督模型的性能？
RQ5各组件（如实例过滤、像素图融合）对弱监督学习最终性能的贡献程度如何？

主要发现

所提流水线在 PASCAL VOC 2012 验证集上达到 69.4% 的 CorLoc，较之前最佳结果提升 3.8%。
在 MS-COCO 上，该方法实现 72.8% 的 F1-C，优于基线 ResNet-101 及最先进方法在每类 F1 测量值上的表现。
消融实验表明，若移除目标实例处理步骤，mAP 下降 3.1%，凸显其关键作用。
若移除聚类与异常值检测步骤，mAP 下降 2.7%，证实过滤噪声实例的重要性。
无论置信度如何，对所有像素分配标签会使 mAP 降至 47.5%，表明不确定性感知标注具有显著优势。
双分支多任务网络（分类 + 分割）在 MS-COCO 上所有最先进方法中，实现了最高的 F1-C、F1-O 与 F1-C/top3 分数。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。