QUICK REVIEW

[论文解读] Learning to Segment Object Candidates

Pedro O. Pinheiro, Ronan Collobert|arXiv (Cornell University)|Jun 20, 2015

Advanced Neural Network Applications参考文献 33被引用 522

一句话总结

本文提出 DeepMask，一种卷积神经网络，可直接从原始图像像素生成与类别无关的分割掩码和目标可能性分数，无需依赖边缘或超像素。在 MS COCO 上联合训练，并在 PASCAL VOC 和 COCO 上评估，其在目标提议任务中达到最先进性能，显著优于先前方法，在召回率方面表现更优，且提议数量远少于以往方法——例如，使用 100 个 DeepMask 提议时 mAP 达 68.2%，而使用 2000 个 SelectiveSearch 提议时仅为 66.9%。

ABSTRACT

Recent object detection systems rely on two critical steps: (1) a set of object proposals is predicted as efficiently as possible, and (2) this set of candidate proposals is then passed to an object classifier. Such approaches have been shown they can be fast, while achieving the state of the art in detection performance. In this paper, we propose a new way to generate object proposals, introducing an approach based on a discriminative convolutional network. Our model is trained jointly with two objectives: given an image patch, the first part of the system outputs a class-agnostic segmentation mask, while the second part of the system outputs the likelihood of the patch being centered on a full object. At test time, the model is efficiently applied on the whole test image and generates a set of segmentation masks, each of them being assigned with a corresponding object likelihood score. We show that our model yields significant improvements over state-of-the-art object proposal algorithms. In particular, compared to previous approaches, our model obtains substantially higher object recall using fewer proposals. We also show that our model is able to generalize to unseen categories it has not seen during training. Unlike all previous approaches for generating object masks, we do not rely on edges, superpixels, or any other form of low-level segmentation.

研究动机与目标

开发一种生成目标提议的方法，使其在召回率和效率方面超越现有方法。
在提议生成过程中消除对边缘、超像素或手工设计特征等低级线索的依赖。
训练一个统一的卷积网络，联合预测分割掩码和目标可能性分数。
评估模型对训练期间未见的目标类别的泛化能力。
展示在集成 Fast R-CNN 时，使用更少提议即可实现更好的检测性能。

提出的方法

共享的卷积主干网络处理图像块，并为分割和目标性预测输出特征。
分割分支使用低秩全连接层，从特征中预测 56×56 的与类别无关的掩码。
目标性分支使用独立训练的判别性头，预测某一块是否包含一个完整目标。
模型通过联合损失函数进行端到端训练，该损失函数结合了掩码预测和分数预测目标。
推理时，网络在多个尺度上密集地应用于整幅图像，以生成排序后的分割提议。
通过跨尺度的批量处理和 GPU 加速，提升推理效率。

实验结果

研究问题

RQ1深度卷积网络能否在不依赖低级分割线索的情况下，直接从原始图像像素学习生成高质量的目标提议？
RQ2与分别优化相比，联合训练分割和目标性预测是否能提升提议质量？
RQ3模型能否在未见的目标类别上实现良好泛化，特别是分割分支？
RQ4使用更少但质量更高的提议是否能带来更好的下游检测性能？
RQ5模型在不同目标尺寸和 IoU 阈值下的表现如何？

主要发现

在 PASCAL VOC 2007 上，仅使用 500 个提议时，DeepMask 的 mAP 达 69.9%，优于使用 2000 个 SelectiveSearch 提议的 Fast R-CNN（mAP 为 66.9%）。
使用 100 个提议时，DeepMask 的 mAP 达 68.2%，超过使用 2000 个 SelectiveSearch 提议时的 66.9%。
在 PASCAL VOC 2007 上，DeepMask 在 1000 个提议下的平均召回率（AR@1000）为 69.0%，优于 MCG（63.4%）和 SelectiveSearch（61.8%）。
模型泛化能力良好：在仅用 20 个 PASCAL 类别训练的 DeepMask20∗，在 80 个 COCO 类别上的表现与完整版 DeepMask 模型相当。
在 IoU 阈值低于 0.7 的情况下，DeepMask 的定位召回率高于所有基线模型，仅在极高 IoU（≥0.9）时略逊于基线，原因在于下采样后的掩码输出。
在 COCO 上，每张图像的推理时间为 1.6 秒（PASCAL 上为 1.2 秒），与 Geodesic（约 1 秒）等快速方法相当，远快于 MCG（约 30 秒）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。