QUICK REVIEW

[论文解读] Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations

Jiwoon Ahn, Sunghyun Cho|arXiv (Cornell University)|Apr 10, 2019

Advanced Neural Network Applications参考文献 51被引用 68

一句话总结

本文提出 IRNet，通过学习类别无关的实例映射和像素级亲和力，从图像级监督生成伪实例分割标签，从而在无需额外候选区域或标注的情况下训练完全监督的模型。

ABSTRACT

This paper presents a novel approach for learning instance segmentation with image-level class labels as supervision. Our approach generates pseudo instance segmentation labels of training images, which are used to train a fully supervised model. For generating the pseudo labels, we first identify confident seed areas of object classes from attention maps of an image classification model, and propagate them to discover the entire instance areas with accurate boundaries. To this end, we propose IRNet, which estimates rough areas of individual instances and detects boundaries between different object classes. It thus enables to assign instance labels to the seeds and to propagate them within the boundaries so that the entire areas of instances can be estimated accurately. Furthermore, IRNet is trained with inter-pixel relations on the attention maps, thus no extra supervision is required. Our method with IRNet achieves an outstanding performance on the PASCAL VOC 2012 dataset, surpassing not only previous state-of-the-art trained with the same level of supervision, but also some of previous models relying on stronger supervision.

研究动机与目标

仅使用图像级类别标签来激发实例分割的学习。
开发一种方法，在无外部候选区域或额外监督的情况下生成伪实例分割标签。
利用类别注意力图来推导像素间关系，以实现准确的实例界定。
使得使用伪标签对标准分割模型（如 Mask R-CNN）进行训练成为可能。

提出的方法

使用图像分类器的类别注意力图（CAMs）来种子化实例区域。
引入带有两条分支的 IRNet： (i) 预测每个像素的质心指向向量的位移场，(ii) 生成边界图的类别边界检测器。
用从 CAMs 推导的像素间关系来训练 IRNet：同一实例对的像素位移，以及相邻像素对的类别等价标签。
迭代地细化位移场以收敛到质心并生成一个类别无关的实例映射。
从边界图计算像素级语义亲和性，并通过基于随机游走的传播来传播 CAM 分数，形成实例感知的 CAM。
通过将实例映射与细化的实例级 CAM 和亲和力结合，合成伪实例掩码，然后在这些伪标签上训练标准检测器/分割器。

实验结果

研究问题

RQ1是否可以利用图像级标签在没有外部候选区域的情况下恢复每个实例的分割？
RQ2如何学习从 CAM 推导的像素间关系，以生成可靠的伪实例掩码？
RQ3结合类别边界和位移场是否能提高伪标签的质量和下游分割精度？
RQ4所提出的方法与 PASCAL VOC 2012 上的最新弱监督方法相比如何？
RQ5用 IRNet 训练的伪标签能否在弱监督下为 Mask R-CNN 和 DeepLab 提供具有竞争力的结果？

主要发现

结合 CAM 和像素间关系的 IRNet 比先前的基于图像级监督的方法（例如仅 CAM）产生更高质量的伪实例标签。
结合类别边界显著提高伪标签质量，在他们的消融研究中让 APr50 提升超过 25%。
增加位移场有助于区分同一类别的多个实例，并进一步提升性能。
伪标签使得用它们训练的 Mask R-CNN 在 PASCAL VOC 2012 上超过多种使用更强监督的方法。
IRNet 产生的伪语义分割标签在 PASCAL VOC 2012 训练/验证集的 mIoU 上超过 AffinityNet，表明像素级亲和力更好。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。