QUICK REVIEW

[论文解读] Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation

Jiwoon Ahn, Suha Kwak|arXiv (Cornell University)|Mar 28, 2018

Advanced Neural Network Applications参考文献 40被引用 74

一句话总结

本文提出 AffinityNet，一种用图像级标签训练的 CNN，用以预测相邻坐标之间的像素级语义亲和性，从而通过基于随机游走的传播来优化 CAMs 并合成用于训练强大分割模型的分割标签，而无需额外标注。

ABSTRACT

The deficiency of segmentation labels is one of the main obstacles to semantic segmentation in the wild. To alleviate this issue, we present a novel framework that generates segmentation labels of images given their image-level class labels. In this weakly supervised setting, trained models have been known to segment local discriminative parts rather than the entire object area. Our solution is to propagate such local responses to nearby areas which belong to the same semantic entity. To this end, we propose a Deep Neural Network (DNN) called AffinityNet that predicts semantic affinity between a pair of adjacent image coordinates. The semantic propagation is then realized by random walk with the affinities predicted by AffinityNet. More importantly, the supervision employed to train AffinityNet is given by the initial discriminative part segmentation, which is incomplete as a segmentation annotation but sufficient for learning semantic affinities within small image areas. Thus the entire framework relies only on image-level class labels and does not require any extra data or annotations. On the PASCAL VOC 2012 dataset, a DNN learned with segmentation labels generated by our method outperforms previous models trained with the same level of supervision, and is even as competitive as those relying on stronger supervision.

研究动机与目标

通过利用图像级标签进行分割来解决缺乏像素级标注的问题。
学习像素级语义亲和性，将局部辨别性反应传播到整个对象区域。
开发一个端到端框架，合成可用于训练分割模型的分割标签。
在 PASCAL VOC 2012 数据集上，在图像级监督下展示最先进的性能。

提出的方法

从图像级训练的分类器计算类别激活映射（CAM），以作为对象区域的初始种子。
训练 AffinityNet，使其在相邻坐标之间预测语义亲和性 Wij，使用与类别无关的目标和 CAM 派生的监督。
通过从 CAMs 和 dCRF 精 refinement 中选择有信心的对象/背景区域，生成可靠的成对亲和标签。
利用学习到的亲和矩阵通过随机游走传播 CAM，以修正 CAM 并获得改进的分割候选。
用 dCRF 对修正后的 CAM 进行上采样和细化，以合成用于训练语义分割网络的分割标签。
在合成标签上训练最终的分割模型（如 Ours-ResNet38）。

实验结果

研究问题

RQ1能否使用图像级标签学习像素级语义亲和性，以帮助恢复完整的对象形状？
RQ2通过随机游走将激活图扩散到精确对象边界的学习型亲和模型有多有效？
RQ3合成的分割标签在弱监督下是否能实现具有竞争力的语义分割？
RQ4在 PASCAL VOC 2012 上，弱监督结果能接近完全监督基线的程度有多大？

主要发现

通过图像级监督训练的 AffinityNet 能产生有意义的像素级亲和性。
利用 AffinityNet 的随机游走显著改善基于 CAM 的分割掩码。
合成标签使得训练的分割模型超越先前的图像级监督方法，并且与更强的监督方式相竞争。
Ours-ResNet38 相较于以前的弱监督方法在 PASCAL VOC 2012 上取得了优异的表现。
该方法接近完全监督基线的性能，恢复了其性能的相当大一部分。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。