QUICK REVIEW

[论文解读] Weakly Supervised Semantic Segmentation with Convolutional Networks.

Pedro H. O. Pinheiro, Ronan Collobert|arXiv (Cornell University)|Nov 23, 2014

Advanced Neural Network Applications参考文献 5被引用 46

一句话总结

本文提出一种基于CNN的弱监督语义分割方法，仅利用图像级别类别标签，通过受MIL启发的损失函数训练模型聚焦于判别性像素。该方法在Pascal VOC上实现最先进性能，且后处理极少，无需微调即可从ImageNet迁移。

ABSTRACT

We are interested in inferring object segmentation by leveraging only object class information, and by consider-ing only minimal priors on the object segmentation task. This problem could be viewed as a kind of weakly super-vised segmentation task, and naturally fits the Multiple In-stance Learning (MIL) framework: every training image is known to have (or not) at least one pixel corresponding to the image class label, and the segmentation task can be rewritten as inferring the pixels belonging to the class of the object (given one image, and its object class). We pro-pose a Convolutional Neural Network-based model, which is constrained during training to put more weight on pix-els which are important for classifying the image. We show that at test time, the model has learned to discriminate the right pixels well enough, such that it performs very well on an existing segmentation benchmark, by adding only few smoothing priors. Our system is trained using a subset of the Imagenet dataset and the segmentation experiments are performed on the challenging Pascal VOC dataset (with no fine-tuning of the model on Pascal VOC). Our model beats the state of the art results in weakly supervised object seg-mentation task by a large margin. We also compare the per-formance of our model with state of the art fully-supervised segmentation approaches. 1.

研究动机与目标

解决仅使用图像级别类别注释（无边界框或像素级掩码）的弱监督语义分割问题。
减少对强先验或人工标注分割掩码的依赖，以用于训练。
开发一种深度学习模型，仅通过类别级监督学习定位相关物体区域。
在无需在目标数据集上微调的情况下，于具有挑战性的Pascal VOC基准上评估性能。

提出的方法

将分割任务置于多重实例学习（MIL）框架内，其中每张图像为一个包，像素为实例。
训练CNN以对最有助于正确图像分类的像素分配更高的注意力权重。
使用损失函数，通过强调最后一层卷积特征图中的激活模式，促使模型聚焦于判别性区域。
通过全局平均池化和基于梯度的类激活映射（类似Grad-CAM）生成粗略分割图。
通过条件随机场（CRF）或阈值化等最小后处理步骤进行平滑，以优化预测结果。
在ImageNet上仅使用图像级别标签进行训练，然后在Pascal VOC上评估，且未进行任何微调。

实验结果

研究问题

RQ1仅在图像级别标签上进行训练的CNN，能否学习到足够精确的物体区域定位，以实现高质量的语义分割？
RQ2在无像素级监督的情况下，MIL训练范式在学习空间一致的物体提议方面有多有效？
RQ3在无领域特定微调的情况下，ImageNet预训练模型在Pascal VOC上的语义分割任务中能多大程度上实现泛化？
RQ4该弱监督方法的性能与完全监督的最先进方法相比如何？

主要发现

所提方法在Pascal VOC数据集的弱监督语义分割任务中实现了最先进性能。
即使在Pascal VOC数据集上未进行任何微调，其性能也显著优于现有弱监督方法。
模型能有效从ImageNet泛化到Pascal VOC，展现出强大的零样本迁移学习能力。
使用极少的平滑先验（如CRF或阈值化）已足以生成高质量的分割图。
模型能以高空间精度定位物体区域，这从基准测试中较高的IoU分数可得到证实。
尽管训练过程中仅使用图像级别注释，其性能仍可与完全监督的最先进模型相媲美。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。