QUICK REVIEW

[论文解读] Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

Yude Wang, Jie Zhang|arXiv (Cornell University)|Apr 9, 2020

Advanced Neural Network Applications参考文献 38被引用 57

一句话总结

SEAM 引入自监督等变正则化和像素相关模块，在仅有图像级监督的情况下细化类别激活图，从而在 PASCAL VOC 2012 上实现弱监督语义分割的最新状态。

ABSTRACT

Image-level weakly supervised semantic segmentation is a challenging problem that has been deeply studied in recent years. Most of advanced solutions exploit class activation map (CAM). However, CAMs can hardly serve as the object mask due to the gap between full and weak supervisions. In this paper, we propose a self-supervised equivariant attention mechanism (SEAM) to discover additional supervision and narrow the gap. Our method is based on the observation that equivariance is an implicit constraint in fully supervised semantic segmentation, whose pixel-level labels take the same spatial transformation as the input images during data augmentation. However, this constraint is lost on the CAMs trained by image-level supervision. Therefore, we propose consistency regularization on predicted CAMs from various transformed images to provide self-supervision for network learning. Moreover, we propose a pixel correlation module (PCM), which exploits context appearance information and refines the prediction of current pixel by its similar neighbors, leading to further improvement on CAMs consistency. Extensive experiments on PASCAL VOC 2012 dataset demonstrate our method outperforms state-of-the-art methods using the same level of supervision. The code is released online.

研究动机与目标

动机：在语义分割中充分监督和弱监督之间的差距。
提出自监督等变正则化，以在变换输入之间强制 CAM 的一致性。
引入像素相关模块，通过具有上下文信息的亲和力来细化 CAM。
提出带等变横向正则化损失的孪生网络架构，以训练 CAM。
展示在 PASCAL VOC 2012 上仅使用图像级标签即可达到的最先进性能。

提出的方法

在共享权重的孪生网络中嵌入等变正则化，以在仿射变换下加强 CAM 的一致性（ER loss）。
整合像素相关模块 (PCM)，通过自注意力样机制利用学习得到的像素亲和力来细化 CAM。
通过分支之间的等变横向正则化（ECR）损失，将 CAM 精细化与等变监督融合。
在推理阶段通过前景-背景评分和背景阈值来处理前背景平衡。
使用多标签软边界损失，结合 ECR 损失的 OHEM 进行训练；可选的 CRF 后处理。

实验结果

研究问题

RQ1在仿射变换输入下 CAM 的一致性能否为 WSSS 提供自监督而无需额外标注？
RQ2像素相关模块是否在弱监督下提升 CAM 的稳定性与与目标形状的一致性？
RQ3等变正则化与 PCM 的综合效果对 CAM 质量和分割性能有何影响？
RQ4所提出的 SEAM 框架是否可在仅使用图像级标签的情况下在 PASCAL VOC 2012 上达到最先进的结果？
RQ5不同的仿射变换如何影响等变正则化的有效性？

主要发现

SEAM 提高了 CAM 质量，在仅有图像级监督的情况下在 PASCAL VOC 2012 上取得比基线更高的 mIoU。
完整 SEAM 流水线，包含 ER、PCM、OHEM，以及可选的 CRF，在 VOC train/augmentation 设置下达到 56.83% 的 mIoU。
使用修订后的 CAM 搭配基于 AffinityNet 的伪标签，在 VOC train 集上达到 63.61% mIoU，从而实现强监督分割结果。
在图像级监督下，SEAM 在 VOC 2012 测试集上达到最先进的性能，报告表中验证集 mIoU 约为 64.5，测试集 mIoU 为 65.7。
CAMs from SEAM exhibit fewer under-activations and over-activations and are more consistent under multi-scale testing.
PCM learns boundary-sensitive affinities, producing more complete object activation coverage and more faithful shapes.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。