QUICK REVIEW

[论文解读] Expansion and Shrinkage of Localization for Weakly-Supervised Semantic Segmentation

Jinlong Li, Zequn Jie|arXiv (Cornell University)|Sep 16, 2022

Advanced Neural Network Applications被引用 25

一句话总结

本论文提出 Expansion and Shrinkage (ESOL) 框架，使用带偏移学习的可变形卷积，先扩展基于CAM的定位以覆盖整个对象，再收缩以提高清晰度，达到 VOC2012 与 COCO2014 的弱监督语义分割的最新效果。

ABSTRACT

Generating precise class-aware pseudo ground-truths, a.k.a, class activation maps (CAMs), is essential for weakly-supervised semantic segmentation. The original CAM method usually produces incomplete and inaccurate localization maps. To tackle with this issue, this paper proposes an Expansion and Shrinkage scheme based on the offset learning in the deformable convolution, to sequentially improve the recall and precision of the located object in the two respective stages. In the Expansion stage, an offset learning branch in a deformable convolution layer, referred as "expansion sampler" seeks for sampling increasingly less discriminative object regions, driven by an inverse supervision signal that maximizes image-level classification loss. The located more complete object in the Expansion stage is then gradually narrowed down to the final object region during the Shrinkage stage. In the Shrinkage stage, the offset learning branch of another deformable convolution layer, referred as "shrinkage sampler", is introduced to exclude the false positive background regions attended in the Expansion stage to improve the precision of the localization maps. We conduct various experiments on PASCAL VOC 2012 and MS COCO 2014 to well demonstrate the superiority of our method over other state-of-the-art methods for weakly-supervised semantic segmentation. Code will be made publicly available here https://github.com/TyroneLi/ESOL_WSSS.

研究动机与目标

解决使用图像级标签的 CAM 基于弱监督语义分割中的部分定位问题。
开发一个两阶段训练流程（先扩展再收缩）以提升对象定位的召回率然后是精确度。
利用带偏移学习的可变形卷积来对样本进行更少区分性的区域采样并排除假阳性。
在 PASCAL VOC 2012 和 MS COCO 2014 数据集上展示最先进的定位与分割性能。

提出的方法

在特征提取器之后嵌入扩展采样器可变形卷积，在逆图像级监督下学习偏移，使样本的对象区域逐渐包含更不具辨识性的区域。
在 Expansion 阶段使用一个逆优化信号，使图像级分类损失最大化，同时冻结骨干网络的特征。
在 Expansion 之后应用特征裁剪策略，以在 Shrinkage 之前缓解激活偏置。
在 Shrinkage 阶段引入收缩采样器可变形卷积，以排除假阳性背景区域，使用分类损失和区域损失进行训练。
通过使用 IRN/其他细化方法对 CAM-seeds 进行细化来生成伪地面真值，并使用 DeepLab-v2-ResNet101 训练最终分割。

实验结果

研究问题

RQ1Expansion 阶段是否能够在最具辨识性的区域之外恢复目标对象的完整范围？
RQ2Shrinkage 阶段是否能够修剪假阳性和背景区域，以提升定位精度？
RQ3与最先进方法相比，ESOL 方法对 VOC2012 与 COCO2014 的弱监督语义分割性能有何影响？

主要发现

Expansion 将初始 CAM 种子在 VOC2012 上相对于原始 CAM 基线提升约 5.2% mIoU。
经过细化，最终伪地面真值达到更高的 mIoU 分数（例如在 VOC2012 上 PSA 为 66.4%，IRN 为 68.7%）。
在监管下使用显式显著性线索时，VOC2012 验证/测试分割达到 71.1%/70.4% mIoU。
MS COCO 2014 验证集上，ESOL 达到 42.6% mIoU，比 IRN 高出 1.2 个点。
在 VOC2012 上，ESOL 的初始种子（Seed mIoU）在 refinement 之前提升至 53.6%（VOC），超过若干前方法。
总体而言，ESOL 在 VOC2012 和 COCO2014 上对比同期的 WSSS 方法表现具有竞争力或更优。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。