QUICK REVIEW

[论文解读] Segment Anything Model (SAM) Enhanced Pseudo Labels for Weakly Supervised Semantic Segmentation

Tianle Chen, Zheda Mai|arXiv (Cornell University)|May 9, 2023

Advanced Neural Network Applications被引用 36

一句话总结

本文提出 SEPL，一种简单方法，使用 CAM 派生的伪标签作为种子来选择并合并 SAM 掩码，产生具备类别感知与对象感知的伪标签，并在 VOC 2012 和 MS COCO 2014 上持续提升 WSSS 性能。

ABSTRACT

Weakly supervised semantic segmentation (WSSS) aims to bypass the need for laborious pixel-level annotation by using only image-level annotation. Most existing methods rely on Class Activation Maps (CAM) to derive pixel-level pseudo-labels and use them to train a fully supervised semantic segmentation model. Although these pseudo-labels are class-aware, indicating the coarse regions for particular classes, they are not object-aware and fail to delineate accurate object boundaries. To address this, we introduce a simple yet effective method harnessing the Segment Anything Model (SAM), a class-agnostic foundation model capable of producing fine-grained instance masks of objects, parts, and subparts. We use CAM pseudo-labels as cues to select and combine SAM masks, resulting in high-quality pseudo-labels that are both class-aware and object-aware. Our approach is highly versatile and can be easily integrated into existing WSSS methods without any modification. Despite its simplicity, our approach shows consistent gain over the state-of-the-art WSSS methods on both PASCAL VOC and MS-COCO datasets.

研究动机与目标

推动通过让伪标签具备对象感知，而不仅仅是类别感知，来提升 WSSS 的伪标签质量。
利用 Segment Anything Model (SAM) 将精准的对象边界注入伪标签。
开发一个轻量级、可即插即用的 SEPL 方法，无需修改现有的 WSSS 方法。
在标准基线（PASCAL VOC 2012 和 MS COCO 2014）上展示对现有 WSSS 基线的改进。

提出的方法

SEPL 以每个类别的 CAM 派生伪标签作为种子，以及一组图像的 SAM 掩码作为集合。
将掩码分配给与 CAM 伪标签重叠最大的类别（掩码分配）。
然后基于两个重叠度量来选择掩码：o_s（掩码与伪标签的重叠比例）和 o_p（伪标签被掩码覆盖的比例）。
当 o_s > 0.5 或 o_p > 0.85 时保留该掩码；否则，当没有适用的 SAM 掩码时，保留初始的 CAM 伪标签。
通过对所选掩码进行逐位或合并并将该类别标签分配给非零区域，得到最终强化的伪标签。
SEPL 设计为可轻松集成到现有的 WSSS 流水线中，而无需修改底层方法。

Figure 1 : Illustration of how SAM addresses partial and false activation on PASCAL VOC 2012 train set: (A) original images; (B) pseudo-labels generated by a SOTA image-level WSSS method, CLIMS [ 50 ] ; (C) masks from SAM; (D) SAM enhanced pseudo-labels; (E) ground-truth labels.

实验结果

研究问题

RQ1在类别无关的 SAM 掩码在由类别感知的 CAM 伪标签引导下，是否能产生更高质量、具对象感知的伪标签以用于 WSSS？
RQ2基于这些伪标签训练的下游语义分割，在 VOC 2012 和 COCO 2014 上是否会带来可测量的改进？
RQ3SEPL 在多样化的基线 WSSS 方法与数据集上的鲁棒性如何？
RQ4SEPL 的失败模式是什么，如何减轻？

主要发现

SEPL 在 VOC 2012 和 COCO 2014 上的多种最先进 WSSS 基线中，伪标签质量持续提升。
在多个基线下（如 Recurseed、L2G、CLIPES、RCA、EPS、CLIMS、TransCAM、PPC+EPS、PPC+SEAM、SIPE、PuzzleCAM），使用 SEPL 增强的伪标签训练 DeepLab V2（ResNet-101）得到的 IoU 高于使用原始伪标签。
在 VOC 2012 训练集上，SEPL 相对于十一种基线的伪标签质量平均提升 5.33%；在 MS COCO 2014 训练集上，平均提升为 3.12%。
SEPL 的改进无需对现有 WSSS 方法进行改动，凸显了其多功能性与实用性。
该研究首次在 WSSS 场景中探讨应用 SAM，且暗示了分割基础模型在 CV 任务中的更广泛应用潜力。

Figure 2 : Illustration of the SEPL pipeline. SEPL comprises two stages, mask assignment and mask selection. Based on the intersection between each SAM mask and pseudo-labels, a mask is assigned to the class with the largest intersection. For each mask, two metrics are computed: $o_{s}$ , the fracti

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。