QUICK REVIEW

[论文解读] An Alternative to WSSS? An Empirical Study of the Segment Anything Model (SAM) on Weakly-Supervised Semantic Segmentation Problems

Weixuan Sun, Zheyuan Liu|arXiv (Cornell University)|May 2, 2023

Machine Learning and Data Classification被引用 15

一句话总结

本论文研究使用 Segment Anything Model (SAM) 作为弱监督语义分割的伪标签生成器，与传统 WSSS 方法在 PASCAL VOC 和 MS-COCO 上进行比较。分析性能、局限性和实际意义。

ABSTRACT

The Segment Anything Model (SAM) has demonstrated exceptional performance and versatility, making it a promising tool for various related tasks. In this report, we explore the application of SAM in Weakly-Supervised Semantic Segmentation (WSSS). Particularly, we adapt SAM as the pseudo-label generation pipeline given only the image-level class labels. While we observed impressive results in most cases, we also identify certain limitations. Our study includes performance evaluations on PASCAL VOC and MS-COCO, where we achieved remarkable improvements over the latest state-of-the-art methods on both datasets. We anticipate that this report encourages further explorations of adopting SAM in WSSS, as well as wider real-world applications.

研究动机与目标

探讨是否可以使用 SAM 作为仅利用类别标签的图像级 WSSS 的伪标签生成器。
评估基于 SAM 的伪标签相对于标准数据集上的先进 WSSS 方法的质量。
识别 SAM 在 WSSS 中的局限性，包括语义模糊性与实际部署考虑。
提供在现实世界场景中何时使用 SAM 驱动的 WSSS 流水线具有优势的指南。

提出的方法

使用 Grounded-DINO 将图像级类别标签通过文本提示转换为有锚点的边界框。
将锚点边界框输入 SAM (ViT-H) 以获得实例分割掩码。
将 SAM 掩码结合以生成用于训练的语义分割伪标签。
在 PASCAL VOC 和 MS-COCO 上用 DeepLab-v2 (ResNet-101) 评估伪标签质量和下游分割。
将基于 SAM 的伪标签和最终分割与一系列之前的 WSSS 方法及全监督进行比较。
讨论包括计算成本和数据-真值对齐等实际考虑因素。

Figure 1: SAM generated pseudo-labels compared to the ground-truth in PASCAL VOC. In most cases, SAM performs closely to the human annotations.

实验结果

研究问题

RQ1SAM 在不进行微调的前提下，是否在由文本引导提示的情况下能够生成高质量的 WSSS 伪标签？
RQ2SAM 基于伪标签与最先进的 WSSS 方法在 PASCAL VOC 和 MS-COCO 上的对比如何？
RQ3使用 SAM 进行 WSSS 的实际局限性（如语义模糊性、资源需求）有哪些？
RQ4SAM 能否在标准基准上接近完全监督分割的性能？

主要发现

方法	场地	带有显著性	验证集	测试集
NSRM	CVPR2021	✓	70.4	70.2
InferCam	WACV2022	✓	70.8	71.8
EDAM	CVPR2021	✓	70.9	70.6
EPS	CVPR2021	✓	71.0	71.8
DRS	AAAI2021	✓	71.2	71.4
L2G	CVPR2022	✓	72.1	71.7
Du et al.	CVPR2022	✓	72.6	73.6
PSA	CVPR2018	–	61.7	63.7
SEAM	CVPR2020	–	64.5	65.7
CDA	ICCV2021	–	66.1	66.8
ECS-Net	ICCV2021	–	66.6	67.6
Du et al.	CVPR2022	–	67.7	67.4
CPN	ICCV2021	–	67.8	68.5
AdvCAM	CVPR2021	–	68.1	68.0
Kweon et al.	ICCV2021	–	68.4	68.2
ReCAM	CVPR2022	–	68.5	68.4
SIPE	CVPR2022	–	68.8	69.7
URN	AAAI2022	–	69.5	69.7
ESOL	NeurIPS2022	–	69.9	69.3
PMM	ICCV2021	–	70.0	70.5
VWL-L	IJCV2022	–	70.6	70.7
Lee et al.	CVPR2022	–	70.7	70.1
MCTformer	CVPR2022	–	71.9	71.6
OCR	CVPR2023	–	72.7	72.0
CLIP-ES	CVPR2023	–	73.8	73.9
SAM	–	–	77.2	77.1
full-supervision	–	–	77.7	79.7

SAM 伪标签在 PASCAL VOC 的训练集上达到 88.3 mIoU，领先于先前的 WSSS 方法 13.3 mIoU。
基于 SAM 的最终分割在 PASCAL VOC 的验证集达到 77.2 mIoU，在测试集达到 77.1 mIoU，超越了先前的 SOTA 方法。
在 MS-COCO 上，SAM 达到伪标签 mIoU 66.8，最终分割 55.6 mIoU，显著超过现有的 WSSS 方法。
SAM 展示了不微调也具备竞争力的性能，凸显其作为基于基础模型的 WSSS 替代方案的潜力。
研究指出语义模糊性作为一个局限性，SAM 的粒度可能与人工注释不同，建议将来工作采用分层提示。
SAM 不是对 WSSS 的严格公平比较（因为它是在大规模、潜在完全标注的数据上训练的），但提供了一种实用、简化的 WSSS 替代方案。

Figure 2: We observe that in some cases SAM performs better than the human annotated ground-truth. Notably, SAM is able to capture crisp boundaries, more detailed structures and finer-grained semantic classes.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。