QUICK REVIEW

[论文解读] Semantic-SAM: Segment and Recognize Anything at Any Granularity

Feng Li, Hao Zhang|arXiv (Cornell University)|Jul 10, 2023

Advanced Neural Network Applications被引用 51

一句话总结

Semantic-SAM 是一个通用分割模型，能够在多种粒度上对对象进行分割和识别，并输出具备语义感知的结果，联合在七个数据集上进行训练，以实现开放词汇和多粒度分割。它使用多对多匹配和解耦的对象/部件分类，以实现语义感知和粒度-abundance。

ABSTRACT

In this paper, we introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity. Our model offers two key advantages: semantic-awareness and granularity-abundance. To achieve semantic-awareness, we consolidate multiple datasets across three granularities and introduce decoupled classification for objects and parts. This allows our model to capture rich semantic information. For the multi-granularity capability, we propose a multi-choice learning scheme during training, enabling each click to generate masks at multiple levels that correspond to multiple ground-truth masks. Notably, this work represents the first attempt to jointly train a model on SA-1B, generic, and part segmentation datasets. Experimental results and visualizations demonstrate that our model successfully achieves semantic-awareness and granularity-abundance. Furthermore, combining SA-1B training with other segmentation tasks, such as panoptic and part segmentation, leads to performance improvements. We will provide code and a demo for further exploration and evaluation.

研究动机与目标

致力于构建具有语义感知和粒度丰度的通用分割模型。
整合来自多个数据集的在语义层次和粒度层次上的训练数据。
通过多对多匹配方案实现单次点击就能得到多粒度输出。
将对象和部件概念解耦，以实现跨对象的部件知识迁移。
通过与 SA-1B 的联合训练，在全景分割和部件分割方面展示改进。

提出的方法

使用基于查询的掩码解码器来生成多粒度掩码。
用多个查询来表示每次用户点击，分别对应不同的粒度水平（K=6）。”
将点提示/框提示转换为锚框，并将带有位置嵌入的输入送入可变形解码器。
使用多对多的匈牙利算法匹配，将每次点击的多个预测掩码与多个真实掩码对齐。
使用共享文本编码器对对象和部件分类进行解耦，以实现跨数据集的对象/部件联合分割。
在七个数据集（SA-1B、COCO panoptic、ADE20k panoptic、Pascal Part、PACO、PartImageNet、Objects365）上进行训练，并重新组织数据格式以匹配目标。

实验结果

研究问题

RQ1单一模型是否能够在多样数据集上以开放词汇实现多粒度的分割与识别？
RQ2在语义丰富和粒度丰富的数据上进行联合训练，是否能同时提升通用分割和细粒度部件分割？
RQ3多对多匹配策略是否能提升单次点击的多粒度输出？
RQ4解耦的对象/部件分类是否能实现部件概念在对象之间的有效知识迁移？
RQ5SA-1B 及其他分割数据对全景分割和部件分割任务有何影响？

主要发现

Semantic-SAM 通过在七个数据集上的联合训练实现了语义感知和粒度丰富。
将 SA-1B 与 COCO panoptic 等数据联合，交互式分割的框架在 box AP 提升 (+2.3) 和 mask AP 提升 (+1.2)。
每次点击的多粒度输出比先前方法如 SAM 更丰富、质量更高，且在 1-IoU@All Granularity 上表现更好。
多对多匹配与多对一匹配相比，显著提升 1-IoU@All Granularity 分数。
使用 SA-1B 数据进行训练，特别是在 COCO 评估中提升小对象的表现（APs、APm）。
Semantic-SAM 展现了在通用分割与部件分割任务中的开放词汇和多粒度能力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。