QUICK REVIEW

[论文解读] Fast Segment Anything

Xu Zhao, Wenchao Ding|arXiv (Cornell University)|Jun 21, 2023

Multimodal Machine Learning Applications被引用 33

一句话总结

FastSAM 提出一种实时的、基于 CNN 的替代方案，用于 SAM 的 segment-anything 任务，通过使用 YOLOv8-seg 进行全实例分割再进行提示引导筛选，实现在约 50 倍加速下的可比性能。

ABSTRACT

The recently proposed segment anything model (SAM) has made a significant influence in many computer vision tasks. It is becoming a foundation step for many high-level tasks, like image segmentation, image caption, and image editing. However, its huge computation costs prevent it from wider applications in industry scenarios. The computation mainly comes from the Transformer architecture at high-resolution inputs. In this paper, we propose a speed-up alternative method for this fundamental task with comparable performance. By reformulating the task as segments-generation and prompting, we find that a regular CNN detector with an instance segmentation branch can also accomplish this task well. Specifically, we convert this task to the well-studied instance segmentation task and directly train the existing instance segmentation method using only 1/50 of the SA-1B dataset published by SAM authors. With our method, we achieve a comparable performance with the SAM method at 50 times higher run-time speed. We give sufficient experimental results to demonstrate its effectiveness. The codes and demos will be released at https://github.com/CASIA-IVA-Lab/FastSAM.

研究动机与目标

通过降低计算需求，推动工业界的实时 segment-anything 应用。
探讨基于 CNN 的检测器是否能够在 segment-anything 任务上达到与 SAM 相当的性能。
展示两阶段的 FastSAM 框架（全实例分割与提示引导筛选）在推理速度显著更快的情况下的表现。
在零样本任务上评估 FastSAM，包括边缘检测、对象候选提案和文本引导分割，以测试泛化能力。

提出的方法

将 segment-anything 重新表述为两阶段过程：全实例分割（AIS）随后进行提示引导筛选（PGS）。
在 AIS 中使用带有实例分割分支的 YOLOv8-seg（YOLACT 风格原型）以对图像中的所有对象进行分割。
在 SA-1B 数据集的 2%（1/50）上进行训练，以让基于 CNN 的检测器学会稳健的掩模。
使用点提示、框提示和文本提示（通过 CLIP）进行提示引导筛选，从 AIS 掩模中识别目标对象。
利用简单的提示编码器/解码器将提示映射到掩模选择，而不使用端到端的基于 Transformer 的分割。
给出在 RTX 3090 上在各种提示设置下比 SAM 快 50x 的推理速度对比。

实验结果

研究问题

RQ1带有实例分割分支的基于 CNN 的检测器是否能够在 segment-anything 任务上实现可与 SAM 相当的分割性能，同时提供实时速度？
RQ2FastSAM 在零样本任务（如边缘检测、对象提案生成和文本引导分割）中的表现相比 SAM 如何？
RQ3将 segment-anything 解耦为 AIS 与 PGS 与端到端 Transformer 方法相比有哪些优点和局限？
RQ4在 SA-1B 的一小部分上进行训练是否足以在现实应用中获得具有竞争力的结果？

主要发现

FastSAM 在单个 RTX 3090 上的推理速度约比 SAM 快 50x（32×32 提示模式），同时保持可比的性能。
FastSAM 在 BSDS500 的零-shot 设置中获得有竞争力的边缘检测结果，R50 更高，AP 与 SAM 相当。
在 COCO 的对象提案上，FastSAM 的 AR1000 为 63.7，略高于使用 32×32 提示的 SAM，同时时更快。
在 LVIS v1 上，FastSAM 在边界框 AR@1000 上表现强劲，掩模 AR@1000 与 SAM 具有竞争力，尤其在零-shot 设置中。
FastSAM 展示了使用 ViTDet 提供的框作为提示的稳健零-shot 实例分割能力，尽管在某些设定下其在 COCO/LVIS 的 AP 低于完全监督的方法与 SAM。
基于文本提示的分割结合 CLIP 能够实现，但由于 CLIP 嵌入吞吐量较慢，需要在灵活性与速度之间权衡。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。