QUICK REVIEW

[论文解读] SAM Fails to Segment Anything? -- SAM-Adapter: Adapting SAM in Underperformed Scenes: Camouflage, Shadow, Medical Image Segmentation, and More

Tianrun Chen, Lanyun Zhu|arXiv (Cornell University)|Apr 18, 2023

Visual Attention and Saliency Detection被引用 40

一句话总结

SAM-Adapter 在 Segment Anything (SAM) 主干上增添轻量级任务特定适配器，以提升在伪装、阴影与医学影像等具有挑战性的场景中的分割性能，在若干数据集上取得了最新的SOTA。

ABSTRACT

The emergence of large models, also known as foundation models, has brought significant advancements to AI research. One such model is Segment Anything (SAM), which is designed for image segmentation tasks. However, as with other foundation models, our experimental findings suggest that SAM may fail or perform poorly in certain segmentation tasks, such as shadow detection and camouflaged object detection (concealed object detection). This study first paves the way for applying the large pre-trained image segmentation model SAM to these downstream tasks, even in situations where SAM performs poorly. Rather than fine-tuning the SAM network, we propose extbf{SAM-Adapter}, which incorporates domain-specific information or visual prompts into the segmentation network by using simple yet effective adapters. By integrating task-specific knowledge with general knowledge learnt by the large model, SAM-Adapter can significantly elevate the performance of SAM in challenging tasks as shown in extensive experiments. We can even outperform task-specific network models and achieve state-of-the-art performance in the task we tested: camouflaged object detection, shadow detection. We also tested polyp segmentation (medical image segmentation) and achieves better results. We believe our work opens up opportunities for utilizing SAM in downstream tasks, with potential applications in various fields, including medical image processing, agriculture, remote sensing, and more.

研究动机与目标

评估在 SAM 性能不足的具有挑战性的分割任务上的表现（伪装、阴影、医学影像）。
提出 SAM-Adapter，通过轻量级适配器在不微调主干的情况下向 SAM 注入任务特定信息。
展示在伪装目标检测、阴影检测和息肉分割数据集上的性能提升。
表明 SAM-Adapter 能超越任务特定模型，并达到具有竞争力或最先进的结果。

提出的方法

将 SAM 作为冻结主干（ViT-H/16），其掩码解码器初始化并进行轻微微调。
引入 SAM-Adapter：一个由两个基于 MLP 的模块组成的轻量级架构，用于生成任务特定的视觉提示。
将任务特定信息 F^i（如高频分量和补丁嵌入）纳入适配器，以生成 SAM 层的提示 P^i。
将提示 P^i 附着在变换器层上，引导 SAM 适应下游任务。
允许以 F^i = sum_j w_j F_j 的形式灵活组合信息 F^i，以汇聚不同引导类型。
在各数据集上使用标准损失（适用时为 BCE/IOU）和 AdamW 进行训练。

实验结果

研究问题

RQ1在具备轻量级适配器后，SAM 是否能够在伪装目标检测、阴影检测和息肉分割上达到具有竞争力的性能？
RQ2哪些形式的任务特定信息（视觉先验）作为适配器输入是有效的？
RQ3在不微调 SAM 主干的情况下，SAM-Adapter 的提示是否能跨数据集和任务泛化？

主要发现

单独使用时，SAM 在伪装目标检测和阴影检测上的表现不足。
SAM-Adapter 在 COD 数据集上显著提升 SAM 的性能，在 CAMO/COD10K/CHAMELEON 上相对于基线在 Sα 上约提升 +17.9%。
SAM-Adapter 在伪装目标检测数据集（COD10K、CAMO、CHAMELEON）和阴影检测（ISTD）上取得接近 SOTA 的结果，指标强且 MAE 低。
在息肉分割（医学影像）上，SAM-Adapter 提升了 mDice 和 mIoU 相对于 SAM 基线。
定量结果显示 SAM-Adapter 在多个任务上超过了若干任务特定方法和原始 SAM。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。