QUICK REVIEW

[论文解读] Personalize Segment Anything Model with One Shot

Renrui Zhang, Zhengkai Jiang|arXiv (Cornell University)|May 4, 2023

Visual Attention and Saliency Detection被引用 65

一句话总结

PerSAM 提供训练自由的一Shot 个人化 SAM 使用正/负定位先验和目标语义；PerSAM-F 增加一个带有两个可学习权重的尺度感知微调以改进分割，在个性化对象分割上达到最先进的结果并帮助 DreamBooth。

ABSTRACT

Driven by large-data pre-training, Segment Anything Model (SAM) has been demonstrated as a powerful and promptable framework, revolutionizing the segmentation models. Despite the generality, customizing SAM for specific visual concepts without man-powered prompting is under explored, e.g., automatically segmenting your pet dog in different images. In this paper, we propose a training-free Personalization approach for SAM, termed as PerSAM. Given only a single image with a reference mask, PerSAM first localizes the target concept by a location prior, and segments it within other images or videos via three techniques: target-guided attention, target-semantic prompting, and cascaded post-refinement. In this way, we effectively adapt SAM for private use without any training. To further alleviate the mask ambiguity, we present an efficient one-shot fine-tuning variant, PerSAM-F. Freezing the entire SAM, we introduce two learnable weights for multi-scale masks, only training 2 parameters within 10 seconds for improved performance. To demonstrate our efficacy, we construct a new segmentation dataset, PerSeg, for personalized evaluation, and test our methods on video object segmentation with competitive performance. Besides, our approach can also enhance DreamBooth to personalize Stable Diffusion for text-to-image generation, which discards the background disturbance for better target appearance learning. Code is released at https://github.com/ZrrSkywalker/Personalize-SAM

研究动机与目标

在不依赖手动提示的情况下，推动对用户指定视觉概念的个性化分割。
开发训练自由的机制，将高层次目标语义注入到 SAM，以获得个性化掩码。
引入一个轻量级的尺度感知微调变体（PerSAM-F），以解决掩码尺度的歧义问题。
创建 PerSeg 数据集，用于评估个性化对象分割。
展示在一Shot 视频/语义/部位分割中的适用性，并为个性化图像合成的 DreamBooth 提供帮助。

提出的方法

使用参考图像和测试图像的特征，在测试图像中计算目标的定位置信度图。
从置信度图中提取正–负定位先验来提示 SAM。
通过定位图对所有跨注意力层进行目标引导注意力偏置，使其偏向前景区域。
通过向所有解码器输入令牌添加全局目标嵌入来应用目标语义提示。
执行级联后处理以迭代地通过轻量级解码器提示来提高掩码质量。
对于 PerSAM-F，冻结 SAM 并学习两个掩码权重，将三尺度 SAM 输出组合成最终尺度感知掩码（两个可学习参数，10 秒在 A100）。
可选地使用 PerSAM 通过在扩散模型微调期间屏蔽背景区域来改进 DreamBooth。

实验结果

研究问题

RQ1Can SAM be personalized for a specific object from just one reference image and a rough mask?
RQ2How can high-level target semantics be efficiently injected into SAM without retraining?
RQ3Does a light, scale-aware fine-tuning strategy improve segmentation when only one-shot data is available?
RQ4Can PerSAM facilitate better personalized text-to-image synthesis (DreamBooth) by mitigating background disturbance?

主要发现

Method	mIoU	bIoU	Param.
PerSAM	89.3	71.7	0
PerSAM-F	95.3	77.9	2
VP	65.9	25.5	383M
SEEM*	87.1	55.7	341M
SegGPT*	94.3	76.5	354M

PerSAM significantly improves personalized object segmentation over several baselines on PerSeg, with substantial gains over training-free prompts.
PerSAM-F achieves the best overall performance on PerSeg with mIoU of 95.3 and bIoU of 77.9, using only 2 trainable parameters.
PerSAM outperforms training-free competitors like VP, Painter, SEEM, and SegGPT on the PerSeg benchmark.
Two-step post-refinement and target-guided attention contribute notably to performance gains (up to +11.4% mIoU from refinement alone).
Scale-aware fine-tuning (PerSAM-F) provides a robust improvement by learning scale weights, outperforming other parameter-efficient methods (Prompt,Tuning, Adapter, LoRA).
PerSAM-assisted DreamBooth yields higher-quality personalized text-to-image synthesis by focusing training on foreground regions.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。