QUICK REVIEW

[论文解读] Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model

Zihan Zhong, Zhiqiang Tang|arXiv (Cornell University)|Jan 31, 2024

Context-Aware Activity Recognition Systems被引用 12

一句话总结

Conv-LoRA 是一种参数高效微调方法，通过一组 MoE 指导的尺度特异专家，将轻量级卷积先验注入 SAM 的 ViT 编码器，在保持大部分 SAM 权重冻结的同时，提升跨多领域的分割性能。

ABSTRACT

The Segment Anything Model (SAM) stands as a foundational framework for image segmentation. While it exhibits remarkable zero-shot generalization in typical scenarios, its advantage diminishes when applied to specialized domains like medical imagery and remote sensing. To address this limitation, this paper introduces Conv-LoRA, a simple yet effective parameter-efficient fine-tuning approach. By integrating ultra-lightweight convolutional parameters into Low-Rank Adaptation (LoRA), Conv-LoRA can inject image-related inductive biases into the plain ViT encoder, further reinforcing SAM's local prior assumption. Notably, Conv-LoRA not only preserves SAM's extensive segmentation knowledge but also revives its capacity of learning high-level image semantics, which is constrained by SAM's foreground-background segmentation pretraining. Comprehensive experimentation across diverse benchmarks spanning multiple domains underscores Conv-LoRA's superiority in adapting SAM to real-world semantic segmentation tasks.

研究动机与目标

在 zero-shot SAM 困难的领域特定分割中（如医学、遥感等）提升 SAM 的性能的动机。
提出一种参数高效的微调方法，在保留 SAM 知识的同时实现图像相关的局部先验。
通过在 LoRA 的基础上加入轻量级卷积和 Mixture-of-Experts，开发 Conv-LoRA 以处理多尺度特征。
证明 Conv-LoRA 在自然图像、农业、遥感和医疗数据集上优于其他 PEFT 方法。

提出的方法

在 LoRA 的基础上，在变换器权重周围插入瓶颈并添加轻量级卷积（Conv-LoRA）。
使用 Mixture-of-Experts (MoE) 创建多个尺度特异的卷积专家，并有一个门控机制，在前向传递期间动态选择前 top-k 专家。
通过让每个专家对特征图进行上采样、卷积和下采样，使其回到 ViT 的默认尺度，在相应的特征尺度注入局部先验。
移除提示编码器以实现端到端微调，并在掩码解码器中添加一个轻量级分类分支，以实现多类分割。
用可训练参数较少的方式训练所有方法，同时冻结 SAM 的预训练权重；使用辅助损失来平衡专家使用。
将 Conv-LoRA 与基线进行比较，包括仅解码器微调、BitFit、Adapter、SAM-Adapter、VPT、LST、SSF，以及 LoRA，在多样数据集上。

实验结果

研究问题

RQ1PEFT，特别是 Conv-LoRA，是否能够在保留 SAM 的分割知识的同时，恢复并增强其学习高级语义信息的能力？
RQ2通过 MoE 引导的 Conv-LoRA 注入多尺度局部先验，是否能在自然图像、农业、遥感和医学数据集上改进二值和多类语义分割？
RQ3就性能、参数开销和训练效率而言，Conv-LoRA 与其他 PEFT 方法相比如何？
RQ4在提示保持不变且添加多类解码分支时，端到端的 SAM 是否可行用于分割任务？

主要发现

Conv-LoRA 在自然图像、农业、遥感和医疗基准上始终优于其他 PEFT 方法。
与 LoRA 相比，Conv-LoRA 增加的参数开销极小，同时带来明显的性能提升。
基于 MoE 的动态尺度选择带来训练速度提升和相较多尺度融合的内存节省。
对图像编码器进行微调（即使使用 PEFT）对分割质量（mIoU、Dice）更有利于分割任务，而不是仅解码器微调。
SAM 在二值掩码预测上的预训练限制了高级语义学习，Conv-LoRA 有助于恢复。
通过简单的结构修改和 PEFT，可以实现端到端的 SAM 多类分割自适应。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。