QUICK REVIEW

[论文解读] Towards Efficient Visual Adaption via Structural Re-parameterization

Gen Luo, Minglang Huang|arXiv (Cornell University)|Feb 16, 2023

Advanced Neural Network Applications被引用 28

一句话总结

RepAdapter 引入一个顺序的、结构可重参数化的视觉适配器，用于巨型视觉模型，在零推理开销的情况下实现超越27个数据集上最先进的 PETL 方法的性能。

ABSTRACT

Parameter-efficient transfer learning (PETL) is an emerging research spot aimed at inexpensively adapting large-scale pre-trained models to downstream tasks. Recent advances have achieved great success in saving storage costs for various pre-trained models by updating a small number of parameters instead of full tuning. However, we notice that most existing PETL methods still incur non-negligible latency during inference. In this paper, we propose a parameter-efficient and computational friendly adapter for giant vision models, called RepAdapter. Specifically, we first prove that common adaptation modules can also be seamlessly integrated into most giant vision models via our structural re-parameterization, thereby achieving zero-cost during inference. We then investigate the sparse design and effective placement of adapter structure, helping our RepAdaper obtain other advantages in terms of parameter efficiency and performance. To validate RepAdapter, we conduct extensive experiments on 27 benchmark datasets of three vision tasks, i.e., image and video classifications and semantic segmentation. Experimental results show the superior performance and efficiency of RepAdapter than the state-of-the-art PETL methods. For instance, RepAdapter outperforms full tuning by +7.2% on average and saves up to 25% training time, 20% GPU memory, and 94.6% storage cost of ViT-B/16 on VTAB-1k. The generalization ability of RepAdapter is also well validated by a bunch of vision models. Our source code is released at https://github.com/luogen1996/RepAdapter.

研究动机与目标

推动针对大型视觉模型的参数高效迁移学习（PETL），以在部署时减少存储和计算需求。
表明通过结构重参数化可将常见视觉适配器合并到预训练模型中，而不会产生额外的推理成本。
研究适配器的稀疏设计与放置以提高参数效率和性能。
展示 RepAdapter 在多样化视觉任务（图像/视频分类、语义分割）和模型家族中的有效性。
在如 ConvNeXt、ViT、Swin-Transformer 和 CLIP 等骨干网络上验证泛化性。

提出的方法

提出 RepAdapter，一种轻量级适配器，其训练时添加可被重参数化为附近投影权重，从而实现零推理成本。
在适配器中移除非线性激活以实现线性重参数化，从而在推理阶段获得等价的线性投影。
将顺序适配块重参数化为预训练权重（W0, b0），形成 Wrep 和 brep，以并入到注意力（MHA）、FFN 和卷积中。
引入密集-稀疏 Adapters 设计，其中上投影按组进行（Gs 组），以减少参数量。
系统性研究适配器放置，显示在神经模块前（MHA/FFN 之前）插入的放置在 ViT 和其他骨干上实现了更好的性能。
在三个视觉任务（图像/视频分类、语义分割）和多种骨干网络（ViT、ConvNeXt、Swin-Transformer、CLIP）上评估 RepAdapter 的效果。

Figure 1 : Performance comparison of our RepAdpater and existing PETL methods [ 19 , 16 , 2 , 18 , 38 ] on VTAB-1K. The vision model is ViT-B/16 and the inference speed is measured on a NVIDIA 3090 GPU with a batch size of 1. Most existing PETL methods incur non-negligible GPU latency during inferen

实验结果

研究问题

RQ1线性化的顺序适配器是否可以在不增加推理成本的情况下重参数化到预训练的视觉模型中？
RQ2稀疏、按组设计的适配器是否在降低参数数量的同时保持性能？
RQ3适配器放置对大规模视觉模型的影响是什么，哪些位置能最大化收益？
RQ4RepAdapter 如何在不同架构和任务（图像/视频分类、分割、基于 CLIP 的少样本/领域泛化）上实现泛化？
RQ5与现有的 PETL 方法在准确性和效率方面相比，RepAdapter 的表现如何？

主要发现

RepAdapter 在重参数化后，在推理阶段不再产生额外的计算量。
顺序放置的线性化适配器可以合并到预训练权重中，且不会导致性能下降。
稀疏（分组）设计将参数量减少约 25%，同时保持或提升准确性。
在 ViT 及其他骨干上，前置插入（在 MHA/FFN 之前）的放置比后置插入产生更好的性能。
RepAdapter 在 VTAB-1k 上优于最先进的 PETL 方法，并能良好泛化到 CLIP、ConvNeXt、Swin、ViT，以及视频/分割任务。
在推理阶段，RepAdapter 未显示额外 FLOPs，而许多竞品 PETL 方法在此方面存在差异，同时实现了更优或更具竞争力的准确性。

Figure 2 : Comparison of existing PETL methods [ 2 , 19 , 18 ] and our RepAdapter . RepAdapter is deployed in a sequential manner, but it can be completely re-parameterized into the vision models during inference, enabling zero additional computational overhead. Its structure is also more lightweigh

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。