QUICK REVIEW

[论文解读] Patching open-vocabulary models by interpolating weights

Gabriel Ilharco, Mitchell Wortsman|arXiv (Cornell University)|Aug 10, 2022

Multimodal Machine Learning Applications被引用 27

一句话总结

PAINT 通过在零-shot 与微调权重之间线性插值来对补丁任务的准确性进行提升，同时在支持任务上尽可能保留性能，从而实现多任务补丁和广泛的迁移。

ABSTRACT

Open-vocabulary models like CLIP achieve high accuracy across many image classification tasks. However, there are still settings where their zero-shot performance is far from optimal. We study model patching, where the goal is to improve accuracy on specific tasks without degrading accuracy on tasks where performance is already adequate. Towards this goal, we introduce PAINT, a patching method that uses interpolations between the weights of a model before fine-tuning and the weights after fine-tuning on a task to be patched. On nine tasks where zero-shot CLIP performs poorly, PAINT increases accuracy by 15 to 60 percentage points while preserving accuracy on ImageNet within one percentage point of the zero-shot model. PAINT also allows a single model to be patched on multiple tasks and improves with model scale. Furthermore, we identify cases of broad transfer, where patching on one task increases accuracy on other tasks even when the tasks have disjoint classes. Finally, we investigate applications beyond common benchmarks such as counting or reducing the impact of typographic attacks on CLIP. Our findings demonstrate that it is possible to expand the set of tasks on which open-vocabulary models achieve high accuracy without re-training them from scratch.

研究动机与目标

提高特定任务的准确性而不损害开放词汇模型现有能力的必要性。
引入一个基于预微调与后微调权重之间插值的简单两步补丁方法（PAINT）。
在多数据集和模型尺度上展示补丁效果，包括多任务和广泛迁移情景。

提出的方法

在补丁任务上微调一个零-shot 模型以获得 ft 权重。
使用混合系数 alpha 在零-shot 与微调权重之间线性插值以获得补丁模型。
通过在补丁任务和支持任务上的保留验证来选择 alpha。
通过联合、顺序或并行策略对多个补丁任务应用 PAINT，并比较性能。
使用 CLIP ViT-L/14 和 ViT-L/14 规模来研究补丁有效性和模型相似性（CKA）随尺度增加的情况。

实验结果

研究问题

RQ1在零-shot 与微调权重之间的插值是否可以在补丁任务上提升性能，同时不降低对支持任务的性能？
RQ2模型规模如何影响权重插值补丁的有效性和稳定性？
RQ3在单一模型上对多任务进行补丁是否可行，且与专门针对各任务的模型相比如何？
RQ4在一个任务上进行补丁是否会对相关或甚至不同任务产生广泛的迁移收益？
RQ5PAINT 的实际案例研究有哪些（例如排版攻击、计数、VQA）能提供收益？

主要发现

Task	Unpatched accuracy	Patched accuracy	(+Δ)
Cars	86.2	87.0	+0.8
DTD	64.9	66.1	+1.2
EuroSAT	79.9	87.2	+7.3
GTSRB	51.7	71.1	+19.4
KITTI	43.4	60.4	+17.0
MNIST	82.6	91.3	+8.7
RESISC45	73.4	74.2	+0.8
SUN397	76.9	79.3	+2.4
SVHN	72.8	88.9	+16.1

PAINT 在九个补丁任务上带来 15 到 60 个百分点的提升，同时保持 ImageNet 准确度在零-shot 模型的 <1 个百分点之内。
补丁效果随模型尺度增大而提升，较大的模型在未补丁权重与微调权重及表示之间更为接近。
在对若干任务进行补丁时，单个补丁模型的表现可以达到或接近若干专用模型的水平（平均综合准确度在大约 0.5 个百分点内）。
即使类别空间不相交，基于补丁的广泛迁移仍能提升相关任务（例如 EuroSAT/RESISC45、MNIST/SVHN 等）。
PAINT 在案例研究中取得稳健增益：排版攻击鲁棒性提升多达 41 点；在未看到数字上的计数从 59% 提升到超过 99%，对 ImageNet 的影响很小；VQA 性能提升约 18 点，ImageNet 减损很小。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。