[论文解读] TIES-Merging: Resolving Interference When Merging Models
Ties-Merging (TrIm, Elect Sign & Merge) 通过修剪冗余参数、解决符号冲突,并仅合并与当选符号对齐的参数,来合并多个微调模型,在 NLP 和视觉任务上优于现有的合并方法。
Transfer learning - i.e., further fine-tuning a pre-trained model on a downstream task - can confer significant advantages, including improved downstream performance, faster convergence, and better sample efficiency. These advantages have led to a proliferation of task-specific fine-tuned models, which typically can only perform a single task and do not benefit from one another. Recently, model merging techniques have emerged as a solution to combine multiple task-specific models into a single multitask model without performing additional training. However, existing merging methods often ignore the interference between parameters of different models, resulting in large performance drops when merging multiple models. In this paper, we demonstrate that prior merging techniques inadvertently lose valuable information due to two major sources of interference: (a) interference due to redundant parameter values and (b) disagreement on the sign of a given parameter's values across models. To address this, we propose our method, TRIM, ELECT SIGN & MERGE (TIES-Merging), which introduces three novel steps when merging models: (1) resetting parameters that only changed a small amount during fine-tuning, (2) resolving sign conflicts, and (3) merging only the parameters that are in alignment with the final agreed-upon sign. We find that TIES-Merging outperforms several existing methods in diverse settings covering a range of modalities, domains, number of tasks, model sizes, architectures, and fine-tuning settings. We further analyze the impact of different types of interference on model parameters, and highlight the importance of resolving sign interference. Our code is available at https://github.com/prateeky2806/ties-merging
研究动机与目标
- 说明为何将任务特定的微调模型合并以实现多任务能力和泛化性是可取的。
- 识别在天真模型合并过程中干扰的来源并量化其影响。
- 提出并验证一个三步式合并过程(trim、elect signs、disjoint merge)以提升多任务性能。
- 在跨模态、模型规模和微调方式(包括有验证数据与无验证数据的设置)下评估 Ties-Merging。
提出的方法
- 用 tau_t = theta_ft^t - theta_init 表示每个任务。
- Trim: 在每个 tau_t 中仅保留前 k% 的幅值,其它值设为零。
- Elect: 通过对跨任务的修剪后任务向量求和再取符号,得到当选符号 gamma_m。
- Disjoint Merge: 对每个参数,仅对符号与当选符号匹配的已修剪值进行平均,忽略零值项。
- Merge 方式为 theta_m = theta_init + lambda * tau_m,其中 tau_m 是由对齐符号形成的不相交均值向量。
- 与基线方法如 Simple Averaging、Fisher Merging、RegMean,以及 Task Arithmetic 在 NLP 与 vision 基准上进行对比。
实验结果
研究问题
- RQ1在合并过程中,模型参数之间的干扰是否可被表征和缓解?
- RQ2在合并多个微调模型时,修剪冗余参数并解决符号冲突是否能提升多任务性能?
- RQ3Ties-Merging 在跨模态(语言与视觉)、模型规模和微调制度(全量 vs PEFT)下的表现与现有方法相比如何?
- RQ4符号分辨对维持或提升合并后参数幅值与任务性能是否至关重要?
- RQ5在合并时没有验证集的情况下,Ties-Merging 的鲁棒性如何?
主要发现
- Ties-Merging 在 NLP 与 vision 任务上对比现有合并方法在有无验证集的情况下均表现突出。
- 在域内评估中,Ties-Merging 在 NLP 上平均比最强基线提升 2.3% 的绝对值,在 vision 上提升 1.7%。
- 在域外泛化中,Ties-Merging 相对于最强基线实现绝对提升 1.0%(T5-base)和 4.4%(T5-large)。
- 在缺乏验证数据的情况下,采用固定方案(前 20% 修剪,lambda=1)的 Ties-Merging 仍然优于像 Task Arithmetic 这样的基线,适用于 vision 和 NLP 任务。
- 消融实验显示去除修剪、符号选举或不相交均值组件将降低性能,其中尺度化与不相交均值对提升贡献显著。
- 解决符号干扰至关重要:前位参数方向翻转可能灾难性地降低性能,强调正确符号估计的重要性。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。