[论文解读] Towards a Unified View of Parameter-Efficient Transfer Learning
本文通过将当下最先进的参数高效迁移学习方法重新表述为在冻结的预训练模型中的隐藏状态修改来统一它们,并展示新的变体,在多项NLP任务上达到与全量微调相同的性能,同时可调参数显著更少。
Fine-tuning large pre-trained language models on downstream tasks has become the de-facto learning paradigm in NLP. However, conventional approaches fine-tune all the parameters of the pre-trained model, which becomes prohibitive as the model size and the number of tasks grow. Recent work has proposed a variety of parameter-efficient transfer learning methods that only fine-tune a small number of (extra) parameters to attain strong performance. While effective, the critical ingredients for success and the connections among the various methods are poorly understood. In this paper, we break down the design of state-of-the-art parameter-efficient transfer learning methods and present a unified framework that establishes connections between them. Specifically, we re-frame them as modifications to specific hidden states in pre-trained models, and define a set of design dimensions along which different methods vary, such as the function to compute the modification and the position to apply the modification. Through comprehensive empirical studies across machine translation, text summarization, language understanding, and text classification benchmarks, we utilize the unified view to identify important design choices in previous methods. Furthermore, our unified framework enables the transfer of design elements across different approaches, and as a result we are able to instantiate new parameter-efficient fine-tuning methods that tune less parameters than previous methods while being more effective, achieving comparable results to fine-tuning all parameters on all four tasks.
研究动机与目标
- 拆解并连接现有的参数高效调优方法。
- 识别跨任务有效性关键的设计要素。
- 提出一个统一框架,以在方法之间转移设计选择。
- 实现并评估在保持性能的同时使用更少参数的新变体。
提出的方法
- 将参数高效调优方法重新表述为对冻结的预训练语言模型中的隐藏表示的修改。
- 定义设计维度:修改的功能形式、修改位置,以及与原始表示的整合/组合。
- 展示等价关系(如前缀调优与适配器)并引入诸如多头并行适配器和缩放并行适配器等变体。
- 通过跨方法迁移设计元素来实现新方法,并在多项NLP任务上进行评估。
实验结果
研究问题
- RQ1在一个统一框架内,参数高效调优方法如何相互关联?
- RQ2哪些设计要素对这些方法的有效性至关重要?
- RQ3是否可以在方法之间转移有用的要素以创造更强的变体?
- RQ4在不同资源预算下,新的变体是否优于现有方法?
主要发现
- 现有方法在某些任务上以不到1%的可调参数提供具有竞争力的结果,但在诸如 XSum 和 en-ro MT 等高资源任务上仍存在差距。
- 并行插入(如前缀调优)通常优于顺序适配器,并行适配器常常优于顺序的。
- 当参数预算较大时,FFN 修改稳定地优于注意力修改,建议将更多预算分配给 FFN 变更。
- 一个多头并行适配器(MH PA)和一个混合匹配适配器(MAM Adapter)实现了强性能,在 XSum 和 MT 上仅调优约 6.7% 参数即可达到与全量微调相匹配,在 MNLI/SST2 上约为 0.5%。
- 对设计要素进行扩展和组合(例如前缀调优与针对 FFN 的缩放)在统一框架内产生了最先进的结果。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。