QUICK REVIEW

[论文解读] LLM Augmented LLMs: Expanding Capabilities through Composition

Rachit Bansal, Bidisha Samanta|arXiv (Cornell University)|Jan 4, 2024

Topic Modeling被引用 10

一句话总结

CALM 通过学习一个小型基于交叉注意力的接口，将锚定的 LLM 与专门的增强模型组合在一起，在不改变任一模型权重的情况下实现新能力。它提升了诸如低资源语言翻译、带 KV 映射的算术推理、以及代码理解/生成等任务。

ABSTRACT

Foundational models with billions of parameters which have been trained on large corpora of data have demonstrated non-trivial skills in a variety of domains. However, due to their monolithic structure, it is challenging and expensive to augment them or impart new skills. On the other hand, due to their adaptation abilities, several new instances of these models are being trained towards new domains and tasks. In this work, we study the problem of efficient and practical composition of existing foundation models with more specific models to enable newer capabilities. To this end, we propose CALM -- Composition to Augment Language Models -- which introduces cross-attention between models to compose their representations and enable new capabilities. Salient features of CALM are: (i) Scales up LLMs on new tasks by 're-using' existing LLMs along with a few additional parameters and data, (ii) Existing model weights are kept intact, and hence preserves existing capabilities, and (iii) Applies to diverse domains and settings. We illustrate that augmenting PaLM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13\% on tasks like translation into English and arithmetic reasoning for low-resource languages. Similarly, when PaLM2-S is augmented with a code-specific model, we see a relative improvement of 40\% over the base model for code generation and explanation tasks -- on-par with fully fine-tuned counterparts.

研究动机与目标

推动高效、实用的基础模型组合，以在不进行微调或数据共享限制的情况下获得新能力。
通过学习冻结模型之间的小型可训练交互，实现对现有模型的重用。
在包括语言包容性、带键值映射的算术运算，以及代码相关任务等多样领域展示 CALM。

提出的方法

引入 CALM：一个在两个冻结模型的选定层上学习一小组可训练参数的框架（一个锚定模型 m_B 和一个增强模型 m_A）。
学习一个投影 f_proj，将 m_A 的表示映射到 m_B 的维度，进而在两者之间实现跨注意力层 f_cross。
在投影后的 m_A 的 key/value 与 m_B 的 query 之间引入跨注意力，并通过残差连接传递到后续层。
使用一个小型数据集 D_C 训练组合参数 Θ_C，该数据集旨在呈现目标组合任务 C 所需的联合“组合技能”。
选择组合层 L_A 和 L_B，并在所选层上迭代应用跨注意力机制。
证明 CALM 在不改变基础权重的前提下，保留了两种模型的现有能力，同时实现新的能力。

实验结果

研究问题

RQ1锚定的 LLM 与领域专门化的增强模型是否可以组合，以实现单独任何一个模型都没有的能力？
RQ2CALM 是否在开启新任务的同时保留基础模型的个体能力？
RQ3CALM 在低资源语言翻译、带键值映射的算术、以及代码理解/生成等任务中的表现如何？

主要发现

模型	KV-Substitution	Numeric-Arithmetic	KV-MATH
m_A	98.1	4.2	0.7
m_B	0.0	73.7	0.0
m_A⊕B	92.9	72.0	84.3

组合模型 m_A⊕B 在关键任务上显著优于两个基础模型，例如在 KV-Substitution 和 KV-Arithmetic 的 KV-Arithmetic 准确率为 84.3%，而 m_B 为 0%。
在低资源语言翻译方面，CALM 明显提高了对两种基础模型的 FLORES-200 英语翻译指标，在多种语言上实现更高的平均分。
在代码相关任务中，相较于锚点，CALM 在代码补全和代码转文本/解释任务上取得了显著提升，且无需微调就增强了模型能力。
消融研究表明，将 m_A 替换为普通模型或随机模型会降低性能，强调收益来自 m_A 的专业知识以及 CALM 交互，而不仅仅是增加的参数。
与 LoRA 相比，CALM 展示了更优的任务迁移能力，并避免了在微调基础模型时观察到的灾难性遗忘。

Figure 2: Gains seen by the composed model $\mathbf{m}$ ${}_{\text{A}\oplus\text{B}}$ over the anchor model, $\mathbf{m}$ ${}_{\text{B}}$ , for the complete set of FLORES-200 languages. The languages are sorted from low to high-resource.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。