QUICK REVIEW

[論文レビュー] LLM Augmented LLMs: Expanding Capabilities through Composition

Rachit Bansal, Bidisha Samanta|arXiv (Cornell University)|Jan 4, 2024

Topic Modeling被引用数 10

ひとこと要約

CALMはアンカーLLMと専門的な拡張モデルを、小さなクロスアテンションベースのインターフェースを学習することで組み合わせ、どちらのモデルの重みも変更せずに新しい能力を実現します。低リソース言語翻訳、KVマッピングによる四則演算推論、コード理解/生成などのタスクを改善します。

ABSTRACT

Foundational models with billions of parameters which have been trained on large corpora of data have demonstrated non-trivial skills in a variety of domains. However, due to their monolithic structure, it is challenging and expensive to augment them or impart new skills. On the other hand, due to their adaptation abilities, several new instances of these models are being trained towards new domains and tasks. In this work, we study the problem of efficient and practical composition of existing foundation models with more specific models to enable newer capabilities. To this end, we propose CALM -- Composition to Augment Language Models -- which introduces cross-attention between models to compose their representations and enable new capabilities. Salient features of CALM are: (i) Scales up LLMs on new tasks by 're-using' existing LLMs along with a few additional parameters and data, (ii) Existing model weights are kept intact, and hence preserves existing capabilities, and (iii) Applies to diverse domains and settings. We illustrate that augmenting PaLM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13\% on tasks like translation into English and arithmetic reasoning for low-resource languages. Similarly, when PaLM2-S is augmented with a code-specific model, we see a relative improvement of 40\% over the base model for code generation and explanation tasks -- on-par with fully fine-tuned counterparts.

研究の動機と目的

ファウンデーションモデルの効率的で実践的な組み合わせによって、ファインチューニングやデータ共有の制約なしに新しい能力を獲得する動機づけ。
凍結されたモデル間の小さな訓練可能な相互作用を学習することで、既存のモデルの再利用を可能にする。
言語包摂性、キーと値の写像を用いた算数、コード関連タスクなど、さまざまな領域でCALMをデモンストレーションする。

提案手法

CALMを導入する：凍結された2つのモデル（アンカー m_B と拡張 m_A）の選択層に対して、学習可能な小さなパラメータ集合を学習するフレームワーク。
m_A の表現を m_B の次元に写像する射影 f_proj を学習し、両モデル間のクロスアテンション層 f_cross を可能にする。
射影済み m_A のキー/バリューと m_B のクエリの間にクロスアテンションを組み込み、残差接続を次の層へ流す。
ターゲット組み合わせタスク C のために必要な共同の“結合スキル”を描くよう設計された小さなデータセット D_C を用いて、組成パラメータ Θ_C を学習する。
組成層 L_A と L_B を選択し、選択した層をまたいでクロスアテンション機構を反復適用する。
CALM が両モデルの既存の能力を維持しつつ、基盤重みを変更せずに新しい能力を可能にすることを示す。

実験結果

リサーチクエスチョン

RQ1アンカーLLMとドメイン特化の拡張モデルを組み合わせて、単独のどちらのモデルにも備わっていない能力を実現できるか？
RQ2CALMは新しいタスクを可能にしつつ、基盤モデルの個別能力を維持するか？
RQ3低リソース言語翻訳、キー・バリュー写像を用いた算術、コード理解/生成などのタスクでCALMはどのように性能を発揮するか？

主な発見

モデル	KV-置換	数値演算	KV-数学
m_A	98.1	4.2	0.7
m_B	0.0	73.7	0.0
m_A⊕B	92.9	72.0	84.3

組み合わせモデル m_A⊕B は主要タスクで両方の基礎モデルを大幅に上回る。例えば KV-Substitution および KV-Arithmetic で m_B の 0% に対して 84.3% の KV-Arithmetic 精度。
低リソース言語翻訳では、CALM は FLORES-200 の英語翻訳指標を両方の基礎モデルより大幅に改善し、複数言語でより高い平均を達成。
コード関連タスクでは、アンカーと比較してコード補完やコードからテキスト/説明タスクで意味のある向上を示し、ファインチューニングなしでモデル能力を拡張。
アブレーション研究は、m_A をそのままのモデルやランダムモデルに置き換えると性能が低下することを示しており、 gains が m_A の専門知識と CALM 相互作用から生じることを強調する。
LoRA と比較して、CALM はタスク転移に優れ、基盤モデルのファインチューニング時に観察される壊滅的忘却を回避する。

Figure 2: Gains seen by the composed model $\mathbf{m}$ ${}_{\text{A}\oplus\text{B}}$ over the anchor model, $\mathbf{m}$ ${}_{\text{B}}$ , for the complete set of FLORES-200 languages. The languages are sorted from low to high-resource.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。