QUICK REVIEW

[論文レビュー] In-context Learning and Induction Heads

Catherine Olsson, Nelson Elhage|arXiv (Cornell University)|Sep 24, 2022

Domain Adaptation and Few-Shot Learning被引用数 84

ひとこと要約

本論文は、induction heads がトランスフォーマーにおける文脈内学習の機械的源泉であると提案し、小規模なモデルで因果証拠を、大規模なモデルで相関証拠を、6つの補完的な証拠ラインにわたって提供します。

ABSTRACT

"Induction heads" are attention heads that implement a simple algorithm to complete token sequences like [A][B] ... [A] -> [B]. In this work, we present preliminary and indirect evidence for a hypothesis that induction heads might constitute the mechanism for the majority of all "in-context learning" in large transformer models (i.e. decreasing loss at increasing token indices). We find that induction heads develop at precisely the same point as a sudden sharp increase in in-context learning ability, visible as a bump in the training loss. We present six complementary lines of evidence, arguing that induction heads may be the mechanistic source of general in-context learning in transformer models of any size. For small attention-only models, we present strong, causal evidence; for larger models with MLPs, we present correlational evidence.

研究の動機と目的

誘導ヘッドが[A][B] ... [A] -> [B] のようなトークン列を完成させる単純なアルゴリズムを実装しているかを調べる。
誘導ヘッドがさまざまなサイズのトランスフォーマーモデルにおける文脈内学習の主要な機構であるかを検討する。
誘導ヘッドと文脈内学習パフォーマンスとの間に因果的または相関的な関係を確立するための複数の証拠を提供する。

提案手法

文脈内学習の候補機構として induction heads を特定する。
induction heads を文脈内学習に結びつける六つの相補的な証拠ラインを提示する。
小規模な注意のみモデルにおいて、induction heads が学習現象を駆動する因果証拠を提供する。
MLP を含むより大きなモデルでは、関係をサポートする相関的証拠を提示する。
induction head の発展が、文脈内学習能力の鋭い増加として現れる損失の急増と同じ時期に発展することを示す。
発見を総合して、induction heads を一般的な文脈内学習の機械的源泉であると主張する。

実験結果

リサーチクエスチョン

RQ1induction heads はトランスフォーマーにおける文脈内学習の核アルゴリズムを実装しているか。
RQ2induction heads は小規模モデルで観察される文脈内学習に因果的に責任があるか、そして大規模モデルでは相関的に関連しているか。
RQ3induction heads は文脈内学習能力の急激な gains と同じ発達段階で出現するか。
RQ46つの証拠ラインは、モデル規模を超えて induction heads の機械的役割を一貫して支持しているか。

主な発見

induction heads は、学習損失がブームを示すときに文脈内学習の急激な改善と関連している。
小さな注意のみモデルでは、induction heads が文脈内学習を駆動する強力な因果証拠を提供する。
MLP を含むより大きなモデルでは、証拠は相関的だが一貫して induction head 機構と一致する。
induction head の発生時期は、強化された文脈内学習能力の出現と一致する。
六つの補完的な証拠ラインは、トランスフォーマーのサイズを超えて induction heads を文脈内学習の一般的な機構として支持する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。