QUICK REVIEW

[논문 리뷰] In-context Learning and Induction Heads

Catherine Olsson, Nelson Elhage|arXiv (Cornell University)|2022. 09. 24.

Domain Adaptation and Few-Shot Learning인용 수 84

한 줄 요약

논문은 induction heads가 트랜스포머의 in-context 학습의 기계적 원천임을 제시하며, 소형 모델에서 인과적 증거와 대형 모델에서 상관관계 증거를 제시한다. six 가지 보완적 증거 라인에서 이를 제시한다.

ABSTRACT

"Induction heads" are attention heads that implement a simple algorithm to complete token sequences like [A][B] ... [A] -> [B]. In this work, we present preliminary and indirect evidence for a hypothesis that induction heads might constitute the mechanism for the majority of all "in-context learning" in large transformer models (i.e. decreasing loss at increasing token indices). We find that induction heads develop at precisely the same point as a sudden sharp increase in in-context learning ability, visible as a bump in the training loss. We present six complementary lines of evidence, arguing that induction heads may be the mechanistic source of general in-context learning in transformer models of any size. For small attention-only models, we present strong, causal evidence; for larger models with MLPs, we present correlational evidence.

연구 동기 및 목표

induction heads가 [A][B] ... [A] -> [B]와 같은 토큰 시퀀스를 완성하는 간단한 알고리즘을 구현하는지 조사한다.
다양한 크기의 트랜스포머 모델에서 induction heads가 in-context 학습의 주요 메커니즘인지 검토한다.
induction heads와 in-context 학습 성능 사이의 인과관계 또는 상관관계 연결을 확립하기 위해 다수의 증거를 제시한다.

제안 방법

in-context 학습의 후보 메커니즘으로 induction heads를 식별한다.
induction heads를 in-context 학습과 연결하는 six 가지 보완적 증거 라인을 제시한다.
작은 주의만 있는 모델들에 대해, induction heads가 학습 현상을 주도한다는 인과적 증거를 제시한다.
MLP를 포함한 더 큰 모델들에 대해, 연결을 지지하는 상관관계 증거를 제시한다.
induction heads가 나타나는 시점이 손실 증가를 보이는 in-context 학습 능력의 급격한 증가와 같은 시점과 일치함을 보인다.
발견들을 종합하여 induction heads를 일반적인 in-context 학습의 기계적 원천으로 주장한다.

실험 결과

연구 질문

RQ1Do induction heads implement the core algorithm behind in-context learning in transformers?
RQ2Are induction heads causally responsible for observed in-context learning in small models and correlationally linked in larger models?
RQ3Do induction heads emerge at the same developmental stage as abrupt gains in in-context learning ability?
RQ4Do six lines of evidence coherently support a mechanistic role for induction heads across model scales?

주요 결과

Induction heads are associated with a sudden improvement in in-context learning as training loss exhibits a bump.
In small attention-only models, induction heads provide strong causal evidence for driving in-context learning.
In larger models that include MLPs, evidence is correlational but consistently aligns with the induction head mechanism.
The timing of induction head development coincides with the emergence of enhanced in-context learning ability.
Six complementary evidence lines collectively support induction heads as a general mechanism for in-context learning across transformer sizes.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.