QUICK REVIEW

[논문 리뷰] The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks

Ziqian Zhong, Ziming Liu|arXiv (Cornell University)|2023. 06. 30.

Neural Networks and Applications인용 수 11

한 줄 요약

본 논문은 모듈러 덧셈으로 학습된 신경망이 아키텍처와 하이퍼파라미터에 따라 여러 알고리즘적 전략(Clock, Pizza 등)을 발견할 수 있으며, 신경망의 기계적 해법에서 알고리즘적 페이즈 전이를 드러낸다는 것을 보여준다.

ABSTRACT

Do neural networks, trained on well-understood algorithmic tasks, reliably rediscover known algorithms for solving those tasks? Several recent studies, on tasks ranging from group arithmetic to in-context linear regression, have suggested that the answer is yes. Using modular addition as a prototypical problem, we show that algorithm discovery in neural networks is sometimes more complex. Small changes to model hyperparameters and initializations can induce the discovery of qualitatively different algorithms from a fixed training set, and even parallel implementations of multiple such algorithms. Some networks trained to perform modular addition implement a familiar Clock algorithm; others implement a previously undescribed, less intuitive, but comprehensible procedure which we term the Pizza algorithm, or a variety of even more complex procedures. Our results show that even simple learning problems can admit a surprising diversity of solutions, motivating the development of new tools for characterizing the behavior of neural networks across their algorithmic phase space.

연구 동기 및 목표

하나의 표준 해답을 넘는 알고리즘 과제에서 신경망이 알고리즘을 발견하는 동기를 부여한다.
Clock과 Pizza 알고리즘이 서로 다른 하이퍼파라미터 하에서 비슷한 아키텍처에서 모두 등장할 수 있음을 보인다.
네트워크가 강건성을 위해 여러 알고리즘 변형을 병렬로 앙상블할 수 있음을 시연한다.
이 알고리즘들을 구분하고 알고리즘 공간에서의 페이즈 전이를 정량화하기 위한 척도를 도입한다.

제안 방법

모듈러 덧셈 모듈로 p(=59)에서 어텐션이 있는 경우와 없는 경우의 원-layer 트랜스포머를 학습시킨다.
PCA 투영 공간에서 원으로 특징화된 학습 임베딩을 이용해 Clock 동작을 식별한다.
Clock 대 Pizza를 구분하기 위한 지표로 그라디언트 대칭성(gradient symmetricity)과 거리 무관성(distance irrelevance)을 정의하고 계산한다.
축소된 부분공간하에서 임베딩 표현을 분석하기 위한 circle isolation을 도입한다.
아키텍처와 새로운 attention-rate 매개변수를 변경해 Clock와 Pizza 사이의 알고리즘 페이즈 전이를 매핑한다.

Figure 1: Illustration of the Clock and the Pizza Algorithm.

실험 결과

연구 질문

RQ1모듈러 덧셈으로 학습된 신경망이 Clock과 같은 익숙한 알고리즘을 재발견할 수 있는가, 아니면 조건에 따라 대안 전략이 나타나는가?
RQ2실제로 Clock와 Pizza를 구분하는 메커니즘(임베딩, 그라디언트)은 무엇인가?
RQ3어떤 아키텍처(어텐션 포함 여부)와 하이퍼파라미터가 학습되는 알고리즘에 어떤 영향을 미치는가?
RQ4네트워크가 여러 알고리즘적 전략을 앙상블하는가, 이를 어떻게 감지하고 분석할 수 있는가?

주요 결과

Clock와 Pizza는 유사한 네트워크에서 모듈러 덧셈에 대해 모두 실행 가능한 해법이다.
어텐션이 없는 네트워크(Clock-leaning)는 그라디언트 대칭성과 거리 무관 로그잇 패턴을 보이며 Pizza 유사한 동작을 시사한다.
Pizza 알고리즘은 임베딩의 평균화와 절댓값 연산에 의존하며, a-b에 의존하는 로그잇 패턴을 이끈다.
Clock 알고리즘은 원형 임베딩을 사용하고 a-b 의 의존성이 없으며, Pizza는 a-b에 의존하고 로그잇에 추가로 |cos((a-b)/2)| 요소가 나타난다.
Clock와 Pizza 사이에는 모델 폭과 어텐션 강도로 좌우되는 뚜렷한 알고리즘 페이즈 전이가 있으며, 앙상블은 입력 전반에 걸쳐 강건성을 드러낸다.

Figure 2: Gradients on first six principal components of input embeddings. $(a,b,c)$ in the title stands for taking gradients on the output logit $c$ for input $(a,b)$ . x and y axes represent the gradients for embeddings of the first and the second token. The dashed line $y=x$ signals a symmetric g

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.