QUICK REVIEW

[논문 리뷰] Insights on representational similarity in neural networks with canonical correlation

Ari S. Morcos, Maithra Raghu|arXiv (Cornell University)|2018. 06. 14.

Neural Networks and Applications인용 수 104

한 줄 요약

이 논문은 신경 표현에서 신호와 노이즈를 구분하기 위한 projection weighted CCA(PWCCA)를 도입하고, 이를 이용해 CNN과 RNN이 일반화와 기억화, 네트워크 폭, 학습률 등 다양한 조건 하에서 비슷하거나 다양한 표현으로 수렴하는지 분석한다.

ABSTRACT

Comparing different neural network representations and determining how representations evolve over time remain challenging open questions in our understanding of the function of neural networks. Comparing representations in neural networks is fundamentally difficult as the structure of representations varies greatly, even across groups of networks trained on identical tasks, and over the course of training. Here, we develop projection weighted CCA (Canonical Correlation Analysis) as a tool for understanding neural networks, building off of SVCCA, a recently proposed method (Raghu et al., 2017). We first improve the core method, showing how to differentiate between signal and noise, and then apply this technique to compare across a group of CNNs, demonstrating that networks which generalize converge to more similar representations than networks which memorize, that wider networks converge to more similar solutions than narrow networks, and that trained networks with identical topology but different learning rates converge to distinct clusters with diverse representations. We also investigate the representational dynamics of RNNs, across both training and sequential timesteps, finding that RNNs converge in a bottom-up pattern over the course of training and that the hidden state is highly variable over the course of a sequence, even when accounting for linear transforms. Together, these results provide new insights into the function of CNNs and RNNs, and demonstrate the utility of using CCA to understand representations.

연구 동기 및 목표

신경망 표현을 피상적 정렬을 넘어서 비교하기 위한 강력한 방법의 동기를 부여한다.
학습 중 층 활성화에서 신호와 노이즈를 구분한다.
일반화, 모델 폭, 학습률이 네트워크 간 표현 유사성에 어떤 영향을 미치는지 특성화한다.
학습 및 시퀀스 처리 중 CNN과 RNN의 표현의 시간적 다이내믹스를 탐구한다.

제안 방법

projection weighted CCA(PWCCA)를 개발하여 계층 출력에 대한 기여도에 따라 고전 상관관계를 가중한다.
초기 및 중간 학습 비교를 통해 신호와 노이즈를 분리하여 SVCCA를 개선한다.
PWCCA를 CIFAR-10에서 true 라벨과 random 라벨로 학습된 CNN 집단에 적용하여 일반화 vs 기억화를 비교한다.
수렴된 표현에 대한 네트워크 폭의 영향을 분석한다.
학습 시간과 시퀀스 단계에 걸친 RNN 표현을 연구하여 하향식 수렴과 시퀀스 전반의 변화성을 평가한다.

실험 결과

연구 질문

RQ1같은 데이터로 학습될 때 일반화하는 네트워크와 기억화하는 네트워크의 수렴 표현은 어떻게 다를까?
RQ2네트워크 폭 증가가 독립적으로 초기화된 네트워크들 간의 수렴 표현을 더 비슷하게 만들까?
RQ3서로 다른 학습률로 학습된 네트워크가 서로 다른 해의 군집으로 수렴하는가, PWCCA가 이를 밝힐 수 있는가?
RQ4학습 시간 및 시퀀스 타임스텝에 걸쳐 RNN 표현은 어떻게 진화하는가?

주요 결과

일반화하는 CNN 그룹은 기억화 네트워크보다 나중 계층에서 더 유사한 표현으로 수렴한다.
더 넓은 네트워크는 더 좁은 네트워크보다 더 유사한 표현으로 수렴한다.
다수의 초기화와 학습률에 걸쳐 네트워크는 이전 제거 기반 클러스터링 결과와 일치하는 구별 가능한 해의 군집으로 수렴한다.
RNN은 학습 시간에 따른 표현의 하향식 수렴을 보이고, 시퀀스 단계별 표현은 상당히 달라진다.
투영 가중은 PWCCA를 노이즈에 강하게 만들고 공유 구조를 측정하는 데 있어 가중치를 두지 않은 평균 CCA 및 기본 SVCCA보다 우수하다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.