QUICK REVIEW

[論文レビュー] Insights on representational similarity in neural networks with canonical correlation

Ari S. Morcos, Maithra Raghu|arXiv (Cornell University)|Jun 14, 2018

Neural Networks and Applications被引用数 104

ひとこと要約

This paper introduces projection weighted CCA (PWCCA) to distinguish signal from noise in neural representations and uses it to analyze how CNNs and RNNs converge to similar or diverse representations under various conditions, including generalization vs memorization, network width, and learning rate.

ABSTRACT

Comparing different neural network representations and determining how representations evolve over time remain challenging open questions in our understanding of the function of neural networks. Comparing representations in neural networks is fundamentally difficult as the structure of representations varies greatly, even across groups of networks trained on identical tasks, and over the course of training. Here, we develop projection weighted CCA (Canonical Correlation Analysis) as a tool for understanding neural networks, building off of SVCCA, a recently proposed method (Raghu et al., 2017). We first improve the core method, showing how to differentiate between signal and noise, and then apply this technique to compare across a group of CNNs, demonstrating that networks which generalize converge to more similar representations than networks which memorize, that wider networks converge to more similar solutions than narrow networks, and that trained networks with identical topology but different learning rates converge to distinct clusters with diverse representations. We also investigate the representational dynamics of RNNs, across both training and sequential timesteps, finding that RNNs converge in a bottom-up pattern over the course of training and that the hidden state is highly variable over the course of a sequence, even when accounting for linear transforms. Together, these results provide new insights into the function of CNNs and RNNs, and demonstrate the utility of using CCA to understand representations.

研究の動機と目的

表面的な整列を超えてニューラルネットワークの表現を比較する堅牢な方法を動機づける。
訓練中の層活性化における信号とノイズを識別する。
一般化、モデル幅、および学習率がネットワーク間の表現類似性にどのように影響するかを特徴づける。
訓練およびシーケンス処理中のCNNとRNNの表現の時間的ダイナミクスを探る。

提案手法

projection weighted CCA (PWCCA) を開発し、正準相関を層の出力への寄与度で重み付ける。
早期および中期の訓練比較を通じて信号とノイズを分離することにより、 SVCCA を改良する。
CIFAR-10 で true ラベルと random ラベルを用いて訓練された CNN のグループに PWCCA を適用し、一般化と memorization の比較を行う。
収束表現に対するネットワーク幅の影響を分析する。
PWCCA を適用して、CIFAR-10 で true ラベルと random ラベルを用いて訓練された CNN のグループを比較し、一般化と memorization の比較を行う。
RNN の表現は訓練時間とシーケンスステップをまたいで下向きの収束を示し、シーケンス間での表現のばらつきが顕著である。
projection weighting により PWCCA はノイズに対して頑健となり、共有構造を測定する際に、重み付けなしの平均CCAおよび基本SVCCAよりも優れている。

実験結果

リサーチクエスチョン

RQ1同じデータで訓練した場合、一般化するネットワークは memorization するネットワークと比べて収束表現がどう異なるか。
RQ2独立に初期化されたネットワーク間で、ネットワーク幅を広くすると収束表現がより類似するか。
RQ3異なる学習率で訓練されたネットワークは異なる解のクラスターへ収束するか、そして PWCCA はそれらを明らかにできるか。
RQ4RNN の表現は訓練時間とシーケンスの timestep をまたいでどのように変化するか。

主な発見

一般化する CNN のグループは、記憶化ネットワークよりも後半の層でより類似した表現へ収束する。
幅の広いネットワークは、狭いネットワークよりもより類似した表現へ収束する。
多数の初期化と学習率にわたって、ネットワークは先行するアブレーションベースのクラスタリング結果と一致する識別可能な解のクラスターへ収束する。
RNN は訓練時間を通じて表現のボトムアップ収束を示し、シーケンスステップごとの表現は大きく変動する。
プロジェクション重み付けにより PWCCA はノイズに対して頑健となり、共有構造を測定する際に、重み付けなしの平均CCAおよび基本SVCCAよりも優れている。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。