QUICK REVIEW

[論文レビュー] Insights on representational similarity in neural networks with canonical correlation

Ari S. Morcos, Maithra Raghu|arXiv (Cornell University)|Jun 14, 2018

Neural Networks and Applications参考文献 16被引用数 58

ひとこと要約

本論文は projection Weighted Canonical Correlation Analysis (PWCCA) を改良し、ニューラル表現における信号とノイズを区別できるようにする。さらにこれを用いて、CNNとRNNが一般化時に類似の表現へ収束する様子、幅が収束に与える影響、学習率と初期化が解のクラスターをどう形成するかを示す。

ABSTRACT

Comparing different neural network representations and determining how representations evolve over time remain challenging open questions in our understanding of the function of neural networks. Comparing representations in neural networks is fundamentally difficult as the structure of representations varies greatly, even across groups of networks trained on identical tasks, and over the course of training. Here, we develop projection weighted CCA (Canonical Correlation Analysis) as a tool for understanding neural networks, building off of SVCCA, a recently proposed method (Raghu et al., 2017). We first improve the core method, showing how to differentiate between signal and noise, and then apply this technique to compare across a group of CNNs, demonstrating that networks which generalize converge to more similar representations than networks which memorize, that wider networks converge to more similar solutions than narrow networks, and that trained networks with identical topology but different learning rates converge to distinct clusters with diverse representations. We also investigate the representational dynamics of RNNs, across both training and sequential timesteps, finding that RNNs converge in a bottom-up pattern over the course of training and that the hidden state is highly variable over the course of a sequence, even when accounting for linear transforms. Together, these results provide new insights into the function of CNNs and RNNs, and demonstrate the utility of using CCA to understand representations.

研究の動機と目的

異なるアーキテクチャと学習履歴を持つニューラルネットワーク間で表現を比較する難題を調査する。
表現中の信号とノイズを分離するために、典型的相関分析（CCA）のアプローチを改良する。
CNNとRNNにおける一般化、幅、学習初期化が収束した表現に与える影響を分析する。

提案手法

元の出力への寄与度でCCA成分を重み付けする Projection Weighted Canonical Correlation Analysis (PWCCA) を開発する。
初期訓練時と最終時のCCAベクトルを比較して表現中の信号とノイズを区別し、安定成分（S）と不安定成分（B）を選択する。
同一トポロジーで初期化と学習率が異なるネットワーク間で表現を比較するために PWCCA を適用する。
CIFAR-10 で真ラベルとランダムラベルの下で訓練された CNN の層間表現の収束を PWCCA で定量化する（一般化 vs メモリ化）。
訓練時間とシーケンスステップ全体で RNN（LSTMs）の表現ダイナミクスを検討し、ボトムアップ収束とシーケンス・タイムステップの変動を評価する。

実験結果

リサーチクエスチョン

RQ1一般化するネットワークは、 memorization を行うネットワークよりも類似した表現へ収束するのか。
RQ2ネットワークの幅が、ネットワーク群全体で類似表現への収束にどう影響するか。
RQ3同一トポロジーだが学習率が異なるネットワークは、異なる解のクラスターへ収束するのか？
RQ4RNN 表現は訓練を通じてボトムアップの収束を示すのか、またシーケンスのタイムステップは表現の類似度にどう影響するのか？
RQ5信号とノイズの存在下で、PWCCA は重み付けなしの平均 SVCCA よりも意味のある共有構造をより正確に反映できるか？

主な発見

一般化するネットワークは、後半の CNN 層で memorizing ネットワークよりも類似した解へ収束する。
より広いネットワークは、狭いネットワークよりも類似した解へ収束し、より大きい幅はテスト精度の向上とペアワイズ PWCCA 距離の低減と相関する。
多数の初期化と学習率にわたって、PWCCA で識別可能な解のクラスターへ収束し、それらはアブレーションの頑健性で見つかったクラスターと一致する。
RNN は訓練時間を通じてボトムアップ収束を示し、PWCCA は無加重指標より鋭い収束を強調する一方、シーケンスのタイムステップ間で表現は非線形に変化する。
PWCCA は信号ノイズ比への頑健性において無加重平均SVCCAより優れており、訓練を通じて安定した信号方向を示す。
類似した解へ収束するネットワークは、一般化性能が高い傾向にある。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。