QUICK REVIEW

[论文解读] Insights on representational similarity in neural networks with canonical correlation

Ari S. Morcos, Maithra Raghu|arXiv (Cornell University)|Jun 14, 2018

Neural Networks and Applications被引用 104

一句话总结

本文提出投影加权CCA (PWCCA) 来区分神经表示中的信号与噪声，并利用其分析在不同条件下 CNN 和 RNN 如何收敛到相似或多样的表示，包括泛化与记忆、网络宽度和学习率。

ABSTRACT

Comparing different neural network representations and determining how representations evolve over time remain challenging open questions in our understanding of the function of neural networks. Comparing representations in neural networks is fundamentally difficult as the structure of representations varies greatly, even across groups of networks trained on identical tasks, and over the course of training. Here, we develop projection weighted CCA (Canonical Correlation Analysis) as a tool for understanding neural networks, building off of SVCCA, a recently proposed method (Raghu et al., 2017). We first improve the core method, showing how to differentiate between signal and noise, and then apply this technique to compare across a group of CNNs, demonstrating that networks which generalize converge to more similar representations than networks which memorize, that wider networks converge to more similar solutions than narrow networks, and that trained networks with identical topology but different learning rates converge to distinct clusters with diverse representations. We also investigate the representational dynamics of RNNs, across both training and sequential timesteps, finding that RNNs converge in a bottom-up pattern over the course of training and that the hidden state is highly variable over the course of a sequence, even when accounting for linear transforms. Together, these results provide new insights into the function of CNNs and RNNs, and demonstrate the utility of using CCA to understand representations.

研究动机与目标

推动一种鲁棒的方法来比较神经网络表示，超越表层对齐。
在训练过程中区分层激活中的信号与噪声。
描述泛化、模型宽度和学习率如何影响跨网络的表示相似性。
在训练和序列处理中探索 CNN 与 RNN 表征的时序动态。

提出的方法

开发投影加权CCA (PWCCA)，按它们对层输出的贡献来对典型相关进行加权。
通过早期和中期训练的比较，将信号与噪声分离，从而改进 SVCCA。
将 PWCCA 应用于在 CIFAR-10 上以真标签与随机标签训练的 CNN 组，以比较泛化与记忆。
分析网络宽度对收敛表示的影响。
研究 RNN 表征在训练时间和序列步长上的变化，以评估自底向上的收敛性以及序列间的变异性。

实验结果

研究问题

RQ1在相同数据下训练时，泛化的网络在收敛表示上与记忆的网络有何不同？
RQ2增加网络宽度是否会使不同随机初始化的网络的收敛表示更相似？
RQ3使用不同学习率训练的网络是否会收敛到不同的解簇，PWCCA 能否揭示它们？
RQ4RNN 表征在训练时间和序列时间步中如何演化？

主要发现

泛化的 CNN 组在后几层收敛到比记忆网络更相似的表示。
更宽的网络比更窄的网络收敛到更相似的表示。
在多次初始化和学习率下，网络收敛到可辨别的解簇，与先前基于消融的聚类结果一致。
RNN 表征在训练时间呈自底向上的收敛，序列步的表示差异相当大。
投影加权使 PWCCA 对噪声鲁棒，并在衡量共享结构方面优于未加权的均值 CCA 和基本 SVCCA。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。