QUICK REVIEW

[论文解读] Insights on representational similarity in neural networks with canonical correlation

Ari S. Morcos, Maithra Raghu|arXiv (Cornell University)|Jun 14, 2018

Neural Networks and Applications参考文献 16被引用 58

一句话总结

本文改进了投影加权典型相关分析（PWCCA），以区分神经表征中的信号与噪声，并利用它展示在泛化时卷积神经网络（CNNs）和循环神经网络（RNNs）如何收敛到相似的表征、宽度如何影响收敛，以及学习率与初始化如何形成不同的解簇。

ABSTRACT

Comparing different neural network representations and determining how representations evolve over time remain challenging open questions in our understanding of the function of neural networks. Comparing representations in neural networks is fundamentally difficult as the structure of representations varies greatly, even across groups of networks trained on identical tasks, and over the course of training. Here, we develop projection weighted CCA (Canonical Correlation Analysis) as a tool for understanding neural networks, building off of SVCCA, a recently proposed method (Raghu et al., 2017). We first improve the core method, showing how to differentiate between signal and noise, and then apply this technique to compare across a group of CNNs, demonstrating that networks which generalize converge to more similar representations than networks which memorize, that wider networks converge to more similar solutions than narrow networks, and that trained networks with identical topology but different learning rates converge to distinct clusters with diverse representations. We also investigate the representational dynamics of RNNs, across both training and sequential timesteps, finding that RNNs converge in a bottom-up pattern over the course of training and that the hidden state is highly variable over the course of a sequence, even when accounting for linear transforms. Together, these results provide new insights into the function of CNNs and RNNs, and demonstrate the utility of using CCA to understand representations.

研究动机与目标

研究在具有不同体系结构和训练历史的神经网络之间比较表征的挑战。
改进典型相关分析（CCA）方法，以在表征中把信号与噪声分离。
分析网络泛化、宽度和训练初始化如何影响 CNN 与 RNN 的收敛表征。

提出的方法

开发投影加权典型相关分析（PWCCA），通过它们对原始输出的贡献来对 CCA 分量进行加权。
通过比较早期训练与最终时刻的 CCA 向量，在表征中区分信号与噪声，并选择稳定分量（S）与不稳定分量（B）。
将 PWCCA 应用于具有相同拓扑但初始化和学习率不同的网络之间的表征比较。
使用 PWCCA 量化在 CIFAR-10 上训练的 CNN 各层之间的表征收敛，在真实标签与随机标签下（泛化 vs 记忆）。
在 RNNs（LSTMs）中通过训练时间和序列步长来检查表征动态，以评估自下而上的收敛以及序列时间步的可变性。

实验结果

研究问题

RQ1泛化的网络是否比记忆网络收敛到更相似的表征？
RQ2网络宽度如何影响跨网络组向相似表征的收敛？
RQ3具有相同拓扑但学习率不同的网络是否收敛到不同的解簇？
RQ4RNN 表征是否在整个训练过程中呈现自下而上的收敛，且序列时间步如何影响表征相似性？
RQ5在信号与噪声存在的情况下，PWCCA 能否比无权重的平均 SVCCA 更准确地反映有意义的共性结构？

主要发现

泛化网络在后期 CNN 层收敛到比记忆网络更相似的解。
更宽的网络比更窄的网络收敛到更相似的解，且更大的宽度与更高的测试准确度及更低的成对 PWCCA 距离相关。
在多种初始化和学习率下，网络收敛到可辨别的解簇，PWCCA 可识别，这些簇与消融鲁棒性发现的簇一致。
RNN 在训练时间中呈现自下而上的收敛，PWCCA 比无权重指标显示出更尖锐的收敛，而跨序列时间步的表征以非线性方式变化。
PWCCA 在对信号与噪声比的鲁棒性方面优于无权重的平均 SVCCA，显示了训练过程中的稳定信号方向。
收敛到相似解的网络往往具有更高的泛化性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。