QUICK REVIEW

[论文解读] Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth

Thao Nguyen, Maithra Raghu|arXiv (Cornell University)|May 3, 2021

Adversarial Robustness in Machine Learning参考文献 45被引用 91

一句话总结

本文研究了宽度和深度如何影响神经网络的表征，揭示了高容量模型隐藏表征中的特征块结构。当模型容量超过训练数据量时，该块结构出现，其反映了主成分的保留，导致不同架构间产生独特的表征，尽管整体准确率相似且块外共享特征。

ABSTRACT

A key factor in the success of deep neural networks is the ability to scale models to improve performance by varying the architecture depth and width. This simple property of neural network design has resulted in highly effective architectures for a variety of tasks. Nevertheless, there is limited understanding of effects of depth and width on the learned representations. In this paper, we study this fundamental question. We begin by investigating how varying depth and width affects model hidden representations, finding a characteristic block structure in the hidden representations of larger capacity (wider or deeper) models. We demonstrate that this block structure arises when model capacity is large relative to the size of the training set, and is indicative of the underlying layers preserving and propagating the dominant principal component of their representations. This discovery has important ramifications for features learned by different models, namely, representations outside the block structure are often similar across architectures with varying widths and depths, but the block structure is unique to each model. We analyze the output predictions of different model architectures, finding that even when the overall accuracy is similar, wide and deep models exhibit distinctive error patterns and variations across classes.

研究动机与目标

理解改变网络深度和宽度如何影响神经网络中学习到的表征。
调查在性能相近的情况下，宽网络和深网络是否学习到相似或不同的特征。
识别在模型容量增加时出现的隐藏表征中的结构模式。
分析宽网络和深网络在预测误差和类别级差异方面的不同。

提出的方法

分析在标准数据集上训练的深网络和宽网络的隐藏表征。
应用主成分分析（PCA）以识别在各层间保持不变的主导成分。
检测表征中的块结构，其中各层在深度方向上保持共享的主成分。
在控制准确率的前提下，比较不同宽度和深度模型的表征与预测结果。
测量模型输出中的错误模式和类别级差异，以评估泛化能力的差异。

实验结果

研究问题

RQ1宽度和深度在多大程度上影响神经网络中学习到的表征结构？
RQ2当模型容量超过训练数据集大小时，隐藏表征中会浮现何种结构模式？
RQ3宽网络和深网络在学习到的特征上有多大的共享或差异？
RQ4在整体准确率相近的情况下，宽网络和深网络在各类别上的预测误差如何变化？

主要发现

在高容量模型的隐藏表征中出现块结构，表明主导主成分在各层间被保留并传播。
当模型容量相对于训练集大小较大时，块结构出现，表明这是一种由容量驱动的表征现象。
块结构之外的表征在宽网络和深网络之间相似，表明在非主导成分上存在共享的特征学习。
块结构对每种模型架构都是独特的，意味着即使准确率相似，宽网络和深网络仍学习到不同的表征。
尽管整体准确率相似，宽网络和深网络在各类别上表现出不同的错误模式和差异，表明其归纳偏置不同。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。