QUICK REVIEW

[论文解读] Identity Crisis: Memorization and Generalization under Extreme Overparameterization

Chiyuan Zhang, Samy Bengio|arXiv (Cornell University)|Feb 13, 2019

Generative Adversarial Networks and Image Synthesis参考文献 36被引用 34

一句话总结

该论文研究在单个样本训练以学习同一映射时，超参数化神经网络是如何记忆还是泛化的，对比 FCN 与 CNN 架构并强调架构相关的归纳偏置。

ABSTRACT

We study the interplay between memorization and generalization of overparameterized networks in the extreme case of a single training example and an identity-mapping task. We examine fully-connected and convolutional networks (FCN and CNN), both linear and nonlinear, initialized randomly and then trained to minimize the reconstruction error. The trained networks stereotypically take one of two forms: the constant function (memorization) and the identity function (generalization). We formally characterize generalization in single-layer FCNs and CNNs. We show empirically that different architectures exhibit strikingly different inductive biases. For example, CNNs of up to 10 layers are able to generalize from a single example, whereas FCNs cannot learn the identity function reliably from 60k examples. Deeper CNNs often fail, but nonetheless do astonishing work to memorize the training output: because CNN biases are location invariant, the model must progressively grow an output pattern from the image boundaries via the coordination of many layers. Our work helps to quantify and visualize the sensitivity of inductive biases to architectural choices such as depth, kernel width, and number of channels.

研究动机与目标

研究极端过参数化网络中记忆与泛化之间的平衡。
检验架构（FCN 与 CNN、深度、核大小）在单样本身份任务下对归纳偏置的影响。
量化并可视化架构选择如何影响逼近恒等函数的能力。
为简化情形提供形式化结果，并在不同网络深度与配置下给出经验见解。

提出的方法

在单个训练示例和恒等映射目标下研究一个高度过参数化的设置。
训练各种架构（线性和非线性、全连接和卷积）以最小化重建误差。
为单层 FCNs 和 CNNs 提供有关它们在未见数据上的预测的理论特征化。
系统性地改变架构超参数（深度、核宽度、通道）和初始化，以观察对记忆化或泛化的偏向。
使用定性可视化和定量相关性来比较预测与恒等函数及常数函数的差异。

实验结果

研究问题

RQ1极端过参数化在单样本下如何影响记忆与泛化在 FCN 与 CNN 架构中的表现？
RQ2哪些架构因素（深度、核大小、通道数）会使模型偏向于恒等函数或常数函数？
RQ3我们能否在单样本学习下形式化简单网络的行为，并在 CNN 中意外观察到泛化而在 FCN 中出现记忆？
RQ4训练动力学和初始化方案如何塑造深度过参数化模型的归纳偏置？
RQ5哪些界限或定性解释可以类似于单层 CNN 的恒等学习及其对补丁秩的依赖？

主要发现

CNN 可以从单个样例泛化到若干层，而 FCN 往往记忆或在未见数据上输出随机结果。
更深的线性网络偏向常数函数，而较浅的网络在训练区域之外可能类似随机噪声。
CNN 显示出依赖架构的偏置；中等深度的 CNN 可以逼近恒等，而非常深的 CNN 趋向于记忆训练输出。
理论结果表明，一层 FCN 预测输出主要由沿训练样本的投影加上正交方向的随机分量主导，从而解释记忆行为。
对于 CNN，均方误差界限取决于参数数量、通道数、感受野以及局部输入补丁的秩，指示容量与学习恒等之间的权衡。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。