[论文解读] Critical Learning Periods in Deep Neural Networks
该论文表明深度网络存在关键学习期,在此期间的暂时性缺陷会削弱性能;通过Fisher信息分析揭示两个训练阶段和信息可塑性;并讨论对迁移学习和表示鲁棒性的影响。
Similar to humans and animals, deep artificial neural networks exhibit critical periods during which a temporary stimulus deficit can impair the development of a skill. The extent of the impairment depends on the onset and length of the deficit window, as in animal models, and on the size of the neural network. Deficits that do not affect low-level statistics, such as vertical flipping of the images, have no lasting effect on performance and can be overcome with further training. To better understand this phenomenon, we use the Fisher Information of the weights to measure the effective connectivity between layers of a network during training. Counterintuitively, information rises rapidly in the early phases of training, and then decreases, preventing redistribution of information resources in a phenomenon we refer to as a loss of "Information Plasticity". Our analysis suggests that the first few epochs are critical for the creation of strong connections that are optimal relative to the input data distribution. Once such strong connections are created, they do not appear to change during additional training. These findings suggest that the initial learning transient, under-scrutinized compared to asymptotic behavior, plays a key role in determining the outcome of the training process. Our findings, combined with recent theoretical results in the literature, also suggest that forgetting (decrease of information in the weights) is critical to achieving invariance and disentanglement in representation learning. Finally, critical periods are not restricted to biological systems, but can emerge naturally in learning systems, whether biological or artificial, due to fundamental constrains arising from learning dynamics and information processing.
研究动机与目标
- 激发研究深度神经网络在早期训练阶段的动态,类比生物学中的关键期。
- 研究暂时性的感知缺陷如何影响深度神经网络的最终性能。
- 利用Fisher信息量量化训练过程中网络层之间连接性的演变。
- 将早期记忆与后期泛化联系起来,并探讨遗忘对表示不变性的潜在好处。
提出的方法
- 在早期训练轮次中使用图像退化缺陷(如模糊)来在训练CIFAR-10和MNIST的卷积神经网络中诱导关键期。
- 通过可处理的迹基估计器估计权重的Fisher信息矩阵(FIM),以评估分层连接性。
- 将信息在训练中跨各层重新分配定义为信息可塑性并进行度量。
- 比较不同架构、优化器和数据分布,以评估关键期现象的鲁棒性。
- 使用滑动窗口方法分析缺陷时机与持续时间如何与敏感性相关。
- 将FIM动态与损失平面中的瓶颈及记忆/遗忘阶段关联起来。
实验结果
研究问题
- RQ1在训练过程中遭遇暂时性缺陷时,深度神经网络会表现出关键学习期吗?
- RQ2缺陷的时机和持续时间如何影响不同架构和数据集的最终性能?
- RQ3Fisher信息动态与网络对缺陷的敏感性(信息可塑性)之间的关系是什么?
- RQ4信息的逐层重组能否解释观察到的关键期并帮助解读迁移学习效应?
主要发现
- 深度神经网络显示出关键期:在一个窗口期内移除缺陷(大约40–60个训练轮次)会永久性地削弱最终性能。
- 当在早期引入模糊缺陷时,最终准确性下降更明显;在早期快速学习阶段达到敏感性峰值。
- Fisher信息在早期上升,随后在巩固阶段下降,映射出记忆随后遗忘/重新组织的阶段。
- 缺陷敏感性与全局及逐层Fisher信息一致,表明在缺陷下信息可塑性丧失。
- 逐层分析显示缺陷将依赖转移到更高层次,早期移除允许朝向中间层的部分重新组织。
- 关键期在多种架构(All-CNN、ResNet、MNIST、CIFAR-10)和优化方案(SGD、Adam)中持续存在,尽管形状和持续时间随深度和超参数而异。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。