Skip to main content
QUICK REVIEW

[论文解读] An analytic theory of generalization dynamics and transfer learning in deep linear networks

Andrew K. Lampinen, Surya Ganguli|arXiv (Cornell University)|Sep 27, 2018
Domain Adaptation and Few-Shot Learning参考文献 22被引用 38
一句话总结

本文通过奇异值分解(SVD)推导出权重动态的精确解,建立了一个关于深度线性网络泛化与迁移学习的解析理论。结果表明,迁移性能由源任务与目标任务的奇异向量及其奇异值之间的对齐程度决定,当源任务的奇异向量与目标任务对齐且奇异值保持不变时,迁移效果最优。

ABSTRACT

Much attention has been devoted recently to the generalization puzzle in deep learning: large, deep networks can generalize well, but existing theories bounding generalization error are exceedingly loose, and thus cannot explain this striking performance. Furthermore, a major hope is that knowledge may transfer across tasks, so that multi-task learning can improve generalization on individual tasks. However we lack analytic theories that can quantitatively predict how the degree of knowledge transfer depends on the relationship between the tasks. We develop an analytic theory of the nonlinear dynamics of generalization in deep linear networks, both within and across tasks. In particular, our theory provides analytic solutions to the training and testing error of deep networks as a function of training time, number of examples, network size and initialization, and the task structure and SNR. Our theory reveals that deep networks progressively learn the most important task structure first, so that generalization error at the early stopping time primarily depends on task structure and is independent of network size. This suggests any tight bound on generalization error must take into account task structure, and explains observations about real data being learned faster than random data. Intriguingly our theory also reveals the existence of a learning algorithm that proveably out-performs neural network training through gradient descent. Finally, for transfer learning, our theory reveals that knowledge transfer depends sensitively, but computably, on the SNRs and input feature alignments of pairs of tasks.

研究动机与目标

  • 开发一个严格的解析框架,以理解深度线性网络中的泛化与迁移学习。
  • 确定迁移学习在何种精确数学条件下可提升线性网络的性能。
  • 通过SVD分解,表征训练过程中权重动态与泛化及迁移之间的关系。
  • 量化奇异向量与奇异值在决定迁移有效性方面的作用。

提出的方法

  • 作者使用奇异值分解(SVD)对深度线性网络进行建模,将权重矩阵分解为正交矩阵与奇异值。
  • 通过基于SVD的参数化方法,推导出随机梯度下降过程中网络权重的精确动态。
  • 该方法分别分析左、右奇异向量与奇异值的演化过程,表明奇异向量会收敛至与数据相关的方向。
  • 通过将联合权重矩阵分解为对应于源任务与目标任务的块,并在任务特定组件之间施加正交性,分析迁移学习。
  • 通过相似性矩阵 Q = V̄_A^T V̄_B 量化迁移效应,该矩阵衡量源任务与目标任务右奇异向量之间的对齐程度。
  • 理论表明,迁移性能仅取决于 V̄_A 与 V̄_B 的对齐程度以及奇异值的保持,与初始的 U 矩阵无关。

实验结果

研究问题

  • RQ1源任务与目标任务的奇异向量与奇异值如何共同决定深度线性网络中的迁移性能?
  • RQ2在随机梯度下降下,深度线性网络的泛化误差的精确解析形式是什么?
  • RQ3权重矩阵中任务特定组件的正交性如何影响迁移学习?
  • RQ4在何种条件下,迁移学习可提升线性网络的泛化性能?
  • RQ5为何某些预训练策略在线性模型中能带来更好的迁移性能?

主要发现

  • 迁移性能仅由源任务与目标任务右奇异向量之间的对齐程度决定,该程度由相似性矩阵 Q = V̄_A^T V̄_B 衡量。
  • 左奇异向量(U 矩阵)不影响迁移性能,因为它们在任务之间保持正交且解耦。
  • 当源任务的奇异向量与目标任务的奇异向量对齐,且奇异值在任务间保持不变时,可实现最优迁移。
  • 泛化误差被精确推导并表明其依赖于奇异值以及奇异向量的对齐程度。
  • 该理论解释了为何某些预训练策略更有效:它们使源任务的奇异向量与目标任务的结构对齐。
  • 该模型预测,当源任务与目标任务的奇异向量正交时,无论奇异值大小如何,迁移均会失败。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。