Skip to main content
QUICK REVIEW

[论文解读] Kernel and Rich Regimes in Overparametrized Models

Blake Woodworth, Suriya Gunasekar|arXiv (Cornell University)|Jun 13, 2019
Stochastic Gradient Optimization Techniques参考文献 33被引用 66
一句话总结

本文分析 overparameterized 模型中的 kernel (lazy) 与 rich (active) 两种范式,展示初始化尺度如何控制转变并影响泛化性,对 depth-D 模型进行了详细分析并进行了实证验证。

ABSTRACT

A recent line of work studies overparametrized neural networks in the "kernel regime," i.e. when the network behaves during training as a kernelized linear predictor, and thus training with gradient descent has the effect of finding the minimum RKHS norm solution. This stands in contrast to other studies which demonstrate how gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms. Building on an observation by Chizat and Bach, we show how the scale of the initialization controls the transition between the "kernel" (aka lazy) and "rich" (aka active) regimes and affects generalization properties in multilayer homogeneous models. We also highlight an interesting role for the width of a model in the case that the predictor is not identically zero at initialization. We provide a complete and detailed analysis for a family of simple depth-$D$ models that already exhibit an interesting and meaningful transition between the kernel and rich regimes, and we also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.

研究动机与目标

  • Motivate the study of overparametrized neural networks beyond the kernel regime and explore how initialization affects regime behavior.
  • Characterize the transition between kernel and rich regimes in multilayer homogeneous models.
  • Provide a complete analysis for a family of simple depth-D models to reveal regime transitions.
  • Demonstrate the regime transition experimentally on matrix factorization models and multilayer networks.

提出的方法

  • Leverage the observation that initialization scale determines kernel versus rich regime behavior in overparameterized models.
  • Develop a formal analysis for a family of depth-D models to capture the kernel–rich transition.
  • Analyze how gradient descent biases generalization in both regimes via RKHS norms and non-RKHS implicit biases.
  • Extend the framework to matrix factorization and multilayer networks to show empirical evidence of the transition.
  • Examine the role of model width when the predictor is nonzero at initialization.

实验结果

研究问题

  • RQ1How does initialization scale influence whether training operates in the kernel (lazy) or rich (active) regime?
  • RQ2What are the theoretical implications of this transition for generalization in deep homogeneous models?
  • RQ3How does model width affect regime behavior when the predictor is nonzero at initialization?
  • RQ4Do simple depth-D models exhibit a meaningful transition that reflects kernel-to-rich dynamics, and can this be observed in more complex architectures?
  • RQ5Do empirical results on matrix factorization and multilayer networks align with the proposed kernel–rich transition framework?

主要发现

  • Initialization scale controls the transition between kernel and rich regimes in multilayer homogeneous models.
  • The width of a model can influence regime behavior when the predictor is not identically zero at initialization.
  • A complete analysis for a simple depth-D family of models reveals a meaningful kernel–rich transition.
  • Empirical demonstrations on matrix factorization models support the existence of the transition in more complex networks.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。