QUICK REVIEW

[論文レビュー] Kernel and Rich Regimes in Overparametrized Models

Blake Woodworth, Suriya Gunasekar|arXiv (Cornell University)|Jun 13, 2019

Stochastic Gradient Optimization Techniques参考文献 33被引用数 66

ひとこと要約

本論文は過剰パラメータ化されたモデルにおける kernel (lazy) および rich (active) レジームを分析し、初期化スケールが遷移をどのように制御し、一般化に影響するかを示す。depth-D モデルの詳細な分析と経験的検証を含む。

ABSTRACT

A recent line of work studies overparametrized neural networks in the "kernel regime," i.e. when the network behaves during training as a kernelized linear predictor, and thus training with gradient descent has the effect of finding the minimum RKHS norm solution. This stands in contrast to other studies which demonstrate how gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms. Building on an observation by Chizat and Bach, we show how the scale of the initialization controls the transition between the "kernel" (aka lazy) and "rich" (aka active) regimes and affects generalization properties in multilayer homogeneous models. We also highlight an interesting role for the width of a model in the case that the predictor is not identically zero at initialization. We provide a complete and detailed analysis for a family of simple depth-$D$ models that already exhibit an interesting and meaningful transition between the kernel and rich regimes, and we also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.

研究の動機と目的

カーネル regime を超えた過剰パラメータ化されたニューラルネットワークの研究を動機づけ、初期化がレジームの挙動に影響することを探る。
多層均質モデルにおけるカーネルとリッチレジームの遷移を特徴づける。
単純な depth-D 系のモデルファミリに対する完全な分析を提供し、レジーム遷移を明らかにする。
行列因子分解モデルと多層ネットワークでレジーム遷移を実験的に実証する。

提案手法

過剰パラメータ化されたモデルにおいて初期化スケールが kernel 対 rich レジームの挙動を決定するという観察を活用する。
kernel–rich 遷移を捉えるため、depth-D のモデル系族に対する形式的分析を構築する。
勾配降下法が両レジームで一般化にどのように偏らせるかを、RKHS ノルムと非 RKHS 側の暗黙的バイアスを通じて分析する。
枠組みを行列因子分解と多層ネットワークへ拡張し、遷移の経験的証拠を示す。
初期化時に予測子が非ゼロである場合のモデル幅の役割を検討する。

実験結果

リサーチクエスチョン

RQ1初期化スケールはトレーニングが kernel (lazy) レジームか rich (active) レジームのどちらで動作するかにどのように影響するか？
RQ2深い均質モデルにおけるこの遷移が一般化に対して理論的にどのような影響を持つか？
RQ3初期化時に予測子が非ゼロである場合、モデル幅はレジーム挙動にどのように影響するか？
RQ4単純な depth-D モデルは kernel–to–rich ダイナミクスを反映する意味的な遷移を示し、より複雑なアーキテクチャでも観察可能か？
RQ5行列因子分解と多層ネットワークの経験的結果は提案された kernel–rich 遷移フレームワークと一致するか？

主な発見

初期化スケールは多層均質モデルにおける kernel と rich レジーム間の遷移を支配する。
モデル幅は predictor が初期化時にゼロと等しくない場合、レジーム挙動に影響を与える可能性がある。
単純な depth-D ファミリの完全な分析は意味のある kernel–rich 遷移を明らかにする。
行列因子分解モデルでの経験的デモは、より複雑なネットワークにおける遷移の存在を支持する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。