QUICK REVIEW

[論文レビュー] The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization

Ben Adlam, Jeffrey Pennington|arXiv (Cornell University)|Aug 15, 2020

Stochastic Gradient Optimization Techniques被引用数 35

ひとこと要約

本論文は高次元ニューラルネットワークに対する NTK（Neural Tangent Kernel）でのカーネル回帰を分析し、複数のパラメータ化スケールにまたがる非単調な一般化挙動（トリプルディセントを含む可能性）を示している。

ABSTRACT

Modern deep learning models employ considerably more parameters than required to fit the training data. Whereas conventional statistical wisdom suggests such models should drastically overfit, in practice these models generalize remarkably well. An emerging paradigm for describing this unexpected behavior is in terms of a \emph{double descent} curve, in which increasing a model's capacity causes its test error to first decrease, then increase to a maximum near the interpolation threshold, and then decrease again in the overparameterized regime. Recent efforts to explain this phenomenon theoretically have focused on simple settings, such as linear regression or kernel regression with unstructured random features, which we argue are too coarse to reveal important nuances of actual neural networks. We provide a precise high-dimensional asymptotic analysis of generalization under kernel regression with the Neural Tangent Kernel, which characterizes the behavior of wide neural networks optimized with gradient descent. Our results reveal that the test error has non-monotonic behavior deep in the overparameterized regime and can even exhibit additional peaks and descents when the number of parameters scales quadratically with the dataset size.

研究の動機と目的

過剰parameter化されたニューラルネットワークが古典的な領域を超えて一般化する理由を動機づけ、理解する。
広い teacher ネットワークに対する NTK リッジ回帰の高次元における厳密な漸近解析を提供する。
線形および二次の遷移を含む、テスト誤差が非単調に振る舞う複数のパラメータ化スケールを特定する。
NTKを層ごとのカーネルに分解して、非単調な一般化の根源を特定する。
勾配降下法で訓練された有限サイズネットワークにおけるトリプルディセントの実証的証拠を提供する。

提案手法

1つの隠れ層ネットワークの Neural Tangent Kernel を用いたカーネルリッジ回帰で学習タスクをモデル化する。
NTKを2つの層ごとのカーネル K1 および K2 に分解し、それぞれの寄与を解析する。
固定比 φ=n0/m および ψ=n0/n1 を用いて、m サンプル、n0 特徴、n1 隠れユニットの高次元リミットを導出する。
ガウス同値による非線形ランダム特徴行列を線形化して扱いやすい表現を得る。
リニアペンシルとランダムマトリックス技術を用いて、テスト誤差をカーネルの逆数の有理関数として表現する。
E_train および E_test の厳密な漸近公式を提供し、極限領域を解析する。）

実験結果

リサーチクエスチョン

RQ1パラメータ数 p が m および m^2 とスケールする場合、高次元で NTK リッジ回帰はどのように一般化するか？
RQ2過剰パラメータ化領域の深部でテスト誤差の非単調性は生じるか、またそれを生じさせるスケールは線形か二次か？
RQ3一般化に対する第一層と第二層のカーネルの相対寄与はどの程度か？
RQ4NTK レジームは有限幅ネットワークでトリプルディセントおよび多スケールの学習曲線を示せるか？
RQ5NTK 回帰とその極限ケースでの訓練誤差とテスト誤差の漸近表現は何か？

主な発見

テスト誤差は過剰パラメータ化領域の深部で非単調な振る舞いを示す。
非単調性は、p がデータサイズ m と二次的にスケールする場合に生じ、持続する可能性がある（p ~ m^2）。
非単調性は主に第二層の重みに関連するカーネル（K2）に起因する。
大幅な幅（超豊富）領域では、ノイズなしの場合 E_test が m^−2、有限SNRでは約 m^−1 のスケールで非常に速くなる。
トリプルディセントと多スケール現象は、理論解析と有限ネットワークでの実証的証拠によって裏付けられている。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。