QUICK REVIEW

[論文レビュー] The generalization error of random features regression: Precise asymptotics and double descent curve

Mei Song, Andrea Montanari|arXiv (Cornell University)|Aug 14, 2019

Random Matrices and Applications参考文献 31被引用数 248

ひとこと要約

この論文はランダム特徴量リッジ回帰の高次元漸近を正確に導出し、二重降下の一般化曲線を示し、最適なテスト誤差は正則化の有無に関係なく極端に過剰パラメータ化された領域で発生することを示す。

ABSTRACT

Deep learning methods operate in regimes that defy the traditional statistical mindset. Neural network architectures often contain more parameters than training samples, and are so rich that they can interpolate the observed labels, even if the latter are replaced by pure noise. Despite their huge complexity, the same architectures achieve small generalization error on real data. This phenomenon has been rationalized in terms of a so-called `double descent' curve. As the model complexity increases, the test error follows the usual U-shaped curve at the beginning, first decreasing and then peaking around the interpolation threshold (when the model achieves vanishing training error). However, it descends again as model complexity exceeds this threshold. The global minimum of the test error is found above the interpolation threshold, often in the extreme overparametrization regime in which the number of parameters is much larger than the number of samples. Far from being a peculiar property of deep neural networks, elements of this behavior have been demonstrated in much simpler settings, including linear regression with random covariates. In this paper we consider the problem of learning an unknown function over the $d$-dimensional sphere $\mathbb S^{d-1}$, from $n$ i.i.d. samples $(\boldsymbol x_i, y_i)\in \mathbb S^{d-1} imes \mathbb R$, $i\le n$. We perform ridge regression on $N$ random features of the form $σ(\boldsymbol w_a^{\mathsf T} \boldsymbol x)$, $a\le N$. This can be equivalently described as a two-layers neural network with random first-layer weights. We compute the precise asymptotics of the test error, in the limit $N,n,d o \infty$ with $N/d$ and $n/d$ fixed. This provides the first analytically tractable model that captures all the features of the double descent phenomenon without assuming ad hoc misspecification structures.

研究の動機と目的

ランダム特徴量回帰の非自明なノンパラメトリック設定における二重降下現象を動機づけ、分析する。
N/dとn/dが固定される比例的なレジーム（領域）におけるテスト誤差の厳密な漸近を計算する。
正則化と信号対ノイズ比が一般化と補間閾値の位置にどのように影響するかを特徴づける。

提案手法

活性化関数を sigma とする N 個のランダム特徴量に対してリッジ回帰として学習問題をモデル化し、d次元球面からの n 個のサンプルで訓練する。
N/n/d -> ∞ の極限で、N/d -> psi1 および n/d -> psi2 のとき、テスト誤差 R_RF の精密な漸近を導出する。
予測誤差を psi1, psi2, lambda およびデータ統計量の関数として、ブロック構造を持つランダム行列の Stieltjes 変換を介して表現する。
漸近極限において、ランダム特徴量とガウス共変量モデルの同値性を示し、直感を得る。
リッジレス極限（lambda -> 0）や高度に過剰パラメータ化された領域を含む特殊ケースの簡略化を提供する。
結果をカーネル視点に関連づけ、自己誘導正則化機構について論じる。

実験結果

リサーチクエスチョン

RQ1比例的高次元極限におけるランダム特徴量リッジ回帰の正確な漸近予測誤差はいくらになるのか。
RQ2モデルの複雑さ（N/d および n/d）と正則化（lambda）は、このノンパラメトリック設定でどのように相互作用して二重降下を生み出すのか。
RQ3高度に過剰パラメータ化した領域で、ランダム特徴量モデルが最適な一般化を示す条件は何か。
RQ4ガウス共変量の代理変数が、ランダム特徴量と同じ漸近的な一般化挙動を再現できるのか。
RQ5線形と非線形のターゲット関数は、テスト誤差の漸近特性にどのように影響するか？

主な発見

本論文は比例的なレジームにおけるテスト誤差の厳密な漸近を得て、二重降下現象のすべての特徴を捉えている。
臨界的な信号対ノイズ比を超えると、訓練誤差がほぼゼロとなる極端に過剰パラメータ化された補間器によって最小のテスト誤差が達成される。
正則化は SNR によって有益にも害にもなる可能性があり、臨界 SNR で最適な lambda が移動する相転換を特定した。
リッジレス極限（lambda -> 0）は、高度に過剰パラメータ化された領域で統計的に最適となるほぼ補間器をしばしば生む。
解析は、過補間閾値で分散とバイアスの両方がピークし得ること、およびノイズがゼロの設定でも二重降下が持続することを示す。
モデルは、特定のミススペシフィケーション仮定なしに最適な一般化が起こり得ること、そして適切な条件下で強い過剰パラメータ化が有益であることを示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。