QUICK REVIEW

[論文レビュー] Double Trouble in Double Descent : Bias and Variance(s) in the Lazy Regime

Stéphane d’Ascoli, Maria Refinetti|arXiv (Cornell University)|Mar 2, 2020

Stochastic Gradient Optimization Techniques参考文献 52被引用数 53

ひとこと要約

この論文は、ランダム特徴回帰を用いた遅い学習 regime におけるダブルディセント現象の定量的理論を構築し、テスト誤差をバイアスと複数の分散源に分解し、アンサンブルが補間閾値での過適合ピークをどのように抑制するかを示す。

ABSTRACT

Deep neural networks can achieve remarkable generalization performances while interpolating the training data perfectly. Rather than the U-curve emblematic of the bias-variance trade-off, their test error often follows a "double descent" - a mark of the beneficial role of overparametrization. In this work, we develop a quantitative theory for this phenomenon in the so-called lazy learning regime of neural networks, by considering the problem of learning a high-dimensional function with random features regression. We obtain a precise asymptotic expression for the bias-variance decomposition of the test error, and show that the bias displays a phase transition at the interpolation threshold, beyond which it remains constant. We disentangle the variances stemming from the sampling of the dataset, from the additive noise corrupting the labels, and from the initialization of the weights. Following up on Geiger et al. 2019, we first show that the latter two contributions are the crux of the double descent: they lead to the overfitting peak at the interpolation threshold and to the decay of the test error upon overparametrization. We then quantify how they are suppressed by ensemble averaging the outputs of K independently initialized estimators. When K is sent to infinity, the test error remains constant beyond the interpolation threshold. We further compare the effects of overparametrizing, ensembling and regularizing. Finally, we present numerical experiments on classic deep learning setups to show that our results hold qualitatively in realistic lazy learning scenarios.

研究の動機と目的

ランブル regime（遅い regime）でのダブルディセントに対するバイアスと異なる分散源の寄与を理解する。
ランダム特徴モデルにおけるテスト誤差の厳密な漸近的分解を提供する。
データサンプリング、ラベルノイズ、初期化からの分散源を分離する。
アンサンブリングがバイアス・分散成分とダブルディセントに与える影響を定量化する。
解析的発見を遅い学習のニューラルネットワークに関連付け、実験で検証する。

提案手法

固定されたランダムな第一層の重みを持つ遅いニューラルネットワークの代理としてランダム特徴（RF）モデルを用い、第二層のリッジ回帰だけを訓練する。 RF出力を f̂(x) = sum_i a_i σ(<θ_i, x>/√D) と表し、θ_i は i.i.d. に drawn, σ は活性化関数。高次元極限（P, N, D → ∞、比率固定）でのテスト誤差のバイアス-分散分解の鋭い漸近表現を導出。テスト誤差をノイズ、初期化、サンプリング分散とバイアスに分解し、Ψ1, Ψ2^v, Ψ3^v, Ψ2^e, Ψ3^e, Ψ2^d の形で表す。統計物理学のリプリカ法を用いてガウス共分モデルへ写像し、平均場解析を秩序パラメータ（オーバーラップ Q^αβ）で実行。独立に初期化された K 個のRF推定器の出力を平均することによるアンサンブルを分析し、K の関数としてのテスト誤差を導出。
research_questions([

実験結果

リサーチクエスチョン

RQ1遅い RF モデルにおけるテスト誤差の異なる寄与（ノイズ、初期化、サンプリング分散、バイアス）は何か、そしてそれらは補間閾値でどう振る舞うか？
RQ2遅い regime における過parameterization（P/N）がバイアスと分散成分にどのように影響するか？
RQ3独立に初期化された推定器のアンサンブルはダブルディセントのピークを抑制できるか、そしてモデルサイズの拡大とどう比較されるか？
RQ4過parameterization、アンサンブル、正則化の関係はテスト誤差の制御においてどのように寄与するか？
RQ5解析的予測は現実的な遅い学習ニューラルネットワークにも定性的に拡張されるか？

主な発見

補間閾値はノイズと初期化分散のピークをもたらす一方で、バイアスとサンプリング分散は“ kink（折れ）”と“ plateau（平坦）”を示す。
過剰パラメータ化した領域では、バイアスとサンプリング分散はほとんど一定で、テスト誤差の低下はノイズと初期化分散の縮小によって生じる。
独立に初期化された K 個の推定器のアンサンブルは補間閾値での発散を抑制し、K が大きくなるとテスト誤差がアンサンブル（カーネル）極限に向かう。
過parameterization とアンサンブリングは、最終的にノイズをより多くのランダム特徴に分散させることにより有益な効果をもたらし、悪影響となる分散を低減する。
最適な正則化はアンサンブリングを補完でき、探索した regime では無限にアンサンブルされたシステムは最適に正則化された単一モデルを常に上回る。
怠惰 regime の古典的な深層学習設定における数値実験は、遅い極限での CNN に類似したデータ上の CNN 風および全結合ネットの理論を概略的に裏付ける。
重量初期値の分散がダブルディセントにおける役割を直感づけ、無限特徴極限のカーネル法と接続する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。