QUICK REVIEW

[論文レビュー] Double Trouble in Double Descent : Bias and Variance(s) in the Lazy Regime

Stéphane d’Ascoli, Maria Refinetti|arXiv (Cornell University)|Mar 2, 2020

Stochastic Gradient Optimization Techniques被引用数 57

ひとこと要約

この論文は、Random Features を用いた lazy learning regime における double descent を分析し、テスト誤差の正確な bias-variance 分解を導出し、アンサンブリングと過 parametrization が interpolation threshold での過剰適合ピークを抑制することを示す。

ABSTRACT

Deep neural networks can achieve remarkable generalization performances while interpolating the training data perfectly. Rather than the U-curve emblematic of the bias-variance trade-off, their test error often follows a "double descent" - a mark of the beneficial role of overparametrization. In this work, we develop a quantitative theory for this phenomenon in the so-called lazy learning regime of neural networks, by considering the problem of learning a high-dimensional function with random features regression. We obtain a precise asymptotic expression for the bias-variance decomposition of the test error, and show that the bias displays a phase transition at the interpolation threshold, beyond which it remains constant. We disentangle the variances stemming from the sampling of the dataset, from the additive noise corrupting the labels, and from the initialization of the weights. Following up on Geiger et al. 2019, we first show that the latter two contributions are the crux of the double descent: they lead to the overfitting peak at the interpolation threshold and to the decay of the test error upon overparametrization. We then quantify how they are suppressed by ensemble averaging the outputs of K independently initialized estimators. When K is sent to infinity, the test error remains constant beyond the interpolation threshold. We further compare the effects of overparametrizing, ensembling and regularizing. Finally, we present numerical experiments on classic deep learning setups to show that our results hold qualitatively in realistic lazy learning scenarios.

研究の動機と目的

Understand the mechanisms driving double descent in the lazy regime of neural networks.
Disentangle the contributions to test error from noise, initialization, and sampling variances.
Provide a precise asymptotic formula for how ensembling affects these variances.
Compare effects of overparametrization, ensembling, and regularization on generalization.

提案手法

Model the neural network as Random Features with fixed random first-layer weights and trained second-layer weights via ridge regression.
Derive a bias-variance decomposition of the test error into noise, initialization, sampling, and bias terms.
Compute sharp asymptotic expressions for these terms in the high-dimensional limit using the replica method.
Analyze the effect of ensembling by averaging outputs of K independently initialized estimators and derive its impact on the test error.
Relate RF results to kernel ridge regression in the P→∞ limit and compare with empirical deep learning scenarios.

実験結果

リサーチクエスチョン

RQ1What are the distinct sources of variance and bias contributing to the test error in lazy learning and how do they behave around the interpolation threshold?
RQ2How does ensembling influence the different variance components and the overall double-descent curve?
RQ3How do overparametrization, ensembling, and regularization compare in mitigating the overfitting peak?
RQ4To what extent do the RF/kernel results carry over to realistic lazy-learning neural networks and data?

主な発見

The test error decomposes into noise, initialization, sampling variances, and bias, with Bayes error as a residual term.
The interpolation threshold causes a divergence in noise and initialization variances, while sampling variance and bias exhibit a kink and plateau, both smoothed by regularization.
Beyond the interpolation threshold, bias and sampling variance remain essentially constant, and the benefits of overparametrization arise from reducing the noise and initialization variances.
Ensembling K independent initialized estimators reduces the divergence by a factor of 1/K for the affected variance terms, preserving a constant test error as K→∞.
Overparametrization and ensembling have similar qualitative effects in suppressing the double-descent peak, with analytic expressions quantifying their relative impacts.
Finite-size simulations validate the asymptotic predictions, and CNN/DNN experiments in lazy regimes show qualitative agreement with RF results.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。