QUICK REVIEW

[論文レビュー] Reconciling modern machine learning practice and the bias-variance trade-off

Mikhail Belkin, Daniel Hsu|arXiv (Cornell University)|Dec 28, 2018

Machine Learning and Data Classification参考文献 38被引用数 83

ひとこと要約

論文はダブルデセントリスク曲線を導入し、補間を超えるモデル容量の増加がテストリスクを低下させる方法を説明し、ニューラルネットワーク、ランダム特徴、アンサンブル法を横断して、古典的なバイアス-分散理論と現代の補間予測子を統合させる。

ABSTRACT

Breakthroughs in machine learning are rapidly changing science and society, yet our fundamental understanding of this technology has lagged far behind. Indeed, one of the central tenets of the field, the bias-variance trade-off, appears to be at odds with the observed behavior of methods used in the modern machine learning practice. The bias-variance trade-off implies that a model should balance under-fitting and over-fitting: rich enough to express underlying structure in data, simple enough to avoid fitting spurious patterns. However, in the modern practice, very rich models such as neural networks are trained to exactly fit (i.e., interpolate) the data. Classically, such models would be considered over-fit, and yet they often obtain high accuracy on test data. This apparent contradiction has raised questions about the mathematical foundations of machine learning and their relevance to practitioners. In this paper, we reconcile the classical understanding and the modern practice within a unified performance curve. This "double descent" curve subsumes the textbook U-shaped bias-variance trade-off curve by showing how increasing model capacity beyond the point of interpolation results in improved performance. We provide evidence for the existence and ubiquity of double descent for a wide spectrum of models and datasets, and we posit a mechanism for its emergence. This connection between the performance and the structure of machine learning models delineates the limits of classical analyses, and has implications for both the theory and practice of machine learning.

研究の動機と目的

バイアス-分散のトレードオフと現代の補間モデルとの間の apparent mismatch を動機づける。
モデル容量と一般化の統一的枠組みとしてダブルデセントリスク曲線を提案・説明する。
ニューラルネットワーク、ランダム特徴、アンサンブル法でのダブルデセントの普遍性を実験を通じて示す。
この振る舞いを導く帰納的バイアスと最適化ダイナミクスに関する洞察を提供する。

提案手法

古典的なバイアス-分差フレームワークと interpolation threshold を定義する。
容量を研究する制御可能なモデルクラスとして Random Fourier Features を導入する。
平方損失のERMでモデルを訓練し、容量 (N) の異なる変化を比較する（N<n および N≥n を含む）。
カーネル/最小ノルム補間子 (H_infty) はしばしば補間を超えた一般化で有限-Nクラスより良いことを示す。
ニューラルネットワークやアンサンブル法（AdaBoost、Random Forests）に観察を拡張し、同様のダブルデセント曲線を示す。
より大きな容量は、一般化しやすいより単純で小さいノルムの補間解を見つけることを可能にするという直感を提供する。

実験結果

リサーチクエスチョン

RQ1補間閾値を超えたときにダブルデセントリスク曲線が現れるか。
RQ2ニューラルネットワーク、ランダム特徴、木ベースのアンサンブルなどのモデルクラス全体でダブルデセントが普遍的か。
RQ3補間を超えたときに見られるより良い一般化の背後にある帰納的バイアスやノルム（例：最小ノルム解）は何か。

主な発見

ダブルデセント一般化曲線：補間を過ぎて容量を増やすと、まずテストリスクが悪化し、その後改善する。
最小ノルム補間子（または平滑化された平均/補間解）は補間を超えた一般化をもたらし、第二のデセントを説明する。
Random Fourier Features の実験では補間閾値 (N=n) でピークが現れ、N>n でテスト性能が改善する。
ニューラルネットワーク（2層ネット、複数層アーキテクチャを含む）は意味論的に類似したダブルデセントパターンを示し、最適化ダイナミクスが観測性に影響を与える。
AdaBoost や Random Forests のようなアンサンブル法も、非常に補間力のある決定木を使用するとダブルデセントを示し、平均化がより滑らかな一般化に寄与する。
カーネル極限 (H_infty) は有限-N ランダム特徴モデルをしばしば上回るベンチマークを提供し、さまざまな領域で最小ノーム補間と一貫していることを強調する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。