QUICK REVIEW

[論文レビュー] Understanding overfitting peaks in generalization error: Analytical risk curves for $l_2$ and $l_1$ penalized interpolation

Partha P. Mitra|arXiv (Cornell University)|Jun 9, 2019

Sparse and Compressive Sensing Techniques参考文献 24被引用数 35

ひとこと要約

本論文は MiSpaR (Misparametrized Sparse Regression) を導入し、$l_2$ および $l_1$ Penalized interpolation における訓練誤差と汎化誤差の曲線を解析的に導出する。高次元設定において、過適合のピークは古典的 regime と現代 regime を厳密に区別するものではなく、それぞれのペナルティがいつ汎化に適しているかを示す。

ABSTRACT

Traditionally in regression one minimizes the number of fitting parameters or uses smoothing/regularization to trade training (TE) and generalization error (GE). Driving TE to zero by increasing fitting degrees of freedom (dof) is expected to increase GE. However modern big-data approaches, including deep nets, seem to over-parametrize and send TE to zero (data interpolation) without impacting GE. Overparametrization has the benefit that global minima of the empirical loss function proliferate and become easier to find. These phenomena have drawn theoretical attention. Regression and classification algorithms have been shown that interpolate data but also generalize optimally. An interesting related phenomenon has been noted: the existence of non-monotonic risk curves, with a peak in GE with increasing dof. It was suggested that this peak separates a classical regime from a modern regime where over-parametrization improves performance. Similar over-fitting peaks were reported previously (statistical physics approach to learning) and attributed to increased fitting model flexibility. We introduce a generative and fitting model pair ("Misparametrized Sparse Regression" or MiSpaR) and show that the overfitting peak can be dissociated from the point at which the fitting function gains enough dof's to match the data generative model and thus provides good generalization. This complicates the interpretation of overfitting peaks as separating a "classical" from a "modern" regime. Data interpolation itself cannot guarantee good generalization: we need to study the interpolation with different penalty terms. We present analytical formulae for GE curves for MiSpaR with $l_2$ and $l_1$ penalties, in the interpolating limit $λ ightarrow 0$.These risk curves exhibit important differences and help elucidate the underlying phenomena.

研究の動機と目的

Misparametrized Sparse Regression (MiSpaR) フレームワークを導入し、測定値、モデルパラメータ、および適合自由度を分離する。
$l_2$ および $l_1$ ペナルティ下の補間極限で訓練誤差と汎化誤差の解析式を導出する。
過適合のピークが補間点と真のデータ生成能力との関係および sparsity とノイズが汎化に与える影響を示す。
過剰パラメータ化が進む場合でも正則化が汎化を改善するかどうかを、リッジ ($l_2$) とスパース ($l_1$) ペナルティを比較して明示する。

提案手法

MiSpaR を提案し、推論パラメータの数 $p$ が生成パラメータ数 $n$ および測定数 $m$ と異なる生成モデルを採用する。
比率 $\mu=p/m$ および $\alpha=m/n$ が固定されたまま $m,p,n\to\infty$ の高次元漸近を導出し、$l_2$ 回帰の解析的 TE および GE を得る。
補間極限で数値解のための3つの方程式を伴う $l_1$ ペナルティの解析的 GE 式を提供する。
両ペナルティ下で undersampling/oversampling ($\alpha$, $\mu$) および sparsity ($\rho$) によって有効雑音がどのように変化するかを示す。
自己平均化の議論とランダム行列理論（Marchenko-Pastur 分布）を用いて GE/TE 式に必要な和を計算する。

実験結果

リサーチクエスチョン

RQ1ミスパラメトリゼーションとスパシティが $l_2$ 対 $l_1$ ペナルティでデータを補完する場合の訓練誤差と汎化誤差にどう影響するか。
RQ2過適合ピークはデータ補完点 ($\mu=1$) と良い汎化の領域（例：$\mu\alpha=1$）のどこに現れるか。
RQ3特に低ノイズと強いスパース性の下で、非常に過 parameterized な設定で $l_2$ と $l_1$ ペナルティはどのように汎化能力が異なるか。
RQ4補間極限における両ペナルティの GE および TE の正確な解析形、およびそれらが $\alpha$, $\mu$, $\rho$ によってどう依存するか。

主な発見

補間極限 ($\lambda\to 0$) では、過適合ピークは両ペナルティとも $\mu=1$ で起きるが、良い汎化は $\mu\alpha=1$ から始まることがある、補間点で起こるとは限らない。
大きな過 Parameterization の場合、片方のペナルティでも汎化は消失する（$GE(\mu\to\infty)=1$）が、スパースな $l_1$ はノイズ ($\sigma^2$) と稀少性 ($\rho$) が小さい範囲で広い $\mu$ にわたって良く汎化できる。
高い過 parameterization とノイズ低下・強い sparsity の領域で $l_1$ と $l_2$ の間には性能差が顕著であり、$l_1$ は汎化できる一方 $l_2$ は失敗する。
有限正則化 ($\lambda>0$) は過適合ピークを抑制し、単なる補間が良い汎化を保証するわけではないことを示唆する。
$l_1$ の解析的 GE は $\tau$, $\hat{\rho}$, および $\sigma_{\xi}$ を結ぶ3式の系を含み、スパース回帰におけるアルゴリズム的相転移を示している。
本研究は汎化性は inductive bias（ペナルティの選択）に強く依存し、データ補間だけに内在するものではないことを示している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。