QUICK REVIEW

[论文解读] Exact expressions for double descent and implicit regularization via surrogate random design

Michał Dereziński, Feynman Liang|arXiv (Cornell University)|Dec 10, 2019

Stochastic Gradient Optimization Techniques参考文献 61被引用 33

一句话总结

论文在代理的 determinantal 随机设计下推导出 Moore-Penrose 估计量在线性回归中的确切非渐近 MSE 表达式，揭示双下降和隐式岭回归式正则化，与独立同分布设计的渐近一致性。

ABSTRACT

Double descent refers to the phase transition that is exhibited by the generalization error of unregularized learning models when varying the ratio between the number of parameters and the number of training samples. The recent success of highly over-parameterized machine learning models such as deep neural networks has motivated a theoretical analysis of the double descent phenomenon in classical models such as linear regression which can also generalize well in the over-parameterized regime. We provide the first exact non-asymptotic expressions for double descent of the minimum norm linear estimator. Our approach involves constructing a special determinantal point process which we call surrogate random design, to replace the standard i.i.d. design of the training sample. This surrogate design admits exact expressions for the mean squared error of the estimator while preserving the key properties of the standard design. We also establish an exact implicit regularization result for over-parameterized training samples. In particular, we show that, for the surrogate design, the implicit bias of the unregularized minimum norm estimator precisely corresponds to solving a ridge-regularized least squares problem on the population distribution. In our analysis we introduce a new mathematical tool of independent interest: the class of random matrices for which determinant commutes with expectation.

研究动机与目标

Explain double descent in linear regression with exact, non-asymptotic expressions for the minimum norm estimator.
Introduce surrogate random design via a determinantal point process to enable tractable analysis.
Characterize the implicit regularization of the minimum norm solution and its relation to ridge regression.
Demonstrate asymptotic consistency of surrogate design with standard i.i.d. designs for Gaussian-like data.

提出的方法

Construct surrogate random design S_mu^n using a determinantal point process with background measure mu.
Derive exact non-asymptotic MSE expressions for MSE[ X_bar^dagger y_bar ] under the surrogate design (Theorem 1).
Define and compute implicit regularization through the expected Moore-Penrose estimator, linking it to ridge-regularized LS (Theorem 2).
Introduce determinant preserving random matrices to justify determinant-expectation commutation (Section 4).
Prove asymptotic consistency of the surrogate design with i.i.d. designs for sub-Gaussian rows and bounded covariance (Theorem 3).
Provide auxiliary lemmas for expectations of traces and projections under surrogate design (Lemmas 2 and 3).

实验结果

研究问题

RQ1How does the minimum-norm Moore-Penrose estimator perform in under- and over-parameterized linear regression when samples are drawn from a surrogate, determinantal design?
RQ2Can we obtain exact non-asymptotic MSE expressions for the surrogate design and interpret their implicit regularization in terms of ridge regression?
RQ3Do surrogate-design results asymptotically match those of standard i.i.d. designs across common data distributions?
RQ4What mathematical tools (like determinant-preserving matrices) enable tractable analysis of random design determinants?
RQ5How does eigenvalue decay of the data covariance affect double descent and implicit regularization in this setting?

主要发现

Exact non-asymptotic MSE formulas are obtained for the Moore-Penrose estimator under surrogate design (Theorem 1).
The implicit regularization effect makes the under-determined estimator correspond to a ridge-regularized LS solution on the population (Theorem 2).
The effective dimension and a related lambda_n parameter govern the MSE, linking to ridge-like regularization without explicit regularization.
Surrogate design yields MSE expressions that align with empirical i.i.d. designs for Gaussian-like mu, and remain accurate under various covariance structures.
The surrogate design is asymptotically consistent with i.i.d. designs for sub-Gaussian rows and bounded covariance as n/d converges to a constant (Theorem 3).
The analysis relies on determinant-preserving random matrices and Poisson-linked constructions to compute expectations of determinants (Section 4, Lemmas 4–6).

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。