QUICK REVIEW

[논문 리뷰] Exact expressions for double descent and implicit regularization via surrogate random design

Michał Dereziński, Feynman Liang|arXiv (Cornell University)|2019. 12. 10.

Stochastic Gradient Optimization Techniques참고 문헌 61인용 수 33

한 줄 요약

논문은 surrogate, determinantal 랜덤 설계 하에서 Moore-Penrose 추정기에 대한 비-비대칭적 비점근 MSE 표현을 도출하여 double descent와 암시적 ridge-like 정규화를 드러내고, i.i.d. 설계에 대한 점근적 일관성으로 수렴함을 보여준다.

ABSTRACT

Double descent refers to the phase transition that is exhibited by the generalization error of unregularized learning models when varying the ratio between the number of parameters and the number of training samples. The recent success of highly over-parameterized machine learning models such as deep neural networks has motivated a theoretical analysis of the double descent phenomenon in classical models such as linear regression which can also generalize well in the over-parameterized regime. We provide the first exact non-asymptotic expressions for double descent of the minimum norm linear estimator. Our approach involves constructing a special determinantal point process which we call surrogate random design, to replace the standard i.i.d. design of the training sample. This surrogate design admits exact expressions for the mean squared error of the estimator while preserving the key properties of the standard design. We also establish an exact implicit regularization result for over-parameterized training samples. In particular, we show that, for the surrogate design, the implicit bias of the unregularized minimum norm estimator precisely corresponds to solving a ridge-regularized least squares problem on the population distribution. In our analysis we introduce a new mathematical tool of independent interest: the class of random matrices for which determinant commutes with expectation.

연구 동기 및 목표

최소 제곱 노름 추정기에 대한 정확하고 비점근적 표현으로 선형 회귀에서의 더블 디센트를 설명한다.
분석 가능성을 가능하게 하는 determinantal 포인트 프로세스로서의 surrogate 랜덤 설계를 도입한다.
최소 노름 해의 암시적 정규화와 그것의 ridge 회귀와의 관계를 특징화한다.
표준 i.i.d. 설계와의 Gaussian 유사 데이터에 대한 surrogate 설계의 점근적 일관성을 보여준다.

제안 방법

배경 측정 mu를 이용한 determinantal 포인트 프로세스로 surrogate 랜덤 설계 S_mu^n를 구성한다.
Surrogate 설계 하에서 MSE[ X_bar^dagger y_bar ]에 대한 정확한 비점근적 MSE 표현을 도출한다(Theorem 1).
암시적 정규화를 기대 Moore-Penrose 추정기를 통해 정의하고 계산하여 이를 ridge-정규화된 LS(In population)와 연결한다(Theorem 2).
Determinant를 보존하는 랜덤 매트릭스를 도입하여 determinant-기대값 순서를 정당화한다(Section 4).
sub-Gaussian 행으로의 surrogate 설계와 경계된 공분산에 대한 i.i.d. 설계의 점근적 일관성을 증명한다(Theorem 3).
surrogate 설계의 기대값 계산에서 Tr 및 투영의 기대값에 관한 보조 보조정리를 제공한다(Lemmas 2와 3).

실험 결과

연구 질문

RQ1 surrogate determinantal 설계에서 샘플링될 때 최소-노름 Moore-Penrose 추정기가 과소 매개변수화/과다 매개변수화된 선형 회귀에서 어떻게 성능을 보이는가?
RQ2 surrogate 설계에 대해 정확한 비점근적 MSE 표현을 얻고 이를 ridge 회귀와의 암시적 정규화로 해석할 수 있는가?
RQ3 surrogate 설계 결과가 일반적인 데이터 분포에서 표준 i.i.d. 설계의 결과와 점근적으로 일치하는가?
RQ4 determinant를 보존하는 행렬과 같은 수학적 도구가 랜덤 설계의 결정자(det) 분석을 어떻게 가능하게 하는가?
RQ5데이터 공분산의 고유값 감소가 이 설정의 double descent와 암시적 정규화에 어떤 영향을 미치는가?

주요 결과

surrogate 설계 하에서 Moore-Penrose 추정기에 대한 비점근적 MSE 공식이 정확히 도출된다(Theorem 1).
암시적 정규화 효과로 인해 과소 결정된 추정기가 모집단에서 ridge-정규화된 LS 해에 대응하게 된다(Theorem 2).
실질적 차원과 관련된 lambda_n 매개변수가 MSE를 지배하며 명시적 정규화 없이 Ridge와 유사한 정규화로 연결된다.
surrogate 설계는 Gaussian-like mu에 대해 경험적 i.i.d. 설계와 일치하는 MSE 표현을 제공하고, 다양한 공분산 구조에서도 정확성을 유지한다.
surrogate 설계는 sub-Gaussian 행과 경계된 공분산에 대해 n/d가 일정으로 수렴할 때 i.i.d. 설계와 점근적으로 일관된다(Theorem 3).
분석은 determinant를 보존하는 랜덤 매트릭스와 Poisson 연결 구성에 의존하여 determinant의 기대값을 계산한다(Section 4, Lemmas 4–6).

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.