QUICK REVIEW

[논문 리뷰] Double Trouble in Double Descent : Bias and Variance(s) in the Lazy Regime

Stéphane d’Ascoli, Maria Refinetti|arXiv (Cornell University)|2020. 03. 02.

Stochastic Gradient Optimization Techniques인용 수 57

한 줄 요약

이 논문은 random features를 이용한 lazy 학습 구간에서 이중 하강(double descent)을 분석하고, 테스트 오차의 편향-분산 분해를 정확하게 도출하며, 앙상블과 과파라미터화가 보간 임계에서 과적합 피크를 억제하는 방식을 보인다.

ABSTRACT

Deep neural networks can achieve remarkable generalization performances while interpolating the training data perfectly. Rather than the U-curve emblematic of the bias-variance trade-off, their test error often follows a "double descent" - a mark of the beneficial role of overparametrization. In this work, we develop a quantitative theory for this phenomenon in the so-called lazy learning regime of neural networks, by considering the problem of learning a high-dimensional function with random features regression. We obtain a precise asymptotic expression for the bias-variance decomposition of the test error, and show that the bias displays a phase transition at the interpolation threshold, beyond which it remains constant. We disentangle the variances stemming from the sampling of the dataset, from the additive noise corrupting the labels, and from the initialization of the weights. Following up on Geiger et al. 2019, we first show that the latter two contributions are the crux of the double descent: they lead to the overfitting peak at the interpolation threshold and to the decay of the test error upon overparametrization. We then quantify how they are suppressed by ensemble averaging the outputs of K independently initialized estimators. When K is sent to infinity, the test error remains constant beyond the interpolation threshold. We further compare the effects of overparametrizing, ensembling and regularizing. Finally, we present numerical experiments on classic deep learning setups to show that our results hold qualitatively in realistic lazy learning scenarios.

연구 동기 및 목표

신경망의 lazy(게으른) 영역에서 이중 하강의 기제를 이해한다.
잡음, 초기화, 샘플링 분산이 테스트 오차에 기여하는 바를 분리한다.
앙상블이 이러한 분산에 미치는 영향을 정확한 점근식으로 제시한다.
과파라미터화, 앙상블, 정규화가 일반화에 미치는 효과를 비교한다.

제안 방법

신경망을 Random Features로 모델링하고, 고정된 무작위 1층 가중치와 릿지 회귀로 학습된 2층 가중치를 사용한다.
테스트 오차를 잡음, 초기화, 샘플링, 바이어스 항으로 나눈 편향-분산 분해를 도출한다.
리플리카 방법을 사용하여 고차원 극한에서 이들 항의 명확한 점근식 표현을 계산한다.
K개의 독립적으로 초기화된 추정기의 출력을 평균화하여 앙상블의 효과를 분석하고 테스트 오차에 미치는 영향을 도출한다.
RF 결과를 P→∞ 극한의 커널 릿지 회귀와 연관시키고, 실제 심층학습 시나리오와 비교한다.

실험 결과

연구 질문

RQ1lazy 학습에서 테스트 오차에 기여하는 분산과 바이어스의 뚜렷한 원천은 무엇이며, 이들이 보간 임계 근처에서 어떻게 작용하는가?
RQ2앙상블이 서로 다른 분산 성분과 전체 이중 하강 곡선에 어떤 영향을 미치는가?
RQ3과파라미터화, 앙상블, 정규화가 과적합 피크를 완화하는 데 어떻게 비교되는가?
RQ4RF/커널 결과가 현실적인 lazy-learning 신경망과 데이터에 얼마나 일반화되는가?

주요 결과

테스트 오차는 노이즈, 초기화, 샘플링 분산, 바이어스로 분해되며, 베이즈 오차가 잔차 항으로 남는다.
보간 임계는 노이즈 및 초기화 분산의 발산을 유발하고, 샘플링 분산 및 바이어스는 꼬임과 평형을 보이고, 둘 다 정규화에 의해 매끄럽게 완화된다.
보간 임계 너머에서 바이어스와 샘플링 분산은 본질적으로 일정하게 남아 있으며, 과파라미터화의 이점은 노이즈와 초기화 분산을 줄이는 데에서 나온다.
K개의 독립적으로 초기화된 추정기의 앙상블은 영향을 받는 분산 항의 발산을 1/K 배로 줄이고, K→∞로 갈 때 테스트 오차를 상수로 유지한다.
과파라미터화와 앙상블은 이중 하강 피크를 억제하는 데 비슷한 정성적 효과를 가지며, 상대적 영향은 해석적 식으로 정량화된다.
유한 크기 시뮬레이션이 점근 예측을 검증하고, 느린 규칙의 CNN/DNN 실험은 RF 결과와 질적 일치를 보인다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.