QUICK REVIEW

[논문 리뷰] Reconciling modern machine learning practice and the bias-variance trade-off

Mikhail Belkin, Daniel Hsu|arXiv (Cornell University)|2018. 12. 28.

Machine Learning and Data Classification참고 문헌 38인용 수 83

한 줄 요약

논문은 이중 하강 위험 곡선을 도입하여, 증가하는 모델 용량이 보간(interpolation)을 넘어서도 테스트 위험을 줄일 수 있는 방법을 설명하고, 신경망, 임의 특징, 앙상블 방법 전반에 걸쳐 고전적 편향-분산 이론과 현대적으로 보간하는 예측기를 조화시킨다.

ABSTRACT

Breakthroughs in machine learning are rapidly changing science and society, yet our fundamental understanding of this technology has lagged far behind. Indeed, one of the central tenets of the field, the bias-variance trade-off, appears to be at odds with the observed behavior of methods used in the modern machine learning practice. The bias-variance trade-off implies that a model should balance under-fitting and over-fitting: rich enough to express underlying structure in data, simple enough to avoid fitting spurious patterns. However, in the modern practice, very rich models such as neural networks are trained to exactly fit (i.e., interpolate) the data. Classically, such models would be considered over-fit, and yet they often obtain high accuracy on test data. This apparent contradiction has raised questions about the mathematical foundations of machine learning and their relevance to practitioners. In this paper, we reconcile the classical understanding and the modern practice within a unified performance curve. This "double descent" curve subsumes the textbook U-shaped bias-variance trade-off curve by showing how increasing model capacity beyond the point of interpolation results in improved performance. We provide evidence for the existence and ubiquity of double descent for a wide spectrum of models and datasets, and we posit a mechanism for its emergence. This connection between the performance and the structure of machine learning models delineates the limits of classical analyses, and has implications for both the theory and practice of machine learning.

연구 동기 및 목표

편향-분산 트레이드오프와 현대의 보간 모델 사이의 명백한 불일치를 동기화한다.
모델 용량과 일반화를 위한 통합 프레임워크로서 이중 하강 위험 곡선을 제시하고 설명한다.
실험을 통해 이중 하강이 신경망, 임의 특징, 앙상블 방법 전반에 보편적으로 나타남을 입증한다.
이 행동을 주도하는 귀납적 편향과 최적화 역학에 대한 통찰을 제공한다.

제안 방법

고전적 편향-분산 프레임워크와 보간 임계치를 정의한다.
용량을 연구하기 위한 제어 가능한 모델 계급으로 Random Fourier Features를 도입한다.
ERSM(ERM)을 제곱 손실로 학습하고, 다양하게 변화하는 용량(N)을 포함하여 N<n 및 N≥n인 경우를 비교한다.
커널/최소-노름 보간자(H_infty)가 종종 보간 이후 일반화에서 유한-N 계급보다 더 나은 일반화를 보인다는 것을 보여준다.
신경망 및 앙상블 방법(AdaBoost, Random Forest)을 확장하여 유사한 이중 하강 곡선을 보임을 보여준다.
더 큰 용량이 더 간단하고 노름이 작아 보간자를 찾아 일반화 성능이 더 나아지게 한다는 직관을 제공한다.

실험 결과

연구 질문

RQ1보간 임계치를 넘어 모델 용량을 증가시킬 때 이중 하강 위험 곡선이 나타나는가?
RQ2이중 하강이 신경망, 임의 특징, 트리 기반 앙상블과 같은 모델 계급 전반에 보편적인가?
RQ3보간 이후에 더 나은 일반화를 뒷받침하는 귀납적 편향이나 노름(예: 최소-노름 해) 은 무엇인가?

주요 결과

이중 하강 일반화 곡선: 보간을 넘어 용량을 증가시키면 먼저 테스트 위험이 악화되었다가 이후에 개선된다.
최소-노름 보간자(또는 더 매끄러운 평균/보간 솔루션)가 보간 이후 일반화를 더 잘 유도하는 경향이 있어 두 번째 하강을 설명한다.
Random Fourier Features 실험은 보간 임계치(N=n)에서 피크를 보이고 N>n에서 테스트 성능이 향상된다.
두 층 신경망을 포함한 다층 구조의 신경망은 질적으로 유사한 이중 하강 패턴을 보이며, 최적화 역학이 관찰 가능성에 영향을 준다.
AdaBoost와 Random Forest와 같은 앙상블 방법도 고도로 보간하는 트리를 사용할 때 이중 하강을 보이며, 평균화가 더 매끄러운 일반화를 유도한다.
커널 한계(H_infty)는 유한-N 임의 특징 모델을 능가하는 경우가 많아, 서로 다른 영역에서 최소-노름 보간과의 일관성을 강조한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.