QUICK REVIEW

[논문 리뷰] Scaling and renormalization in high-dimensional regression

Alexander Atanasov, Jacob A. Zavatone-Veth|arXiv (Cornell University)|2024. 05. 01.

Bayesian Methods and Mixture Models인용 수 6

한 줄 요약

이 논문은 자유 확률에서 S-변환 기법을 사용하여 고차원 리지 회귀 모델의 학습 및 일반화 오차를 도출하고, 선형 및 무작위 특성 모델에서 스케일링, 이중 감소(double descent), 및 분산 원천의 단일화 재정규화 관점을 제공한다.

ABSTRACT

From benign overfitting in overparameterized models to rich power-law scalings in performance, simple ridge regression displays surprising behaviors sometimes thought to be limited to deep neural networks. This balance of phenomenological richness with analytical tractability makes ridge regression the model system of choice in high-dimensional machine learning. In this paper, we present a unifying perspective on recent results on ridge regression using the basic tools of random matrix theory and free probability, aimed at readers with backgrounds in physics and deep learning. We highlight the fact that statistical fluctuations in empirical covariance matrices can be absorbed into a renormalization of the ridge parameter. This `deterministic equivalence' allows us to obtain analytic formulas for the training and generalization errors in a few lines of algebra by leveraging the properties of the $S$-transform of free probability. From these precise asymptotics, we can easily identify sources of power-law scaling in model performance. In all models, the $S$-transform corresponds to the train-test generalization gap, and yields an analogue of the generalized-cross-validation estimator. Using these techniques, we derive fine-grained bias-variance decompositions for a very general class of random feature models with structured covariates. This allows us to discover a scaling regime for random feature models where the variance due to the features limits performance in the overparameterized setting. We also demonstrate how anisotropic weight structure in random feature models can limit performance and lead to nontrivial exponents for finite-width corrections in the overparameterized setting. Our results extend and provide a unifying perspective on earlier models of neural scaling laws.

연구 동기 및 목표

랜덤 매트릭스와 자유확 probability 도구(S-transform)를 도입하여 고차원 리지 회귀를 분석한다.
큰 N, P 한계에서 선형, 커널, 무작위 특성 모델 전반에 걸친 정확한 학습 및 일반화 오차를 도출한다.
S-변환을 통해 리지 매개변수를 노이즈와 연결하는 재정규화 관점을 제시한다.
과적합 매개변수 설정에서 스케일링 거동, 바이어스-분산 분해 및 분산 원천을 특성화한다.

제안 방법

경험적 공분산 행렬을 무작위(Wishart/structured Wishart) 앙상블로 모델링하고 resolvents와 Stieltjes transforms를 통해 특이스펙트럼 특성을 연구한다.
R- 및 S-transform을 이용하여 무작위 데이터 및 특징에 대한 평균에 대한 결정적 등가물을 얻는다.
도표식 자유 확률(diagrammatic free probability)을 적용하여 subordination 관계를 도출하고 곱노이즈를 재정규화된 ridge 매개변수로 변환한다.
선형 및 커널 리지 회귀에 대한 정확한 학습 및 일반화 오차를 도출하고 바이어스-분산 분해를 포함한다.
구조적 공변량과 특징 노이즈를 갖는 무작위 특징 모델로 확장하여 새로운 스케일링 관계와 거동을 얻는다.

실험 결과

연구 질문

RQ1S-transform이 ridge 회귀에서 경험적 공분산에 대한 곱성 노이즈의 영향을 어떻게 인코딩하는가?
RQ2고차원에서 선형 및 커널 리지 회귀에 대한 정확한 학습 및 일반화 오차는 무엇인가?
RQ3과적합/미적합 구간에서 재정규화 효과로부터 스케일링 법칙과 더블 디센트 현상이 어떻게 나타나는가?
RQ4구조적 공변량이나 특징 노이즈를 가진 무작위 특징 모델의 바이어스-분산 분해 및 스케일링 거동은 무엇인가?
RQ5비등방성 가중치 구조가 과parameterized 설정에서 유한 폭 보정과 지수에 어떤 영향을 미치는가?

주요 결과

S-transform은 리지 매개변수를 재정규화하고 모델 간의 train-test 간극을 도출하는 간단한 경로를 제공한다.
학습 및 일반화 오차의 정확한 점근은 알려진 결과를 재생산하며 multiplicative noise를 통한 통합적 시각을 제공한다.
구조적 공변량을 갖는 광범위한 무작위 특징 모델에 대한 새로운 바이어스-분산 분해가 얻어진다.
특성 유발 분산이 과매개변수 설정에서 성능을 제한하는 분산 우세 스케일링 거동을 확인한다.
비등방성 가중치 구조는 과매개변수 구간의 유의미한 finite-width 지수와 스케일링에 영향을 미칠 수 있다.
이 프레임워크는 신경망 스케일링 법칙을 통합하고 더블 디센트를 재정규화 효과로 설명한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.