QUICK REVIEW

[论文解读] Scaling and renormalization in high-dimensional regression

Alexander Atanasov, Jacob A. Zavatone-Veth|arXiv (Cornell University)|May 1, 2024

Bayesian Methods and Mixture Models被引用 6

一句话总结

本论文利用自由概率中的S变换技术推导高维岭回归模型的训练误差和泛化误差，提供一个统一的归一化视角，解释线性和随机特征模型中的尺度、双下降与方差来源。

ABSTRACT

From benign overfitting in overparameterized models to rich power-law scalings in performance, simple ridge regression displays surprising behaviors sometimes thought to be limited to deep neural networks. This balance of phenomenological richness with analytical tractability makes ridge regression the model system of choice in high-dimensional machine learning. In this paper, we present a unifying perspective on recent results on ridge regression using the basic tools of random matrix theory and free probability, aimed at readers with backgrounds in physics and deep learning. We highlight the fact that statistical fluctuations in empirical covariance matrices can be absorbed into a renormalization of the ridge parameter. This `deterministic equivalence' allows us to obtain analytic formulas for the training and generalization errors in a few lines of algebra by leveraging the properties of the $S$-transform of free probability. From these precise asymptotics, we can easily identify sources of power-law scaling in model performance. In all models, the $S$-transform corresponds to the train-test generalization gap, and yields an analogue of the generalized-cross-validation estimator. Using these techniques, we derive fine-grained bias-variance decompositions for a very general class of random feature models with structured covariates. This allows us to discover a scaling regime for random feature models where the variance due to the features limits performance in the overparameterized setting. We also demonstrate how anisotropic weight structure in random feature models can limit performance and lead to nontrivial exponents for finite-width corrections in the overparameterized setting. Our results extend and provide a unifying perspective on earlier models of neural scaling laws.

研究动机与目标

引入随机矩阵与自由概率工具（S变换）来分析高维岭回归。
在大N、P极限下推出线性、核方法和随机特征模型的精确训练与泛化误差。
提供通过S变换将岭回归参数与噪声联系起来的归一化视角。
表征过参数化情形下的尺度规律、偏差-方差分解以及方差来源。

提出的方法

将经验协方差矩阵建模为随机（Wishart/结构化Wishart）集合，并通过分辨子和Stieltjes变换研究其谱性质。
采用R-变换和S-变换来获得随机数据与特征的平均值的确定等价表达。
应用图形自由概率来推导下支配关系并将乘性噪声转化为归一化的岭回归参数。
推导线性和核岭回归的精确训练与泛化误差，包括偏差-方差分解。
扩展到带结构协变量和特征噪声的随机特征模型，以获得新的缩放关系和阶段。

实验结果

研究问题

RQ1S变换如何编码乘性噪声对岭回归经验协方差的影响？
RQ2在高维中线性与核岭回归的精确训练与泛化误差是什么？
RQ3从归一化效应在过参数化/欠参数化状态下如何产生尺度定律与双下降现象？
RQ4带结构化协变量或特征噪声的随机特征模型的偏差-方差分解与缩放阶段？
RQ5各向异性权重结构如何影响有限宽度修正与在过参数化状态中的幂次？

主要发现

S变换为归一化岭回归参数并推导不同模型之间的训练-测试差距提供了一条简单路径。
训练与泛化误差的精确渐近结果再现已知结论，并通过乘性噪声提供统一视角。
得到一类具有结构化协变量的广泛随机特征模型的新型偏差-方差分解。
识别一个方差主导的缩放阶段，在过参数化设置中由特征引起的方差限制性能。
各向异性权重结构可以产生非平凡的有限宽度幂次并影响过参数化状态下的缩放。
该框架统一了神经网络的缩放规律并将双下降解释为归一化效应。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。