QUICK REVIEW

[论文解读] Statistical Mechanics of Generalization in Kernel Regression.

Abdülkadir Canatar, Blake Bordelon|arXiv (Cornell University)|Jun 23, 2020

Gaussian Processes and Bayesian Inference参考文献 20被引用 6

一句话总结

本文利用统计力学推导出所有旋转不变核函数在核回归中泛化误差的解析表达式，揭示了高维数据中的多个学习阶段。每个阶段对应于与核特征值相关的退化谱模式的学习，学习曲线由有效正则化项和噪声决定；当有效正则化等于有效噪声方差时，泛化性能达到最优，且每个阶段均表现出双下降行为。

ABSTRACT

Generalization beyond a training dataset is a main goal of machine learning. We investigate generalization error in kernel regression using statistical mechanics and derive an analytical expression for it applicable to any kernel. Focusing on the broad class of rotation invariant kernels, which is relevant to training deep neural networks in the infinite-width limit, we show several phenomena. When data is drawn from a spherically symmetric distribution and the number of input dimensions, $D$, is large, we find that multiple learning stages exist, one for each scaling of the number of training samples with $\mathcal{O}_D(D^K)$ with $K\in Z^+$. In each stage $\mathcal{O}_D(D^K)$ degenerate spectral modes corresponding to the $K$-th kernel eigenvalue are learned. The mathematical analysis of a learning stage reduces to that of a solvable model with the dimensionality of the feature space extensive in the number of samples and a white kernel spectrum, including linear regression as a special case. The behavior of the learning curve in each stage is governed by an effective regularizer and an effective target noise that are related to the tail of the kernel and the target function spectra. When effective regularization is zero, we identify a first order phase transition that corresponds to a divergence in the generalization error. Each learning stage can exhibit sample-wise extit{double-descent}, where learning curves show non-monotonic sample size dependence. For each stage an optimal value of effective regularizer exists, equal to the effective noise variance, that gives minimum generalization error.

研究动机与目标

通过统计力学理解核回归中的泛化误差，尤其关注与深度学习相关的高维设置。
分析学习动态与泛化如何依赖于旋转不变核的谱结构。
识别高维输入空间中的不同学习阶段，每个阶段对应于训练样本数相对于输入维度 D 的特定缩放关系。
表征每个阶段中控制学习曲线行为的有效正则化项与有效噪声。
确定最小化泛化误差的最优正则化条件，并识别相变点。

提出的方法

该分析采用统计力学方法，在高维极限下建模核回归，重点研究旋转不变核。
基于训练样本数按 O_D(D^K) 的缩放关系（K 为正整数）识别学习阶段。
每个阶段可简化为具有广延特征空间和白噪声核谱的可解模型，推广了线性回归。
有效正则化项和有效噪声分别从核谱的尾部和目标函数谱中推导得出。
当有效正则化消失时，识别出相变点，导致泛化误差发散。
该方法揭示，每个阶段均表现出样本层面的双下降行为，源于泛化误差对训练样本数的非单调依赖。

实验结果

研究问题

RQ1当输入维度 D 较大且数据呈球对称时，核回归中的泛化误差行为如何？
RQ2高维核回归中的不同学习阶段是什么？它们如何依赖于训练样本数随 D 的缩放关系？
RQ3有效正则化项与有效噪声如何在每个阶段控制学习曲线的动力学行为？
RQ4一阶相变在何种条件下发生？其对泛化误差有何影响？
RQ5在每个学习阶段，最小化泛化误差的最优有效正则化项是什么？

主要发现

多个学习阶段出现，每个阶段对应于 K ∈ Z^+ 的 O_D(D^K) 训练样本，且在每个阶段中学习到退化的谱模式。
每个学习阶段可简化为具有广延特征空间和白噪声核谱的可解模型，推广了线性回归。
每个阶段的学习曲线由从核谱和目标函数谱中导出的有效正则化项与有效噪声共同决定。
当有效正则化消失时，发生一阶相变，导致泛化误差发散。
每个阶段均表现出样本层面的双下降行为，即泛化误差随样本数先减小后增大。
在每个阶段中，使泛化误差最小化的最优有效正则化项等于该阶段的有效噪声方差。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。