QUICK REVIEW

[论文解读] Universality of empirical risk minimization

Andrea Montanari, Basil N. Saeed|arXiv (Cornell University)|Feb 17, 2022

Machine Learning in Materials Science被引用 23

一句话总结

本文证明了普适性：在高维 ERM 中，使用少量投影向量时，训练误差和测试误差仅通过特征分布的均值与协方差来决定，与高斯等价模型相匹配，适用于随机特征和神经切线模型的应用。

ABSTRACT

Consider supervised learning from i.i.d. samples $\{{\boldsymbol x}_i,y_i\}_{i\le n}$ where ${\boldsymbol x}_i \in\mathbb{R}^p$ are feature vectors and ${y} \in \mathbb{R}$ are labels. We study empirical risk minimization over a class of functions that are parameterized by $\mathsf{k} = O(1)$ vectors ${\boldsymbol θ}_1, . . . , {\boldsymbol θ}_{\mathsf k} \in \mathbb{R}^p$ , and prove universality results both for the training and test error. Namely, under the proportional asymptotics $n,p o\infty$, with $n/p = Θ(1)$, we prove that the training error depends on the random features distribution only through its covariance structure. Further, we prove that the minimum test error over near-empirical risk minimizers enjoys similar universality properties. In particular, the asymptotics of these quantities can be computed $-$to leading order$-$ under a simpler model in which the feature vectors ${\boldsymbol x}_i$ are replaced by Gaussian vectors ${\boldsymbol g}_i$ with the same covariance. Earlier universality results were limited to strongly convex learning procedures, or to feature vectors ${\boldsymbol x}_i$ with independent entries. Our results do not make any of these assumptions. Our assumptions are general enough to include feature vectors ${\boldsymbol x}_i$ that are produced by randomized featurization maps. In particular we explicitly check the assumptions for certain random features models (computing the output of a one-layer neural network with random weights) and neural tangent models (first-order Taylor approximation of two-layer networks).

研究动机与目标

在高维 setting 使用特征化映射和 k 较小的设置中，推动经验风险最小化。
在成比例渐近条件下，建立训练误差的普遍性，以及在正则性条件下测试误差的普遍性。
建立一种证明框架，将非高斯特征分析简化为高斯等价问题。
展示其在随机特征模型和神经切线 regime 下的适用性。

提出的方法

在 n/p=Θ(1) 的成比例极限 n,p→∞ 下，使用特征矩阵 X 及其高斯对偶 G 来形式化 ERM。
在高斯替换下定义普适极限：训练误差 minΘ R̂n(Θ;X,y) 和测试误差 Rn(Θ)。
引入假设 1–5（损失函数/标签、约束集合、分布参数、正则化、逐点正态性）。
通过在 X 与 G 之间建立一个连续的 sin/cos 混合插值路径，并结合多项式近似技术，证明训练误差的普遍性。
证明近似最小化解（ERMt）上的测试误差的普遍性，并给出定理 2 和定理 3 的条件，论证 Rn^x 与 Rn^g 的普适性。
将结果应用于两类特征化映射：随机特征和神经切线模型。

实验结果

研究问题

RQ1高维多投影特征下的 ERM 训练误差是否相对于特征分布呈现普遍性？
RQ2在何种条件下，近经验风险最小化解上的测试误差也呈现普遍性？
RQ3普遍性是否能扩展到非高斯特征、具有依赖性的特征映射，如随机特征和神经切线表示？
RQ4在损失、正则化项和数据分布上的哪些实际条件能确保普遍性，以及在常见模型中如何验证？

主要发现

在假设 1–5 成立时，训练误差具有普遍性：其渐近值与高斯等价模型相符。
在额外正则性条件下，近最小化解上的测试误差具普遍性，使得通过高斯模型分析来预测实际性能成为可能。
普遍性适用于随机特征映射和神经切线模型，无需强凸性或特征条目独立性。
连续插值（插值方法）和多项式近似是处理非凸 ERM 并推导普遍性的关键技术工具。
结果将以往的普遍性工作推广至非强凸/非高斯设置，且不依赖于特征条目的独立性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。