QUICK REVIEW

[论文解读] On the Multiple Descent of Minimum-Norm Interpolants and Restricted Lower Isometry of Kernels

Tengyuan Liang, Alexander Rakhlin|arXiv (Cornell University)|Aug 27, 2019

Stochastic Gradient Optimization Techniques被引用 59

一句话总结

该论文分析在不同高维缩放情形下最小范数插值于 RKHS 的风险，并证明高维核矩阵的受限下等距性性质，揭示随着 d 相对于 n 的扩大，风险呈现非单调的多次下降行为。

ABSTRACT

We study the risk of minimum-norm interpolants of data in Reproducing Kernel Hilbert Spaces. Our upper bounds on the risk are of a multiple-descent shape for the various scalings of $d = n^{\\alpha}$, $\\alpha\\in(0,1)$, for the input dimension $d$ and sample size $n$. Empirical evidence supports our finding that minimum-norm interpolants in RKHS can exhibit this unusual non-monotonicity in sample size; furthermore, locations of the peaks in our experiments match our theoretical predictions. Since gradient flow on appropriately initialized wide neural networks converges to a minimum-norm interpolant with respect to a certain kernel, our analysis also yields novel estimation and generalization guarantees for these over-parametrized models. At the heart of our analysis is a study of spectral properties of the random kernel matrix restricted to a filtration of eigen-spaces of the population covariance operator, and may be of independent interest.

研究动机与目标

激发对 RKHS 中最小范数插值（核无脊回归）的泛化性与一致性理解。
描述在高维缩放 d ~ n^α，α 在 (0,1) 时的风险行为。
通过受限下等距性分析揭示随机化核矩阵的谱特性。
将结果与通过梯度流训练的过参数化模型及类似 NTK 的核联系起来。

提出的方法

研究由核 k(x,z)=h(x^Tz/d) 定义的 RKHS 中的最小范数插值 f̂，其 h 光滑且泰勒系数非负。
使用闭式表达 f̂(x)=k(x,X)^T K^{-1} Y，以及给定 X 的偏差-方差分解，分析插值的方差与偏差贡献。
在总体协方差算子的特征空间过滤上，为经验核矩阵建立受限下等距性性质。
对多项式特征进行 Gram-Schmidt 正交化，以控制单项式的协方差结构，从而实现谱下界。
应用小球概率技巧界定高维下样本协方差的最小特征值。
将结果扩展到神经网络启发的核，包括 Neural-Tangent-Type 核，并推导泛化界。

实验结果

研究问题

RQ1当维度按 d ~ n^α（α 在 (0,1)）缩放时，最小范数核插值的风险如何表现？
RQ2能否为高维核矩阵建立受限下等距性属性（RLIP），以及它如何影响方差与偏差界？
RQ3这些 RKHS 结果是否可拓展到通过 NTK-type 核训练的神经网络情形，为过参数化模型提供保证？
RQ4峰值风险（多次下降）的区间位于何处，以及它们与总体协方差的谱特性有何关系？
RQ5在无噪声与有噪声设定下，插值的泛化性能如何变化？

主要发现

RKHS 中最小范数插值的风险上界在 d 约等于 n^α 的区间内呈现多重下降形态，α ∈ (0,1)。
对每个整数 ι≥1 且 α ∈ [1/(ι+1), 1/ι)，风险曲线在 d ≈ n^{1/(ι+1/2)} 附近具有快速收敛的谷，并在预测的尺度出现峰值。
经验核矩阵在总体特征空间的过滤后，满足受限下等距性性质，从而实现尖锐的方差与偏差控制。
方差界包含一个与 d^ι/n 成正比的项以及 n/d^{ι+1}，结果覆盖非多项式与多项式核情形，取决于 h 的泰勒系数。
偏差可用方差项来控制，在目标函数通过核的表示和核值有界的假设下成立界。
推论将主要结果扩展到 Neural-Tangent-Type 核，为收敛到最小范数插值的宽神经网络提供估计与泛化保证。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。