QUICK REVIEW

[论文解读] Risk Bounds for High-dimensional Ridge Function Combinations Including Neural Networks

Jason M. Klusowski, Andrew R. Barron|arXiv (Cornell University)|Jul 5, 2016

Model Reduction and Neural Networks参考文献 25被引用 37

一句话总结

本文通过利用谱范数和原子范数，建立了使用径向基函数线性组合（包括单层神经网络）进行高维函数估计的风险界。当使用连续参数空间时，估计误差衰减为 $(v_{f^\bullet}^4 \frac{\text{log } d}{n})^{1/3}$，在 $d \gg n$ 的高维设置下显著优于经典界。结果适用于平滑激活函数（如Sigmoid、Ramp和正弦函数），并表明即使参数数量超过样本量，收敛性依然良好。

ABSTRACT

Let $ f^{\star} $ be a function on $ \mathbb{R}^d $ with an assumption of a spectral norm $ v_{f^{\star}} $. For various noise settings, we show that $ \mathbb{E}\|\hat{f} - f^{\star} \|^2 \leq \left(v^4_{f^{\star}}\frac{\log d}{n} ight)^{1/3} $, where $ n $ is the sample size and $ \hat{f} $ is either a penalized least squares estimator or a greedily obtained version of such using linear combinations of sinusoidal, sigmoidal, ramp, ramp-squared or other smooth ridge functions. The candidate fits may be chosen from a continuum of functions, thus avoiding the rigidity of discretizations of the parameter space. On the other hand, if the candidate fits are chosen from a discretization, we show that $ \mathbb{E}\|\hat{f} - f^{\star} \|^2 \leq \left(v^3_{f^{\star}}\frac{\log d}{n} ight)^{2/5} $. This work bridges non-linear and non-parametric function estimation and includes single-hidden layer nets. Unlike past theory for such settings, our bound shows that the risk is small even when the input dimension $ d $ of an infinite-dimensional parameterized dictionary is much larger than the available sample size. When the dimension is larger than the cube root of the sample size, this quantity is seen to improve the more familiar risk bound of $ v_{f^{\star}}\left(\frac{d\log (n/d)}{n} ight)^{1/2} $, also investigated here.

研究动机与目标

推导使用径向基函数线性组合进行高维函数估计的一般化误差界。
解决非参数和非线性估计中 $d \gg n$ 的挑战，此时传统界失效。
证明即使参数数量超过样本量，风险依然保持较小。
统一并扩展单层神经网络与径向基函数逼近的现有理论。
通过谱范数控制和原子范数正则化，建立改进的收敛速率。

提出的方法

在径向基函数的连续参数空间上使用带惩罚的最小二乘估计器，形式为 $f(x) = \sum_{k=1}^m c_k \phi(a_k \cdot x + b_k)$。
将原子范数 $\|f\|_{\mathcal{H}}$ 定义为：在字典 $\mathcal{H}$ 中表示 $f$ 的系数的最小 $\ell_1$-范数。
引入谱范数 $v_{f^\star,s} = \int_{\mathbb{R}^d} \|\omega\|_1^s |\widetilde{f}(\omega)| d\omega$，以量化目标函数 $f^\star$ 的光滑性和正则性。
通过从与 $|\cos(\|\omega\|_1 t + b(\omega))| \|\omega\|_1^2 |\widetilde{f}(\omega)|$ 成比例的密度中随机抽样，构造 $\pm(\alpha \cdot x - t)_+$ 的线性组合。
通过平衡逼近误差与复杂度，结合Fubini定理和傅里叶变换的积分表示，推导风险界。
通过使用平方径向基函数 $(a_k \cdot x + b_k)^2_+$，将框架扩展至高阶泰勒展开。

实验结果

研究问题

RQ1当参数数量 $d$ 超过样本量 $n$ 时，能否为高维函数估计建立风险界？
RQ2在连续与离散参数空间之间进行选择，如何影响径向基函数估计器的收敛速率？
RQ3径向基函数线性组合的最优收敛速率是什么，特别是对于类似神经网络的模型？
RQ4谱范数 $v_{f^\star,s}$ 能否用于控制逼近误差，并推广至高维设置？
RQ5原子范数与带惩罚的最小二乘估计器如何协同作用，以在高维中实现改进的一般化性能？

主要发现

连续参数空间的风险界为 $\mathbb{E}\|\hat{f} - f^\star\|^2 \leq \left(v_{f^\star}^4 \frac{\log d}{n}\right)^{1/3}$，当 $d \gg n$ 时优于经典界。
对于离散参数空间，风险界为 $\mathbb{E}\|\hat{f} - f^\star\|^2 \leq \left(v_{f^\star}^3 \frac{\log d}{n}\right)^{2/5}$，表明在高维中收敛更慢但更优。
该界适用于广泛的激活函数类，包括Sigmoid、Ramp、正弦函数及其平方形式，使结果可应用于单层神经网络。
使用 $m$ 个径向基函数时，$f^\star$ 的逼近误差在使用 $\pm(\alpha \cdot x - t)_+$ 时有界于 $16v_{f^\star,2}^2 / m$，在使用平方径向基函数进行二阶逼近时有界于 $16v_{f^\star,3}^2 / m$。
该框架允许在不离散化参数空间的情况下进行非参数估计，避免了刚性，提升了自适应性。
结果表明，即使 $d \gg n$，泛化误差依然保持较小，填补了高维设置下神经网络理论理解的空白。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。