QUICK REVIEW

[论文解读] Recovery Guarantees for One-hidden-layer Neural Networks

Kai Zhong, Zhao Song|arXiv (Cornell University)|Jun 10, 2017

Neural Networks and Applications参考文献 19被引用 129

一句话总结

论文证明了一隐藏一层神经网络的参数恢复和全局收敛性保证，使用接近真实参数的 Hessian 分析以及张量初始化，在温和假设下得到线性维度的样本和计算复杂度。

ABSTRACT

In this paper, we consider regression problems with one-hidden-layer neural networks (1NNs). We distill some properties of activation functions that lead to $\mathit{local~strong~convexity}$ in the neighborhood of the ground-truth parameters for the 1NN squared-loss objective. Most popular nonlinear activation functions satisfy the distilled properties, including rectified linear units (ReLUs), leaky ReLUs, squared ReLUs and sigmoids. For activation functions that are also smooth, we show $\mathit{local~linear~convergence}$ guarantees of gradient descent under a resampling rule. For homogeneous activations, we show tensor methods are able to initialize the parameters to fall into the local strong convexity region. As a result, tensor initialization followed by gradient descent is guaranteed to recover the ground truth with sample complexity $ d \cdot \log(1/ε) \cdot \mathrm{poly}(k,λ)$ and computational complexity $n\cdot d \cdot \mathrm{poly}(k,λ) $ for smooth homogeneous activations with high probability, where $d$ is the dimension of the input, $k$ ($k\leq d$) is the number of hidden nodes, $λ$ is a conditioning property of the ground-truth parameter matrix between the input layer and the hidden layer, $ε$ is the targeted precision and $n$ is the number of samples. To the best of our knowledge, this is the first work that provides recovery guarantees for 1NNs with both sample complexity and computational complexity $\mathit{linear}$ in the input dimension and $\mathit{logarithmic}$ in the precision.

研究动机与目标

激发在高斯输入下对一隐藏层神经网络（1NNs）回归的理解。
确定激活函数的条件，使得平方损失在接近真实参数处具有局部强凸性。
开发基于张量的初始化，将参数放置在局部强凸区域的吸引盆地内。
建立一个全局收敛的训练流程，样本复杂度线性于输入维度，且对精度对数级依赖。

提出的方法

表征激活性质（性质3.1–3.3），确保在W*附近海森矩阵正定。
在这些激活性质下，证明经验海森矩阵的局部正定性和梯度下降的局部线性收敛。
引入张量方法来初始化W和v，使其落入强凸区域（算法1）。
通过先估计二阶矩来恢复子空间V，然后进行低维度张量分解（P3(V,V,V)），将基于张量的初始化从对维度的三次方依赖降低到线性依赖。
给出一个全局收敛算法（算法2），将张量初始化与迭代梯度下降结合起来，具有收敛保证（定理6.1）。

实验结果

研究问题

RQ1在何种激活函数条件下，1NN的平方损失在接近真实参数处具备局部强凸性？
RQ2张量基初始化是否能将参数放入吸引盆地，以保证梯度法的收敛？
RQ3在高斯输入下，恢复1NN真实参数所需的样本与计算复杂度是多少？
RQ4所提方法是否可以推广到具有可证明全局收敛性的光滑齐次激活？

主要发现

激活性质在充足样本下，使海森矩阵在接近真实参数的邻域内正定。
对于光滑的齐次激活，带再采样的梯度下降实现对真实参数的线性收敛。
张量初始化可在样本复杂度和时间复杂度上实现权重和输出权重的回收，均呈输入维度的线性增长（多项式因子除外）。
一个全局收敛的流程，将张量初始化与梯度下降结合起来，在高概率下恢复真实参数，样本复杂度 ~ linear in d and log(1/epsilon)。
在温和假设下，该工作为1NN提供了样本复杂度与计算复杂度均为 d 的线性并对 epsilon 的对数依赖的恢复保证。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。