QUICK REVIEW

[论文解读] On the Complexity of Learning Neural Networks

Le Song, Santosh Vempala|arXiv (Cornell University)|Jul 14, 2017

Stochastic Gradient Optimization Techniques参考文献 2被引用 24

一句话总结

本文在对数凹输入分布下，为具有平滑激活函数（如ReLU、Sigmoid）的一层隐藏层神经网络的训练复杂度建立了基本的下限。研究证明，任何统计查询算法——包括所有标准的SGD变体——在学习即使是最简单的、可实现的函数时，也需呈指数级数量的查询，尽管其网络规模小且数据分布良好。

ABSTRACT

The stunning empirical successes of neural networks currently lack rigorous theoretical explanation. What form would such an explanation take, in the face of existing complexity-theoretic lower bounds? A first step might be to show that data generated by neural networks with a single hidden layer, smooth activation functions and benign input distributions can be learned efficiently. We demonstrate here a comprehensive lower bound ruling out this possibility: for a wide class of activation functions (including all currently used), and inputs drawn from any logconcave distribution, there is a family of one-hidden-layer functions whose output is a sum gate, that are hard to learn in a precise sense: any statistical query algorithm (which includes all known variants of stochastic gradient descent with any loss function) needs an exponential number of queries even using tolerance inversely proportional to the input dimensionality. Moreover, this hard family of functions is realizable with a small (sublinear in dimension) number of activation units in the single hidden layer. The lower bound is also robust to small perturbations of the true weights. Systematic experiments illustrate a phase transition in the training error as predicted by the analysis.

研究动机与目标

理解为何神经网络在实践中泛化性能良好，尽管缺乏严格的理论依据。
探究具有平滑激活函数和良性输入的单隐藏层神经网络所生成的函数是否可以被高效学习。
确定统计查询算法——代表所有已知的基于梯度的训练方法——是否能够高效学习此类函数。
在现实假设（平滑激活函数、对数凹输入）下，为神经网络学习建立正式的复杂度障碍。

提出的方法

将神经网络的训练形式化为统计查询（SQ）问题，其中梯度更新对应于对损失导数期望值的查询。
应用Feldman等人推广的SQ框架，使用VSTAT(t)预言机来建模查询的准确度和容差。
构造了一类可由小规模、单隐藏层网络（使用Sigmoid或ReLU激活函数）和求和门输出实现的函数族。
证明这些函数之间的相关性结构导致高统计维数，从而意味着指数级查询复杂度。
使用柯西-施瓦茨不等式和马尔可夫不等式，对每轮查询可排除的函数数量进行上界估计，从而得出指数级下限。
证明该下限对真实网络权重的小扰动具有鲁棒性，增强了其在实际中的相关性。

实验结果

研究问题

RQ1具有平滑激活函数和对数凹输入的单隐藏层神经网络所生成的函数，能否被统计查询算法高效学习？
RQ2SGD在训练深度网络中的经验成功，是否与在现实假设下的已知复杂性理论下限相矛盾？
RQ3任何统计查询算法学习由小规模单隐藏层网络实现的函数类，所需的最少查询次数是多少？
RQ4神经网络函数族中函数之间的相关性结构，如何影响目标函数的可学习性？
RQ5该硬度结果对真实网络权重的小扰动是否具有鲁棒性，从而反映现实世界中的训练噪声？

主要发现

任何统计查询算法在学习具有平滑激活函数和对数凹输入的一层隐藏层神经网络函数族时，均需呈指数级数量的查询。
在相同假设下，该下限对所有常用激活函数（包括ReLU和Sigmoid）均成立。
该困难函数族仅需隐藏单元数量为维度的次线性数量级即可实现，表明该硬度并非源于网络规模。
该下限对真实网络权重的小扰动具有鲁棒性，表明其在现实训练噪声下依然成立。
理论下限得到系统性实验的支持，实验显示训练误差存在与分析一致的相变现象。
该学习问题的统计维数被证明为指数级大，意味着任何SQ算法都无法高效学习该函数类。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。