[论文解读] Spurious Valleys in Two-layer Neural Network Optimization Landscapes
本论文为单隐层网络定义内在维度,并表明有限的内在维度在过参数化下防止虚假山谷,而无限内在维度允许它们;当存在时,虚假山谷位于低风险区域,且随着宽度增加不太可能出现。
Neural networks provide a rich class of high-dimensional, non-convex optimization problems. Despite their non-convexity, gradient-descent methods often successfully optimize these models. This has motivated a recent spur in research attempting to characterize properties of their loss surface that may explain such success. In this paper, we address this phenomenon by studying a key topological property of the loss: the presence or absence of spurious valleys, defined as connected components of sub-level sets that do not include a global minimum. Focusing on a class of two-layer neural networks defined by smooth (but generally non-linear) activation functions, we identify a notion of intrinsic dimension and show that it provides necessary and sufficient conditions for the absence of spurious valleys. More concretely, finite intrinsic dimension guarantees that for sufficiently overparametrised models no spurious valleys exist, independently of the data distribution. Conversely, infinite intrinsic dimension implies that spurious valleys do exist for certain data distributions, independently of model overparametrisation. Besides these positive and negative results, we show that, although spurious valleys may exist in general, they are confined to low risk levels and avoided with high probability on overparametrised models.
研究动机与目标
- 激发对神经网络非凸损失景观的理解。
- 表征一隐藏层网络中虚假山谷的存在与否。
- 引入内在维度概念,将结构与优化拓扑联系起来。
- 建立在不同激活函数下过参数化消除虚假山谷的条件。
- 在景观性质的背景下对比经验风险和总体风险。
提出的方法
- 将虚假山谷定义为不包含全局最小值的子水平集的连通分量。
- 引入上内在维度和下内在维度来量化网络的函数空间复杂性。
- 证明有限内在维度在足够宽的网络下保证不存在虚假山谷。
- 表明无限内在维度意味着在某些数据分布下存在虚假山谷。
- 通过推论给出多项式激活和ERM设置的专门结果。
- 讨论线性和二次激活的改进并与张量分解相关联。
实验结果
研究问题
- RQ1单隐藏层神经网络中虚假山谷何时存在或消失?
- RQ2在过参数化下,网络的内在维度如何影响优化景观?
- RQ3在不同激活类型下,总体风险与经验风险最小化的结果是否不同?
- RQ4是否可以对特定激活类别(如多项式、线性、二次)实现过参数化以保证无山谷的优化?
主要发现
- 当隐藏宽度 p 至少等于上内在维度 dim*(σ,X)(对于有限 dim*(σ,X))时,不会出现虚假山谷。
- 对于多项式激活,在充足的过参数化下ERM和总体风险下不会出现虚假山谷;对于线性/二次激活,结果在常数量级内是紧致的。
- 对于非多项式、非负激活,使用对抗性数据分布可以在任意宽度下构造虚假山谷。
- 过参数化模型仍可能出现虚假山谷,但它们的测度随宽度减小,且高概率地避免低能量虛假山谷。
- 在线性网络中,对平方损失在任何深度都不会出现虚假山谷;对于激活为二次且 m=1 的情况,p ≥ 2n+1 足以避免虚假山谷。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。