QUICK REVIEW

[论文解读] How degenerate is the parametrization of neural networks with the ReLU activation function

Dennis Elbrächter, Julius Berner|arXiv (Cornell University)|May 23, 2019

Neural Networks and Applications被引用 11

一句话总结

本文通过分析网络参数与其实现函数之间的关系，研究了ReLU神经网络参数化中的退化问题。在浅层网络中，建立了实现映射在Sobolev范数下的逆稳定性，表明在受限参数空间中的局部最小值对应于近似最优的实现，从而实现了在函数空间上的有效优化。

ABSTRACT

Neural network training is usually accomplished by solving a non-convex optimization problem using stochastic gradient descent. Although one optimizes over the networks parameters, the main loss function generally only depends on the realization of the neural network, i.e. the function it computes. Studying the optimization problem over the space of realizations opens up new ways to understand neural network training. In particular, usual loss functions like mean squared error and categorical cross entropy are convex on spaces of neural network realizations, which themselves are non-convex. Approximation capabilities of neural networks can be used to deal with the latter non-convexity, which allows us to establish that for sufficiently large networks local minima of a regularized optimization problem on the realization space are almost optimal. Note, however, that each realization has many different, possibly degenerate, parametrizations. In particular, a local minimum in the parametrization space needs not correspond to a local minimum in the realization space. To establish such a connection, inverse stability of the realization map is required, meaning that proximity of realizations must imply proximity of corresponding parametrizations. We present pathologies which prevent inverse stability in general, and, for shallow networks, proceed to establish a restricted space of parametrizations on which we have inverse stability w.r.t. to a Sobolev norm. Furthermore, we show that by optimizing over such restricted sets, it is still possible to learn any function which can be learned by optimization over unrestricted sets.

研究动机与目标

通过分析从参数到实现函数的映射，理解ReLU神经网络的非凸优化景观。
识别参数空间中的局部最小值为何不必然对应于函数空间中的良好解，原因在于参数退化。
建立在受限参数空间上进行优化可产生近似最优实现的条件。
证明在受限空间上的优化可保持标准神经网络的近似能力。

提出的方法

分析从网络参数到计算函数的实现映射，重点关注其逆稳定性。
引入Sobolev范数以度量实现函数与其对应参数之间的接近程度。
限制参数空间以确保逆稳定性，防止浅层网络中出现病态退化。
利用逼近理论证明，受限空间仍能支持无限制优化所能学习的所有函数。
证明在受限参数空间中，正则化后的局部最小值对应于实现空间中的近似最优解。
建立损失函数在实现空间上为凸函数，从而在受限参数化下可提供收敛性保证。

实验结果

研究问题

RQ1为何ReLU网络参数空间中的局部最小值不必然对应于函数空间中的良好解？
RQ2在何种条件下可为浅层ReLU网络建立实现映射的逆稳定性？
RQ3在受限参数空间上进行优化是否仍能实现与无限制优化相同的表达能力？
RQ4范数的选择（如Sobolev范数）如何影响实现映射的稳定性？
RQ5参数空间中的正则化与函数空间中的最优性之间存在何种关系？

主要发现

由于参数化中的病态退化，实现映射的逆稳定性在一般情况下无法保证。
对于浅层ReLU网络，当在Sobolev范数下限制参数时，逆稳定性成立。
在受限参数空间中，正则化优化问题的局部最小值在实现空间中近乎最优。
受限参数空间保留了标准ReLU网络的全部近似能力。
尽管参数自由度减少，受限空间上的优化仍可产生与无限制优化相当的解。
所使用的损失函数（如均方误差、交叉熵）在实现空间上为凸函数，从而可提供强收敛保证。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。