QUICK REVIEW

[论文解读] On Approximation Capabilities of ReLU Activation and Softmax Output Layer in Neural Networks

Behnam Asadi, Hui Jiang|arXiv (Cornell University)|Feb 10, 2020

Neural Networks and Applications参考文献 9被引用 19

一句话总结

本文通过证明：使用 ReLU 激活函数的足够大的单隐藏层前馈网络可以使用 ReLU 在 $L^1$ 空间中逼近任意函数，且使用 softmax 输出层的足够大网络可以使用 softmax 在 $L^1$ 空间中逼近任意指示函数（表示互斥类别标签），从而为神经网络中使用 ReLU 激活函数和 softmax 输出层提供了理论基础。这些结果解释了现代深度学习中分类任务广泛采用这些组件的原因。

ABSTRACT

In this paper, we have extended the well-established universal approximator theory to neural networks that use the unbounded ReLU activation function and a nonlinear softmax output layer. We have proved that a sufficiently large neural network using the ReLU activation function can approximate any function in $L^1$ up to any arbitrary precision. Moreover, our theoretical results have shown that a large enough neural network using a nonlinear softmax output layer can also approximate any indicator function in $L^1$, which is equivalent to mutually-exclusive class labels in any realistic multiple-class pattern classification problems. To the best of our knowledge, this work is the first theoretical justification for using the softmax output layers in neural networks for pattern classification.

研究动机与目标

将通用逼近理论扩展至使用无界 ReLU 激活函数的神经网络。
从理论上证明多类模式分类中非线性 softmax 输出层的合理性。
证明足够大的 ReLU 网络可以以任意精度逼近任何 $L^1$ 函数。
证明足够大的 softmax 网络可以以 $L^1$ 空间中逼近任意指示函数，等价于互斥类别标签。
为现代深度学习架构中 ReLU 和 softmax 的经验成功提供理论基础。

提出的方法

证明了任何属于 $L^1(I_d)$ 的函数都可以通过使用单隐藏层和足够宽度的 ReLU 网络来逼近。
构建了一个变换 $f'_i(\mathbf{x}) = \frac{2m}{\epsilon}(f_i(\mathbf{x}) - 0.5)$，将目标函数映射为适合 ReLU 逼近的形式。
使用三角不等式来界定网络输出与目标函数之间的 $L^1$ 误差。
利用引理 1 证明存在一个 ReLU 网络 $g(\mathbf{x})$，使得 $\|\text{softmax}(g(\mathbf{x}))_i - \text{softmax}(f'(\mathbf{x}))_i\|_1 < \epsilon/2$。
通过将定义域划分为 $f_i = 1$ 和 $f_i = 0$ 的区域，分析了 softmax 函数在指示函数上的行为。
利用不等式 $\exp(-x) \leq 1/x$（当 $x > 0$ 时）来界定 softmax 逼近的 $L^1$ 误差，使其小于 $\epsilon/2$，从而完成证明。

实验结果

研究问题

RQ1使用 ReLU 激活函数的神经网络能否逼近 $L^1$ 空间中的任意函数？
RQ2使用 softmax 输出层的神经网络能否逼近 $L^1$ 空间中的任意指示函数？
RQ3当在单隐藏层网络中联合使用 ReLU 和 softmax 时，通用逼近性质是否仍然成立？
RQ4ReLU 和 softmax 的理论合理性是否适用于具有互斥标签的真实多类分类问题？
RQ5通过使用足够大的网络，能否将逼近误差任意缩小至接近零？

主要发现

使用 ReLU 激活函数的足够大神经网络可以以任意精度逼近 $L^1(I_d)$ 中的任意函数。
通过增加网络宽度，ReLU 网络的逼近误差可小于任意 $\epsilon > 0$。
足够大的具有 softmax 输出层的网络可以逼近 $L^1(I_d)$ 中的任意指示函数，这对应于分类任务中互斥的类别标签。
通过变换和指数衰减分析，softmax 输出与目标指示函数之间的 $L^1$ 误差被控制在 $\epsilon/2$ 以内。
只要隐藏层能够逼近变换后的函数，softmax 逼近的理论结果与隐藏层所用的激活函数无关。
该证明建立了在分类任务中使用 softmax 的理论依据，首次为此类组件的广泛应用提供了理论基础。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。