QUICK REVIEW

[论文解读] Deep Learning using Rectified Linear Units (ReLU)

Abien Fred Agarap|arXiv (Cornell University)|Mar 22, 2018

Neural Networks and Applications参考文献 13被引用 2,484

一句话总结

本研究测试在深度神经网络中将 ReLU 作为分类函数的可行性，比较 DL-ReLU 与 DL-Softmax，在 MNIST、Fashion-MNIST 和 WDBC 数据集上，采用 FFNN 和 CNN 架构。

ABSTRACT

We introduce the use of rectified linear units (ReLU) as the classification function in a deep neural network (DNN). Conventionally, ReLU is used as an activation function in DNNs, with Softmax function as their classification function. However, there have been several studies on using a classification function other than Softmax, and this study is an addition to those. We accomplish this by taking the activation of the penultimate layer $h_{n - 1}$ in a neural network, then multiply it by weight parameters $θ$ to get the raw scores $o_{i}$. Afterwards, we threshold the raw scores $o_{i}$ by $0$, i.e. $f(o) = \max(0, o_{i})$, where $f(o)$ is the ReLU function. We provide class predictions $\hat{y}$ through argmax function, i.e. argmax $f(x)$.

研究动机与目标

激发在深度网络中用 ReLU 作为最终分类器来替代 Softmax。
在标准基准上评估 DL-ReLU 相对于 DL-Softmax 的性能。
分析在不同体系结构和数据集上的训练收敛性、准确率和按类别的指标。
指出 ReLU 分类的潜在缺点并提出未来改进。

提出的方法

使用两种网络类型（FFNN 和 CNN），并带有 Softmax 与 ReLU 最后一层分类器。
使用相同超参数的 Adam 优化器进行训练，以实现公平比较。
对 MNIST/Fashion-MNIST 使用归一化和 PCA 的数据预处理以降低维度。
将 Softmax 交叉熵损失替换为基于 ReLU 的交叉熵公式，并按常规反向传播梯度。
使用 10-fold 交叉验证、测试准确率、精确度、召回率、F1 分数和混淆矩阵进行评估。

实验结果

研究问题

RQ1将 Softmax 替换为 ReLU 作为分类层，是否能在 MNIST、Fashion-MNIST 和 WDBC 上获得与 Softmax 基模型相当或更高的准确率？
RQ2基于 ReLU 的分类对 FFNN 和 CNN 架构的训练收敛与学习动态有何影响？
RQ3使用 ReLU 作为最终分类器时的按类别性能模式（精确度/召回率/F1）是怎样的？
RQ4影响 DL-ReLU 性能的限制（如 dying ReLU）有哪些，如何缓解？

主要发现

模型	数据集	训练-交叉验证	测试准确率	精确度	召回率	F1-Score
FFNN-Softmax	MNIST	99.29%	97.98%	0.98	0.98	0.98
FFNN-ReLU	MNIST	98.22%	97.77%	0.98	0.98	0.98
CNN-Softmax	MNIST	97.23%	95.36%	0.95	0.95	0.95
CNN-ReLU	MNIST	73.53%	91.74%	0.92	0.92	0.92
FFNN-Softmax	Fashion-MNIST	98.87%	89.35%	0.89	0.89	0.89
FFNN-ReLU	Fashion-MNIST	92.23%	89.06%	0.89	0.89	0.89
CNN-Softmax	Fashion-MNIST	91.96%	86.08%	0.86	0.86	0.86
CNN-ReLU	Fashion-MNIST	83.24%	85.84%	0.86	0.86	0.86
FFNN-Softmax	WDBC	91.21%	92.40%	0.92	0.92	0.92
FFNN-ReLU	WDBC	87.96%	90.64%	0.91	0.91	0.90

DL-ReLU 往往在不同数据集和架构上达到与 DL-Softmax 相当的性能。
在 MNIST 上，FFNN-ReLU 在测试准确率方面几乎等同于 FFNN-Softmax（97.77% 对 97.98%）。
CNN-ReLU 在 MNIST 上收敛更慢，且交叉验证准确率落后于 CNN-Softmax（73.53% 对 97.23%），但测试准确率达到 91.74%。
在 Fashion-MNIST 上，FFNN-ReLU 接近 FFNN-Softmax（89.06% 对 89.35% 的测试准确率）。
CNN-ReLU 在 Fashion-MNIST 上的交叉验证准确率低于 CNN-Softmax，但测试准确率相近（85.84% 对 86.08%）。
在 WDBC 上，FFNN-ReLU 在交叉验证和测试上均低于 FFNN-Softmax（测试准确率 90.64% 对 92.40%，F1 0.90 对 0.92）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。