QUICK REVIEW

[論文レビュー] Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)

Djork-Arné Clevert, Thomas Unterthiner|arXiv (Cornell University)|Nov 23, 2015

Domain Adaptation and Few-Shot Learning参考文献 41被引用数 2,311

ひとこと要約

ELUは負の値をとる活性化を導入し、平均活性化を0に近づけ、学習を加速させ、深層ネットワーク全体の汎化性能を向上させ、CIFARおよびImageNetでReLU系よりも優れている。

ABSTRACT

We introduce the "exponential linear unit" (ELU) which speeds up learning in deep neural networks and leads to higher classification accuracies. Like rectified linear units (ReLUs), leaky ReLUs (LReLUs) and parametrized ReLUs (PReLUs), ELUs alleviate the vanishing gradient problem via the identity for positive values. However, ELUs have improved learning characteristics compared to the units with other activation functions. In contrast to ReLUs, ELUs have negative values which allows them to push mean unit activations closer to zero like batch normalization but with lower computational complexity. Mean shifts toward zero speed up learning by bringing the normal gradient closer to the unit natural gradient because of a reduced bias shift effect. While LReLUs and PReLUs have negative values, too, they do not ensure a noise-robust deactivation state. ELUs saturate to a negative value with smaller inputs and thereby decrease the forward propagated variation and information. Therefore, ELUs code the degree of presence of particular phenomena in the input, while they do not quantitatively model the degree of their absence. In experiments, ELUs lead not only to faster learning, but also to significantly better generalization performance than ReLUs and LReLUs on networks with more than 5 layers. On CIFAR-100 ELUs networks significantly outperform ReLU networks with batch normalization while batch normalization does not improve ELU networks. ELU networks are among the top 10 reported CIFAR-10 results and yield the best published result on CIFAR-100, without resorting to multi-view evaluation or model averaging. On ImageNet, ELU networks considerably speed up learning compared to a ReLU network with the same architecture, obtaining less than 10% classification error for a single crop, single model network.

研究の動機と目的

学習におけるバイアスシフトを減らすため、平均活性化を0に近づける特性を持つ活性化関数を動機付ける。
ノイズ頑健性と学習安定性を高めるよう、負の値へ飽和するELUを開発する。
標準的な視覚ベンチマークでELUベースのネットワークの収束の速さと汎化性能を示す。

提案手法

ELU活性化を定義: f(x)=x for x>0, α(exp(x)-1) for x≤0, α>0 を満たす。
単位自然勾配を用いたバイアスシフトを分析し、活性化の性質が学習ダイナミクスにどう影響するかを示す。
ELUをReLU、Leaky ReLU、Shifted ReLUと比較する。
バッチ正規化あり/なしでELUネットワークを評価する。
深層自己符号化器と畳み込みネットワークを訓練して学習速度と汎化を評価。

実験結果

リサーチクエスチョン

RQ1深いネットワーク全体で、ELUはReLUベースの活性化と比較して学習を加速しますか？
RQ2CIFAR-10/100やImageNetなどの標準視覚ベンチマークでELUは汎化を向上させますか？
RQ3バッチ正規化との相互作用は、他の活性化と比べてどうですか？
RQ4ELUの負の飽和は頑健性と表現品質においてどのような役割を果たしますか？

主な発見

ネットワーク	CIFAR-10（テスト誤差％）	CIFAR-100（テスト誤差％）
AlexNet	18.04	45.80
DSN	7.97	34.57
NiN	8.81	35.68
Maxout	9.38	38.57
All-CNN	7.25	33.71
Highway Network	7.60	32.24
Fract. Max-Pooling	4.50	27.62
ELU-Network	6.55	24.28

5層を超えるネットワークでは、ELUネットワークはReLUおよびLeaky ReLUより学習を速く、汎化性能を著しく向上させる。
CIFAR-100で、多視点評価やモデルアベレージングを用いずに新しい最先端記録を樹立（公開された中で最高の結果）。
CIFAR-100とCIFAR-10のいくつかの設定で、Batch Normalizationを用いたReLUネットワークよりELUネットワークが優れている。
ImageNetでは、同等のReLUネットより収束が速く、トップ-5誤差20%をより早く達成（160k対200kイテレーション）。
ELUネットワークは調査したデータセット全体で訓練損失とテスト損失を競合活性化より低く維持。
ELUは負の領域で飽和し、前向き伝播の変動を減らし、より頑健な表現を生み出す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。