QUICK REVIEW

[논문 리뷰] Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)

Djork-Arné Clevert, Thomas Unterthiner|arXiv (Cornell University)|2015. 11. 23.

Domain Adaptation and Few-Shot Learning참고 문헌 41인용 수 2,311

한 줄 요약

ELU는 음수 값을 갖는 활성화를 도입해 평균 활성화를 0에 가까워지게 하여 학습 속도를 높이고 깊은 네트워크의 일반화 성능을 개선하며 CIFAR 및 ImageNet에서 ReLU 변형들을 능가합니다.

ABSTRACT

We introduce the "exponential linear unit" (ELU) which speeds up learning in deep neural networks and leads to higher classification accuracies. Like rectified linear units (ReLUs), leaky ReLUs (LReLUs) and parametrized ReLUs (PReLUs), ELUs alleviate the vanishing gradient problem via the identity for positive values. However, ELUs have improved learning characteristics compared to the units with other activation functions. In contrast to ReLUs, ELUs have negative values which allows them to push mean unit activations closer to zero like batch normalization but with lower computational complexity. Mean shifts toward zero speed up learning by bringing the normal gradient closer to the unit natural gradient because of a reduced bias shift effect. While LReLUs and PReLUs have negative values, too, they do not ensure a noise-robust deactivation state. ELUs saturate to a negative value with smaller inputs and thereby decrease the forward propagated variation and information. Therefore, ELUs code the degree of presence of particular phenomena in the input, while they do not quantitatively model the degree of their absence. In experiments, ELUs lead not only to faster learning, but also to significantly better generalization performance than ReLUs and LReLUs on networks with more than 5 layers. On CIFAR-100 ELUs networks significantly outperform ReLU networks with batch normalization while batch normalization does not improve ELU networks. ELU networks are among the top 10 reported CIFAR-10 results and yield the best published result on CIFAR-100, without resorting to multi-view evaluation or model averaging. On ImageNet, ELU networks considerably speed up learning compared to a ReLU network with the same architecture, obtaining less than 10% classification error for a single crop, single model network.

연구 동기 및 목표

평균 활성화를 0에 가깝게 밀어 학습의 바이어스 편향(shift)을 줄이려는 활성화 함수의 필요성을 제시한다.
노이즈 강인성과 학습 안정성을 높이기 위해 음수 값으로 포화되는 ELU를 개발한다.
표준 시각 벤치마크에서 ELU 기반 네트워크의 더 빠른 수렴과 더 나은 일반화를 입증한다.

제안 방법

ELU 활성화 정의: f(x)=x if x>0, α(exp(x)-1) if x≤0, α>0.
단위 자연 그래디언트를 사용한 바이어스 시프트를 분석하고 활성화 특성이 학습 역학에 미치는 영향을 보인다.
MNIST, CIFAR-10/100, ImageNet에서 ELU와 ReLU, Leaky ReLU, Shifted ReLU를 비교한다.
배치 정규화 여부에 따른 ELU 네트워크를 평가한다.
학습 속도와 일반화를 평가하기 위해 심층 자동인코더와 컨볼루션 신경망을 학습한다.

실험 결과

연구 질문

RQ1ELU가 깊은 네트워크에서 ReLU 기반 활성화에 비해 학습 속도를 높이는가?
RQ2ELU가 CIFAR-10/100 및 ImageNet과 같은 표준 시각 벤치마크에서 일반화를 개선하는가?
RQ3배치 정규화와 비교하여 ELU는 다른 활성화와 어떻게 상호작용하는가?
RQ4ELU의 음수 포화가 강건성과 표현 품질에 미치는 역할은 무엇인가?

주요 결과

ELU 네트워크는 5층을 넘는 네트워크에서 ReLU와 Leaky ReLU보다 더 빠른 학습과 현저하게 더 나은 일반화를 달성한다.
CIFAR-100에서 ELU 네트워크는 다중 시야 평가나 모델 평균화를 필요로 하지 않는 새로운 최첨단 성과를 기록했다(가장 높은 게시 결과).
CIFAR-100 및 CIFAR-10의 여러 설정에서 배치 정규화가 있는 ReLU 네트워크보다 ELU 네트워크가 더 우수하다.
ImageNet에서 ELU 네트워크는 동등한 ReLU 네트보다 더 빠르게 수렴하여 더 높은 상위 5의 오차를 더 빨리(160k 대 200k 반복) 도달한다.
ELU는 여러 데이터셋에서 경쟁 활성화들보다 학습 및 테스트 손실이 더 낮다.
ELU는 음수 영역에서 포화되어 순전달 변동을 감소시키고 더 강건한 표현을 만든다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.