QUICK REVIEW

[論文レビュー] Learning Activation Functions: A new paradigm for understanding Neural Networks

Mohit Goyal, Rajan Goyal|arXiv (Cornell University)|Jun 23, 2019

Neural Networks and Applications参考文献 16被引用数 51

ひとこと要約

この論文は Self-Learnable Activation Functions (SLAF) を導入し、SLNNs の活性化関数として学習可能な多項式を用い、多項式表現性と普遍性境界を証明し、回帰・分類・標準データセットで標準NNと比べて SLNNs をベンチマークする。

ABSTRACT

The scope of research in the domain of activation functions remains limited and centered around improving the ease of optimization or generalization quality of neural networks (NNs). However, to develop a deeper understanding of deep learning, it becomes important to look at the non linear component of NNs more carefully. In this paper, we aim to provide a generic form of activation function along with appropriate mathematical grounding so as to allow for insights into the working of NNs in future. We propose "Self-Learnable Activation Functions" (SLAF), which are learned during training and are capable of approximating most of the existing activation functions. SLAF is given as a weighted sum of pre-defined basis elements which can serve for a good approximation of the optimal activation function. The coefficients for these basis elements allow a search in the entire space of continuous functions (consisting of all the conventional activations). We propose various training routines which can be used to achieve performance with SLAF equipped neural networks (SLNNs). We prove that SLNNs can approximate any neural network with lipschitz continuous activations, to any arbitrary error highlighting their capacity and possible equivalence with standard NNs. Also, SLNNs can be completely represented as a collections of finite degree polynomial upto the very last layer obviating several hyper parameters like width and depth. Since the optimization of SLNNs is still a challenge, we show that using SLAF along with standard activations (like ReLU) can provide performance improvements with only a small increase in number of parameters.

研究の動機と目的

ニューラルネットワークの活性化関数に対する理論的・実践的再考を動機づける。
多項式基底に基づく一般的で学習可能な活性化形式を提案する。
SLAFがSLNNsを用いて従来のネットワークを近似し、ネットワークを多項式特徴として表現する方法を調べる。
回帰および分類タスク全般における最適化の影響と経験的性能を検証する。

提案手法

SLAFを、事前定義された多項式基底要素（Taylor、Chebyshev、EMFN）の加重和として定義し、最適な活性化を近似する。
基底要素に対して平均-分散正規化を用い、SGD最適化中の勾配を安定化させる。
SLAF活性化を持つSLNNが任意の Lipschitz-continuous-activation ネットワークを近似できることを証明する（定理Aおよび関連議論）。
SLNNを多項式表現の視点で示し、出力が入力特徴量の次数kまでの多項式として表されることを示す（定理2）。
適応的な多項式活性化を用いた最適化を管理するための訓練ルーチンと正則化戦略を提示する。
回帰および分類のベンチマークでSLNNを標準NNと比較し、表現と最適化の影響を分析する。

実験結果

リサーチクエスチョン

RQ1SLAFは有限次数の多項式を通じて、一般的な活性化関数の空間を効果的に近似できるか。
RQ2SLNNはベンチマークタスクで標準NNと同等または優れた性能を提供するか。
RQ3SLAFの次数が近似・最適化・一般化に与える影響は何か。
RQ4多項式表現がSLNNの学習ダイナミクスとパラメータ効率にどう影響するか。
RQ5SLNNを多項式特徴として表すのに必要なパラメータ数に関する理論的境界は何か。

主な発見

Algorithm	Degree/Description	Training RMSE	Testing RMSE
NNRELUBN	2 Hidden Layers	1.32	3.78
SLNN	k=8, k1=4, k2=2	2.09	3.98
LRSGD	Degree=8 , Penalty=0.01	22.03	22.69
LLS	Degree=8, Penalty=0.01	1.59	3.06

Taylor多項式基底で係数を学習すると、SLAFは多くの既存の活性化関数を近似できる。
SLNNは複数のタスクで標準NNと同等の性能を達成でき、改善や同等の精度を示すケースもある。
SLNNはSLAFによって決定される次数までの多項式表現を認め、次数は実効的な特徴空間の境界を規定する。
基底関数の正規化は勾配爆発/消失の問題を緩和し、訓練を安定化させる。
SLAFを標準活性化と組み合わせて使用することで、わずかなパラメータ増加で性能向上を得られる。
様々な設定でMNIST、CIFAR-10、Fashion-MNISTにおいて競合力のあるまたは優れた精度を経験的に示している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。