QUICK REVIEW

[論文レビュー] A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks

Umut Şimşekli, Levent Sagun|arXiv (Cornell University)|Jan 17, 2019

Gaussian Processes and Bayesian Inference参考文献 57被引用数 69

ひとこと要約

本論文は深層ネットワークにおける確率的勾配ノイズが重尾部（α-stable）であることを示し、SGDをLevy駆動SDEとして解析し、実験で非ガウス的尾部と2つのSGDフェーズを確認している。

ABSTRACT

The gradient noise (GN) in the stochastic gradient descent (SGD) algorithm is often considered to be Gaussian in the large data regime by assuming that the classical central limit theorem (CLT) kicks in. This assumption is often made for mathematical convenience, since it enables SGD to be analyzed as a stochastic differential equation (SDE) driven by a Brownian motion. We argue that the Gaussianity assumption might fail to hold in deep learning settings and hence render the Brownian motion-based analyses inappropriate. Inspired by non-Gaussian natural phenomena, we consider the GN in a more general context and invoke the generalized CLT (GCLT), which suggests that the GN converges to a heavy-tailed $\\alpha$-stable random variable. Accordingly, we propose to analyze SGD as an SDE driven by a L\\'{e}vy motion. Such SDEs can incur `jumps', which force the SDE transition from narrow minima to wider minima, as proven by existing metastability theory. To validate the $\\alpha$-stable assumption, we conduct extensive experiments on common deep learning architectures and show that in all settings, the GN is highly non-Gaussian and admits heavy-tails. We further investigate the tail behavior in varying network architectures and sizes, loss functions, and datasets. Our results open up a different perspective and shed more light on the belief that SGD prefers wide minima.

研究の動機と目的

SGDにおけるガウスノイズ仮定と、それに基づくCLTベースのSDE解析を問い直す。
確率的勾配ノイズのα-stable（重尾部）モデルを提案し、検証する。
尾部挙動をSGDのダイナミクスと、メタ安定性理論による広い極小点を見つける傾向と結びつける。
アーキテクチャ、データセット、およびミニバッチサイズに応じて尾部指標αがどう変化するかを経験的に特徴づける。

提案手法

尾部指数αを持つ確率的勾配ノイズに対してα-stable（SalphaS）ノイズモデルを採用する。
α<2のとき、SGDの連続時間極限としてLévy駆動SDEを導出する。
α-stable分布用に設計された尾部指数推定量を用いて、勾配ノイズサンプルからαを推定する。
変化する深さ・幅・ミニバッチサイズを用いて、MNIST、CIFAR-10、CIFAR-100上のFCNおよびCNNアーキテクチャで広範な実験を実施する。
Lévyノイズ下でのメタ安定性と初回退出挙動を解析し、ジャンプと2つのSGDフェーズを強調する。

実験結果

リサーチクエスチョン

RQ1深層ネットワークにおける確率的勾配ノイズはガウスではなく、α-stable（重尾部）か？
RQ2尾部指標αはネットワークサイズ、アーキテクチャ、データセット、ミニバッチサイズによってどう変化するか？
RQ3α-stableノイズがSGDダイナミクス、メタ安定性、および広い極小点の選好に与える影響は何か？
RQ4初期反復のダイナミクスはαのジャンプを示し、それが精度向上と相関するか？

主な発見

確率的勾配ノイズは設定を問わず高度に非ガウス的で、重尾部を持つ。
ミニバッチサイズの増加は尾部指数αにほとんど影響を与えない。
尾部指数αはアーキテクチャ、データセット、ネットワークサイズの影響を受け、SGDダイナミクスに影響する。
2つの異なるSGDフェーズが観察される：αは初期に急速に低下し、その後ジャンプして、精度が向上するにつれてαが安定化する。
2相の挙動はメタ安定性理論を支持し、ジャンプはαが最も低い値のときに発生する。
CIFARデータセットでは、多くの設定でα値が1.0–1.2の範囲であり、重尾を示している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。