QUICK REVIEW

[論文レビュー] Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

Yoshua Bengio, Nicholas Léonard|arXiv (Cornell University)|Aug 15, 2013

Stochastic Gradient Optimization Techniques参考文献 11被引用数 2,001

ひとこと要約

著者らは確率的または非滑らかなニューロンの4つの勾配推定戦略を比較し、ネットワークの一部をゲートする条件付き計算設定での使用を実証する。

ABSTRACT

Stochastic neurons and hard non-linearities can be useful for a number of reasons in deep learning models, but in many cases they pose a challenging problem: how to estimate the gradient of a loss function with respect to the input of such stochastic or non-smooth neurons? I.e., can we "back-propagate" through these stochastic neurons? We examine this question, existing approaches, and compare four families of solutions, applicable in different settings. One of them is the minimum variance unbiased gradient estimator for stochatic binary neurons (a special case of the REINFORCE algorithm). A second approach, introduced here, decomposes the operation of a binary stochastic neuron into a stochastic binary part and a smooth differentiable part, which approximates the expected effect of the pure stochatic binary neuron to first order. A third approach involves the injection of additive or multiplicative noise in a computational graph that is otherwise differentiable. A fourth approach heuristically copies the gradient with respect to the stochastic output directly as an estimator of the gradient with respect to the sigmoid argument (we call this the straight-through estimator). To explore a context where these estimators are useful, we consider a small-scale version of {\em conditional computation}, where sparse stochastic units form a distributed representation of gaters that can turn off in combinatorially many ways large chunks of the computation performed in the rest of the neural network. In this case, it is important that the gating units produce an actual 0 most of the time. The resulting sparsity can be potentially be exploited to greatly reduce the computational cost of large deep networks for which conditional computation would be useful.

研究の動機と目的

確率的または非滑らかなニューロンを介した勾配推定を条件付き計算のために動機づける。
4つの勾配推定ファミリーをレビューし比較する: 偏りのない勾配推定器、確率的-滑らかな分解、ノイズを注入した微分可能グラフ、ストレートスルー法。
大規模ネットワークの一部を選択的に活性化する確率的ゲートでの訓練の実行可能性を示す。
スパース性制約を持つゲーティング/エキスツアーアーキテクチャで提案手法の実用的性能を評価する。

提案手法

形式として h_i = f(a_i, z_i) の確率的ニューロンを定式化し、勾配流の機会を導出する。
四つのアプローチを導入: (i) 確率的二項ニューロンの偏りのない勾配推定器（REINFORCE-like）；(ii) 確率的二項ニューロンを確率的二項部分と滑らかな一次近似に分解；(iii) ノイズを注入して微分可能なグラフを作る；(iv) バイナリ/確率ゲートを通じて勾配を伝播するストレートスルー推定器。
(4) Noisy Rectifier、STS (Stochastic Times Smooth)、ST (Straight-Through)、および Unbiased REINFORCE-based estimators を提案・分析。
(5) 中心化推定量とユニット固有のベースラインによる分散削減について論じる。

実験結果

リサーチクエスチョン

RQ1確率的または非滑らかなニューロンを介して勾配を効果的に伝播できるか？
RQ2確率的二項またはゲート付きユニットに対して偏りのないまたは低分散の更新を提供する勾配推定器はどれか？
RQ3確率的ゲートは意味のある条件付き計算と計算コストの削減を可能にするか？
RQ4MNISTを用いたゲーティング/エキスツアー網でこれらの推定器は実際にどう機能するか？

主な発見

手法	訓練	検証	テスト
Noisy Rectifier	6.7e-4	1.52	1.87
Straight-through	3.3e-3	1.42	1.39
Smooth Times Stoch.	4.4e-3	1.86	1.96
Stoch. Binary Neuron	9.9e-3	1.78	1.89
Baseline Rectifier	6.3e-5	1.66	1.60
Baseline Sigmoid+Noise	1.8e-3	1.88	1.87
Baseline Sigmoid	3.2e-3	1.97	1.92

確率的二項ニューロンの偏りのない勾配推定器は、期待損失の勾配に対して偏りがないことが証明される。
STS ユニットと Noisy Rectifier は好ましい特性を示し、確率的ゲーティングでの勾配流を可能にする。
ストレートスルー推定器はバイアスがあるにもかかわらず実践で驚くほど良好に機能し、実験ではしばしば検証/テスト結果で最良を示す。
確率的ゲーターでゲーティングを制御すると、約10%程度のユニットをゲートすることで計算を抑制でき、性能への影響は控えめ。
全ての検証済み推定器は訓練を進行させることを可能にしており、ノイズ注入は訓練目的と一般化の両方を改善できる。
MNISTの報告された実験で Straight-Through ユニットが最良の検証・テスト誤差を達成。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。