QUICK REVIEW

[論文レビュー] Swapout: Learning an ensemble of deep architectures

Saurabh Singh, Derek Hoiem|arXiv (Cornell University)|May 20, 2016

Advanced Neural Network Applications参考文献 18被引用数 105

ひとこと要約

Swapout は、 dropout と stochastic depth を一般化した確率的なトレーニング手法で、単位レベルおよび層レベルでアーキテクチャのアンサンブルをサンプリングし、同じ深さの ResNet より精度を向上させ、非常に wide で浅いネットワークが深いモデルに匹敵するようにする。

ABSTRACT

We describe Swapout, a new stochastic training method, that outperforms ResNets of identical network structure yielding impressive results on CIFAR-10 and CIFAR-100. Swapout samples from a rich set of architectures including dropout, stochastic depth and residual architectures as special cases. When viewed as a regularization method swapout not only inhibits co-adaptation of units in a layer, similar to dropout, but also across network layers. We conjecture that swapout achieves strong regularization by implicitly tying the parameters across layers. When viewed as an ensemble training method, it samples a much richer set of architectures than existing methods such as dropout or stochastic depth. We propose a parameterization that reveals connections to exiting architectures and suggests a much richer set of architectures to be explored. We show that our formulation suggests an efficient training method and validate our conclusions on CIFAR-10 and CIFAR-100 matching state of the art accuracy. Remarkably, our 32 layer wider model performs similar to a 1001 layer ResNet model.

研究の動機と目的

Dropout と stochastic depth を超える深層ネットワークの正則化とアーキテクチャの多様性を動機づける。
豊富なアーキテクチャセットからサンプリングする一般化された確率的トレーニングフレームワーク（Swapout）を開発する。
CIFAR-10 と CIFAR-100 で Swapout を ResNet および基準となる確率的手法と比較評価する。
より広く、浅い Swapout モデルが非常に深い残差ネットワークと同等以上を達成できることを示す。

提案手法

Swapout を 0, X, F(X), および X+F(X) を含む複数のオプションの per-unit 確率的選択として定義する。
Swapout が dropout と stochastic depth を特別なケースとして一般化することを示す。
Swapout を SGD の安定性に結びつけるリプシッツ安定性の議論を提供し、dropout に類似する安定性を持つ。
推論アプローチを比較する: 決定的（期待値）対確率的（複数のネットワーク实例をサンプリング）。
CIFAR-10/100 で ResNet に似たブロックを用い、決定的および確率的推論を用い、ネットワークの幅と深さを変化させて実験する。
パラメータ効率的な結果を提示し、より広く、浅い Swapout ネットワークが非常に深い ResNet に匹敵しうることを示す。

実験結果

リサーチクエスチョン

RQ1Swapout は CIFAR-10/CIFAR-100 で同等の深さの ResNet より精度を改善できるか？
RQ2Swapout でネットワークの幅を広げると、より深いアーキテクチャに匹敵する利得が得られるか？
RQ3異なる確率的トレーニングスケジュール（層ごとの Bernoulli パラメータ）は性能にどのように影響するか？
RQ4確率的推論（複数のフォワードをサンプリングして平均する）は決定的推論より Swapout に有利か？
RQ5パラメータ効率と性能の関係は、ベースライン手法と比較して Swapout でどうなるか？

主な発見

方法	パラメータ数	誤差(%)
DropConnect [20]	-	9.32
NIN [11]	-	8.81
FitNet(19) [15]	-	8.39
DSN [10]	-	7.97
Highway [18]	-	7.60
ResNet v1(110) [4]	1.7M	6.41
Stochastic Depth v1(110) [6]	1.7M	24.58
ResNet v2 Ours (20) [5]	1.7M	28.08
SwapOut v1(20) W×2	1.09M	6.58
ResNet v2 (1001) [5]	10.2M	4.92
SwapOut v2(32) W×4	7.43M	4.76

Swapout は CIFAR-10 および CIFAR-100 で相当する ResNet ベースラインより精度を向上させる。
32 層の幅広い Swapout モデルは CIFAR データセットのいずれもで 1001 層の ResNet の性能に匹敵する。
Swapout で幅を増やすと顕著な利得が生まれ、より多くのパラメータを持つ深い ResNet よりも上回ることがある。
確率的推論（複数のサンプルからの予測を平均）は決定的推論を一貫して上回る。
異なる確率的トレーニングスケジュールは性能に大きく影響し、初層のランダム性が少ない方が一般的に良い。
Swapout はパラメータ効率の利得を達成し、時にはより多くのパラメータを持つ深いモデルよりも優れる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。