QUICK REVIEW

[論文レビュー] Shake-Shake regularization

Xavier Gastaldi|arXiv (Cornell University)|May 21, 2017

Advanced Neural Network Applications参考文献 19被引用数 313

ひとこと要約

Shake-Shake は、マルチブランチネットワークにおける並列分岐の標準的な和を、学習時に確率的なアフィン結合で置換し、汎化性能を向上させ、CIFAR の最先端結果を達成する。

ABSTRACT

The method introduced in this paper aims at helping deep learning practitioners faced with an overfit problem. The idea is to replace, in a multi-branch network, the standard summation of parallel branches with a stochastic affine combination. Applied to 3-branch residual networks, shake-shake regularization improves on the best single shot published results on CIFAR-10 and CIFAR-100 by reaching test errors of 2.86% and 15.85%. Experiments on architectures without skip connections or Batch Normalization show encouraging results and open the door to a large set of applications. Code is available at https://github.com/xgastaldi/shake-shake

研究の動機と目的

BatchNorm や dropout を超えるマルチブランチネットワークの正則化を動機づける。
学習時に残差ブランチの確率的アフィン結合を提案する。
Shake-Shake を CIFAR-10 および CIFAR-100 上で最新ベースラインと比較評価する。
訓練時と推論時の挙動およびアーキテクチャ要素（スキップ接続、BN）の役割を探る。

提案手法

訓練中、残差和 x_{i+1}=x_i+F(x_i,W^{(1)})+F(x_i,W^{(2)}) を x_{i+1}=x_i+α_i F(x_i,W^{(1)})+(1−α_i) F(x_i,W^{(2)}) に置換し、α_i ∼ Uniform(0,1) とする。
テスト時には全ての α_i を期待値 0.5 に設定する。
各順伝播・逆伝播の前に α_i を独立に更新して、確率的な順伝播/逆伝播の流れを作り出す（勾配増強を目的として）。
2-branch ResNets（および 3-branch コンテキストの variante）を用いて、順伝播/逆伝播戦略を比較する（Shake, Even, Keep, Batch, Image-level updates）。
逆伝播係数 β_i.j と順伝播係数 α_i.j の相互作用を通じた正則化強度を調査し、残差ブランチ間の整列・相関を分析する。

実験結果

リサーチクエスチョン

RQ1残差ブランチの確率的アフィンブレンディングは、標準的な正則化手法を超えて CIFAR-10/100 の一般化を改善するか？
RQ2順伝播と逆伝播の摂動（Shake vs Keep vs Even）と係数が適用される場所（層・画像）が性能にどう影響するか？
RQ3Shake-Shake 正則化を可能にするアーキテクチャ要素（スキップ接続、BatchNorm）の役割は？
RQ4Shake-Shake が層間のブランチ間の相関と整列にどのように影響するか？
RQ5正則化の強さとダイナミクスを左右する要因は何で、どのように調整できるか？

主な発見

Model	Forward	Backward	Level	CIFAR-10 Error % (avg) - 26 2x32d	CIFAR-10 Error % (avg) - 26 2x64d
Even	Even	n/a	4.27	3.76	3.58
Even	Shake	Batch	4.44	-	-
Shake	Keep	Batch	4.11	-	-
Shake	Even	Batch	3.47	3.30	-
Shake	Shake	Batch	3.67	3.07	-
Even	Shake	Image	4.11	-	-
Shake	Keep	Image	4.09	-	-
Shake	Even	Image	3.47	3.20	-
Shake	Shake	Image	3.55	2.98	2.86

Shake-Shake with 2x32d/64d/96d branches achieves 3.55%, 2.98%, and 2.86% CIFAR-10 error respectively (average of 3–5 runs), outperforming many single-shot baselines.
Image-level coefficient application tends to yield stronger regularization effects than block-level or other schemes.
Removing skip connections or BatchNorm reveals that Shake-Shake can still regularize, but success depends on architecture and hyperparameters; some configurations diverge without BN or with too-strong coupling.
Correlation between the outputs of the two residual branches decreases under Shake-Shake, suggesting decorrelation promotes diverse learning between branches.
Backward-pass coefficient design critically affects learning; unintended configurations (e.g., β_i.j = 1−α_i.j) can drastically harm training, indicating sensitivity to coefficient alignment and timing.
CIFAR-100 results show Shake-Even-Image reduces error to 15.85% on a ResNeXt-29 2x4x64d variant, indicating cross-dataset regularization benefits.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。