QUICK REVIEW

[논문 리뷰] Shake-Shake regularization

Xavier Gastaldi|arXiv (Cornell University)|2017. 05. 21.

Advanced Neural Network Applications참고 문헌 19인용 수 313

한 줄 요약

Shake-Shake은 다중 분기 네트워크에서 병렬 가지의 표준 합산을 학습 중 확률적 선형 조합으로 대체하여 일반화 성능을 향상시키고 CIFAR의 최첨단 결과를 달성한다.

ABSTRACT

The method introduced in this paper aims at helping deep learning practitioners faced with an overfit problem. The idea is to replace, in a multi-branch network, the standard summation of parallel branches with a stochastic affine combination. Applied to 3-branch residual networks, shake-shake regularization improves on the best single shot published results on CIFAR-10 and CIFAR-100 by reaching test errors of 2.86% and 15.85%. Experiments on architectures without skip connections or Batch Normalization show encouraging results and open the door to a large set of applications. Code is available at https://github.com/xgastaldi/shake-shake

연구 동기 및 목표

배치 정규화(BatchNorm)와 드롭아웃을 넘어서 다중 분기 네트워크에 대한 정규화의 필요성을 제시한다.
학습 중 잔차 분기의 확률적 선형 결합을 제안한다.
최첨단 baselines와 비교하여 CIFAR-10 및 CIFAR-100에서 Shake-Shake를 평가한다.
학습 시점과 추론 시점의 동작 차이 및 건축적 구성 요소(스킵 연결, 배치 정규화)의 역할을 탐구한다.

제안 방법

학습 중에 잔차 합 x_{i+1}=x_i+F(x_i,W^{(1)})+F(x_i,W^{(2)})를 x_{i+1}=x_i+α_i F(x_i,W^{(1)})+(1−α_i) F(x_i,W^{(2)})로 대체하고, α_i ∼ Uniform(0,1)이다.
테스트 시에는 모든 α_i를 기대값인 0.5로 설정한다.
앞향 및 뒤향 패스 전 independently로 α_i를 업데이트하여 확률적 순방향/역방향 흐름(그래디언트 증강)을 생성한다.
2-브랜치 ResNet(및 3-브랜치 맥락의 변형)으로 실험하고 순방향/역방향 전략(Shake, Even, Keep, Batch, Image-level 업데이트)을 비교한다.
역전파 계수 β_i.j와 순전파 계수 α_i.j의 상호 작용을 통해 정규화 강도를 조사하고 잔차 분기 간의 정렬 및 상관 관계를 분석한다.

실험 결과

연구 질문

RQ1Does stochastic affine blending of residual branches improve generalization on CIFAR-10/100 beyond standard regularization methods?
RQ2How do forward vs backward perturbations (Shake vs Keep vs Even) and where (layer, image) the coefficients are applied affect performance?
RQ3What is the role of architectural elements (skip connections, BatchNorm) in enabling Shake-Shake regularization?
RQ4How does Shake-Shake influence inter-branch correlation and alignment across layers?
RQ5What controls the strength and dynamics of the regularization, and how can it be tuned?

주요 결과

모델	전방	역방향	레벨	CIFAR-10 오차 % (평균) - 26 2x32d	CIFAR-10 오차 % (평균) - 26 2x64d
Even	Even	n/a	4.27	3.76	3.58
Even	Shake	Batch	4.44	-	-
Shake	Keep	Batch	4.11	-	-
Shake	Even	Batch	3.47	3.30	-
Shake	Shake	Batch	3.67	3.07	-
Even	Shake	Image	4.11	-	-
Shake	Keep	Image	4.09	-	-
Shake	Even	Image	3.47	3.20	-
Shake	Shake	Image	3.55	2.98	2.86

Shake-Shake with 2x32d/64d/96d branches achieves 3.55%, 2.98%, and 2.86% CIFAR-10 error respectively (average of 3–5 runs), outperforming many single-shot baselines.
Image-level coefficient application tends to yield stronger regularization effects than block-level or other schemes.
Removing skip connections or BatchNorm reveals that Shake-Shake can still regularize, but success depends on architecture and hyperparameters; some configurations diverge without BN or with too-strong coupling.
Correlation between the outputs of the two residual branches decreases under Shake-Shake, suggesting decorrelation promotes diverse learning between branches.
Backward-pass coefficient design critically affects learning; unintended configurations (e.g., β_i.j = 1−α_i.j) can drastically harm training, indicating sensitivity to coefficient alignment and timing.
CIFAR-100 results show Shake-Even-Image reduces error to 15.85% on a ResNeXt-29 2x4x64d variant, indicating cross-dataset regularization benefits.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.