QUICK REVIEW

[논문 리뷰] BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning

Yeming Wen, Dustin Tran|arXiv (Cornell University)|2020. 02. 16.

Domain Adaptation and Few-Shot Learning참고 문헌 54인용 수 127

한 줄 요약

BatchEnsemble은 공유 가중치와 멤버별 랭크-1 섭동의 Hadamard 곱으로 각 멤버의 가중치를 구성하는 파라미터 효율적인 앙상블 방법을 제시하여 빠르고 메모리 효율적인 앙상블과 확장 가능한 평생 학습을 가능하게 한다.

ABSTRACT

Ensembles, where multiple neural networks are trained individually and their predictions are averaged, have been shown to be widely successful for improving both the accuracy and predictive uncertainty of single neural networks. However, an ensemble's cost for both training and testing increases linearly with the number of networks, which quickly becomes untenable. In this paper, we propose BatchEnsemble, an ensemble method whose computational and memory costs are significantly lower than typical ensembles. BatchEnsemble achieves this by defining each weight matrix to be the Hadamard product of a shared weight among all ensemble members and a rank-one matrix per member. Unlike ensembles, BatchEnsemble is not only parallelizable across devices, where one device trains one member, but also parallelizable within a device, where multiple ensemble members are updated simultaneously for a given mini-batch. Across CIFAR-10, CIFAR-100, WMT14 EN-DE/EN-FR translation, and out-of-distribution tasks, BatchEnsemble yields competitive accuracy and uncertainties as typical ensembles; the speedup at test time is 3X and memory reduction is 3X at an ensemble of size 4. We also apply BatchEnsemble to lifelong learning, where on Split-CIFAR-100, BatchEnsemble yields comparable performance to progressive neural networks while having a much lower computational and memory costs. We further show that BatchEnsemble can easily scale up to lifelong learning on Split-ImageNet which involves 100 sequential learning tasks.

연구 동기 및 목표

연산 및 메모리 비용을 줄이면서도 효과적인 앙상블의 필요성을 촉진한다.
전통적 앙상블의 대안으로 파라미터 효율적인 BatchEnsemble을 소개한다.
분류, 번역, 그리고 평생 학습 벤치마크에서 BatchEnsemble의 성능을 입증한다.
BatchEnsemble이 보정된 예측과 경쟁력 있는 불확실성 추정치를 제공함을 보여준다.

제안 방법

각 앙상블 멤버의 가중치를 W_i = W ∘ (r_i s_i^T)로 정의한다. 여기서 W는 공유되고 r_i, s_i는 멤버별 벡터이다.
계산을 벡터화하여 하나의 미니배치 내에서 여러 앙상블 멤버를 병렬로 업데이트하도록 하여 디바이스 수준 및 디바이스 내 병렬성을 가능하게 한다 (Y = φ(((X ∘ R) W) ∘ S)).
모든 멤드가 같은 입력을 한 번의 순전파로 처리하도록 미니배치를 B·M 으로 확장하여 앙상블 멤버 간 예측을 평균화하는 테스트 전략을 사용한다.
BatchEnsemble을 평생 학습에 적용하여 첫 번째 태스크에 공유 W와 빠른 가중치의 한 쌍을 학습하고, 이후 태스크에는 새로운 빠른 가중치만 학습한다.
MC-dropout 및 순진한 앙상블과 비교하여 불확실성 보정 및 이상치(out-of-distribution) 성능을 평가한다.

실험 결과

연구 질문

RQ1Can BatchEnsemble achieve competitive accuracy and uncertainty estimates with substantially lower memory and computation than traditional ensembles?
RQ2How well does BatchEnsemble scale to lifelong learning with many sequential tasks?
RQ3What is the impact of BatchEnsemble on calibration and out-of-distribution robustness?
RQ4How does BatchEnsemble perform across vision, language, and translation tasks compared to standard baselines?

주요 결과

BatchEnsemble은 전통적 앙상블과 유사한 정확도와 불확실성 추정치를 달성하면서 비용을 크게 줄인다: 앙상블 크기 4에서 테스트 시속도 3배 증가 및 메모리 절감 약 3배.
평생 학습에서 BatchEnsemble은 누적된 신경망과 비교하여 메모리 및 계산이 훨씬 적으면서도 경쟁력 있는 정확도를 달성하며, 최대 100개의 순차 태스크까지 확장 가능하다.
BatchEnsemble은 손상된 데이터 및 손상된 유사 데이터에서 잘 보정된 예측을 제공하고, 드롭아웃 앙상블과 비교해 보정이 경쟁력이 있으며 드롭아웃과 결합 시 이점을 가질 수 있다.
CIFAR-10/100, WMT14 EN-DE/EN-FR 및 아웃-오브-디스트리뷰션(out-of-distribution) 태스크에서 BatchEnsemble은 Transformer 기반 설정(인코더 자기 주의 계층)에서 강한 성능과 더 빠른 수렴을 보여준다.
다양성 분석은 제한된 학습 데이터에서도 BatchEnsemble이 순진한 앙상블에 근접한 다양화를 달성할 수 있음을 보이고, 더 큰 네트워크의 이점도 활용한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.