QUICK REVIEW

[論文レビュー] BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning

Yeming Wen, Dustin Tran|arXiv (Cornell University)|Feb 16, 2020

Domain Adaptation and Few-Shot Learning参考文献 54被引用数 127

ひとこと要約

BatchEnsemble は、各メンバーの重みを共有重みと個々のランク-1摂動の Hadamard 積とするパラメータ効率のアンサンブル法を提案し、迅速でメモリ効率の良いアンサンブルとスケーラブルな生涯学習を実現します。

ABSTRACT

Ensembles, where multiple neural networks are trained individually and their predictions are averaged, have been shown to be widely successful for improving both the accuracy and predictive uncertainty of single neural networks. However, an ensemble's cost for both training and testing increases linearly with the number of networks, which quickly becomes untenable. In this paper, we propose BatchEnsemble, an ensemble method whose computational and memory costs are significantly lower than typical ensembles. BatchEnsemble achieves this by defining each weight matrix to be the Hadamard product of a shared weight among all ensemble members and a rank-one matrix per member. Unlike ensembles, BatchEnsemble is not only parallelizable across devices, where one device trains one member, but also parallelizable within a device, where multiple ensemble members are updated simultaneously for a given mini-batch. Across CIFAR-10, CIFAR-100, WMT14 EN-DE/EN-FR translation, and out-of-distribution tasks, BatchEnsemble yields competitive accuracy and uncertainties as typical ensembles; the speedup at test time is 3X and memory reduction is 3X at an ensemble of size 4. We also apply BatchEnsemble to lifelong learning, where on Split-CIFAR-100, BatchEnsemble yields comparable performance to progressive neural networks while having a much lower computational and memory costs. We further show that BatchEnsemble can easily scale up to lifelong learning on Split-ImageNet which involves 100 sequential learning tasks.

研究の動機と目的

効果的なアンサンブルを実現するための計算コストとメモリコストの削減の必要性を動機付ける。
従来のアンサンブルに代わるパラメータ効率の高い手法として BatchEnsemble を紹介する。
BatchEnsemble の分類、翻訳、 lifelong learning のベンチマークでの性能を示す。
BatchEnsemble が較正された予測と競争力のある不確実性推定を提供することを示す。

提案手法

各アンサンブルメンバーの重みを Hadamard積 W_i = W ∘ (r_i s_i^T) として定義する。ここで W は共有で、r_i, s_i は各メンバーのベクトルである。
計算をベクトル化して、1つのミニバッチ内で複数メンバーを並列に更新し、デバイスレベルおよびデバイス内の並列性を実現する（Y = φ(((X ∘ R) W) ∘ S)）。
ミニバッチを B·M に拡張して全メンバーが同じ入力を1回の前方伝播で処理することで、予測をアンサンブルメンバー間で平均化する testing 戦略を使用する。
共有 W を学習し最初のタスクには高速重みのペアを1組だけ訓練し、以降のタスクには新しい高速重みのみを訓練することで lifelong learning に BatchEnsemble を適用する。
MC-ドロップアウトおよびナイーブアンサンブルと比較して、不確実性の較正と分布外性能を評価する。

実験結果

リサーチクエスチョン

RQ1BatchEnsemble は従来のアンサンブルに比べて大幅に低いメモリと計算量で競争力のある精度と不確実性推定を達成できるか。
RQ2BatchEnsemble は多数の逐次タスクを伴う lifelong learning にどの程度スケールするか。
RQ3BatchEnsemble は較正と分布外頑健性にどのような影響を与えるか。
RQ4BatchEnsemble は視覚、言語、翻訳タスクで標準的なベースラインと比較してどのような性能を示すか。

主な発見

BatchEnsemble は従来のアンサンブルと同等の精度と不確実性を達成しつつ、コストを劇的に削減する（アンサンブルサイズ 4 での test-time スピードアップ約3倍、メモリ約3分の1の低減）。
lifelong learning において、BatchEnsemble は進行的ニューラルネットワークと競争力のある精度を、はるかに低いメモリと計算量で達成し、最大100の逐次タスクにスケーラブル。
BatchEnsemble は破損データ・類似データに対してよく較正された予測を提供し、ドロップアウトアンサンブルと比較して較正精度が競争力をもち、ドロップアウトと組み合わせると潜在的な利点がある。
CIFAR-10/100、WMT14 EN-DE/EN-FR、および分布外タスク全体で、EncoderSelf-attention 層を含む Transformer ベースの設定で強力な性能とより速い収束を示す。
多様性分析では、BatchEnsemble は限られた訓練データでもナイーブアンサンブルに近い多様化を達成でき、より大きなネットワークから恩恵を受ける。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。