QUICK REVIEW

[论文解读] BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning

Yeming Wen, Dustin Tran|arXiv (Cornell University)|Feb 16, 2020

Domain Adaptation and Few-Shot Learning参考文献 54被引用 127

一句话总结

BatchEnsemble 提出一种参数高效的集成方法，每个成员的权重是共享权重与每成员的秩-1扰动的Hadamard乘积，能够实现快速、内存高效的集成和可扩展的终身学习。

ABSTRACT

Ensembles, where multiple neural networks are trained individually and their predictions are averaged, have been shown to be widely successful for improving both the accuracy and predictive uncertainty of single neural networks. However, an ensemble's cost for both training and testing increases linearly with the number of networks, which quickly becomes untenable. In this paper, we propose BatchEnsemble, an ensemble method whose computational and memory costs are significantly lower than typical ensembles. BatchEnsemble achieves this by defining each weight matrix to be the Hadamard product of a shared weight among all ensemble members and a rank-one matrix per member. Unlike ensembles, BatchEnsemble is not only parallelizable across devices, where one device trains one member, but also parallelizable within a device, where multiple ensemble members are updated simultaneously for a given mini-batch. Across CIFAR-10, CIFAR-100, WMT14 EN-DE/EN-FR translation, and out-of-distribution tasks, BatchEnsemble yields competitive accuracy and uncertainties as typical ensembles; the speedup at test time is 3X and memory reduction is 3X at an ensemble of size 4. We also apply BatchEnsemble to lifelong learning, where on Split-CIFAR-100, BatchEnsemble yields comparable performance to progressive neural networks while having a much lower computational and memory costs. We further show that BatchEnsemble can easily scale up to lifelong learning on Split-ImageNet which involves 100 sequential learning tasks.

研究动机与目标

动机：在降低计算和内存成本的同时，说明有效集成的必要性。
将 BatchEnsemble 作为传统集成方法的参数高效替代方案进行介绍。
展示 BatchEnsemble 在分类、翻译和终身学习基准上的表现。
展示 BatchEnsemble 能提供良好校准的预测和具有竞争力的不确定性估计。

提出的方法

将每个集成成员的权重定义为 Hadamard 积 W_i = W ∘ (r_i s_i^T)，其中 W 是共享的，r_i、s_i 是每成员向量。
将计算向量化，使在单个小批量中可以并行更新多个集成成员，从而实现设备级和设备内并行性 (Y = φ(((X ∘ R) W) ∘ S)).
采用一种测试策略，通过将小批量扩大为 B·M 来跨成员平均预测，使所有成员在一次前向传播中处理相同输入。
将 BatchEnsemble 应用于终身学习：对第一任务训练共享 W 和一对快速权重，后续任务仅训练新的快速权重。
评估不确定性校准和分布外性能，并与 MC-dropout 和朴素集成进行比较。

实验结果

研究问题

RQ1BatchEnsemble 能否在显著降低内存和计算量的同时实现与传统集成相当的准确性和不确定性估计？
RQ2BatchEnsemble 如何在具有大量序列任务的终身学习中扩展？
RQ3BatchEnsemble 对校准和分布外鲁棒性的影响是什么？
RQ4相较于标准基线，BatchEnsemble 在视觉、语言和翻译任务上的表现如何？

主要发现

BatchEnsemble 在准确性和不确定性方面与传统集成相当，同时显著降低成本：在集合大小为4时实现约3x的测试时加速和内存减少。
在终身学习中，BatchEnsemble 以显著更低的内存和计算量实现与渐进神经网络相当的准确性，具备可扩展至多达100个序列任务的能力。
BatchEnsemble 在受损坏数据或类似数据上提供良好校准的预测，与 dropout 集成相比具有竞争力的校准，结合 dropout 时可能获得提升。
在 CIFAR-10/100、WMT14 EN-DE/EN-FR 以及分布外任务中，BatchEnsemble 展现出在基于 Transformer 的设置（编码器自注意力层）中强劲的性能和更快的收敛。
多样性分析表明，在有限的训练数据下，BatchEnsemble 能达到接近朴素集成的多样性，同时受益于更大的网络。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。