[论文解读] BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning
BatchEnsemble 提供一种高效的、参数共享的集成机制,使用 Hadamard 扰动的共享权重,在 3x 测试时加速和 3x 内存减少的情况下实现具有竞争力的准确性与不确定性,并实现可扩展的终身学习。
Ensembles, where multiple neural networks are trained individually and their predictions are averaged, have been shown to be widely successful for improving both the accuracy and predictive uncertainty of single neural networks. However, an ensemble's cost for both training and testing increases linearly with the number of networks, which quickly becomes untenable. In this paper, we propose BatchEnsemble, an ensemble method whose computational and memory costs are significantly lower than typical ensembles. BatchEnsemble achieves this by defining each weight matrix to be the Hadamard product of a shared weight among all ensemble members and a rank-one matrix per member. Unlike ensembles, BatchEnsemble is not only parallelizable across devices, where one device trains one member, but also parallelizable within a device, where multiple ensemble members are updated simultaneously for a given mini-batch. Across CIFAR-10, CIFAR-100, WMT14 EN-DE/EN-FR translation, and out-of-distribution tasks, BatchEnsemble yields competitive accuracy and uncertainties as typical ensembles; the speedup at test time is 3X and memory reduction is 3X at an ensemble of size 4. We also apply BatchEnsemble to lifelong learning, where on Split-CIFAR-100, BatchEnsemble yields comparable performance to progressive neural networks while having a much lower computational and memory costs. We further show that BatchEnsemble can easily scale up to lifelong learning on Split-ImageNet which involves 100 sequential learning tasks.
研究动机与目标
- 动机:阐明深度学习中集成效率瓶颈及对可扩展终身学习的需求。
- 介绍 BatchEnsemble 作为一种参数高效的集成方法。
- 展示在视觉和语言任务中具有竞争力的准确性和不确定性。
- 展示在大量连续任务中的终身学习能力,同时降低内存占用。
提出的方法
- 将每个集成权重定义为共享权重与每个成员的秩-1 快速权重的 Hadamard 积。
- 将计算向量化,使在一个小批量中并行运行所有集成成员。
- 使用批量扩展技术进行训练,使测试时的预测在单一次前向传播中对所有集成成员进行平均。
- 将 BatchEnsemble 应用于终身学习:在新任务下仅训练快速权重,而在第一任务后保持共享权重固定。
实验结果
研究问题
- RQ1BatchEnsemble 是否能够在显著降低计算和内存成本的同时,在准确性和不确定性方面与标准集成方法相媲美?
- RQ2BatchEnsemble 是否能够扩展到包含大量顺序任务的终身学习而不发生灾难性遗忘?
- RQ3与传统集成方法和 MC-dropout 相比,BatchEnsemble 在标准基准(CIFAR-10/100、WMT 翻译)上的表现如何?
- RQ4秩-1 扰动对预测的多样性和校准有何影响?
主要发现
| 原始 | BatchEnsemble | DEN | PNN | RCL | 指标 |
|---|---|---|---|---|---|
| Computational | 1 | 1.11 | 9.58 | 1.12 | 26.41 |
| Memory | 1 | 1.10 | 5.31 | 4.16 | 2.52 |
- BatchEnsemble 在与朴素集成相比下,在准确性和不确定性方面具有竞争力,同时实现显著的加速和内存减少。
- 在 CIFAR-10/100 与 ResNet-32、WMT14 EN-DE/EN-FR 上,BatchEnsemble 显示出在准确性、速度和内存之间的有利权衡。
- 在终身学习中,BatchEnsemble 的准确性可与渐进神经网络媲美,同时使用的内存和计算资源大幅减少,能够扩展到 100 个顺序任务。
- BatchEnsemble 在受损数据上提供经过校准的预测,并且可以与 dropout 集成方法互补以提升性能。
- 多样性分析表明秩-1扰动提供有意义的集成多样性,尤其是在训练数据有限的情况下。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。