QUICK REVIEW

[论文解读] When Ensembling Smaller Models is More Efficient than Single Large Models

Dan Kondratyuk, Mingxing Tan|arXiv (Cornell University)|May 1, 2020

Domain Adaptation and Few-Shot Learning参考文献 11被引用 24

一句话总结

该论文表明，通过集成多个较小且相同的模型，可以在比训练单一大型模型更少的 FLOPs 下实现更高的准确率，从而挑战了‘更大模型始终优于集成模型’的普遍认知。关键发现是，随着模型规模增大，模型集成在准确率-速度权衡方面更具效率，这得益于输出多样性提升和过拟合减少。

ABSTRACT

Ensembling is a simple and popular technique for boosting evaluation performance by training multiple models (e.g., with different initializations) and aggregating their predictions. This approach is commonly reserved for the largest models, as it is commonly held that increasing the model size provides a more substantial reduction in error than ensembling smaller models. However, we show results from experiments on CIFAR-10 and ImageNet that ensembles can outperform single models with both higher accuracy and requiring fewer total FLOPs to compute, even when those individual models' weights and hyperparameters are highly optimized. Furthermore, this gap in improvement widens as models become large. This presents an interesting observation that output diversity in ensembling can often be more efficient than training larger models, especially when the models approach the size of what their dataset can foster. Instead of using the common practice of tuning a single large model, one can use ensembles as a more flexible trade-off between a model's inference speed and accuracy. This also potentially eases hardware design, e.g., an easier way to parallelize the model across multiple workers for real-time or distributed inference.

研究动机与目标

挑战‘更大单模型始终优于集成模型’在准确率和效率方面的假设。
探究集成较小模型是否能在准确率和 FLOP 效率两方面超越单个大型模型。
探索集成作为比扩展单个模型更具灵活性和硬件友好性的替代方案的潜力。
评估集成中架构多样性是否能带来超越相同模型集成的性能提升。

提出的方法

使用不同随机初始化训练同一模型架构的多个实例（在 CIFAR-10 上为 Wide ResNets，在 ImageNet 上为 EfficientNets）。
通过几何平均聚合预测结果形成集成模型，采用逐元素相乘并开 n 次方根的方式：$\mu = (y_1 y_2 \dots y_n)^{1/n}$。
在不同模型规模下，测量单个模型和集成模型的 top-1 准确率与总 FLOPs 性能。
使用神经架构搜索（NAS）结合联合搜索空间，探索集成中多样化架构的可能性，同时对最大延迟施加惩罚以支持并行推理。
优化 NAS 奖励函数，优先提升准确率，同时将集成中运行最慢的模型延迟控制在可接受范围内，以确保实时可行性。
在完整收敛前，对搜索得到的模型进行 10 个周期的训练与评估，以在固定延迟约束下比较性能。

实验结果

研究问题

RQ1较小模型的集成是否能在更低 FLOPs 下实现比单个大型模型更高的准确率？
RQ2随着模型规模增大，集成与单个模型之间的性能差距是否会进一步扩大？
RQ3集成中架构多样性是否能带来优于相同模型集成的性能表现？
RQ4在存在延迟约束的实际部署场景中，集成是否是比模型扩展更高效、更具可扩展性的替代方案？

主要发现

在 CIFAR-10 和 ImageNet 上，较小模型的集成在总 FLOPs 更少的情况下，实现了比单个大型模型更高的 top-1 准确率。
随着模型规模增大，集成与单个模型之间的性能差距进一步扩大，表明集成在规模增大时效率更高。
在 CIFAR-10 上，宽度为 k=1,2,4,8 的 Wide ResNet 集成在所有情况下均优于等效或更大 FLOP 数的单个模型。
在 ImageNet 上，最多包含三个模型的 EfficientNet 集成在最大延迟相近的情况下，准确率与单个模型相当或更优。
尽管通过 NAS 广泛搜索了多样化架构，但在相同延迟约束下，相同模型的集成始终优于架构各异的集成。
性能最佳的集成通过复制最准确的单个模型架构实现，表明在此设置下，模型准确率的重要性超过架构多样性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。