QUICK REVIEW

[论文解读] BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models

Jiahui Yu, Pengchong Jin|arXiv (Cornell University)|Mar 24, 2020

Advanced Neural Network Applications参考文献 43被引用 38

一句话总结

BigNAS 训练一个单一的大阶段模型，直接产生在 200 MFLOPs 到 1 GFLOP 之间的高质量子架构，无需重新训练或后处理，在该范围内超越了最先进水平。

ABSTRACT

Neural architecture search (NAS) has shown promising results discovering models that are both accurate and fast. For NAS, training a one-shot model has become a popular strategy to rank the relative quality of different architectures (child models) using a single set of shared weights. However, while one-shot model weights can effectively rank different network architectures, the absolute accuracies from these shared weights are typically far below those obtained from stand-alone training. To compensate, existing methods assume that the weights must be retrained, finetuned, or otherwise post-processed after the search is completed. These steps significantly increase the compute requirements and complexity of the architecture search and model deployment. In this work, we propose BigNAS, an approach that challenges the conventional wisdom that post-processing of the weights is necessary to get good prediction accuracies. Without extra retraining or post-processing steps, we are able to train a single set of shared weights on ImageNet and use these weights to obtain child models whose sizes range from 200 to 1000 MFLOPs. Our discovered model family, BigNASModels, achieve top-1 accuracies ranging from 76.5% to 80.9%, surpassing state-of-the-art models in this range including EfficientNets and Once-for-All networks without extra retraining or post-processing. We present ablative study and analysis to further understand the proposed BigNASModels.

研究动机与目标

通过消除对子模型的后处理与重新训练来降低 NAS 的计算量与复杂度。
开发一个单阶段、权重共享的模型，从中可直接切分出多样化的高质量架构。
在广泛的 FLOPs 与设备预算范围内实现可用于部署的架构。
系统性地调整训练技术，以适应小型与大型子模型的联合优化。
提供粗到细的架构选择策略，在特定资源约束下挑选架构。

提出的方法

用权重共享训练一个覆盖广泛架构空间（核大小、通道数、深度、分辨率）的“大型单阶段模型”。
在每一步使用 Sandwich Rule 对最小、最大以及若干中间子模型进行采样并聚合梯度。
应用就地蒸馏，使所有子模型都从最大的全模型预测中学习。
用 gamma=0 初始化残差块以稳定训练，并为恒等映射添加显式阶段转换。
使用指数衰减的学习率并以一个恒定末端来平衡大模型和小模型的收敛。
仅对最大的全子模型进行正则化（权重衰减和 dropout），以防止大模型过拟合，同时帮助小模型。

实验结果

研究问题

RQ1是否可以训练一个单一的、具有大权重共享的模型，以在不重新训练或后处理的情况下产生高质量、可部署的子架构？
RQ2如何在单阶段模型内平衡小型与大型后代模型的训练动态？
RQ3哪些初始化、正则化与学习率策略能在广泛的架构空间内实现稳定的高精度训练？
RQ4在 BigNAS 训练模型上的粗到细搜索是否能高效地识别出符合特定资源预算的架构？
RQ5在 200 MFLOP 到 1 GFLOP 范围内，切片后的 BigNAS 架构与最先进模型的性能相比如何？

主要发现

单阶段 BigNAS 模型可产生约 200 MFLOPs 到约 1 GFLOP 的子网络，在该范围内无需重新训练或后处理即可超越同领域最先进水平。
BigNAS 模型在该范围内的 ImageNet top-1 准确率达到 76.5% 到 80.9%，在某些 FLOP 区间甚至超越 EfficientNets 与 Once-for-All。
BigNAS-S、BigNAS-M、BigNAS-L 可以直接从预训练的单阶段模型切片，以在不同约束下部署。
一个简单的粗到细选择策略可在满足延迟/ FLOP 预算的前提下，识别出具有竞争力准确性的架构。
有针对性的初始化和训练计划显著改善了小型和大型子模型的收敛性和最终准确性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。