[论文解读] One-Shot Neural Architecture Search Through A Posteriori Distribution Guided Sampling.
本文提出了一种一次性神经架构搜索方法,通过基于对架构和权重的估计后验联合分布来引导子网络采样,从而提高效率和准确性。通过使用变分推断和混合网络表示来建模该分布,该方法将子网络采样次数降低了数个数量级,同时在CIFAR-10、CIFAR-100和ImageNet上实现了最先进性能,搜索速度比以往方法快20倍且准确率更高。
The emergence of one-shot approaches has greatly advanced the research on neural architecture search (NAS). Recent approaches train an over-parameterized super-network (one-shot model) and then sample and evaluate a number of sub-networks, which inherit weights from the one-shot model. The overall searching cost is significantly reduced as training is avoided for sub-networks. However, the network sampling process is casually treated and the inherited weights from an independently trained super-network perform sub-optimally for sub-networks. In this paper, we propose a novel one-shot NAS scheme to address the above issues. The key innovation is to explicitly estimate the joint a posteriori distribution over network architecture and weights, and sample networks for evaluation according to it. This brings two benefits. First, network sampling under the guidance of a posteriori probability is more efficient than conventional random or uniform sampling. Second, the network architecture and its weights are sampled as a pair to alleviate the sub-optimal weights problem. Note that estimating the joint a posteriori distribution is not a trivial problem. By adopting variational methods and introducing a hybrid network representation, we convert the distribution approximation problem into an end-to-end neural network training problem which is neatly approached by variational dropout. As a result, the proposed method reduces the number of sampled sub-networks by orders of magnitude. We validate our method on the fundamental image classification task. Results on Cifar-10, Cifar-100 and ImageNet show that our method strikes the best trade-off between precision and speed among NAS methods. On Cifar-10, we speed up the searching process by 20x and achieve a higher precision than the best network found by existing NAS methods.
研究动机与目标
- 解决传统一次性NAS方法中采用随机或均匀采样子网络所导致的低效性与次优性能问题。
- 缓解从超网络继承的权重对单个子网络而言次优的问题。
- 开发一种基于学习到的后验分布联合采样架构与权重的方法,以提升搜索效率和准确性。
- 在保持或提升最终模型性能的前提下,减少神经架构搜索中所需的子网络评估次数。
提出的方法
- 该方法使用变分推断来估计神经架构与权重之间的联合后验分布。
- 引入一种混合网络表示,以实现对联合分布的有效参数化。
- 将分布近似问题重新表述为使用变分dropout的端到端训练问题。
- 根据估计的后验概率采样子网络,确保在搜索过程中架构与权重协同优化。
- 整个过程以单一端到端方式训练,实现高效且可微分的搜索。
实验结果
研究问题
- RQ1架构与权重的后验分布是否能提升一次性NAS中的采样效率?
- RQ2与随机或均匀采样相比,后验引导采样在搜索效率和准确率方面表现如何?
- RQ3联合采样架构与权重能否缓解从超网络继承权重的次优性问题?
- RQ4在保持或提升搜索性能的前提下,子网络评估次数最多可降低多少?
主要发现
- 与传统一次性NAS相比,所提方法将子网络采样次数降低了数个数量级。
- 在CIFAR-10上,该方法实现了比现有NAS方法找到的最佳网络更高的top-1准确率。
- 在CIFAR-10上,搜索过程加速了20倍,同时保持了更优的性能。
- 该方法在CIFAR-10、CIFAR-100和ImageNet上均实现了搜索速度与准确率之间的最先进权衡。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。