QUICK REVIEW

[论文解读] Simple And Efficient Architecture Search for Convolutional Neural Networks

Thomas Elsken, Jan-Hendrik Metzen|arXiv (Cornell University)|Nov 13, 2017

Advanced Neural Network Applications参考文献 14被引用 185

一句话总结

这篇论文介绍了 NASH，一种基于简单爬山的神经网络结构搜索，利用网络形变以低成本生成并评估 CNN，在 CPU/资源使用与训练单个网络相似的情况下实现具有竞争力的 CIFAR-10/100 结果。

ABSTRACT

Neural networks have recently had a lot of success for many tasks. However, neural network architectures that perform well are still typically designed manually by experts in a cumbersome trial-and-error process. We propose a new method to automatically search for well-performing CNN architectures based on a simple hill climbing procedure whose operators apply network morphisms, followed by short optimization runs by cosine annealing. Surprisingly, this simple method yields competitive results, despite only requiring resources in the same order of magnitude as training a single network. E.g., on CIFAR-10, our method designs and trains networks with an error rate below 6% in only 12 hours on a single GPU; training for one day reduces this error further, to almost 5%.

研究动机与目标

Automate CNN architecture design to reduce manual trial-and-error effort.
Develop a lightweight search strategy with low computational costs.
leverage network morphisms to initialize and expand architectures without full retraining.

提出的方法

Formalize network morphisms to enable architecture transformations with preserved function.
Use a hill-climbing search (NASH) that iteratively applies random morphisms to a current model and trains short-run successors.
Train new candidates with short SGDR runs and select the best on a validation set.
Employ cosine annealing with restarts for efficient inner-loop training.
Optionally ensemble snapshots from multiple iterations to boost performance.

实验结果

研究问题

RQ1Can simple network morphisms effectively navigate CNN search spaces while keeping training cost close to a single network?
RQ2Does hill-climbing with morphisms yield competitive architectures compared to hand-crafted or other automated methods?
RQ3How does the approach scale in CIFAR-10 and CIFAR-100 relative to computational resources?

主要发现

模型	资源消耗	参数量（mil.）	误差（％）
Shake-Shake (Gastaldi, 2017)	2 days, 2 GPUs	26	2.9
WRN 28-10 (Loshchilov & Hutter, 2017)	1 day, 1 GPU	36.5	3.86
Baker et al. (2016)	8-10 days, 10 GPUs	11	6.9
Cai et al. (2017)	3 days, 5 GPUs	19.7	5.7
Zoph & Le (2017)	800 GPUs, ? days	37.5	3.65
Real et al. (2017)	250 GPUs, ? days	5.4	5.4
Saxena & Verbeek (2016)	?	21	7.4
Brock et al. (2017)	3 days, 1 GPU	16.0	4.0
Ours (random networks, n_steps=5, n_neigh=1)	4.5 hours	4.4	6.5
Ours (n_steps=5, n_neigh=8)	0.5 days, 1 GPU	5.7	5.7
Ours (n_steps=8, n_neigh=8)	1 day, 1 GPU	19.7	5.2
Ours (snapshot ensemble)	2 days, 1 GPU	57.8	4.7
Ours (ensemble across runs)	1 day, 4 GPUs	88	4.4

NASH finds and trains competitive CNNs at roughly the cost of training a single network.
On CIFAR-10, NASH achieves under 6% error in about 12 hours on one GPU and near 5% after one day.
On CIFAR-100, the method achieves under 24% error in one day and approaches 20% after two days.
Snapshot ensembles and cross-run ensembles further improve results, sometimes outperforming several baselines.
Retraining discovered architectures from scratch shows similar final performance, indicating weight inheritance via morphisms does not harm final outcomes.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。