Skip to main content
QUICK REVIEW

[论文解读] Simple And Efficient Architecture Search for Convolutional Neural Networks

Thomas Elsken, Jan-Hendrik Metzen|arXiv (Cornell University)|Nov 13, 2017
Advanced Neural Network Applications参考文献 14被引用 185
一句话总结

这篇论文介绍了 NASH,一种基于简单爬山的神经网络结构搜索,利用网络形变以低成本生成并评估 CNN,在 CPU/资源使用与训练单个网络相似的情况下实现具有竞争力的 CIFAR-10/100 结果。

ABSTRACT

Neural networks have recently had a lot of success for many tasks. However, neural network architectures that perform well are still typically designed manually by experts in a cumbersome trial-and-error process. We propose a new method to automatically search for well-performing CNN architectures based on a simple hill climbing procedure whose operators apply network morphisms, followed by short optimization runs by cosine annealing. Surprisingly, this simple method yields competitive results, despite only requiring resources in the same order of magnitude as training a single network. E.g., on CIFAR-10, our method designs and trains networks with an error rate below 6% in only 12 hours on a single GPU; training for one day reduces this error further, to almost 5%.

研究动机与目标

  • Automate CNN architecture design to reduce manual trial-and-error effort.
  • Develop a lightweight search strategy with low computational costs.
  • leverage network morphisms to initialize and expand architectures without full retraining.

提出的方法

  • Formalize network morphisms to enable architecture transformations with preserved function.
  • Use a hill-climbing search (NASH) that iteratively applies random morphisms to a current model and trains short-run successors.
  • Train new candidates with short SGDR runs and select the best on a validation set.
  • Employ cosine annealing with restarts for efficient inner-loop training.
  • Optionally ensemble snapshots from multiple iterations to boost performance.

实验结果

研究问题

  • RQ1Can simple network morphisms effectively navigate CNN search spaces while keeping training cost close to a single network?
  • RQ2Does hill-climbing with morphisms yield competitive architectures compared to hand-crafted or other automated methods?
  • RQ3How does the approach scale in CIFAR-10 and CIFAR-100 relative to computational resources?

主要发现

模型资源消耗参数量(mil.)误差(%)
Shake-Shake (Gastaldi, 2017)2 days, 2 GPUs262.9
WRN 28-10 (Loshchilov & Hutter, 2017)1 day, 1 GPU36.53.86
Baker et al. (2016)8-10 days, 10 GPUs116.9
Cai et al. (2017)3 days, 5 GPUs19.75.7
Zoph & Le (2017)800 GPUs, ? days37.53.65
Real et al. (2017)250 GPUs, ? days5.45.4
Saxena & Verbeek (2016)?217.4
Brock et al. (2017)3 days, 1 GPU16.04.0
Ours (random networks, n_steps=5, n_neigh=1)4.5 hours4.46.5
Ours (n_steps=5, n_neigh=8)0.5 days, 1 GPU5.75.7
Ours (n_steps=8, n_neigh=8)1 day, 1 GPU19.75.2
Ours (snapshot ensemble)2 days, 1 GPU57.84.7
Ours (ensemble across runs)1 day, 4 GPUs884.4
  • NASH finds and trains competitive CNNs at roughly the cost of training a single network.
  • On CIFAR-10, NASH achieves under 6% error in about 12 hours on one GPU and near 5% after one day.
  • On CIFAR-100, the method achieves under 24% error in one day and approaches 20% after two days.
  • Snapshot ensembles and cross-run ensembles further improve results, sometimes outperforming several baselines.
  • Retraining discovered architectures from scratch shows similar final performance, indicating weight inheritance via morphisms does not harm final outcomes.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。