Skip to main content
QUICK REVIEW

[论文解读] Efficient Neural Architecture Search via Parameter Sharing

Hieu Pham, Melody Y. Guan|arXiv (Cornell University)|Feb 9, 2018
Advanced Neural Network Applications参考文献 26被引用 630
一句话总结

ENAS通过在子模型之间共享权重来加速神经架构搜索,在1000多倍更少的GPU小时数下实现有竞争力的结果;在Penn Treebank上达到55.8 perplexity,在CIFAR-10上测试误差为2.89%。

ABSTRACT

We propose Efficient Neural Architecture Search (ENAS), a fast and inexpensive approach for automatic model design. In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. The controller is trained with policy gradient to select a subgraph that maximizes the expected reward on the validation set. Meanwhile the model corresponding to the selected subgraph is trained to minimize a canonical cross entropy loss. Thanks to parameter sharing between child models, ENAS is fast: it delivers strong empirical performances using much fewer GPU-hours than all existing automatic model design approaches, and notably, 1000x less expensive than standard Neural Architecture Search. On the Penn Treebank dataset, ENAS discovers a novel architecture that achieves a test perplexity of 55.8, establishing a new state-of-the-art among all methods without post-training processing. On the CIFAR-10 dataset, ENAS designs novel architectures that achieve a test error of 2.89%, which is on par with NASNet (Zoph et al., 2018), whose test error is 2.65%.

研究动机与目标

  • Motivate reducing the computational cost of neural architecture search (NAS) while maintaining high performance.
  • Propose a framework where all candidate architectures share weights, enabling subgraph-based search within a larger DAG.
  • Develop a controller that learns to select subarchitectures via reinforcement learning while shared weights are trained.
  • Demonstrate ENAS effectiveness across language (Penn Treebank) and image (CIFAR-10) domains.

提出的方法

  • Represent NAS search spaces as subgraphs of a single large computational DAG.
  • Train a controller RNN (policy gradient) to sample subgraphs that maximize validation rewards.
  • Share parameters among all child architectures to avoid retraining from scratch.
  • Alternate training: optimize shared weights ω for child models and controller parameters θ for architecture sampling.
  • Derive final architectures by sampling from the trained controller and retraining the best candidate from scratch.
  • Apply separate search spaces for recurrent cells and convolutional networks, including macro (full nets) and micro (cell) searches.

实验结果

研究问题

  • RQ1Can parameter sharing enable substantial efficiency gains in NAS without sacrificing performance?
  • RQ2How does ENAS compare to traditional NAS and random search in terms of accuracy and search cost across language and vision tasks?
  • RQ3What are the effects of different search spaces (recurrent cells, convolutional networks, convolutional cells) on ENAS performance?
  • RQ4What is the practical training regime and reward design that yields good architectures under weight sharing?

主要发现

Table/ResultDetailsParams (million)Test Perplexity or Error
ENAS (recurrent cells)Test on Penn Treebank2455.8
NAS (Zoph & Le 2017)Test on Penn Treebank (baseline for comparison)5462.4
ENAS macro (full networks)CIFAR-10 test error with macro search21.34.23
ENAS macro (more channels)CIFAR-10 test error with macro search and more channels38.03.87
ENAS micro (cells)CIFAR-10 test error with micro search4.63.54
ENAS micro (cells) + CutOutCIFAR-10 test error with micro search and CutOut4.62.89
  • ENAS discovers competitive architectures while using far fewer GPU-hours (less than 16 hours on a single GTX 1080Ti), achieving over 1000x speedups versus NAS.
  • On Penn Treebank, ENAS achieves a test perplexity of 55.8, outperforming NAS (62.4) and reaching a new state-of-the-art among non-post-processed models.
  • On CIFAR-10, ENAS finds architectures with 4.23% test error in the macro space and 3.54% in the micro space without CutOut, and 2.89% with CutOut, approaching NASNet-A performance.
  • Across tasks, ENAS outperforms random search and a non-trained controller baseline, highlighting the importance of the learned controller.
  • The best ENAS models show that architecture search benefits from allowing skip connections and diverse operation choices, with observed local-minimum behavior in the discovered cells.
  • ENAS’s weight-sharing approach avoids the prohibitive cost of training each candidate from scratch, enabling scalable architecture discovery.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。