[论文解读] Efficient Neural Architecture Search via Parameter Sharing
ENAS通过在子模型之间共享权重来加速神经架构搜索,在1000多倍更少的GPU小时数下实现有竞争力的结果;在Penn Treebank上达到55.8 perplexity,在CIFAR-10上测试误差为2.89%。
We propose Efficient Neural Architecture Search (ENAS), a fast and inexpensive approach for automatic model design. In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. The controller is trained with policy gradient to select a subgraph that maximizes the expected reward on the validation set. Meanwhile the model corresponding to the selected subgraph is trained to minimize a canonical cross entropy loss. Thanks to parameter sharing between child models, ENAS is fast: it delivers strong empirical performances using much fewer GPU-hours than all existing automatic model design approaches, and notably, 1000x less expensive than standard Neural Architecture Search. On the Penn Treebank dataset, ENAS discovers a novel architecture that achieves a test perplexity of 55.8, establishing a new state-of-the-art among all methods without post-training processing. On the CIFAR-10 dataset, ENAS designs novel architectures that achieve a test error of 2.89%, which is on par with NASNet (Zoph et al., 2018), whose test error is 2.65%.
研究动机与目标
- Motivate reducing the computational cost of neural architecture search (NAS) while maintaining high performance.
- Propose a framework where all candidate architectures share weights, enabling subgraph-based search within a larger DAG.
- Develop a controller that learns to select subarchitectures via reinforcement learning while shared weights are trained.
- Demonstrate ENAS effectiveness across language (Penn Treebank) and image (CIFAR-10) domains.
提出的方法
- Represent NAS search spaces as subgraphs of a single large computational DAG.
- Train a controller RNN (policy gradient) to sample subgraphs that maximize validation rewards.
- Share parameters among all child architectures to avoid retraining from scratch.
- Alternate training: optimize shared weights ω for child models and controller parameters θ for architecture sampling.
- Derive final architectures by sampling from the trained controller and retraining the best candidate from scratch.
- Apply separate search spaces for recurrent cells and convolutional networks, including macro (full nets) and micro (cell) searches.
实验结果
研究问题
- RQ1Can parameter sharing enable substantial efficiency gains in NAS without sacrificing performance?
- RQ2How does ENAS compare to traditional NAS and random search in terms of accuracy and search cost across language and vision tasks?
- RQ3What are the effects of different search spaces (recurrent cells, convolutional networks, convolutional cells) on ENAS performance?
- RQ4What is the practical training regime and reward design that yields good architectures under weight sharing?
主要发现
| Table/Result | Details | Params (million) | Test Perplexity or Error |
|---|---|---|---|
| ENAS (recurrent cells) | Test on Penn Treebank | 24 | 55.8 |
| NAS (Zoph & Le 2017) | Test on Penn Treebank (baseline for comparison) | 54 | 62.4 |
| ENAS macro (full networks) | CIFAR-10 test error with macro search | 21.3 | 4.23 |
| ENAS macro (more channels) | CIFAR-10 test error with macro search and more channels | 38.0 | 3.87 |
| ENAS micro (cells) | CIFAR-10 test error with micro search | 4.6 | 3.54 |
| ENAS micro (cells) + CutOut | CIFAR-10 test error with micro search and CutOut | 4.6 | 2.89 |
- ENAS discovers competitive architectures while using far fewer GPU-hours (less than 16 hours on a single GTX 1080Ti), achieving over 1000x speedups versus NAS.
- On Penn Treebank, ENAS achieves a test perplexity of 55.8, outperforming NAS (62.4) and reaching a new state-of-the-art among non-post-processed models.
- On CIFAR-10, ENAS finds architectures with 4.23% test error in the macro space and 3.54% in the micro space without CutOut, and 2.89% with CutOut, approaching NASNet-A performance.
- Across tasks, ENAS outperforms random search and a non-trained controller baseline, highlighting the importance of the learned controller.
- The best ENAS models show that architecture search benefits from allowing skip connections and diverse operation choices, with observed local-minimum behavior in the discovered cells.
- ENAS’s weight-sharing approach avoids the prohibitive cost of training each candidate from scratch, enabling scalable architecture discovery.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。