QUICK REVIEW

[论文解读] Efficient Neural Architecture Search via Parameter Sharing

Hieu Pham, Melody Y. Guan|arXiv (Cornell University)|Feb 9, 2018

Advanced Neural Network Applications参考文献 26被引用 630

一句话总结

ENAS通过在子模型之间共享权重来加速神经架构搜索，在1000多倍更少的GPU小时数下实现有竞争力的结果；在Penn Treebank上达到55.8 perplexity，在CIFAR-10上测试误差为2.89%。

ABSTRACT

We propose Efficient Neural Architecture Search (ENAS), a fast and inexpensive approach for automatic model design. In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. The controller is trained with policy gradient to select a subgraph that maximizes the expected reward on the validation set. Meanwhile the model corresponding to the selected subgraph is trained to minimize a canonical cross entropy loss. Thanks to parameter sharing between child models, ENAS is fast: it delivers strong empirical performances using much fewer GPU-hours than all existing automatic model design approaches, and notably, 1000x less expensive than standard Neural Architecture Search. On the Penn Treebank dataset, ENAS discovers a novel architecture that achieves a test perplexity of 55.8, establishing a new state-of-the-art among all methods without post-training processing. On the CIFAR-10 dataset, ENAS designs novel architectures that achieve a test error of 2.89%, which is on par with NASNet (Zoph et al., 2018), whose test error is 2.65%.

研究动机与目标

Motivate reducing the computational cost of neural architecture search (NAS) while maintaining high performance.
Propose a framework where all candidate architectures share weights, enabling subgraph-based search within a larger DAG.
Develop a controller that learns to select subarchitectures via reinforcement learning while shared weights are trained.
Demonstrate ENAS effectiveness across language (Penn Treebank) and image (CIFAR-10) domains.

提出的方法

Represent NAS search spaces as subgraphs of a single large computational DAG.
Train a controller RNN (policy gradient) to sample subgraphs that maximize validation rewards.
Share parameters among all child architectures to avoid retraining from scratch.
Alternate training: optimize shared weights ω for child models and controller parameters θ for architecture sampling.
Derive final architectures by sampling from the trained controller and retraining the best candidate from scratch.
Apply separate search spaces for recurrent cells and convolutional networks, including macro (full nets) and micro (cell) searches.

实验结果

研究问题

RQ1Can parameter sharing enable substantial efficiency gains in NAS without sacrificing performance?
RQ2How does ENAS compare to traditional NAS and random search in terms of accuracy and search cost across language and vision tasks?
RQ3What are the effects of different search spaces (recurrent cells, convolutional networks, convolutional cells) on ENAS performance?
RQ4What is the practical training regime and reward design that yields good architectures under weight sharing?

主要发现

Table/Result	Details	Params (million)	Test Perplexity or Error
ENAS (recurrent cells)	Test on Penn Treebank	24	55.8
NAS (Zoph & Le 2017)	Test on Penn Treebank (baseline for comparison)	54	62.4
ENAS macro (full networks)	CIFAR-10 test error with macro search	21.3	4.23
ENAS macro (more channels)	CIFAR-10 test error with macro search and more channels	38.0	3.87
ENAS micro (cells)	CIFAR-10 test error with micro search	4.6	3.54
ENAS micro (cells) + CutOut	CIFAR-10 test error with micro search and CutOut	4.6	2.89

ENAS discovers competitive architectures while using far fewer GPU-hours (less than 16 hours on a single GTX 1080Ti), achieving over 1000x speedups versus NAS.
On Penn Treebank, ENAS achieves a test perplexity of 55.8, outperforming NAS (62.4) and reaching a new state-of-the-art among non-post-processed models.
On CIFAR-10, ENAS finds architectures with 4.23% test error in the macro space and 3.54% in the micro space without CutOut, and 2.89% with CutOut, approaching NASNet-A performance.
Across tasks, ENAS outperforms random search and a non-trained controller baseline, highlighting the importance of the learned controller.
The best ENAS models show that architecture search benefits from allowing skip connections and diverse operation choices, with observed local-minimum behavior in the discovered cells.
ENAS’s weight-sharing approach avoids the prohibitive cost of training each candidate from scratch, enabling scalable architecture discovery.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。