QUICK REVIEW

[Paper Review] SNAS: Stochastic Neural Architecture Search

Sirui Xie, Hehui Zheng|arXiv (Cornell University)|Dec 24, 2018

Advanced Neural Network Applications36 references285 citations

TL;DR

SNAS introduces a differentiable, end-to-end neural architecture search framework that learns operation parameters and architecture distribution parameters simultaneously by relaxing discrete choices with a concrete distribution, achieving competitive CIFAR-10 results and transfer to ImageNet with reduced computational cost.

ABSTRACT

We propose Stochastic Neural Architecture Search (SNAS), an economical end-to-end solution to Neural Architecture Search (NAS) that trains neural operation parameters and architecture distribution parameters in same round of back-propagation, while maintaining the completeness and differentiability of the NAS pipeline. In this work, NAS is reformulated as an optimization problem on parameters of a joint distribution for the search space in a cell. To leverage the gradient information in generic differentiable loss for architecture search, a novel search gradient is proposed. We prove that this search gradient optimizes the same objective as reinforcement-learning-based NAS, but assigns credits to structural decisions more efficiently. This credit assignment is further augmented with locally decomposable reward to enforce a resource-efficient constraint. In experiments on CIFAR-10, SNAS takes less epochs to find a cell architecture with state-of-the-art accuracy than non-differentiable evolution-based and reinforcement-learning-based NAS, which is also transferable to ImageNet. It is also shown that child networks of SNAS can maintain the validation accuracy in searching, with which attention-based NAS requires parameter retraining to compete, exhibiting potentials to stride towards efficient NAS on big datasets. We have released our implementation at https://github.com/SNAS-Series/SNAS-Series.

Motivation & Objective

Motivate an efficient NAS framework that avoids delayed-reward credit assignment in reinforcement learning (RL) based NAS.
Reformulate NAS as learning a joint distribution over cell-level architectures.
Enable differentiable gradient-based updates for both operation parameters and architecture parameters.
Incorporate a global resource constraint to promote hardware-aware, compact architectures.

Proposed method

Represent the NAS search space of a cell as a DAG with one-hot architectural decisions per edge and a fully factorized joint distribution p(Z).
Relax discrete architectural choices using the concrete distribution to enable reparameterizable gradients (Gumbel-based reparameterization).
Derive a search gradient that corresponds to a policy-gradient-like credit assignment, but with differentiable rewards from the loss L_theta(Z).
Show equivalence to RL-based NAS objective in expectation, with more efficient credit assignment and no delayed rewards.
Augment the objective with a global resource constraint that decomposes over edges to encourage smaller, faster architectures.
Optionally include a resource cost term C(Z) and show how to compute its expectation under p_alpha(Z) via tractable approximations.

Experimental results

Research questions

RQ1Can a differentiable, stochastic NAS framework match or surpass RL/evolution-based NAS while reducing training time and avoiding delayed rewards?
RQ2Does aligning architecture sampling with gradient-based optimization improve credit assignment and final performance compared to DARTS and ENAS?
RQ3To what extent can a global resource constraint depress model size and FLOPs without sacrificing accuracy, and is this decomposable for scalable optimization?
RQ4Are the learned cells transferable to larger datasets (e.g., ImageNet) while maintaining competitive accuracy and efficiency?

Key findings

SNAS achieves competitive CIFAR-10 results with 2.85% test error and 2.8M parameters under a mild constraint, outperforming 1st-order DARTS and ENAS and matching 2nd-order DARTS with fewer parameters.
The search process in SNAS maintains higher validation accuracy during search and yields more stable, less biased architectures than DARTS, with an 88% search validation accuracy observed in experiments.
SNAS-produced cells transfer to ImageNet (mobile setting) with 27.3% top-1 error, showing competitive performance relative to RL-based NAS while using substantially less computation (three orders of magnitude reduction).
Across CIFAR-10 experiments, SNAS with mild/moderate/aggressive resource constraints discovers diverse and increasingly sparse cell structures, illustrating controllable trade-offs between accuracy, parameter count, and search cost.
SNAS maintains high validation accuracy for the derived child networks without retraining, unlike DARTS where a substantial gap can appear between search and derived networks.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.