Skip to main content
QUICK REVIEW

[Paper Review] SNAS: Stochastic Neural Architecture Search

Sirui Xie, Hehui Zheng|arXiv (Cornell University)|Dec 24, 2018
Advanced Neural Network Applications36 references285 citations
TL;DR

SNAS introduces a differentiable, end-to-end neural architecture search framework that learns operation parameters and architecture distribution parameters simultaneously by relaxing discrete choices with a concrete distribution, achieving competitive CIFAR-10 results and transfer to ImageNet with reduced computational cost.

ABSTRACT

We propose Stochastic Neural Architecture Search (SNAS), an economical end-to-end solution to Neural Architecture Search (NAS) that trains neural operation parameters and architecture distribution parameters in same round of back-propagation, while maintaining the completeness and differentiability of the NAS pipeline. In this work, NAS is reformulated as an optimization problem on parameters of a joint distribution for the search space in a cell. To leverage the gradient information in generic differentiable loss for architecture search, a novel search gradient is proposed. We prove that this search gradient optimizes the same objective as reinforcement-learning-based NAS, but assigns credits to structural decisions more efficiently. This credit assignment is further augmented with locally decomposable reward to enforce a resource-efficient constraint. In experiments on CIFAR-10, SNAS takes less epochs to find a cell architecture with state-of-the-art accuracy than non-differentiable evolution-based and reinforcement-learning-based NAS, which is also transferable to ImageNet. It is also shown that child networks of SNAS can maintain the validation accuracy in searching, with which attention-based NAS requires parameter retraining to compete, exhibiting potentials to stride towards efficient NAS on big datasets. We have released our implementation at https://github.com/SNAS-Series/SNAS-Series.

Motivation & Objective

  • Motivate an efficient NAS framework that avoids delayed-reward credit assignment in reinforcement learning (RL) based NAS.
  • Reformulate NAS as learning a joint distribution over cell-level architectures.
  • Enable differentiable gradient-based updates for both operation parameters and architecture parameters.
  • Incorporate a global resource constraint to promote hardware-aware, compact architectures.

Proposed method

  • Represent the NAS search space of a cell as a DAG with one-hot architectural decisions per edge and a fully factorized joint distribution p(Z).
  • Relax discrete architectural choices using the concrete distribution to enable reparameterizable gradients (Gumbel-based reparameterization).
  • Derive a search gradient that corresponds to a policy-gradient-like credit assignment, but with differentiable rewards from the loss L_theta(Z).
  • Show equivalence to RL-based NAS objective in expectation, with more efficient credit assignment and no delayed rewards.
  • Augment the objective with a global resource constraint that decomposes over edges to encourage smaller, faster architectures.
  • Optionally include a resource cost term C(Z) and show how to compute its expectation under p_alpha(Z) via tractable approximations.

Experimental results

Research questions

  • RQ1Can a differentiable, stochastic NAS framework match or surpass RL/evolution-based NAS while reducing training time and avoiding delayed rewards?
  • RQ2Does aligning architecture sampling with gradient-based optimization improve credit assignment and final performance compared to DARTS and ENAS?
  • RQ3To what extent can a global resource constraint depress model size and FLOPs without sacrificing accuracy, and is this decomposable for scalable optimization?
  • RQ4Are the learned cells transferable to larger datasets (e.g., ImageNet) while maintaining competitive accuracy and efficiency?

Key findings

  • SNAS achieves competitive CIFAR-10 results with 2.85% test error and 2.8M parameters under a mild constraint, outperforming 1st-order DARTS and ENAS and matching 2nd-order DARTS with fewer parameters.
  • The search process in SNAS maintains higher validation accuracy during search and yields more stable, less biased architectures than DARTS, with an 88% search validation accuracy observed in experiments.
  • SNAS-produced cells transfer to ImageNet (mobile setting) with 27.3% top-1 error, showing competitive performance relative to RL-based NAS while using substantially less computation (three orders of magnitude reduction).
  • Across CIFAR-10 experiments, SNAS with mild/moderate/aggressive resource constraints discovers diverse and increasingly sparse cell structures, illustrating controllable trade-offs between accuracy, parameter count, and search cost.
  • SNAS maintains high validation accuracy for the derived child networks without retraining, unlike DARTS where a substantial gap can appear between search and derived networks.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.