QUICK REVIEW

[论文解读] SNAS: Stochastic Neural Architecture Search

Sirui Xie, Hehui Zheng|arXiv (Cornell University)|Dec 24, 2018

Advanced Neural Network Applications参考文献 36被引用 285

一句话总结

SNAS 引入一种可微分的端到端神经架构搜索框架，通过对离散选择进行 concrete distribution 的放宽，能够同时学习操作参数和架构分布参数，在 CIFAR-10 结果具有竞争力并可迁移到 ImageNet，且计算成本更低。

ABSTRACT

We propose Stochastic Neural Architecture Search (SNAS), an economical end-to-end solution to Neural Architecture Search (NAS) that trains neural operation parameters and architecture distribution parameters in same round of back-propagation, while maintaining the completeness and differentiability of the NAS pipeline. In this work, NAS is reformulated as an optimization problem on parameters of a joint distribution for the search space in a cell. To leverage the gradient information in generic differentiable loss for architecture search, a novel search gradient is proposed. We prove that this search gradient optimizes the same objective as reinforcement-learning-based NAS, but assigns credits to structural decisions more efficiently. This credit assignment is further augmented with locally decomposable reward to enforce a resource-efficient constraint. In experiments on CIFAR-10, SNAS takes less epochs to find a cell architecture with state-of-the-art accuracy than non-differentiable evolution-based and reinforcement-learning-based NAS, which is also transferable to ImageNet. It is also shown that child networks of SNAS can maintain the validation accuracy in searching, with which attention-based NAS requires parameter retraining to compete, exhibiting potentials to stride towards efficient NAS on big datasets. We have released our implementation at https://github.com/SNAS-Series/SNAS-Series.

研究动机与目标

提出一个高效的 NAS 框架，避免基于强化学习（RL）的 NAS 中的延迟奖励信用分配。
将 NAS 重新表述为学习单元级架构的联合分布。
使操作参数和架构参数都能够通过可微分的梯度更新。
引入全局资源约束，以促进硬件感知、紧凑的架构。

提出的方法

将一个单元的 NAS 搜索空间表示为一个有向无环图（DAG），每条边有一个独热的架构决策，并且 p(Z) 为完全因式分解的联合分布。
使用 concrete 分布放宽离散的架构选择，以实现可参数化的梯度（基于 Gumbel 的重参数化）。
推导一个搜索梯度，类似于策略梯度的信用分配，但其奖励来自损失 L_theta(Z) 的可微分值。
在期望意义上证明与基于 RL 的 NAS 目标等价，同时实现更高效的信用分配且无延迟奖励。
在目标函数中加入全局资源约束，按边分解以促进更小、更快的架构。
可选地包含资源成本项 C(Z)，并展示如何通过可处理的近似在 p_alpha(Z) 下计算其期望。

实验结果

研究问题

RQ1一个可微分、随机的 NAS 框架是否能够达到甚至超过基于 RL/进化的 NAS，同时减少训练时间并避免延迟奖励？
RQ2将架构采样与基于梯度的优化对齐，是否相较于 DARTS 和 ENAS 能改善信用分配和最终性能？
RQ3全局资源约束在不损失精度的前提下，在多大程度上降低模型大小和 FLOPs，并且是否可分解以实现可扩展优化？
RQ4学得的单元是否能够迁移到更大数据集（如 ImageNet），并在保持竞争力的准确性与效率？

主要发现

SNAS 在 mild constraint 下以 2.85% 测试误差和 2.8M 参数实现了具竞争力的 CIFAR-10 结果，优于 1st-order DARTS 和 ENAS，并以更少的参数匹配 2nd-order DARTS。
SNAS 的搜索过程在搜索阶段保持更高的验证准确率，并产生比 DARTS 更稳定、偏差更小的架构，实验中观察到 88% 的搜索验证准确率。
SNAS 产生的单元迁移到 ImageNet（移动端设置）时 top-1 误差为 27.3%，相对于基于 RL 的 NAS 显示出有竞争力的性能，同时计算量显著降低（减少三个数量级）。
在 CIFAR-10 实验中，SNAS 在 mild/moderate/aggressive 资源约束下发现了多样且越来越稀疏的单元结构，展示了准确率、参数数量和搜索成本之间可控的权衡。
SNAS 对派生的子网络在无需再训练的情况下保持较高的验证准确性，与 DARTS 不同，后者在搜索网络与派生网络之间可能出现显著差距。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。