QUICK REVIEW

[论文解读] Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective

Wuyang Chen, Xinyu Gong|arXiv (Cornell University)|Feb 23, 2021

Advanced Neural Network Applications参考文献 67被引用 61

一句话总结

TE-NAS 通过利用 neural tangent kernel 的光谱（NTK 光谱）与线性区域数量对架构进行排序，并结合基于剪枝的搜索策略，执行训练无关的神经网络架构搜索，在大幅降低成本的同时取得具有竞争力的 NAS 结果。

ABSTRACT

Neural Architecture Search (NAS) has been explosively studied to automate the discovery of top-performer neural networks. Current works require heavy training of supernet or intensive architecture evaluations, thus suffering from heavy resource consumption and often incurring search bias due to truncated training or approximations. Can we select the best neural architectures without involving any training and eliminate a drastic portion of the search cost? We provide an affirmative answer, by proposing a novel framework called training-free neural architecture search (TE-NAS). TE-NAS ranks architectures by analyzing the spectrum of the neural tangent kernel (NTK) and the number of linear regions in the input space. Both are motivated by recent theory advances in deep networks and can be computed without any training and any label. We show that: (1) these two measurements imply the trainability and expressivity of a neural network; (2) they strongly correlate with the network's test accuracy. Further on, we design a pruning-based NAS mechanism to achieve a more flexible and superior trade-off between the trainability and expressivity during the search. In NAS-Bench-201 and DARTS search spaces, TE-NAS completes high-quality search but only costs 0.5 and 4 GPU hours with one 1080Ti on CIFAR-10 and ImageNet, respectively. We hope our work inspires more attempts in bridging the theoretical findings of deep networks and practical impacts in real NAS applications. Code is available at: https://github.com/VITA-Group/TENAS.

研究动机与目标

通过消除训练需求并利用可训练性与表达能力的理论指标来降低 NAS 成本的动机。
识别与测试准确性相关的训练无关度量（NTK 光谱与线性区域）。
开发基于剪枝的 NAS 工作流程，在平衡可训练性与表达能力的同时高效搜索架构。
在 NAS-Bench-201、DARTS 空间下的 CIFAR-10 以及 ImageNet 的 DARTS 空间中验证 TE-NAS 的有效性。

提出的方法

提出 TE-NAS，这是一个基于两个指标的训练无关 NAS 框架：NTK 条件数 kappa_N 以反映可训练性，以及线性区域数量 R_N 以反映表达能力。
在不进行训练或标注的情况下测量 kappa_N 与 R_N，并实证地展示它们与测试准确性的相关性。
通过等权重的相对排序将这两个指标结合起来以指导架构选择。
引入基于重要性的剪枝机制，逐步将超网络简化为单路径架构，从而加速搜索。
在 NAS-Bench-201 和 DARTS 空间（包括 CIFAR-10 与 ImageNet）上验证 TE-NAS 的训练无关搜索成本。

实验结果

研究问题

RQ1训练无关、无标签指标（如 NTK 光谱与线性区域数量）是否能够有效对 NAS 架构按其最终测试准确性进行排序？
RQ2与基于训练的 NAS 方法相比，基于剪枝的训练无关 NAS 工作流是否能以更低成本得到具有竞争力的架构？
RQ3可训练性（kappa_N）与表达能力（R_N）如何影响不同搜索空间中的操作符选择？
RQ4在将 TE-NAS 应用于 CIFAR-10 与 ImageNet 任务时，实际的搜索时间节省与性能权衡是什么？

主要发现

两个训练无关指标与性能相关：较低的 NTK 条件数 kappa_N（可训练性）和较高的线性区域数量 R_N（表达能力）与较高的测试准确性相关。
TE-NAS 在显著降低搜索时间的情况下实现具有竞争力的 NAS 结果：在 CIFAR-10 上仅需 0.5 GPU 小时，在 ImageNet 上需要 4 GPU 小时，使用一张 1080Ti。
在 NAS-Bench-201 中，TE-NAS 在训练无关搜索条件下在 CIFAR-10、CIFAR-100 和 ImageNet-16-120 上达到报告方法中的最高准确度（给出均值与标准差）。
在 DARTS 空间的 CIFAR-10 上，TE-NAS 以 0.05 GPU-days 的搜索成本（训练无关）达到 2.63% 的测试误差。
在移动设置下的 ImageNet 的 DARTS 空间中，TE-NAS 以 0.17 GPU-days 的搜索成本（训练无关）达到 top-1 24.5% 和 top-5 7.5% 的性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。