QUICK REVIEW

[论文解读] Neural Architecture Optimization

Renqian Luo, Fei Tian|arXiv (Cornell University)|Aug 22, 2018

Advanced Neural Network Applications被引用 431

一句话总结

NAO 通过一个编码器-预测器-解码器三元组学习架构的连续嵌入，并在嵌入空间中进行梯度步骤优化架构，从而在计算开销更低的情况下获得具有竞争力的 NAS 结果。

ABSTRACT

Automatic neural architecture design has shown its potential in discovering powerful neural network architectures. Existing methods, no matter based on reinforcement learning or evolutionary algorithms (EA), conduct architecture search in a discrete space, which is highly inefficient. In this paper, we propose a simple and efficient method to automatic neural architecture design based on continuous optimization. We call this new approach neural architecture optimization (NAO). There are three key components in our proposed approach: (1) An encoder embeds/maps neural network architectures into a continuous space. (2) A predictor takes the continuous representation of a network as input and predicts its accuracy. (3) A decoder maps a continuous representation of a network back to its architecture. The performance predictor and the encoder enable us to perform gradient based optimization in the continuous space to find the embedding of a new architecture with potentially better accuracy. Such a better embedding is then decoded to a network by the decoder. Experiments show that the architecture discovered by our method is very competitive for image classification task on CIFAR-10 and language modeling task on PTB, outperforming or on par with the best results of previous architecture search methods with a significantly reduction of computational resources. Specifically we obtain 1.93% test set error rate for CIFAR-10 image classification task and 56.0 test set perplexity of PTB language modeling task. Furthermore, combined with the recent proposed weight sharing mechanism, we discover powerful architecture on CIFAR-10 (with error rate 2.93%) and on PTB (with test set perplexity 56.6), with very limited computational resources (less than 10 GPU hours) for both tasks.

研究动机与目标

推动自动化神经体系结构设计以提升相较于离散空间强化学习/进化方法的搜索效率。
提出一个连续空间 NAS 框架（NAO），用于对架构进行嵌入、预测和解码。
展示在嵌入空间的梯度优化能够得到具有强性能和可迁移性的架构。

提出的方法

使用单层 LSTM 编码器将神经架构编码为一个连续嵌入。
用在开发集准确率上训练的回归模型来预测架构性能。
用带注意力的 LSTM 解码器将嵌入解码回离散架构以恢复字符串。
通过对预测器输出进行梯度上升来优化嵌入，以获得更可能产生更好架构的新嵌入。
以预测损失与架构重建损失相结合的多任务目标联合训练编码器、预测器和解码器。

实验结果

研究问题

RQ1离散架构的连续嵌入是否能够实现 NAS 的高效梯度优化？
RQ2编码器-预测器-解码器组三元组在 CIFAR-10、PTB 与迁移任务上对架构性能的预测和改进程度如何？
RQ3NAO 在减少计算资源的同时，是否能产生与以往 NAS 方法相竞争或更优的架构？
RQ4所发现的架构是否可迁移到其他数据集（CIFAR-100、ImageNet、WikiText-2）？

主要发现

模型	B	N	F	#op	错误率 (%)	#参数	M	GPU 天数
DenseNet-BC	-	100	40	3	3.46	25.6M	/	/
ResNeXt-29	-	-	-	-	3.58	68.1M	/	/
NASNet-A	5	6	32	13	3.41	3.3M	20000	2000
NASNet-B	5	4	N/A	13	3.73	2.6M	20000	2000
NASNet-C	5	4	N/A	13	3.59	3.1M	20000	2000
Hier-EA	5	2	64	6	3.75	15.7M	7000	300
AmoebaNet-A	5	6	36	10	3.34	3.2M	20000	3150
AmoebaNet-B	5	6	36	19	3.37	2.8M	27000	3150
AmoebaNet-B (128)	5	6	128	19	2.98	34.9M	27000	3150
AmoebaNet-B (128) + Cutout	5	6	128	19	2.13	34.9M	27000	3150
PNAS	5	3	48	8	3.41	3.2M	1280	225
ENAS	5	5	36	5	3.54	4.6M	/	0.45
Random-WS	5	5	36	5	3.92	3.9M	/	0.25
DARTS + Cutout	5	6	36	7	2.83	4.6M	/	4
NAONet	5	6	36	11	3.18	10.6M	1000	200
NAONet	5	6	64	11	2.98	28.6M	1000	200
NAONet + Cutout	5	6	36	11	2.48	10.6M	1000	200
NAONet + Cutout	5	6	128	11	1.93	144.6M	1000	200
NAONet-WS	5	5	36	5	3.53	2.5M	/	0.3
NAONet-WS + Cutout	5	5	36	5	2.93	2.5M	/	0.3

NAO 发现的架构在 CIFAR-10（使用 Cutout）达到 1.93% 的测试误差，在 PTB 上达到 56.0 的困惑度，竞争性或优于以往的 NAS 方法。
在权重共享条件下，NAO 在使用不到 10 GPU 小时的情况下达到 CIFAR-10 的 2.93% 误差和 PTB 的 56.6 困惑度。
将 NAO 找到的架构迁移到 CIFAR-100 和 ImageNet 可得到强结果（CIFAR-100：误差 14.75%；ImageNet top-1：25.7%）。
NAO+权重共享可以在较少评估模型数量下找到具有竞争力的架构（例如表格对比中的 1000 例 vs 20000 例）。
编码器在约 500 个训练架构下即可达到预测质量的 >78% 的成对准确性；解码器几乎能完全恢复架构（平均汉明距离 < 0.5 个符号）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。