QUICK REVIEW

[論文レビュー] Neural Architecture Optimization

Renqian Luo, Fei Tian|arXiv (Cornell University)|Aug 22, 2018

Advanced Neural Network Applications被引用数 431

ひとこと要約

NAOはアーキテクチャの連続埋め込みをエンコーダ-予測子-デコーダの三部で学習し、埋め込み空間で勾配法を用いてアーキテクチャを最適化する。これにより計算量を削減しつつ、NASの結果と競合する性能を実現。

ABSTRACT

Automatic neural architecture design has shown its potential in discovering powerful neural network architectures. Existing methods, no matter based on reinforcement learning or evolutionary algorithms (EA), conduct architecture search in a discrete space, which is highly inefficient. In this paper, we propose a simple and efficient method to automatic neural architecture design based on continuous optimization. We call this new approach neural architecture optimization (NAO). There are three key components in our proposed approach: (1) An encoder embeds/maps neural network architectures into a continuous space. (2) A predictor takes the continuous representation of a network as input and predicts its accuracy. (3) A decoder maps a continuous representation of a network back to its architecture. The performance predictor and the encoder enable us to perform gradient based optimization in the continuous space to find the embedding of a new architecture with potentially better accuracy. Such a better embedding is then decoded to a network by the decoder. Experiments show that the architecture discovered by our method is very competitive for image classification task on CIFAR-10 and language modeling task on PTB, outperforming or on par with the best results of previous architecture search methods with a significantly reduction of computational resources. Specifically we obtain 1.93% test set error rate for CIFAR-10 image classification task and 56.0 test set perplexity of PTB language modeling task. Furthermore, combined with the recent proposed weight sharing mechanism, we discover powerful architecture on CIFAR-10 (with error rate 2.93%) and on PTB (with test set perplexity 56.6), with very limited computational resources (less than 10 GPU hours) for both tasks.

研究の動機と目的

自動ニューラルアーキテクチャ設計を動機づけ、離散空間強化学習/進化戦略による探索効率を改善する。
連続空間の NAS フレームワーク（NAO）を提案し、アーキテクチャを埋め込み・予測・デコードする。
埋め込み空間での勾配ベースの最適化が、優れた性能と転移可能な結果をもたらすアーキテクチャを生み出せることを示す。

提案手法

1層のLSTMエンコーダを用いてニューラルアーキテクチャを連続埋め込みへ変換する。
開発セットの精度で学習した回帰モデルを用いてアーキテクチャの性能を予測する。
注意機構を持つLSTMデコーダで埋め込みを離散的なアーキテクチャへデコードし、文字列を復元する。
予測子の出力に基づく勾配上昇を用いて埋め込みを最適化し、より良いアーキテクチャを生み出す新しい埋め込みを得る。
予測損失とアーキテクチャ再構成損失を組み合わせたマルチタスク目的関数で、エンコーダ・予測子・デコーダを共同訓練する。

Figure 1: The general framework of NAO. Better viewed in color mode. The original architecture $x$ is mapped to continuous representation $e_{x}$ via encoder network. Then $e_{x}$ is optimized into $e_{x^{\prime}}$ via maximizing the output of performance predictor $f$ using gradient ascent (the gre

実験結果

リサーチクエスチョン

RQ1離散的なアーキテクチャの連続埋め込みはNASにおける勾配ベース最適化を効率化できるか？
RQ2エンコーダ-予測子-デコーダの三部がCIFAR-10、PTB、転移タスク全体でアーキテクチャの性能をどの程度予測・改善できるか？
RQ3NAOは前方のNAS手法と競合する、あるいはそれを上回るアーキテクチャを、計算資源を削減して生み出せるか？
RQ4発見されたアーキテクチャは他のデータセット（CIFAR-100、ImageNet、WikiText-2）へ転移可能か？

主な発見

モデル	B	N	F	#操作	誤差(%)	パラメータ数	M	GPU日数
DenseNet-BC	-	100	40	3	3.46	25.6M	/	/
ResNeXt-29	-	-	-	-	3.58	68.1M	/	/
NASNet-A	5	6	32	13	3.41	3.3M	20000	2000
NASNet-B	5	4	N/A	13	3.73	2.6M	20000	2000
NASNet-C	5	4	N/A	13	3.59	3.1M	20000	2000
Hier-EA	5	2	64	6	3.75	15.7M	7000	300
AmoebaNet-A	5	6	36	10	3.34	3.2M	20000	3150
AmoebaNet-B	5	6	36	19	3.37	2.8M	27000	3150
AmoebaNet-B (128)	5	6	128	19	2.98	34.9M	27000	3150
AmoebaNet-B (128) + Cutout	5	6	128	19	2.13	34.9M	27000	3150
PNAS	5	3	48	8	3.41	3.2M	1280	225
ENAS	5	5	36	5	3.54	4.6M	/	0.45
Random-WS	5	5	36	5	3.92	3.9M	/	0.25
DARTS + Cutout	5	6	36	7	2.83	4.6M	/	4
NAONet	5	6	36	11	3.18	10.6M	1000	200
NAONet	5	6	64	11	2.98	28.6M	1000	200
NAONet + Cutout	5	6	36	11	2.48	10.6M	1000	200
NAONet + Cutout	5	6	128	11	1.93	144.6M	1000	200
NAONet-WS	5	5	36	5	3.53	2.5M	/	0.3
NAONet-WS + Cutout	5	5	36	5	2.93	2.5M	/	0.3

NAOはCIFAR-10（cutout使用）でテスト誤差1.93%、PTBで56.0 perplexityを達成し、従来のNAS法と競合するか、それを上回る性能を示した。
重み共有を用いた場合、NAOはCIFAR-10で誤差2.93%、PTBで56.6 perplexityを、10GPU時間以下で達成した。
NAOで発見したアーキテクチャをCIFAR-100およびImageNetへ転移すると良好な結果（CIFAR-100: 誤差14.75%、ImageNetトップ1: 25.7%）を得た。
NAO+重み共有は、評価モデル数を大幅に減らした状態でも競争力のあるアーキテクチャを見つけ得る（例: 表の比較で1000対20000等）。
エンコーダは予測品質で約500の学習アーキテクチャに対してペアワイズ精度>78%を達成；デコーダはアーキテクチャをほぼ正確に回復（平均ハミング距離<0.5トークン）する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。