QUICK REVIEW

[論文レビュー] Simple And Efficient Architecture Search for Convolutional Neural Networks

Thomas Elsken, Jan-Hendrik Metzen|arXiv (Cornell University)|Nov 13, 2017

Advanced Neural Network Applications参考文献 14被引用数 185

ひとこと要約

本論文は NASH を紹介します。ネットワークモーフィズムを用いたシンプルなヒルクライミングに基づくニューラルアーキテクチャ探索で、CNN を安価に生成・評価し、CIFAR-10/100 で競争力のある結果を得つつ、CPU/リソース使用量は単一ネットワークのトレーニングと同等ほどである。

ABSTRACT

Neural networks have recently had a lot of success for many tasks. However, neural network architectures that perform well are still typically designed manually by experts in a cumbersome trial-and-error process. We propose a new method to automatically search for well-performing CNN architectures based on a simple hill climbing procedure whose operators apply network morphisms, followed by short optimization runs by cosine annealing. Surprisingly, this simple method yields competitive results, despite only requiring resources in the same order of magnitude as training a single network. E.g., on CIFAR-10, our method designs and trains networks with an error rate below 6% in only 12 hours on a single GPU; training for one day reduces this error further, to almost 5%.

研究の動機と目的

手動での試行錯誤を減らすためにCNNアーキテクチャ設計を自動化する。
低い計算コストで動作する軽量な探索戦略を開発する。
完全な再学習なしでアーキテクチャを初期化・拡張するためにネットワークモーフィズムを活用する。

提案手法

機能を保持したままアーキテクチャ変換を可能にするよう、ネットワークモーフィズムを形式化する。
現在のモデルに対してランダムなモーフィズムを反復的に適用し、短時間でトレーニングする後継モデルを作るヒルクライミング探索(NASH)を用いる。
新しい候補を短い SGDR 実行で訓練し、検証セットで最良を選択する。
内ループの訓練を効率化するためにリスタート付きのコサインアニーリングを採用する。
性能を向上させるために複数回の反復からのスナップショットをアンサンブルすることもできる。

Figure 1: Visualization of our method. Based on the current best model, new models are generated and trained afterwards. The best model is than updated.

実験結果

リサーチクエスチョン

RQ1単純なネットワークモーフィズムは、CNN探索空間を効果的にナビゲートし、トレーニングコストを単一のネットワークに近づけることができるか。
RQ2モーフィズムを用いたヒルクライミングは、手作業設計や他の自動化手法と比べて競争力のあるアーキテクチャを生み出すか。
RQ3計算リソースに対して、CIFAR-10およびCIFAR-100でこの手法はどの程度スケールするか。

主な発見

モデル	費やしたリソース	# パラメータ (百万)	誤差 (%)
Shake-Shake (Gastaldi, 2017)	2 days, 2 GPUs	26	2.9
WRN 28-10 (Loshchilov & Hutter, 2017)	1 day, 1 GPU	36.5	3.86
Baker et al. (2016)	8-10 days, 10 GPUs	11	6.9
Cai et al. (2017)	3 days, 5 GPUs	19.7	5.7
Zoph & Le (2017)	800 GPUs, ? days	37.5	3.65
Real et al. (2017)	250 GPUs, ? days	5.4	5.4
Saxena & Verbeek (2016)	?	21	7.4
Brock et al. (2017)	3 days, 1 GPU	16.0	4.0
Ours (random networks, n_steps=5, n_neigh=1)	4.5 hours	4.4	6.5
Ours (n_steps=5, n_neigh=8)	0.5 days, 1 GPU	5.7	5.7
Ours (n_steps=8, n_neigh=8)	1 day, 1 GPU	19.7	5.2
Ours (snapshot ensemble)	2 days, 1 GPU	57.8	4.7
Ours (ensemble across runs)	1 day, 4 GPUs	88	4.4

NASH は、単一のネットワークをトレーニングするコスト程度で競争力のある CNN を発見・訓練する。
CIFAR-10 では、約12時間で 1 GPU の実行で誤差率が 6% 未満、1日後にはおよそ 5% に近づく。
CIFAR-100 では、1日で誤差率が 24% 未満、2日後にはおおよそ 20% に近づく。
スナップショットアンサンブルおよびクロスランアンサンブルは結果をさらに改善し、時には複数のベースラインを上回る。
発見されたアーキテクチャを一から再訓練しても同様の最終性能を示すことが多く、モーフィズムによる重みの継承が最終結果を傷つけないことを示唆している。

Figure 2: The best model found by Algorithm 1 tracked over time. With and without using SGDR for the training within the hill climbing (line 17). Final training (line 24) is not plotted. Red vertical lines highlight the times where network morphisms are applied (line 19).

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。