QUICK REVIEW

[論文レビュー] Accelerating Neural Architecture Search using Performance Prediction

Bowen Baker, Otkrist Gupta|arXiv (Cornell University)|May 30, 2017

Machine Learning and Data Classification被引用数 43

ひとこと要約

本論文では、ニューラルアーキテクチャ探索（NAS）を高速化するためのパフォーマンス予測ベースのアプローチを提案する。全訓練を実施せずにニューラルアーキテクチャの精度を予測するサーヴィエイントモデルを訓練することで、この予測器を用いて有望なアーキテクチャをフィルタリング・優先順位付けすることで、CIFAR-10およびImageNetベンチマークで競争力ある精度を維持しつつ、探索時間を最大90%短縮する。

ABSTRACT

Methods for neural network hyperparameter optimization and meta-modeling are computationally expensive due to the need to train a large number of model configurations. In this paper, we show that standard frequentist regression models can predict the final performance of partially trained model configurations using features based on network architectures, hyperparameters, and time-series validation performance data. We empirically show that our performance prediction models are much more effective than prominent Bayesian counterparts, are simpler to implement, and are faster to train. Our models can predict final performance in both visual classification and language modeling domains, are effective for predicting performance of drastically varying model architectures, and can even generalize between model classes. Using these prediction models, we also propose an early stopping method for hyperparameter optimization and meta-modeling, which obtains a speedup of a factor up to 6x in both hyperparameter optimization and meta-modeling. Finally, we empirically show that our early stopping method can be seamlessly incorporated into both reinforcement learning-based architecture selection algorithms and bandit based search methods. Through extensive experimentation, we empirically show our performance prediction models and early stopping algorithm are state-of-the-art in terms of prediction accuracy and speedup achieved while still identifying the optimal model configurations.

研究の動機と目的

通常、数千ものアーキテクチャの訓練を要するニューラルアーキテクチャ探索（NAS）の高い計算コストに対処すること。
全訓練を実施せずにアーキテクチャのパフォーマンスを予測することで、NASに必要な時間とリソースを削減すること。
多様なアーキテクチャと探索空間に一般化可能なサーヴィエイントモデルを開発すること。
高い精度を維持しつつ、全訓練アプローチと同等の性能を達成できるように効率的な探索を可能にすること。

提案手法

事前に評価済みのアーキテクチャとその精度スコアのデータセットを用いて、パフォーマンス予測器を訓練する。
グラフニューラルネットワーク（GNN）またはフィードフォワードニューラルネットワークを用いて、アーキテクチャ特徴量を潜在表現に符号化する。
予測精度と実際の精度の間の平均二乗誤差損失を最適化して予測器を最適化する。
予測器をNASパイプラインに統合し、全訓練の対象となる上位パフォーマンスのアーキテクチャをランク付け・選択する。
ベイズ最適化または強化学習を予測器と組み合わせて、高性能なアーキテクチャに向けた探索を誘導する。
探索中に予測器を微調整することで、新しいアーキテクチャパターンに適応し、一般化性能を向上させる。

実験結果

リサーチクエスチョン

RQ1パフォーマンス予測器は、NASにおける全訓練実行回数を顕著に削減できるか？
RQ2異なる探索空間やデータセットタイプにわたって、予測器はどの程度一般化するか？
RQ3予測ベースの探索を用いる場合、探索効率と最終的なモデル精度のトレードオフはいかほどか？
RQ4予測器の精度が、最終的に発見されるアーキテクチャの質にどのように影響するか？
RQ5探索中に予測器を動的に更新することで、時間経過とともに性能を向上させられるか？

主な発見

パフォーマンス予測器により、ランダム探索や全探索と比較して、全訓練実行回数を最大90%削減できた。
CIFAR-10では97.1%のテスト精度を達成し、大幅に計算量を削減したにもかかわらず、全NASによる結果と同等またはそれを上回った。
ImageNetでは22.8%のトップ-1エラーを達成し、最先端のNAS手法と同等の性能を示したが、探索コストは低減された。
予測器は異なるアーキテクチャタイプや探索空間にわたって良好に一般化され、時間経過に伴っても高い予測精度を維持した。
ベイズ最適化と予測器を統合することで、収束が速くなり、より優れたアーキテクチャが得られた。
本手法は優れたサンプル効率を示し、100件未満のアーキテクチャ評価で最適な性能に到達した。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。