Skip to main content
QUICK REVIEW

[論文レビュー] Towards Automated Deep Learning: Efficient Joint Neural Architecture and Hyperparameter Search

Arber Zela, Aaron Klein|arXiv (Cornell University)|Jul 18, 2018
Machine Learning and Data Classification参考文献 21被引用数 88
ひとこと要約

本論文は BOHB を用いてニューラルアーキテクチャとハイパーパラメータを逐次的に予算を増やしながら共同最適化することを提案し、3時間制限内で CIFAR-10 の競争力のある結果を示し、アーキテクチャとハイパーパラメータ間の予算依存の相互作用を明らかにする。

ABSTRACT

While existing work on neural architecture search (NAS) tunes hyperparameters in a separate post-processing step, we demonstrate that architectural choices and other hyperparameter settings interact in a way that can render this separation suboptimal. Likewise, we demonstrate that the common practice of using very few epochs during the main NAS and much larger numbers of epochs during a post-processing step is inefficient due to little correlation in the relative rankings for these two training regimes. To combat both of these problems, we propose to use a recent combination of Bayesian optimization and Hyperband for efficient joint neural architecture and hyperparameter search.

研究の動機と目的

  • Motivate joint optimization of architecture and hyperparameters rather than post-hoc tuning.
  • Show that short training budgets may not correlate well with long-budget performance.
  • Demonstrate an anytime, budget-aware AutoML approach that gradually increases resources.
  • Evaluate joint NAS-HP search on CIFAR-10 under a 3-hour constraint.

提案手法

  • Cast neural architecture search as a hyperparameter optimization problem with categorical and conditional hyperparameters.
  • Adopt BOHB, a combination of Bayesian optimization and Hyperband, for efficient multi-budget search.
  • Define a joint search space with 10 architectural choices and 7 hyperparameters for a multi-branch residual architecture.
  • Use successive halving to allocate more compute to promising configurations across budgets.
  • Train and evaluate configurations under multiple budgets (e.g., 400s, 1200s, 1h, 3h) to capture budget-aware performance.
  • Compare to manually constructed architectures and analyze budget correlations and parameter importance.
Figure 1: Validation error of all configurations evaluated on the different budgets during the whole optimization procedure. The best performing configuration (incumbent) as a function of time is visualized by the black line.
Figure 1: Validation error of all configurations evaluated on the different budgets during the whole optimization procedure. The best performing configuration (incumbent) as a function of time is visualized by the black line.

実験結果

リサーチクエスチョン

  • RQ1Can neural architecture search be effectively performed jointly with hyperparameter optimization?
  • RQ2How do short and long training budgets correlate in ranking configurations, and what budget should be used during optimization?
  • RQ3Is the BOHB approach effective under a strict time budget for CIFAR-10?
  • RQ4What architectural and hyperparameter choices are most influential under limited compute budgets?

主な発見

NetworkParamsTest error (%)
ResNet-1811.2M3.34±0.11
Shake-Shake 26 2x32d2.9M3.91±0.09
Shake-Shake 26 2x64d11.7M3.38±0.07
Shake-Shake 26 2x96d26.2M4.22±0.06
Ours27.6M3.18±0.16
  • Joint architecture and hyperparameter search with BOHB yields competitive CIFAR-10 results within a 3-hour budget (test error 3.18%).
  • The best performing architecture under 3h is a medium-sized multi-branch residual network (26 2x64d).
  • Spearman correlations show strong alignment between adjacent budgets but degrade quickly across larger budget gaps, making short-budget rankings unreliable for long-budget selection.
  • Budget-aware analysis (fANOVA) indicates different hyperparameters and architectural choices gain or lose importance as the budget changes, highlighting interaction effects.
  • BOHB-based search outperformed several standard architectures under the same optimization pipeline and budget, demonstrating the value of joint optimization.
Figure 2: Parameter importance plots for three hyperparameters for training 400s (top row) and 1h (bottom row). The importance indicates the fraction of the variance explained by the individual choice(s). The value of the best found configuration on this budget is indicated by the dashed line/ gray
Figure 2: Parameter importance plots for three hyperparameters for training 400s (top row) and 1h (bottom row). The importance indicates the fraction of the variance explained by the individual choice(s). The value of the best found configuration on this budget is indicated by the dashed line/ gray

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。