QUICK REVIEW

[论文解读] Towards Automated Deep Learning: Efficient Joint Neural Architecture and Hyperparameter Search

Arber Zela, Aaron Klein|arXiv (Cornell University)|Jul 18, 2018

Machine Learning and Data Classification参考文献 21被引用 88

一句话总结

本文提出 BOHB，通过渐进式增大预算联合优化神经架构和超参数，在 3 小时限制内呈现与 CIFAR-10 竞争的结果，并揭示架构与超参数之间的预算感知交互。

ABSTRACT

While existing work on neural architecture search (NAS) tunes hyperparameters in a separate post-processing step, we demonstrate that architectural choices and other hyperparameter settings interact in a way that can render this separation suboptimal. Likewise, we demonstrate that the common practice of using very few epochs during the main NAS and much larger numbers of epochs during a post-processing step is inefficient due to little correlation in the relative rankings for these two training regimes. To combat both of these problems, we propose to use a recent combination of Bayesian optimization and Hyperband for efficient joint neural architecture and hyperparameter search.

研究动机与目标

推动神经架构与超参数的联合优化，而非事后调优。
显示短训练预算与长预算的性能不一定相关，需考虑预算对结果的影响。
展示一个可 anytime、预算感知的 AutoML 方法，逐步增加资源。
在 3 小时约束下评估 CIFAR-10 的联合 NAS-HP 搜索。

提出的方法

将神经架构搜索视为具有分类和条件超参数的超参数优化问题。
采用 BOHB，即贝叶斯优化与 Hyperband 的结合，用于高效的多预算搜索。
定义一个具有 10 个架构选择和 7 个超参数的联合搜索空间，用于多分支残差结构。
使用逐次折扣（successive halving）在各预算中为有前景的配置分配更多计算。
在多种预算下训练和评估配置（如 400s、1200s、1h、3h），以捕捉预算感知的性能。
与手工构建的架构进行比较，分析预算相关性和参数重要性。

Figure 1: Validation error of all configurations evaluated on the different budgets during the whole optimization procedure. The best performing configuration (incumbent) as a function of time is visualized by the black line.

实验结果

研究问题

RQ1神经架构搜索是否可以与超参数优化结合有效执行？
RQ2短预算与长预算在排序配置方面的相关性如何，优化过程中应使用何种预算？
RQ3在 CIFAR-10 的严格时间预算下，BOHB 方法是否有效？
RQ4在有限计算预算下，哪些架构和超参数选择最具影响力？

主要发现

Network	Params	Test error (%)
ResNet-18	11.2M	3.34±0.11
Shake-Shake 26 2x32d	2.9M	3.91±0.09
Shake-Shake 26 2x64d	11.7M	3.38±0.07
Shake-Shake 26 2x96d	26.2M	4.22±0.06
Ours	27.6M	3.18±0.16

在 3 小时预算内，使用 BOHB 的联合架构与超参数搜索能够得到与 CIFAR-10 竞争的结果（测试误差 3.18%）。
在 3 小时内表现最好的架构是中等规模的多分支残差网络（26 2x64d）。
Spearman 相关显示相邻预算之间存在强烈的一致性，但跨越较大预算间隔时相关性迅速下降，使短预算排名对长预算选择不可靠。
预算感知分析（fANOVA）表明随着预算变化，不同超参数和架构选择的重要性会增减，揭示交互效应。
基于 BOHB 的搜索在相同优化管线和预算下优于若干标准架构，展示了联合优化的价值。

Figure 2: Parameter importance plots for three hyperparameters for training 400s (top row) and 1h (bottom row). The importance indicates the fraction of the variance explained by the individual choice(s). The value of the best found configuration on this budget is indicated by the dashed line/ gray

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。