QUICK REVIEW

[論文レビュー] Practical Bayesian Optimization of Machine Learning Algorithms

Jasper Snoek, Hugo Larochelle|arXiv (Cornell University)|Jun 13, 2012

Gaussian Processes and Bayesian Inference参考文献 23被引用数 5,635

ひとこと要約

GP事前分布を用いた完全ベイズのベイズ最適化によるハイパーパラメータの導入。コスト認識の取得と並列取得を備え、さまざまな機械学習問題で専門家レベル以上のチューニングを実現。

ABSTRACT

Machine learning algorithms frequently require careful tuning of model hyperparameters, regularization terms, and optimization parameters. Unfortunately, this tuning is often a "black art" that requires expert experience, unwritten rules of thumb, or sometimes brute-force search. Much more appealing is the idea of developing automatic approaches which can optimize the performance of a given learning algorithm to the task at hand. In this work, we consider the automatic tuning problem within the framework of Bayesian optimization, in which a learning algorithm's generalization performance is modeled as a sample from a Gaussian process (GP). The tractable posterior distribution induced by the GP leads to efficient use of the information gathered by previous experiments, enabling optimal choices about what parameters to try next. Here we show how the effects of the Gaussian process prior and the associated inference procedure can have a large impact on the success or failure of Bayesian optimization. We show that thoughtful choices can lead to results that exceed expert-level performance in tuning machine learning algorithms. We also describe new algorithms that take into account the variable cost (duration) of learning experiments and that can leverage the presence of multiple cores for parallel experimentation. We show that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization on a diverse set of contemporary algorithms including latent Dirichlet allocation, structured SVMs and convolutional neural networks.

研究の動機と目的

MLアルゴリズムのハイパーパラメータ、正則化項、最適化設定のチューニングを自動化する。
ガウス過程の事前分布でモデルの一般化性能を評価して効率的な実験を導く。
試行時間のばらつきや並列評価といった実務的制約を最適化ループに組み込む。

提案手法

未知のハイパーパラメータ目的関数f(x)をモデル化するためにガウス過程の事前分布を用いる。
期待改良(EI)のような取得関数を採用しGP-UCBと比較する。
モンテカルロ(MCMC)で周辺化してGPハイパーパラメータを完全ベイズ的に扱う（EI with MCMC）。
期間c(x)をGPとしてコストモデルを取り込み、EI per secondを最適化する。
保留中の評価の可能な結果をモンテカルロ平均化して並列実験を可能にする。

実験結果

リサーチクエスチョン

RQ1ハイパーパラメータ上の完全ベイズGP事前分布はベイズ最適化の性能にどう影響するか？
RQ2コスト認識（EI per second）と並列性はハイパーパラメータのチューニング効率を実用的に改善できるか？
RQ3共分散関数の選択（例：Matérn 5/2対平方指数カーネル）は最適化の成功にどのような影響を与えるか？
RQ4保留中の評価を横断する統合取得は次の点の選択にどう影響するか？
RQ5これらの手法は現実のML問題で人間の専門家を上回るか？

主な発見

GPハイパーパラメータを周辺化した統合的なGP EI MCMCは、点推定ハイパーパラメータ戦略よりベンチマークで優れている。
EI per secondは実行時間の効率を、評価の速い構成を優先することで加速する。
並列化されたGP EI MCMC（N × GP EI MCMC）は大規模問題でグリッド探索より早くより良いパラメータを見つけられる。
異なる共分散の選択は最適化の成功に実質的に影響を与える。Matérn 5/2は平方指数より現実的な関数サンプルを得やすいことが多い。
CIFAR-10ではGP EI MCMCアプローチが専門家設定に比べ検証誤差を14.98%達成。
LDA、構造化SVM、CNNなどのタスク全般で、提案するベイズ最適化手法はしばしば人間の専門家や以前の自動手法を上回る。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。