QUICK REVIEW

[論文レビュー] Bayesian Optimization with Gradients

Jian Wu, Matthias Poloczek|arXiv (Cornell University)|Mar 13, 2017

Advanced Bandit Algorithms Research参考文献 34被引用数 95

ひとこと要約

d-KGを導入する。これは勾配情報を利用する微分可能なナレッジグラデーション取得法で、勾配情報（ノイズあり/部分的でも可）を用いてより効率的なグローバル最適化を実現する。高速な離散化不要計算と理論的保証を特徴とする。

ABSTRACT

Bayesian optimization has been successful at global optimization of expensive-to-evaluate multimodal objective functions. However, unlike most optimization methods, Bayesian optimization typically does not use derivative information. In this paper we show how Bayesian optimization can exploit derivative information to decrease the number of objective function evaluations required for good performance. In particular, we develop a novel Bayesian optimization algorithm, the derivative-enabled knowledge-gradient (dKG), for which we show one-step Bayes-optimality, asymptotic consistency, and greater one-step value of information than is possible in the derivative-free setting. Our procedure accommodates noisy and incomplete derivative information, comes in both sequential and batch forms, and can optionally reduce the computational cost of inference through automatically selected retention of a single directional derivative. We also compute the d-KG acquisition function and its gradient using a novel fast discretization-free technique. We show d-KG provides state-of-the-art performance compared to a wide range of optimization procedures with and without gradients, on benchmarks including logistic regression, deep learning, kernel learning, and k-nearest neighbors.

研究の動機と目的

勾配情報を活用してベイズ最適化の効率を向上させる。
ノイズあり/不完全な勾配を扱う微分可能な知識勾配（d-KG）獲得を開発する。
d-KG獲得を計算・最適化するための高速で離散化不要な手法を提供する。
一歩先のベイズ最適性、VOIの増大、漸近的一貫性などの理論的性質を証明する。

提案手法

目的関数を関数と勾配観測を結合したガウス過程としてモデル化する。
GPを(f(x), ∇f(x))の多出力過程に拡張し、平均μ̃とカーネルK̃を用いる。
d-KGを、勾配の一括観測後の最小後方平均の期待削減として定義する。
関数値のみ、特定方向の勾配、または不完全な微分の観測を許可する。
外部最適化の確率的勾配上昇を可能にするため、d-KGの離散化不要で無偏の勾配推定量を提供する。
複数のGPハイパーパラメータサンプル上でd-KGを平均することでハイパーパラメータを完全ベイズ的に扱う。

実験結果

リサーチクエスチョン

RQ1微分情報（全て、部分、ノイズあり）をベイズ最適化にどう組み込むことができるか？
RQ2微分可能な知識勾配（d-KG）は微分なし法より情報値が高いか？
RQ3離散化なしで連続領域でd-KGを効率的に計算でき、理論的にも妥当でありうるか？
RQ4合成ベンチマークや実世界のMLタスク（カーネル学習、ロジスティック回帰、ディープラーニング、KNN）におけるd-KGの経験的利点は何か？
RQ5逐次およびバッチ設定、そして方向微分選択でd-KGはどう機能するか？

主な発見

d-KGは穏やかな条件の下で、微分なしKGよりも一歩の情報価値が高い。
獲得関数は高速な離散化不要法で計算でき、スケーラブルな最適化を可能にする。
d-KGは一歩のベイズ最適であり、有限の実現可能集合上で漸近的一貫性を持つ。
合成ベンチマークと実タスク（カーネル学習、ロジスティック回帰、ディープラーニング、KNN）全般でd-KGは最新性能を示す。
指向微分（ノイズあり/部分的でも可）の使用は、複数のベンチマークで勾配なし手法より性能を向上させる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。