QUICK REVIEW

[論文レビュー] Information Directed Sampling and Bandits with Heteroscedastic Noise

Johannes Kirschner, Andreas Krause|arXiv (Cornell University)|Jan 29, 2018

Advanced Bandit Algorithms Research参考文献 16被引用数 73

ひとこと要約

本論文は、観測ノイズが評価点に依存するヘテロスケダシティ性を持つ確率的バンディットに対してInformation Directed Sampling (IDS) を提案し、後悔情報比率を用いて頻度主義の後悔境界を導出し、線形および RKHS 設定向けの IDS の変種をオンライン最小二乗法の濃度不等式とともに開発する。

ABSTRACT

In the stochastic bandit problem, the goal is to maximize an unknown function via a sequence of noisy evaluations. Typically, the observation noise is assumed to be independent of the evaluation point and to satisfy a tail bound uniformly on the domain; a restrictive assumption for many applications. In this work, we consider bandits with heteroscedastic noise, where we explicitly allow the noise distribution to depend on the evaluation point. We show that this leads to new trade-offs for information and regret, which are not taken into account by existing approaches like upper confidence bound algorithms (UCB) or Thompson Sampling. To address these shortcomings, we introduce a frequentist regret analysis framework, that is similar to the Bayesian framework of Russo and Van Roy (2014), and we prove a new high-probability regret bound for general, possibly randomized policies, which depends on a quantity we refer to as regret-information ratio. From this bound, we define a frequentist version of Information Directed Sampling (IDS) to minimize the regret-information ratio over all possible action sampling distributions. This further relies on concentration inequalities for online least squares regression in separable Hilbert spaces, which we generalize to the case of heteroscedastic noise. We then formulate several variants of IDS for linear and reproducing kernel Hilbert space response functions, yielding novel algorithms for Bayesian optimization. We also prove frequentist regret bounds, which in the homoscedastic case recover known bounds for UCB, but can be much better when the noise is heteroscedastic. Empirically, we demonstrate in a linear setting with heteroscedastic noise, that some of our methods can outperform UCB and Thompson Sampling, while staying competitive when the noise is homoscedastic.

研究の動機と目的

評価点に依存する観測ノイズ（ヘテロスケダシティ）を持つ確率的バンディットを動機づけ、 formalize する。
Russo and Van Roy (2014) に類似する頻度主義的な後悔フレームワークを開発し、後悔情報比率を定義する。
後悔情報比率をアクション分布全体で最小化する頻度主義的 Information Directed Sampling (IDS) を導入する。
頑健な信頼区間を可能にするため、オンライン最小二乗法の濃度不等式をヘテロスケダシティノイズに拡張する。
線形およびRKHS応答関数に対するIDSの変種を定式化し、対応する後悔境界と実用的アルゴリズムを導出する。

提案手法

後悔情報比率と総情報獲得量(gamma_T)に依存する、ランダム化方針の新しい後悔境界を定義する。
信頼区間 Delta_t^+ を用いてサロゲート後悔情報比率 Psi_t^+ を導入し、IDS最適化を可能にする。
Psi_t^+(mu) の極小値の存在と構造特性（例：2つのアクションのサポート）を証明し、より安価な代替として決定論的 IDS (DIDS) を導出する。
線形およびRKHS設定におけるヘテロスケダシティノイズへオンライン最小二乗法濃度不等式を一般化し、f の信頼区間を得る。
RKHS/線形設定で IDS を駆動する 2 つの情報獲得関数 I_t^F および I_t^UCB を定義し、それらを相互情報量および事後分散と関連づける。
アルゴリズム変種の概要を示し、同程度の均一ノイズ場合には UCB 的結果を回復する理論的後悔境界を提供し、ヘテロスケダシティノイズ下で改善をもたらす。

実験結果

リサーチクエスチョン

RQ1ヘテロスケダシティノイズはバンディットにおける探索-利用のトレードオフにどう影響するか？
RQ2ヘテロスケダシティノイズの下で、後悔情報比率を用いた頻度主義的後悔フレームワークが後悔を境界づけられるか？
RQ3ヘテロスケダシティ設定で、後悔を最小化し情報獲得を最大化するよう IDS を適応させられるか？
RQ4線形および RKHS モデルのヘテロスケダシティノイズへオンライン最小二乗法の濃度結果を拡張するにはどうすればよいか？
RQ5ヘテロスケダシティノイズ下で IDS の変種は UCB や Thompson Sampling を上回るか、同時に同質ノイズの場合の比較はどうか？

主な発見

後悔情報比率と gamma_T に依存するランダム化方針の新しい高確率後悔境界。
信頼区間を用いてサロゲート後悔情報比率を最小化する情報 Directed Sampling の頻度主義版。
線形ヘテロスケダシティ設定で IDS 変種が UCB および Thompson Sampling を上回ることを実証し、ノイズが均一な場合には競争力のある性能を示した。
有限次元および RKHS 設定の両方でヘテロスケダシティノイズへオンライン最小二乗法濃度不等式を拡張。
IDS 最小化解は最大で 2 アクションにサポートできることを証明し、連続アクション空間の計算可能性を助ける。
同質ノイズの場合には既知の UCB 型後悔境界を回復し、ヘテロスケダシティノイズ下で改善の可能性を提供。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。