QUICK REVIEW

[論文レビュー] Optimal Model Selection in Contextual Bandits with Many Classes via Offline Oracles

Sanath Kumar Krishnamurthy, Susan Athey|arXiv (Cornell University)|Jan 1, 2021

Advanced Bandit Algorithms Research参考文献 37被引用数 1

ひとこと要約

本稿では、確率的文脈的バンディットにおけるモデル選択を、オффラインモデル選択オракルに還元する新しい手法を提案する。これにより、回帰モデル選択と同等の計算コストで、柔軟で効率的なアルゴリズムが可能になる。実現可能クラスが存在する場合、対数要因を除き最適な実現可能性に基づくレグレットバウンドを達成し、未知の最良クラスの複雑さに適応する。

ABSTRACT

We study the problem of model selection for contextual bandits, in which the algorithm must balance the bias-variance trade-off for model estimation while also balancing the exploration-exploitation trade-off. In this paper, we propose the first reduction of model selection in contextual bandits to offline model selection oracles, allowing for flexible general purpose algorithms with computational requirements no worse than those for model selection for regression. Our main result is a new model selection guarantee for stochastic contextual bandits. When one of the classes in our set is realizable, up to a logarithmic dependency on the number of classes, our algorithm attains optimal realizability-based regret bounds for that class under one of two conditions: if the time-horizon is large enough, or if an assumption that helps with detecting misspecification holds. Hence our algorithm adapts to the complexity of this unknown class. Even when this realizable class is known, we prove improved regret guarantees in early rounds by relying on simpler model classes for those rounds and hence further establish the importance of model selection in contextual bandits.

研究の動機と目的

バイアス・バリアンスと探索・活用のトレードオフを両方管理しなければならない文脈的バンディットにおけるモデル選択の課題に対処すること。
文脈的バンディットにおけるモデル選択の複雑さを、オフラインモデル選択オーキュラスに還元し、汎用的なアルゴリズムを可能にすること。
実現可能クラスが存在する場合、最良のモデルクラスに対する最適なレグレットバウンドを達成すること。また、未知のクラス複雑さに適応すること。
完全な適応が成立する前段階で単純なモデルクラスを活用することで、初期ラウンドのパフォーマンスを向上させること。

提案手法

本手法は、文脈的バンディットにおけるオンラインモデル選択を、オフラインモデル選択オーキュラスに還元する。これにより、既存の回帰スタイルのモデル選択技術を活用する。
性能フィードバックに基づいて複数のモデルクラス間で動的に選択を行う、新しいアルゴリズムフレームワークを導入する。
オフラインモデル選択の複雑さを継承することで、追加のオーバーヘッドを回避し、計算効率を確保する。
モデルの誤指定を検出するメカニズムを組み込み、やや弱い仮定のもとで適応を支援する。
信頼区間に基づく選択戦略を用いることで、探索と活用のバランスを保ちつつ、レグレットの最適性を維持する。

実験結果

リサーチクエスチョン

RQ1レグレットの最適性を損なわずに、文脈的バンディットにおけるモデル選択をオフラインモデル選択オーキュラスに還元できるか？
RQ2どのような条件下で、候補となるモデルクラス集合の中で最良クラスの複雑さに適応できるアルゴリズムが構築できるか？
RQ3完全な適応が成立する前段階で単純なモデルクラスを用いることで、初期ラウンドのパフォーマンスをどのように向上させられるか？
RQ4実現可能クラスが存在する状況下で、誤指定検出の影響がレグレット保証に及ぼす影響は何か？

主な発見

提案されたアルゴリズムは、実現可能モデルクラスが存在する場合、クラス数の対数要因を除き、最適な実現可能性に基づくレグレットバウンドを達成する。
最適性は以下の2条件のいずれかが成立する場合に維持される：時間ホライズンが十分に大きい、または誤指定検出仮定が成り立つ。
未知の最良モデルクラスの複雑さに適応でき、最適クラスの事前知識がなくてもパフォーマンスが向上する。
実現可能クラスが事前に分かっている場合ですら、初期段階で単純なモデルを用いることで、レグレットが改善される。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。