QUICK REVIEW

[論文レビュー] A Survey on Practical Applications of Multi-Armed and Contextual Bandits

Djallel Bouneffouf, Irina Rish|arXiv (Cornell University)|Apr 2, 2019

Advanced Bandit Algorithms Research参考文献 20被引用数 106

ひとこと要約

本調査は、医療、金融、価格設定、レコメンダシステムなどにおける多腕および文脈バンディットの実践的応用を概観し、バンディット手法が現実世界の意思決定と機械学習のワークフローにどのように影響を与えるかを論じる。

ABSTRACT

In recent years, multi-armed bandit (MAB) framework has attracted a lot of attention in various applications, from recommender systems and information retrieval to healthcare and finance, due to its stellar performance combined with certain attractive properties, such as learning from less feedback. The multi-armed bandit field is currently flourishing, as novel problem settings and algorithms motivated by various practical applications are being introduced, building on top of the classical bandit problem. This article aims to provide a comprehensive review of top recent developments in multiple real-life applications of the multi-armed bandit. Specifically, we introduce a taxonomy of common MAB-based applications and summarize state-of-art for each of those domains. Furthermore, we identify important current trends and provide new perspectives pertaining to the future of this exciting and fast-growing field.

研究の動機と目的

ドメインを横断する実世界の MAB および文脈バンディット応用の分類法を提供する。
各ドメインで使用されている最先端アルゴリズムとその利点を要約する。
傾向、ギャップ、および今後の研究の指針となる未解決課題を特定する。

提案手法

標準的な MAB および文脈バンディットのフレームワークと、それらが現実世界の設定にどのように関連するかを説明する。
領域別の応用と、使用される対応するバンディット定式化（MAB 対 CMAB、定常 vs 非定常）をレビューする。
注目すべきアルゴリズムとモデリング手法を強調する（例：LINUCB、CTS、Thompson Sampling、side information を用いたバンディット）。
バンディットが機械学習ワークフローを補強する方法を、ハイパーパラメータチューニング、特徴選択、アクティブラーニング、RLオーケストレーションを含めて論じる。

実験結果

リサーチクエスチョン

RQ1MAB および CMAB が効果的に適用されている主な実世界のドメインは何か。
RQ2各ドメインで最も成功しているバンディット定式化とアルゴリズムは何か。
RQ3特定されたギャップと将来のバンディット研究およびドメイン横断の転移の機会は何か。
RQ4バンディット手法がハイパーパラメータ最適化やアクティブラーニングなど、より広い機械学習タスクをどのように強化できるか。

主な発見

実践的な MAB および CMAB 応用の広範な分類は、医療、金融、ダイナミックプライシング、レコメンダーシステム、影響最大化、情報検索、対話システム、異常検知、通信などに及ぶ。
文脈バンディットと非定常変種は複数の分野で用いられており、LINUCB、CTS、Thompson Sampling などの特定の選択が意思決定を导く。
バンディットは、限られたフィードバックと探索ニーズを伴うオンライン意思決定に利点を提供し、リアルタイムの適応的実験と学習を支援する。
ドメイン横断の転移やマルチタスクバンディット研究は限定的であり、バンディット設定における生涯学習と転移学習の機会を示唆している。
バンディットは、アルゴリズム選択、ハイパーパラメータ最適化（例：Hyperband）、特徴選択、アクティブラーニング、クラスタリング、オンラインRLオーケストレーションなど、機械学習パイプラインを補強できる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。