QUICK REVIEW

[論文レビュー] A Statistically Reliable Optimization Framework for Bandit Experiments in Scientific Discovery

Tong Li, Travis Mandel|arXiv (Cornell University)|Mar 11, 2026

Advanced Bandit Algorithms Research被引用数 0

ひとこと要約

論文は適応式バンディットサンプリング下で妥当な仮説検定を可能にする一般的なアルゴリズム誘導テスト（AIT）補正を提案し、報酬と統計的検力のトレードオフを組み込んだ目的関数と、ユーザー指定のコストの下でバンディットパラメータを選択する最適化枠組みを導入する。

ABSTRACT

Scientific experimentation is largely driven by statistical hypothesis testing to determine significant differences in interventions. Traditionally, experimenters allocate samples uniformly between each intervention. However, such an approach may lead to suboptimal outcomes - multi-armed bandits (MABs) addresses this problem by allocating samples adaptively to maximize outcomes. Yet, two challenges have hindered the use of MABs in scientific domains. First, common hypothesis tests (e.g., $t$-tests) become invalid under adaptive sampling without correction, leading to inflated type~I and type~II errors. This is an understudied problem, and prior solutions suffer from issues such as low statistical power which prevent adoption in many practical settings. Second, practitioners must explicitly balance cumulative reward with statistical efficiency, yet no general methodology exists to quantify this trade-off across algorithms. In this paper, we study assumption modification and critical region correction approaches for hypothesis testing that enable common tests to be applied to adaptively collected data. We provide heuristic justification for its power efficiency and show in simulation that it achieves higher power than existing approaches. Further, we derive a theoretically and practically motivated objective function for adaptive experiment evaluation, which we integrate into a unified experimental framework. Our framework asks experimenters to specify an experiment extension cost for their problem, and based on that enables our proposed optimization procedure to select the bandit algorithm that best balances reward and power in their setting. We show that our approach enables practitioners to improve outcomes with only slightly more steps than uniform randomization, while retaining statistical validity.

研究の動機と目的

適応的（バンディット）サンプリングを用いて実験結果を改善しつつ、妥当な統計推定を維持する動機付け。
任意のバンディットアルゴリズムと一般的な検定に対して有効な型Iエラー制御を生み出す一般的なテスト補正アプローチの提供。
ユーザー定義のホライズン/コストの下で報酬と統計的検力のバランスを取る目的関数の導入。
コストと検定力の制約を考慮して、バンディットアルゴリズムと実験長を推奨する最適化フレームワークの開発。
一般的なバンディットアルゴリズムと仮説検定（TS、ε-TS、UCB、t検定、ANOVA、Tukey検定など）を用いたシミュレーションを通じて提案手法の評価。

提案手法

AIT補正を提案し、同じ適応アルゴリズムの下でデータ収集をシミュレーションして帰無分布を構築し、検定統計量の帰無分布を推定する。
単純仮説では、適応データ収集下におけるAIT補正付きLR検定が最も検出力が高いことを示す。
実験拡張コストパラメータwを定義し正当化し、目的関数F(T,R,w)=R/T - w*log(T)を導出して報酬とホライズンを定量化する。
選択された目的関数とその望ましい特性（単調性、スケール・シフト整合性）を正当化するPDEベースのアイソ値条件を形式化する。
検定力制約の下で提案目的関数を最大化するよう、バンディットアルゴリズムのパラメータとホライズンを選択する最適化手続きを開発する。

Figure 1 . Screenshot of our optimization framework web application, showing the relative ECP-reward performance for the empirical study inspired simulation. Note the best setting for $\epsilon$ -TS outperforms TS and UR near the $w=0.01$ .

実験結果

リサーチクエスチョン

RQ1適応バンディットデータ収集下で、任意のアルゴリズムと検定に対して仮説検定を有効に保つ補正方法はどうあるべきか？
RQ2累積報酬と統計的検力のトレードオフを適応実験でどのように定量化し最適化するか？
RQ3ユーザー指定のコストを考慮して、報酬とホライズンのバランスを最も取るアルゴリズムフレームワークはどれか？
RQ4提案された補正は、一般的なアプローチ（例：ART）と比較して、パワーとFPRの観点でどのような違いがあるか？
RQ5実際の科学的設定で、バンディットパラメータと実験長を選択する際に実務的な指針をこのフレームワークは提供できるか？

主な発見

AIT補正は、TS、ε-greedy、UCBなどの複数のアルゴリズムにおいて、従来のアプローチよりも検出力が高く、経験的FPRが所定の目標値（約0.05）付近を維持する。
単純仮説設定では、適応データ収集下でのAIT補正付きLRT検定が最適である。
提案されたECP報酬目的関数F(T,R,w)=R/T - w*log(T)は、平均報酬と実験延長コストのトレードオフを符号化し、有用な単調性とスケール・シフト特性を持つ。
この枠組みは、与えられたwに対して報酬と統計効率のバランスを取るバンディットパラメータとホライズンを推奨する最適化ツールキットを提供する。
シミュレーション研究は、アプローチが妥当な推論と実務的な性能の改善を、均等な乱択に比べて比較的 modest なステップ増加で達成することを示唆している。
この方法は、一般的なバンディットアルゴリズム（TS、ε-TS、UCB）と標準検定（t検定、ANOVA、Tukey検定）を用いて実証されている。

Figure 2 . Screenshot of our optimization framework web application user input page.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。