QUICK REVIEW

[论文解读] On the Optimal Sample Complexity for Best Arm Identification

Lijie Chen, Jian Li|arXiv (Cornell University)|Nov 12, 2015

Advanced Bandit Algorithms Research参考文献 25被引用 35

一句话总结

本文提出了一种新颖的算法，用于在随机多臂赌博机中实现最优臂识别，显著降低了样本复杂度，其核心是基于对 Sign-ξ 问题的新型下界分析。通过从 Sign-ξ 问题的约化，该文建立了首个超越 Mannor-Tsitsiklis 下界的实例自适应最优下界，同时提出了关于最优样本复杂度的猜想。

ABSTRACT

We study the best arm identification (BEST-1-ARM) problem, which is defined as follows. We are given $n$ stochastic bandit arms. The $i$th arm has a reward distribution $D_i$ with an unknown mean $μ_{i}$. Upon each play of the $i$th arm, we can get a reward, sampled i.i.d. from $D_i$. We would like to identify the arm with the largest mean with probability at least $1-δ$, using as few samples as possible. We provide a nontrivial algorithm for BEST-1-ARM, which improves upon several prior upper bounds on the same problem. We also study an important special case where there are only two arms, which we call the sign problem. We provide a new lower bound of sign, simplifying and significantly extending a classical result by Farrell in 1964, with a completely new proof. Using the new lower bound for sign, we obtain the first lower bound for BEST-1-ARM that goes beyond the classic Mannor-Tsitsiklis lower bound, by an interesting reduction from Sign to BEST-1-ARM. We propose an interesting conjecture concerning the optimal sample complexity of BEST-1-ARM from the perspective of instance-wise optimality.

研究动机与目标

改进随机多臂赌博机中 Best-1-Arm 问题的样本复杂度上界。
通过基于重对数律的新证明技术，为一个基础的两臂检验问题 Sign-ξ 建立更紧致的下界。
通过将 Best-1-Arm 问题约化为 Sign-ξ 问题，推导出新的实例自适应下界，超越经典的 Mannor-Tsitsiklis 下界。
从实例自适应最优性的视角，提出 Best-1-Arm 问题最优样本复杂度的猜想。
通过新颖的理论分析，统一并拓展纯探索赌博机领域的先前成果，建立 Sign-ξ 与 Best-1-Arm 问题之间的联系。

提出的方法

引入一种基于模拟的算法，记为 SIM(𝒜_i, r_i)，该算法通过递减的置信度水平 δ/2^i 和递增的采样速率 r_i = 2^i，运行多个基算法 𝒜 的实例。
采用轮转式模拟策略：在第 r 轮，所有满足 r_i 整除 r 的算法 𝒜_i 按索引递增顺序被模拟。
为每个被模拟的算法分配独立的采样流，以确保统计独立性并保证模拟的正确性。
将概率空间划分为事件 ℱ_i，其中 𝒜_i 是首个终止且成功的算法，从而实现对期望运行时间的分析。
利用性质：对于合理的运行时间上界 T，有 T(δ/2^i, I) ≤ T(δ, I) · (ln δ^{-1} + i ln 2)/ln δ^{-1}，以界定期望模拟时间。
推导出模拟算法的紧致期望运行时间上界为 O(T(δ, I))，证明其在保持 δ-正确性的同时，为期望 O(T) 时间复杂度。

实验结果

研究问题

RQ1在实例自适应最优性下，Best-1-Arm 问题的最优样本复杂度是多少？
RQ2能否推导出 Sign-ξ 问题的更紧致下界，使其优于经典的 ∆^{-2} 绑定，并捕捉对数-对数校正项？
RQ3如何将 Sign-ξ 问题作为构建模块，用于推导 Best-1-Arm 问题的新下界？
RQ4重对数律在建立序列检验问题的非渐近下界中起到何种作用？
RQ5KKS 绑定（O(∑Δ_i^{-2}(ln ln Δ_i^{-1} + ln δ^{-1}))) 是否为 Best-1-Arm 的实例自适应最优？若是，其成立条件为何？

主要发现

本文为 Sign-ξ 问题建立了新的下界，相较于经典的 ∆^{-2} 绑定，引入了 ln ln ∆^{-1} 因子，其证明基于重对数律的全新方法。
证明了任意 δ-正确算法在 Sign-ξ 问题上的期望样本复杂度满足 lim sup_{Δ→0} T_A[Δ]/(Δ^{-2} ln ln Δ^{-1}) > 0，确认了 ln ln Δ^{-1} 因子的必要性。
通过将 Sign-ξ 约化为 Best-1-Arm 问题，本文首次推导出超越 Mannor-Tsitsiklis 绑定的 Best-1-Arm 问题下界，表明 Δ_{[2]}^{-2} ln ln Δ_{[2]}^{-1} 是样本复杂度的必要组成部分。
所提出的算法实现了 O(∑_{i=2}^n Δ_{[i]}^{-2}(ln ln Δ_{[i]}^{-1} + ln δ^{-1})) 的样本复杂度，与目前已知的最佳上界（KKS 绑定）一致，表明其近乎最优。
本文提出一种基于模拟的变换方法，可将任意弱期望 T 时间复杂度的 δ-正确算法，转化为期望 O(T) 时间复杂度的 δ-正确算法，从而实现高效鲁棒算法的构造。
提出一个猜想：Best-1-Arm 问题的最优样本复杂度为 Ω(∑_{i=2}^n Δ_{[i]}^{-2} (ln ln Δ_{[i]}^{-1} + ln δ^{-1}))，若成立，将确立实例自适应最优性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。