QUICK REVIEW

[论文解读] On the Complexity of Best Arm Identification in Multi-Armed Bandit Models

Emilie Kaufmann, Olivier Cappé|arXiv (Cornell University)|Jul 16, 2014

Advanced Bandit Algorithms Research参考文献 38被引用 704

一句话总结

本文在固定置信度和固定预算两种设定下，首次建立了多臂赌博机中最佳臂识别的分布依赖性下界，引入了信息论复杂度度量。研究证明，固定预算复杂度可能小于固定置信度复杂度——这与全假设检验中的经典行为相矛盾——同时提出了匹配算法和改进的序列停止规则，确保误差控制。

ABSTRACT

The stochastic multi-armed bandit model is a simple abstraction that has proven useful in many different contexts in statistics and machine learning. Whereas the achievable limit in terms of regret minimization is now well known, our aim is to contribute to a better understanding of the performance in terms of identifying the m best arms. We introduce generic notions of complexity for the two dominant frameworks considered in the literature: fixed-budget and fixed-confidence settings. In the fixed-confidence setting, we provide the first known distribution-dependent lower bound on the complexity that involves information-theoretic quantities and holds when m is larger than 1 under general assumptions. In the specific case of two armed-bandits, we derive refined lower bounds in both the fixed-confidence and fixed-budget settings, along with matching algorithms for Gaussian and Bernoulli bandit models. These results show in particular that the complexity of the fixed-budget setting may be smaller than the complexity of the fixed-confidence setting, contradicting the familiar behavior observed when testing fully specified alternatives. In addition, we also provide improved sequential stopping rules that have guaranteed error probabilities and shorter average running times. The proofs rely on two technical results that are of independent interest : a deviation lemma for self-normalized sums (Lemma 19) and a novel change of measure inequality for bandit models (Lemma 1).

研究动机与目标

形式化并比较在固定置信度和固定预算两种标准设定下，随机多臂赌博机模型中最佳臂识别的样本复杂度。
在识别前 m 个最佳臂的情况下，利用信息论散度，推导出固定置信度设定下复杂度的首个分布依赖性下界。
表明固定预算复杂度可严格小于固定置信度复杂度，挑战了全备择检验中的经典直觉。
设计匹配算法和改进的序列停止规则，确保误差概率并最小化期望运行时间。
提出两项新颖的技术工具：用于自归一化和的偏差引理，以及用于赌博机模型的测度变换不等式，二者均具有独立研究价值。

提出的方法

引入两种复杂度度量：$\kappa_C(\nu)$ 表示固定置信度设定，$\kappa_B(\nu)$ 表示固定预算设定，分别基于渐近样本复杂度和失败概率衰减速率定义。
利用信息论散度，推导出 $\kappa_C(\nu)$ 的一般下界，该下界在温和假设下对 $m \geq 1$ 成立。
将该下界应用于两臂赌博机，推导出两种设定下的细化下界，并为高斯和伯努利模型构造了匹配算法。
提出一种新颖的测度变换不等式（引理1），用于比较不同赌博机模型下的似然，从而实现紧致下界。
开发出自归一化和的偏差引理（引理7），用于控制序列分析中的尾部概率。
设计改进的序列停止规则，确保失败概率不超过 $\delta$，并使期望停止时间尽可能接近最优。

实验结果

研究问题

RQ1在固定置信度和固定预算设定下，随机多臂赌博机中识别前 m 个最佳臂的根本样本复杂度极限是什么？
RQ2固定预算复杂度是否可能小于固定置信度复杂度？如果是，其条件是什么？
RQ3当 $m \geq 1$ 时，最佳臂识别的样本复杂度的最紧致分布依赖性下界是什么？
RQ4如何设计序列停止规则，以确保失败概率 $\delta$ 的同时最小化期望运行时间？
RQ5在该设定下，实现紧致下界推导的关键技术工具是什么？

主要发现

本文首次建立了当 $m \geq 1$ 时，$\kappa_C(\nu)$ 的分布依赖性下界，其表达形式基于信息论散度。
对于两臂赌博机，固定预算复杂度 $\kappa_B(\nu)$ 可严格小于固定置信度复杂度 $\kappa_C(\nu)$，这与全备择检验中的经典行为相矛盾。
为高斯和伯努利赌博机构造了匹配算法，其在两种设定下均达到了推导出的下界，证实了边界的紧致性。
提出了改进的序列停止规则，确保失败概率不超过 $\delta$，且期望运行时间在最优值的常数倍之内。
通过匹配算法证明了所推导下界的紧致性，从而确立了两种设定下样本复杂度的精确渐近值。
提出了两项新颖的技术工具——引理7（自归一化和的偏差引理）和引理1（测度变换不等式），并证明其在赌博机理论中具有独立研究价值。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。