QUICK REVIEW

[论文解读] Sample Complexity of Incentivized Exploration.

Mark Sellke, Aleksandrs Slivkins|arXiv (Cornell University)|Feb 3, 2020

Advanced Bandit Algorithms Research被引用 3

一句话总结

本文研究了在多臂赌博机中激励性探索的问题，其中参与者是自利的，仅遵循算法推荐。研究发现，当使用足够多的数据进行初始化时，Thompson采样变为激励相容的，并提供了实现此目标所需样本复杂度的多项式上下界，解决了与臂的数量K和贝叶斯先验的关键依赖关系。

ABSTRACT

We consider incentivized exploration: a version of multi-armed bandits where the choice of actions is controlled by self-interested agents, and the algorithm can only issue recommendations. The algorithm controls the flow of information, and the information asymmetry can incentivize the agents to explore. Prior work matches the optimal regret rates for bandits up to constant multiplicative factors determined by the Bayesian prior. However, the dependence on the prior in prior work could be arbitrarily large, and the dependence on the number of arms K could be exponential. The optimal dependence on the prior and K is very unclear. We make progress on these issues. Our first result is that Thompson sampling is incentive-compatible if initialized with enough data points. Thus, we reduce the problem of designing incentive-compatible algorithms to that of sample complexity: (i) How many data points are needed to incentivize Thompson sampling? (ii) How many rounds does it take to collect these samples? We address both questions, providing upper bounds on sample complexity that are typically polynomial in K and lower bounds that are polynomially matching.

研究动机与目标

为解决先前关于激励性探索研究中对臂的数量K和贝叶斯先验依赖关系不明确的问题。
确定使Thompson采样实现激励相容所需的最少数据点数量。
分析收集这些初始数据点所需轮数。
在激励性探索中提供紧致的、多项式形式的上下界样本复杂度。

提出的方法

通过证明当正确初始化时Thompson采样是激励相容的，将设计激励相容算法的问题简化为样本复杂度问题。
分析Thompson采样在初始数据规模下的激励相容性维持条件。
推导出确保激励相容性所需的数据点数量的上界，表明其与K呈多项式依赖。
建立样本复杂度的匹配下界，证明边界在常数因子范围内是紧致的。
使用信息论和博弈论分析，刻画通过信息不对称诱导探索所需的最小数据量。

实验结果

研究问题

RQ1在多臂赌博机中，使Thompson采样实现激励相容所需的最少初始数据点数量是多少？
RQ2所需样本复杂度如何随臂的数量K变化？
RQ3样本复杂度如何依赖于贝叶斯先验，这种依赖关系能否被界定？
RQ4收集必要初始数据点所需的最少轮数是多少？
RQ5样本复杂度的上下界是否为多项式且紧致？

主要发现

当使用足够多的数据点进行初始化时，Thompson采样是激励相容的。
实现激励相容所需的初始数据点数量与臂的数量K呈多项式规模增长。
样本复杂度的上界在K上为多项式，下界与该量级一致，仅相差常数因子。
对贝叶斯先验的依赖关系受到界定，不会无限制增长，从而解决了先前研究的关键局限性。
收集必要数据所需轮数也由K的多项式边界所限制。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。