QUICK REVIEW

[论文解读] Thompson sampling with the online bootstrap

Dean Eckles, Maurits Kaptein|arXiv (Cornell University)|Oct 15, 2014

Advanced Bandit Algorithms Research参考文献 27被引用 28

一句话总结

本文提出Bootstrap Thompson Sampling（BTS），一种计算高效的Thompson Sampling替代方法，通过在线重加权（如双倍返还自助法）用自助分布替代后验分布。BTS在伯努利分布和高斯分布Bandit问题中表现优异，具备更强的可扩展性和对模型误设的鲁棒性，尤其在异方差误差下表现更优。

ABSTRACT

Thompson sampling provides a solution to bandit problems in which new observations are allocated to arms with the posterior probability that an arm is optimal. While sometimes easy to implement and asymptotically optimal, Thompson sampling can be computationally demanding in large scale bandit problems, and its performance is dependent on the model fit to the observed data. We introduce bootstrap Thompson sampling (BTS), a heuristic method for solving bandit problems which modifies Thompson sampling by replacing the posterior distribution used in Thompson sampling by a bootstrap distribution. We first explain BTS and show that the performance of BTS is competitive to Thompson sampling in the well-studied Bernoulli bandit case. Subsequently, we detail why BTS using the online bootstrap is more scalable than regular Thompson sampling, and we show through simulation that BTS is more robust to a misspecified error distribution. BTS is an appealing modification of Thompson sampling, especially when samples from the posterior are otherwise not available or are costly.

研究动机与目标

解决由于MCMC后验抽样计算成本过高，导致Thompson Sampling在大规模Bandit问题中计算不可行的问题。
提升Thompson Sampling对模型误设的鲁棒性，特别是在非独立同分布或异方差误差结构下。
开发一种可扩展的在线替代方法，避免完整后验计算，通过自助重采样实现。
在全量数据重新处理不切实际的流式或高吞吐量数据场景中，实现并行化和实时更新。
证明BTS在减少对参数假设依赖和复杂后验抽样的同时，仍能保持优异性能。

提出的方法

在Thompson Sampling中，用通过重加权而非重采样获得的点估计θ̂的自助分布替代贝叶斯后验P(θ|D)。
使用双倍返还自助法（DoNB），其中每个观测值以相等概率被赋予0或2（或0或1）的权重，支持在线更新。
对每个自助重采样副本j，使用当前数据和权重计算加权估计θ̂j，并从这些θ̂j的经验分布中抽样以选择动作。
实现在线化：当新观测到达时，以1/2的概率更新每个自助副本，避免每次完整重新计算。
利用自助分布确定每种动作为最优的概率，类似于Thompson Sampling的探索-利用权衡。
通过将自助副本分发到多个机器或核心，实现并行化，支持在真实系统中高吞吐量部署。

实验结果

研究问题

RQ1基于自助法的抽样是否足够逼近Thompson Sampling中的后验分布，从而在Bandit问题中保持竞争性表现？
RQ2在模型误设下，特别是存在异方差误差时，BTS与Thompson Sampling在累积遗憾方面的表现如何比较？
RQ3BTS在大规模数据集上的可扩展性如何？是否能以在线流式方式高效更新？
RQ4自助重采样副本数J对BTS的探索-利用平衡及整体性能的影响有多大？
RQ5BTS能否在不牺牲性能或一致性的情况下实现分布式或并行化部署？

主要发现

在设定正确的伯努利Bandit场景下，BTS实现的累积遗憾与Thompson Sampling相当，尤其在自助副本数足够多（J=1000）时表现更优。
在异方差误差分布下，BTS显著优于Thompson Sampling，且随着异方差程度（γ）增加，遗憾差距进一步扩大。
BTS的性能对自助副本数量敏感：副本过少会导致过度利用，遗憾增加。
BTS计算可扩展，因为每个自助副本可独立且在线更新，避免了每一步都重新计算完整后验。
该方法易于并行化，适用于大规模实时应用，如在线广告或A/B测试平台。
BTS对模型误设表现出鲁棒性，尤其当假设的似然函数（如高斯分布）与真实数据生成过程不匹配时。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。