QUICK REVIEW

[论文解读] Ensemble Sampling

Xiuyuan Lu, Benjamin Van Roy|arXiv (Cornell University)|May 20, 2017

Anomaly Detection Techniques and Applications被引用 25

一句话总结

本文提出集成采样（ensemble sampling），这是一种对 Thompson 采样（Thompson sampling）的可处理近似方法，使其能够应用于神经网络等复杂模型。通过使用一组模型来近似后验分布，该方法在保持 Thompson 采样理论优势的同时，实现了对高维、非线性模型的高效扩展。

ABSTRACT

Thompson sampling has emerged as an effective heuristic for a broad range of online decision problems. In its basic form, the algorithm requires computing and sampling from a posterior distribution over models, which is tractable only for simple special cases. This paper develops ensemble sampling, which aims to approximate Thompson sampling while maintaining tractability even in the face of complex models such as neural networks. Ensemble sampling dramatically expands on the range of applications for which Thompson sampling is viable. We establish a theoretical basis that supports the approach and present computational results that offer further insight.

研究动机与目标

解决在神经网络等复杂模型中精确 Thompson 采样计算不可行的问题。
开发一种可扩展的近似方法，同时保留 Thompson 采样的理论优势。
使 Thompson 采样在涉及高维、非线性模型的实际在线决策问题中具有可应用性。
为集成采样作为 Thompson 采样有效近似的使用建立理论基础。

提出的方法

使用一组模型来近似模型参数的后验分布。
从该组模型的经验分布中采样，以模拟 Thompson 采样。
利用该组模型估计不确定性，并在在线决策任务中引导探索。
将该方法应用于序列决策问题，如上下文Bandit（contextual bandits）和强化学习。
理论分析表明，在较弱正则性条件下，该集成近似会收敛到真实后验分布。
通过避免在复杂模型上进行完整的贝叶斯推断，实现计算效率。

实验结果

研究问题

RQ1集成采样能否为神经网络等复杂模型提供一种可处理的替代方案，以替代精确的 Thompson 采样？
RQ2在实践中，集成采样在多大程度上近似了精确 Thompson 采样的性能？
RQ3可以为该集成近似方法建立哪些理论保证？
RQ4集成采样在高维和非线性模型空间中的可扩展性如何？
RQ5集成采样在在线决策任务中的实际性能表现如何？

主要发现

集成采样使得 Thompson 采样在精确推断不可行的复杂模型（如神经网络）中得以有效应用。
在基准在线决策问题中，该方法实现了与精确 Thompson 采样相近的性能。
理论分析支持在标准正则性条件下，该集成近似具有有效性。
计算结果表明其在高维场景下具备良好的可扩展性和实际应用价值。
集成采样保持了强探索-利用平衡，这对在线学习至关重要。
该方法在各种上下文Bandit和强化学习任务中表现出鲁棒性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。