QUICK REVIEW

[论文解读] Thompson Sampling and Approximate Inference

My V. T. Phan, Yasin Abbasi Yadkori|arXiv (Cornell University)|Jan 1, 2019

Advanced Bandit Algorithms Research被引用 15

一句话总结

本文研究了近似推理对 $k$-臂赌博机问题中 Thompson 采样性能的影响，表明即使在 $\alpha > 0$ 时 $\alpha$-散度存在微小误差，也会因探索不足而导致线性遗憾。对于 $\alpha \leq 0$ 的情况，即使存在较大的推理误差，通过添加最小量的强制探索，也能缓解性能下降。

ABSTRACT

We study the effects of approximate inference on the performance of Thompson sampling in the $k$-armed bandit problems. Thompson sampling is a successful algorithm for online decision-making but requires posterior inference, which often must be approximated in practice. We show that even small constant inference error (in $\alpha$-divergence) can lead to poor performance (linear regret) due to under-exploration (for $\alpha 0$) by the approximation. While for $\alpha > 0$ this is unavoidable, for $\alpha \leq 0$ the regret can be improved by adding a small amount of forced exploration even when the inference error is a large constant.

研究动机与目标

分析近似后验推理对 $k$-臂赌博机问题中 Thompson 采样性能的影响。
识别推理误差导致性能下降（尤其是线性遗憾）的条件。
探讨当推理近似不准确时，强制探索是否能改善遗憾表现。
阐明 $\alpha$-散度在量化推理误差及其对探索影响方面的作用。

提出的方法

本文使用 $\alpha$-散度衡量推理误差，对近似后验的 Thompson 采样进行建模。
分析不同 $\alpha$ 值下的遗憾行为，区分 $\alpha > 0$ 与 $\alpha \leq 0$ 的情况。
对于 $\alpha \leq 0$ 的情况，引入少量强制探索以缓解近似误差导致的探索不足。
通过理论分析推导出在恒定推理误差下遗憾仍保持次线性的条件。
利用信息论工具，将 $\alpha$-散度与探索效率关联起来。

实验结果

研究问题

RQ1在 $\alpha$-散度中存在恒定误差时，对 $k$-臂赌博机问题中 Thompson 采样的遗憾有何影响？
RQ2为何当 $\alpha > 0$ 且推理近似时，Thompson 采样会出现探索不足？
RQ3当推理误差较大且 $\alpha \leq 0$ 时，强制探索是否能恢复次线性遗憾？
RQ4$\alpha$-散度在决定 Thompson 采样对推理近似鲁棒性方面起到什么作用？

主要发现

当 $\alpha > 0$ 时，$\alpha$-散度中即使存在微小的恒定推理误差，也会因持续的探索不足导致线性遗憾。
当 $\alpha \leq 0$ 时，若添加少量强制探索，即使存在较大的恒定推理误差，遗憾仍可保持次线性。
$\alpha > 0$ 时的性能下降根本上与 $\alpha$-散度误差的方向有关，该方向会使后验偏向于探索不足。
对于 $\alpha \leq 0$ 的情况，$\alpha$-散度的结构允许通过强制探索来抵消不准确推理的影响。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。