QUICK REVIEW

[论文解读] Analysis of Thompson Sampling for Gaussian Process Optimization in the Bandit Setting

Kinjal Basu, Souvik Ghosh|arXiv (Cornell University)|May 18, 2017

Advanced Bandit Algorithms Research被引用 2

一句话总结

本文在函数评估昂贵且存在噪声的连续老虎机设置下，分析了高斯过程优化中的汤普森采样。在常规条件下，建立了所选点向全局最优值收敛的指数速率，提供了关于收敛速度的全新无遗憾分析。

ABSTRACT

We consider the global optimization of a function over a continuous domain. At every evaluation attempt, we can observe the function at a chosen point in the domain and we reap the reward of the value observed. We assume that drawing these observations are expensive and noisy. We frame it as a continuum-armed bandit problem with a Gaussian Process prior on the function. In this regime, most algorithms have been developed to minimize some form of regret. Contrary to this popular norm, in this paper, we study the convergence of the sequential point $\boldsymbol{x}^t$ to the global optimizer $\boldsymbol{x}^*$ for the Thompson Sampling approach. Under some assumptions and regularity conditions, we show an exponential rate of convergence to the true optimal.

研究动机与目标

研究汤普森采样在高斯过程老虎机优化中的收敛行为，而非聚焦于遗憾最小化。
分析所选点序列在连续域中接近全局优化器的速度。
在目标函数的正则性和光滑性假设下，建立理论收敛速率。
提供一种非基于遗憾的优化性能分析，强调收敛速度。

提出的方法

将优化问题表述为具有未知函数高斯过程先验的连续臂老虎机问题。
利用汤普森采样根据高斯过程模型的后验样本，依次选择域中的点。
应用如利普希茨连续性和函数光滑性等正则性条件，以确保收敛。
使用贝叶斯推断，在每次获得噪声观测后更新高斯过程后验分布。
分析在时间 t 所选点位于全局最优值给定距离内的概率。
利用高斯过程后验方差和采样机制的性质，推导收敛速率的理论界。

实验结果

研究问题

RQ1在连续、噪声大且函数评估昂贵的设置下，汤普森采样收敛到全局优化器的速度有多快？
RQ2是否可以在不依赖遗憾最小化作为主要性能指标的前提下建立收敛性？
RQ3为确保指数收敛，对函数和核函数需要哪些正则性条件？
RQ4高斯过程模型的后验方差如何影响所选点的收敛速率？

主要发现

在温和的正则性条件下，汤普森采样可实现向全局优化器的指数收敛速率。
收敛速率独立于遗憾最小化目标，为优化性能提供了全新的理论视角。
所选点位于全局最优值固定距离内的概率随迭代次数呈指数衰减。
该分析依赖于高斯过程后验方差的衰减，以及采样机制高效探索的能力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。