QUICK REVIEW

[论文解读] Tracking Slowly Moving Clairvoyant: Optimal Dynamic Regret of Online Learning with True and Noisy Gradient

Tianbao Yang, Lijun Zhang|arXiv (Cornell University)|May 15, 2016

Advanced Bandit Algorithms Research参考文献 12被引用 71

一句话总结

该论文通过引入路径变差（path variation）作为最优解随时间缓慢变化的度量，建立了在真实梯度反馈和噪声梯度反馈下在线凸优化的最优动态遗憾边界。论文提出了达到极小极大最优遗憾边界的算法，包括在两点bandit反馈下对光滑损失函数的情形，证明了在有利条件下bandit反馈可达到与完整信息反馈相当的性能。

ABSTRACT

This work focuses on dynamic regret of online convex optimization that compares the performance of online learning to a clairvoyant who knows the sequence of loss functions in advance and hence selects the minimizer of the loss function at each step. By assuming that the clairvoyant moves slowly (i.e., the minimizers change slowly), we present several improved variation-based upper bounds of the dynamic regret under the true and noisy gradient feedback, which are {\\it optimal} in light of the presented lower bounds. The key to our analysis is to explore a regularity metric that measures the temporal changes in the clairvoyant's minimizers, to which we refer as {\\it path variation}. Firstly, we present a general lower bound in terms of the path variation, and then show that under full information or gradient feedback we are able to achieve an optimal dynamic regret. Secondly, we present a lower bound with noisy gradient feedback and then show that we can achieve optimal dynamic regrets under a stochastic gradient feedback and two-point bandit feedback. Moreover, for a sequence of smooth loss functions that admit a small variation in the gradients, our dynamic regret under the two-point bandit feedback matches what is achieved with full information.

研究动机与目标

为最优决策随时间缓慢变化的在线学习问题开发更紧致的动态遗憾边界。
分析梯度反馈质量（真实、噪声或bandit）对动态遗憾性能的影响。
通过引入路径变差作为关键正则性度量，建立极小极大最优遗憾边界。
在各种反馈模型下，弥合现有上界与理论下界之间的差距。
证明两点bandit反馈在光滑损失函数下可实现与完整信息反馈相当的性能。

提出的方法

引入路径变差 $V^{p}_{T}$ 作为最优解序列 $\mathbf{w}_{t}^{*}$ 时间变化的度量。
推导出仅依赖于 $V^{p}_{T}$ 的动态遗憾的一般下界，确立理论极限。
提出一种带自适应步长的改进在线梯度下降（OGD）算法，用于真实梯度反馈，对光滑函数实现 $O(V^{p}_{T})$ 遗憾。
基于META算法（Chiang et al., 2013）设计一种两点bandit反馈算法，利用方向扰动估计梯度。
在噪声反馈设置中，采用有界范数 $\|\hat{\mathbf{g}}_{t}\|_{2} \leq Gd$ 的随机梯度估计器，以控制方差。
通过投影到收缩的可行集 $\Pi_{(1-\xi)\Omega}$ 来保持稳定性并提升收敛性。

实验结果

研究问题

RQ1在真实梯度反馈下，路径变差 $V^{p}_{T}$ 的最优动态遗憾边界是什么？
RQ2噪声梯度反馈如何影响动态遗憾？能否使其达到最优？
RQ3两点bandit反馈能否实现与完整信息反馈相当的遗憾性能？
RQ4对于梯度变化较小的光滑损失函数，其动态遗憾边界是什么？
RQ5所提出的上界相对于推导出的下界是否紧致？

主要发现

论文建立了仅依赖于路径变差 $V^{p}_{T}$ 的动态遗憾一般下界，表明在无额外假设下 $O(V^{p}_{T})$ 是可能的最佳边界。
对于在可行域内梯度趋于零的光滑损失函数，所提算法在真实梯度反馈下实现 $O(V^{p}_{T})$ 动态遗憾，与下界一致，因此为最优。
在两点bandit反馈下，动态遗憾被限制在 $O(\max(\sqrt{V^{p}_{T}V^{g}_{T}}, V^{p}_{T}))$，当 $V^{g}_{T}$ 较小时与下界一致，证明了其最优性。
对于利普希茨连续的损失函数，bandit反馈算法实现 $O(\sqrt{V^{p}_{T}T})$ 遗憾，其阶与随机梯度反馈性能一致。
在bandit反馈下，光滑函数的动态遗憾边界为 $O(\max(d^{2}\sqrt{S_{T}\max(B_{T},1)}, d^{3/2}\max(B_{T},1)})$，当 $V^{p}_{T}$ 占主导时与下界一致。
结果表明，对于光滑函数，在路径变差正则性下，两点bandit反馈可实现与完整信息反馈相同阶的遗憾，证明bandit反馈在该条件下并非本质上更差。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。