QUICK REVIEW

[论文解读] The Many Faces of Exponential Weights in Online Learning

Dirk van der Hoeven, Tim van Erven|arXiv (Cornell University)|Feb 21, 2018

Advanced Bandit Algorithms Research参考文献 27被引用 53

一句话总结

该论文分析在线学习中的高斯先验的指数权重，并证明带学习率的线性化损失下的懒惰和贪婪指数权重产生的高斯后验具有与先验相同的协方差。

ABSTRACT

A standard introduction to online learning might place Online Gradient Descent at its center and then proceed to develop generalizations and extensions like Online Mirror Descent and second-order methods. Here we explore the alternative approach of putting Exponential Weights (EW) first. We show that many standard methods and their regret bounds then follow as a special case by plugging in suitable surrogate losses and playing the EW posterior mean. For instance, we easily recover Online Gradient Descent by using EW with a Gaussian prior on linearized losses, and, more generally, all instances of Online Mirror Descent based on regular Bregman divergences also correspond to EW with a prior that depends on the mirror map. Furthermore, appropriate quadratic surrogate losses naturally give rise to Online Gradient Descent for strongly convex losses and to Online Newton Step. We further interpret several recent adaptive methods (iProd, Squint, and a variation of Coin Betting for experts) as a series of closely related reductions to exp-concave surrogate losses that are then handled by Exponential Weights. Finally, a benefit of our EW interpretation is that it opens up the possibility of sampling from the EW posterior distribution instead of playing the mean. As already observed by Bubeck and Eldan, this recovers the best-known rate in Online Bandit Linear Optimization.

研究动机与目标

理解指数权重在在线学习中如何与高斯先验相互作用。
表征在线性化损失后，由懒惰和贪婪 EW 得到的分布。
表明得到的后验是否保留先验协方差。
比较不同 EW 变体在后验均值及行为方面的差异。

提出的方法

采用高斯先验 P1(w) = N(w1, σ^2 I).
对线性化损失应用带学习率 η_t 的懒惰和贪婪指数权重。
推导得到的后验分布 Pt，并表明它们是具有协方差 σ^2 I 的高斯分布。
建立后验均值与 EW 更新规则（w_t 或 ~w_t）的关系。
指出在所分析的 EW 变体中，协方差与先验协方差匹配。

实验结果

研究问题

RQ1在在线学习设置中，懒惰和贪婪指数权重如何与高斯先验相互作用？
RQ2当损失线性化时，得到的后验分布是什么？
RQ3在这些 EW 方案下，后验协方差是否仍然等于先验协方差？
RQ4在本框架中，EW 更新规则与后验均值之间的关系是什么？

主要发现

懒惰 EW 和带线性化损失的贪婪 EW 产生的后验 Pt 为高斯分布。
得到的高斯分布具有与高斯先验相同的协方差 σ^2 I。
后验均值可以对应 ~w_t 或 w_t，具体取决于变体。
这强调某些 EW 方案在更新均值的同时保持先验协方差。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。