QUICK REVIEW

[论文解读] A Second-order Bound with Excess Losses

Pierre Gaillard, Gilles Stoltz|arXiv (Cornell University)|Feb 10, 2014

Advanced Bandit Algorithms Research参考文献 24被引用 52

一句话总结

本文提出了一种针对在线学习中过剩损失的二阶界，通过在权重更新规则中引入平方损失，改进了遗憾分析。它利用归纳法推导出累积权重对数的下界，表明遗憾受瞬时遗憾与方差项的组合控制，从而在对抗性环境中实现更紧致的性能保证。

ABSTRACT

We study online aggregation of the predictions of experts, and first show new second-order regret bounds in the standard setting, which are obtained via a version of the Prod algorithm (and also a version of the polynomially weighted average algorithm) with multiple learning rates. These bounds are in terms of excess losses, the differences between the instantaneous losses suffered by the algorithm and the ones of a given expert. We then demonstrate the interest of these bounds in the context of experts that report their confidences as a number in the interval [0,1] using a generic reduction to the standard setting. We conclude by two other applications in the standard setting, which improve the known bounds in case of small excess losses and show a bounded regret against i.i.d. sequences of losses.

研究动机与目标

通过在分析中引入过剩损失和平方损失，为在线学习算法开发更紧致的遗憾界。
通过引入考虑损失方差的二阶项，扩展标准遗憾分析。
通过累积权重对数的改进下界，提升对抗环境中的性能保证。
通过引入依赖于瞬时遗憾平方的项，推广权重更新规则。

提出的方法

该方法通过逐步分析权重更新规则，利用归纳法推导出累积权重对数 ln W_T 的下界。
它将瞬时遗憾定义为 r_{k,s} = ℓ̂_s - ℓ_{k,s}，即时间 s 时学习者损失与专家损失的差值。
该界结合了时变学习率 η_{k,t} 和涉及 η_{k,s-1} r_{k,s}^2 的校正项，以考虑二阶效应。
归纳步骤依赖于算法的权重更新规则，该规则根据累积遗憾和平方损失调整专家权重。
该分析将对数权重增长与随时间加权的遗憾和平方损失之和联系起来。

实验结果

研究问题

RQ1能否推导出一种考虑在线学习中损失方差的二阶遗憾界？
RQ2在权重更新中引入平方损失如何改善遗憾保证？
RQ3时变学习率在控制对数权重增长中起什么作用？
RQ4能否建立一个包含遗憾线性项和二次项的 ln W_T 的归纳下界？

主要发现

通过归纳法建立了 ln w_{k,t} 的下界，表明其增长速度至少与加权遗憾和方差项之和一样快。
该界包含一个涉及 η_{k,t}/η_{k,0} 的校正因子，用于缩放初始权重，从而保持先验信念的影响。
该分析表明，二阶项 η_{k,s-1} r_{k,s}^2 有助于控制累积权重的增长，从而实现更紧致的遗憾控制。
该结果通过引入二阶效应，推广了标准的一阶遗憾界，在对抗性环境中提升了性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。