QUICK REVIEW

[论文解读] Minimax Policies for Combinatorial Prediction Games

Jean-Yves Audibert, Sébastien Bubeck|arXiv (Cornell University)|May 24, 2011

Advanced Bandit Algorithms Research参考文献 19被引用 41

一句话总结

该论文在完整信息、半-bandit 和 bandit 反馈设置下，为组合预测博弈建立了紧致的极小极大后悔边界，损失约束为 $L_∞$ 和 $L_2$。提出了一种统一的基于势函数的梯度下降方法，结合 Bregman 投影，恢复了先前的结果，并证明了首个紧致的后悔边界——表明指数加权平均预测器在 $L_\infty$ 对手面前表现次优。

ABSTRACT

We address the online linear optimization problem when the actions of the forecaster are represented by binary vectors. Our goal is to understand the magnitude of the minimax regret for the worst possible set of actions. We study the problem under three different assumptions for the feedback: full information, and the partial information models of the so-called "semi-bandit", and "bandit" problems. We consider both $L_\infty$-, and $L_2$-type of restrictions for the losses assigned by the adversary. We formulate a general strategy using Bregman projections on top of a potential-based gradient descent, which generalizes the ones studied in the series of papers Gyorgy et al. (2007), Dani et al. (2008), Abernethy et al. (2008), Cesa-Bianchi and Lugosi (2009), Helmbold and Warmuth (2009), Koolen et al. (2010), Uchiya et al. (2010), Kale et al. (2010) and Audibert and Bubeck (2010). We provide simple proofs that recover most of the previous results. We propose new upper bounds for the semi-bandit game. Moreover we derive lower bounds for all three feedback assumptions. With the only exception of the bandit game, the upper and lower bounds are tight, up to a constant factor. Finally, we answer a question asked by Koolen et al. (2010) by showing that the exponentially weighted average forecaster is suboptimal against $L_{\infty}$ adversaries.

研究动机与目标

刻画动作是二值向量、损失为线性聚合的组合预测博弈的极小极大后悔边界。
分析三种反馈模型（完整信息、半-bandit、bandit）下的后悔，损失约束为 $L_\infty$ 和 $L_2$。
确定最坏情况动作集 $\mathcal{S} \subset \{0,1\}^d$ 下极小极大后悔的最优阶量级。
解决关于指数加权平均预测器在 $L_\infty$ 对手面前是否最优的开放问题。
通过 Bregman 投影和基于势函数的梯度下降，统一并推广现有的在线线性优化策略。

提出的方法

提出一种基于势函数梯度下降与 Bregman 投影的通用策略，统一了 Gy€orgy 等人（2007）、Dani 等人（2008）及其他研究者的先前算法。
使用 Bregman 散度将更新投影到单纯形上，实现在不同反馈模型下的高效后悔分析。
应用 Pinsker 不等式与 Kullback-Leibler 散度链式法则，推导出后悔的信息论下界。
通过在 $d/2$ 对专家上构造 $\alpha$-对手，构建一个困难对手，每个专家的损失服从参数为 $1/2$ 和 $1/2+\varepsilon$ 的伯努利分布。
利用链式法则计算 $(-i,\alpha)$-对手与 $\alpha$-对手分布之间的 KL 散度，并通过引理 24 进行有界，得到 $\mathrm{KL} \leq \frac{16\varepsilon^2}{d} \mathbb{E}[\sum \mathbbm{1}_{I_{i,t}=\alpha_i}]$。
通过在所有 $\alpha \in \{1,2\}^{d/2}$ 上取平均，并对 KL 项应用平方根的凹性，推导出下界。

实验结果

研究问题

RQ1在完整信息、半-bandit 和 bandit 反馈设置下，$L_\infty$ 和 $L_2$ 损失约束的组合预测博弈的极小极大后悔边界是什么？
RQ2在组合预测博弈中，指数加权平均预测器在 $L_\infty$ 对手面前是否最优？
RQ3三种反馈模型下，后悔的上下界如何比较？它们是否紧致？
RQ4基于势函数的梯度下降框架能否统一并推广在线线性优化中的先前结果？
RQ5动作集 $\mathcal{S}$ 在决定最坏情况后悔中的作用是什么？$\mathcal{S}$ 的结构如何影响极小极大率？

主要发现

对于 $L_2$ 有界的对手，完整信息和半-bandit 设置下的极小极大后悔为 $\Omega(\sqrt{dn})$，且上界与之相差常数因子。
在 bandit 设置下，极小极大后悔为 $\Omega(\min(n, d\sqrt{n}))$，紧致性达到常数因子。
所提出的基于势函数的梯度下降结合 Bregman 投影的方法，恢复并推广了多个在线线性优化研究中的先前结果。
指数加权平均预测器在 $L_\infty$ 对手面前表现次优，解决了 Koolen 等人（2010）提出的开放问题。
通过在 $d/2$ 对专家上构造随机对手，结合 Pinsker 不等式与 KL 散度链式法则，推导出下界。
分析表明，在 $L_2$ 约束下，最坏情况后悔的量级为 $\sqrt{dn}$，在 $L_\infty$ 约束下为 $\min(n, d\sqrt{n})$，且通过匹配的上界验证了紧致性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。