QUICK REVIEW

[论文解读] Trading Regret for Efficiency: Online Convex Optimization with Long Term Constraints

Mehrdad Mahdavi, Rong Jin|arXiv (Cornell University)|Nov 25, 2011

Advanced Bandit Algorithms Research参考文献 25被引用 167

一句话总结

本文提出了一种新颖的在线凸优化框架，通过允许长期满足约束条件来放松严格的每轮可行性约束，从而在遗憾与计算效率之间进行权衡。通过将问题重新表述为凸-凹优化问题，并使用带有惩罚项的改进在线梯度下降法，该方法实现了 O(√T) 的遗憾和 O(T^{3/4}) 的约束违反，对于线性约束情形，通过镜像近端法进一步改善了边界。

ABSTRACT

In this paper we propose a framework for solving constrained online convex optimization problem. Our motivation stems from the observation that most algorithms proposed for online convex optimization require a projection onto the convex set $\mathcal{K}$ from which the decisions are made. While for simple shapes (e.g. Euclidean ball) the projection is straightforward, for arbitrary complex sets this is the main computational challenge and may be inefficient in practice. In this paper, we consider an alternative online convex optimization problem. Instead of requiring decisions belong to $\mathcal{K}$ for all rounds, we only require that the constraints which define the set $\mathcal{K}$ be satisfied in the long run. We show that our framework can be utilized to solve a relaxed version of online learning with side constraints addressed in \cite{DBLP:conf/colt/MannorT06} and \cite{DBLP:conf/aaai/KvetonYTM08}. By turning the problem into an online convex-concave optimization problem, we propose an efficient algorithm which achieves $ ilde{\mathcal{O}}(\sqrt{T})$ regret bound and $ ilde{\mathcal{O}}(T^{3/4})$ bound for the violation of constraints. Then we modify the algorithm in order to guarantee that the constraints are satisfied in the long run. This gain is achieved at the price of getting $ ilde{\mathcal{O}}(T^{3/4})$ regret bound. Our second algorithm is based on the Mirror Prox method \citep{nemirovski-2005-prox} to solve variational inequalities which achieves $ ilde{\mathcal{\mathcal{O}}}(T^{2/3})$ bound for both regret and the violation of constraints when the domain $\K$ can be described by a finite number of linear constraints. Finally, we extend the result to the setting where we only have partial access to the convex set $\mathcal{K}$ and propose a multipoint bandit feedback algorithm with the same bounds in expectation as our first algorithm.

研究动机与目标

为解决在线凸优化中投影到复杂凸集时的计算瓶颈问题。
放松每轮决策必须位于可行集中的要求，转而确保约束在长期中得到满足。
开发在长时间内实现次线性遗憾并最小化约束违反的算法。
将该框架扩展至部分反馈场景（如 bandit 反馈），同时保持有利的边界。
为一般域和线性约束域提供关于遗憾和约束违反的理论保证。

提出的方法

通过引入基于约束违反的惩罚项，将约束在线优化问题重新表述为凸-凹对偶点问题。
在包含正向约束违反惩罚项的修改损失函数上使用在线梯度下降，实现无投影更新。
采用参数化惩罚函数并引入 δ 以平衡遗憾与约束违反，通过优化 δ 实现期望的权衡。
利用詹森不等式和梯度的范数界，推导出以问题参数表示的遗憾与违反边界。
当约束为线性时，将镜像近端法适配于变分不等式，实现遗憾与违反的 O(T^{2/3}) 边界，优于一般情形。
通过多点估计策略将框架扩展至 bandit 反馈，保持与完整信息情形相似的期望边界。

实验结果

研究问题

RQ1能否通过放松每轮可行性约束、转而追求长期可行性，使在线凸优化更加高效？
RQ2在避免投影、改用基于惩罚的方法时，遗憾与约束违反之间的权衡如何？
RQ3能否在长期实现次线性约束违反的同时，保持 O(√T) 的遗憾边界？
RQ4当域由线性约束定义且应用镜像近端法时，边界如何变化？
RQ5该算法能否扩展至部分信息的 bandit 反馈场景，同时保持有利的遗憾与违反边界？

主要发现

所提算法实现了 O(√T) 遗憾和 O(T^{3/4}) 的约束违反，表现出性能与计算成本之间的有利权衡。
通过调节惩罚参数 δ，该方法可实现 O(T^{3/4}) 遗憾和 O(T^{3/4}) 的约束违反，优于朴素方法。
对于线性约束域，基于镜像近端的算法在遗憾与约束违反上均实现 O(T^{2/3}) 边界，优于一般情形。
当惩罚参数 γ 设为 bT^{-1/3}（其中 b=2√F）时，该算法在足够大的 T 下可保证长期约束违反为零。
bandit 扩展版本实现了与完整信息情形相同的期望边界，表明对部分反馈具有鲁棒性。
理论分析证实，在给定假设下，遗憾与违反边界是紧致的，且显式依赖于问题参数 G、D、R 和 F。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。