QUICK REVIEW

[论文解读] A Low Complexity Algorithm with $O(\sqrt{T})$ Regret and $O(1)$ Constraint Violations for Online Convex Optimization with Long Term Constraints

Hao Yu, Michael J. Neely|arXiv (Cornell University)|Apr 8, 2016

Advanced Bandit Algorithms Research参考文献 18被引用 19

一句话总结

该论文提出了一种新颖的低复杂度在线凸优化算法，可在具有长期函数约束的问题中实现 $O(√T)$ 的遗憾和 $O(1)$ 的约束违规。通过引入基于对偶平均的更新规则并采用自适应惩罚参数，该算法避免了昂贵的投影操作，同时确保累积约束违规保持有界，优于以往因约束违规持续增长而表现较差的方法。

ABSTRACT

This paper considers online convex optimization over a complicated constraint set, which typically consists of multiple functional constraints and a set constraint. The conventional online projection algorithm (Zinkevich, 2003) can be difficult to implement due to the potentially high computation complexity of the projection operation. In this paper, we relax the functional constraints by allowing them to be violated at each round but still requiring them to be satisfied in the long term. This type of relaxed online convex optimization (with long term constraints) was first considered in Mahdavi et al. (2012). That prior work proposes an algorithm to achieve $O(\sqrt{T})$ regret and $O(T^{3/4})$ constraint violations for general problems and another algorithm to achieve an $O(T^{2/3})$ bound for both regret and constraint violations when the constraint set can be described by a finite number of linear constraints. A recent extension in \citet{Jenatton16ICML} can achieve $O(T^{\max\{θ,1-θ\}})$ regret and $O(T^{1-θ/2})$ constraint violations where $θ\in (0,1)$. The current paper proposes a new simple algorithm that yields improved performance in comparison to prior works. The new algorithm achieves an $O(\sqrt{T})$ regret bound with $O(1)$ constraint violations.

研究动机与目标

解决在约束复杂时基于投影的在线凸优化算法计算成本过高的问题。
使在线算法在电力系统或网络调度等具有复杂约束集的系统中具备实际部署可行性。
在时间上实现次线性遗憾并保持约束违规有界，即使在单轮中违反约束也能实现。
开发一种避免对复杂约束集进行迭代投影的方法，同时保持强理论性能保证。
改进现有算法中约束违规持续增长的问题，例如 $O(T^{3/4})$ 或 $O(T^{2/3})$ 的边界。

提出的方法

引入一种基于对偶平均的更新规则，通过维护一个对偶变量向量来追踪长期约束违规。
使用随时间变化的惩罚参数，其规模为 $\Theta(\sqrt{t})$，以平衡遗憾与约束违规。
仅对基础凸集 $\mathcal{X}_0$ 进行简单投影，避免对复杂函数约束进行投影。
应用加倍技巧以在未知时间范围 $T$ 的情况下运行，无需事先知晓 $T$。
通过选择合适的步长和惩罚更新规则，利用次梯度下降推导遗憾边界。
通过自适应对偶变量更新确保约束违规被限制在一个常数范围内。

实验结果

研究问题

RQ1能否设计一种低复杂度的在线算法，在具有长期约束的在线凸优化中实现 $O(\sqrt{T})$ 遗憾和 $O(1)$ 的约束违规？
RQ2是否可能避免对复杂函数约束集进行昂贵的投影操作，同时保持强理论性能？
RQ3所提算法的性能与以往具有 $O(T^{3/4})$ 或 $O(T^{2/3})$ 约束违规的算法相比如何？
RQ4该算法是否可在未知时间范围 $T$ 的情况下实现，而无需事先知晓 $T$？
RQ5自适应惩罚参数方案是否能同时确保次线性遗憾和有界的约束违规？

主要发现

所提算法实现了 $O(\sqrt{T})$ 遗憾，与在线凸优化的最佳已知边界一致。
约束违规被限制在一个常数范围内，实现 $O(1)$ 的违规，相较于以往的 $O(T^{3/4})$ 和 $O(T^{2/3})$ 边界有显著改进。
该算法通过仅在基础集 $\mathcal{X}_0$ 上投影，避免了在每轮中求解复杂的凸规划问题，从而大幅降低计算成本。
数值实验表明，该算法在 $T=5000$ 的 1000 次独立试验中均保持低遗憾和有界的约束违规。
该算法在遗憾方面与其它 $O(\sqrt{T})$ 遗憾方法表现相当，但在约束违规方面显著优于它们。
加倍技巧使算法无需事先知晓 $T$ 即可运行，同时保持相同的理论边界。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。