QUICK REVIEW

[论文解读] Online Convex Optimization with Time-Varying Constraints

Michael J. Neely, Hao Yu|arXiv (Cornell University)|Feb 15, 2017

Advanced Bandit Algorithms Research参考文献 10被引用 71

一句话总结

本文给出一种在线算法，使用 Lyapunov 漂移和虚拟队列来解决具有时间变化约束的在线凸优化，在常见子集 Slater 条件下获得 O(1/ε^2) 的收敛速率，并扩展到随机 iid 设置。

ABSTRACT

This paper considers online convex optimization with time-varying constraint functions. Specifically, we have a sequence of convex objective functions $\{f_t(x)\}_{t=0}^{\infty}$ and convex constraint functions $\{g_{t,i}(x)\}_{t=0}^{\infty}$ for $i \in \{1, ..., k\}$. The functions are gradually revealed over time. For a given $ε>0$, the goal is to choose points $x_t$ every step $t$, without knowing the $f_t$ and $g_{t,i}$ functions on that step, to achieve a time average at most $ε$ worse than the best fixed-decision that could be chosen with hindsight, subject to the time average of the constraint functions being nonpositive. It is known that this goal is generally impossible. This paper develops an online algorithm that solves the problem with $O(1/ε^2)$ convergence time in the special case when all constraint functions are nonpositive over a common subset of $\mathbb{R}^n$. Similar performance is shown in an expected sense when the common subset assumption is removed but the constraint functions are assumed to vary according to a random process that is independent and identically distributed (i.i.d.) over time slots $t \in \{0, 1, 2, \ldots\}$. Finally, in the special case when both the constraint and objective functions are i.i.d. over time slots $t$, the algorithm is shown to come within $ε$ of optimality with respect to the best (possibly time-varying) causal policy that knows the full probability distribution.

研究动机与目标

激励并形式化具有时间变化约束函数的在线凸优化。
开发一种在线算法，在 Slater 条件下与公共可行子集中的最佳固定决策竞争。
刻画确定性与随机（独立同分布，i.i.d.）设置下的收敛性保证。
给出不需要拉格朗日乘子、仅使用简单投影和虚拟队列的实现。

提出的方法

为每个约束引入虚拟队列 Q_i(t)，以及基于 Slater 条件的公共可行子集。
构建一个漂移-惩罚目标，将Lyapunov 漂移与加权二次正则化及子梯度项结合。
推导出逐时段决策规则，在一个线性化目标加上二次惩罚下最小化，导致对 X 的投影（X_t = P_X[X_{t-1} + W_t/(2α)]）。
更新规则：Q_i(t+1) = max{Q_i(t) + g_{t-1,i}(X_{t-1}) + g'_{t-1,i}(X_{t-1})^T(X_t - X_{t-1}), 0}。
给出漂移、队列和目标差距的有限界，参数 V 和 α 控制权衡（V 控制目标误差，α 控制稳定性）。
展示与基于投影的实现的等价性并推导建立性能界限的关键引理和定理。

实验结果

研究问题

RQ1在时间变化的约束下，在线算法是否能实现对公共可行子集中最佳固定决策的 ε-近似？
RQ2在确定性 Slater 条件下，带时间变化约束的在线凸优化能够达到怎样的收敛速率？
RQ3随机（i.i.d.）约束和目标模型如何影响可实现的性能保证？
RQ4是否可以在不显式知道公共可行集 A 的情况下，使用简单投影来实现该在线算法？
RQ5在不同建模假设（确定性与 i.i.d.）下，虚拟队列和遗憾的界限是多少？

主要发现

在确定性 Slater 条件和公共子集假设下，在线算法实现 O(ε) 近似，收敛时间为 O(1/ε^2)。
在具有任意目标动态的 i.i.d. 约束过程下，算法在期望意义上达到 O(ε) 近似，收敛时间为 O(1/ε^2)。
当目标函数和约束函数均为 i.i.d. 时，算法在与知道分布的最佳因果策略相比的误差小于 ε。
该方法不需要拉格朗日乘子知识，使用在 X 上可实现的逐时段投影。
队列界限显示在适当的参数选择下 ||Q(t)|| 规模为 O(V)，且分析将漂移、队列长度与目标/约束性能联系起来。
无约束情况的下界得到匹配，证明在线情形下的收敛速率接近最优。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。