QUICK REVIEW

[論文レビュー] Online Convex Optimization with Time-Varying Constraints

Michael J. Neely, Hao Yu|arXiv (Cornell University)|Feb 15, 2017

Advanced Bandit Algorithms Research参考文献 10被引用数 71

ひとこと要約

本稿はLyapunovドリフトと仮想キューを用いて時間変化する制約を持つオンライン凸最適化を解くオンラインアルゴリズムを提案し、共通真理集合のSlater条件の下でO(1/ε^2)収束速度を達成し、確率的な独立同分布(i.i.d.)設定へ拡張する。

ABSTRACT

This paper considers online convex optimization with time-varying constraint functions. Specifically, we have a sequence of convex objective functions $\{f_t(x)\}_{t=0}^{\infty}$ and convex constraint functions $\{g_{t,i}(x)\}_{t=0}^{\infty}$ for $i \in \{1, ..., k\}$. The functions are gradually revealed over time. For a given $ε>0$, the goal is to choose points $x_t$ every step $t$, without knowing the $f_t$ and $g_{t,i}$ functions on that step, to achieve a time average at most $ε$ worse than the best fixed-decision that could be chosen with hindsight, subject to the time average of the constraint functions being nonpositive. It is known that this goal is generally impossible. This paper develops an online algorithm that solves the problem with $O(1/ε^2)$ convergence time in the special case when all constraint functions are nonpositive over a common subset of $\mathbb{R}^n$. Similar performance is shown in an expected sense when the common subset assumption is removed but the constraint functions are assumed to vary according to a random process that is independent and identically distributed (i.i.d.) over time slots $t \in \{0, 1, 2, \ldots\}$. Finally, in the special case when both the constraint and objective functions are i.i.d. over time slots $t$, the algorithm is shown to come within $ε$ of optimality with respect to the best (possibly time-varying) causal policy that knows the full probability distribution.

研究の動機と目的

時間変化する制約関数を有するオンライン凸最適化を動機づけ、形式化する。
Slater条件の下で共通実行可能部分集合内の最適な固定決定と競合するオンラインアルゴリズムを開発する。
決定論的および確率的（i.i.d.）設定における収束保証を特徴づける。
ラグランジ multiplier を必要とせず、単純な射影と仮想キューを用いた実装を提供する。

提案手法

各制約に対して仮想キュー Q_i(t) を導入し、Slaterに基づく共通実行可能部分集合を設定する。
Lyapunovドリフトと重み付き二次正則化および部分勾配項を組み合わせたドリフト＋ペナルティ目的を定式化する。
線形化した目的関数に二次ペナルティを加えたものを最小化する1スロット決定規則を導出し、X 上への射影へ導く（X_t = P_X[X_{t-1} + W_t/(2α)]）。
更新則: Q_i(t+1) = max{Q_i(t) + g_{t-1,i}(X_{t-1}) + g'_{t-1,i}(X_{t-1})^T(X_t - X_{t-1}), 0}。
V と α というパラメータを用いてトレードオフを制御し、ドリフト、キュー、および目的ギャップの有限界を提供する（V は目的誤差を、α は安定性を制御）。
実装として射影ベースの実装と等価であることを示し、性能境界を確立する主要な補題と定理を導出する。

実験結果

リサーチクエスチョン

RQ1時間変化する制約下で共通実行可能部分集合内の最適な固定決定に対してε近似をオンラインアルゴリズムで達成できるか？
RQ2決定論的Slater条件下でオンライン凸最適化の収束速度はどの程度達成可能か？
RQ3確率的（i.i.d.）な制約と目的モデルが達成可能な性能保証にどう影響するか？
RQ4共通実行可能集合Aの明示的知識なしに単純射影を用いてオンラインアルゴリズムを実装できるか？
RQ5異なるモデリング仮定（決定論的 vs. i.i.d.）における仮想キューとレグレットの界は？

主な発見

オンラインアルゴリズムは決定論的Slater条件と共通サブセット仮定の下でO(ε)近似と収束時間O(1/ε^2)を達成する。
任意の目的ダイナミクスを持つi.i.d.制約プロセスの下では、アルゴリズムは期待値でO(ε)近似をO(1/ε^2)の収束時間で達成する。
目的関数と制約関数の両方がi.i.d.である場合、アルゴリズムは分布を知る最良の因果方針のε以内に収まる。
この方法はラグランジュ乗数の知識を必要とせず、X 上で実装可能な各スロットの射影を使用する。
キューの界は適切なパラメータ選択の下でO(V)にスケールすることを示し、ドリフト、キュー長、目的/制約性能を結ぶ分析を提供する。
無制約の場合の下界は一致しており、オンライン設定におけるほぼ最適な収束速度を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。