QUICK REVIEW

[論文レビュー] Trading Regret for Efficiency: Online Convex Optimization with Long Term Constraints

Mehrdad Mahdavi, Rong Jin|arXiv (Cornell University)|Nov 25, 2011

Advanced Bandit Algorithms Research参考文献 25被引用数 167

ひとこと要約

本稿では、各ラウンドの厳密な実行可能性制約を緩和し、長期的な制約満たしを許容することで、計算効率を高める、新しいオンライン凸最適化フレームワークを提案する。問題を凸-凹最適化問題に再定式化し、罰則項を含む修正されたオンライン勾配降下法を用いることで、O(√T)のリグレットとO(T^{3/4})の制約違反を達成する。線形制約の場合にはミラー・プロックスを用いることで、より良い境界が得られる。

ABSTRACT

In this paper we propose a framework for solving constrained online convex optimization problem. Our motivation stems from the observation that most algorithms proposed for online convex optimization require a projection onto the convex set $\mathcal{K}$ from which the decisions are made. While for simple shapes (e.g. Euclidean ball) the projection is straightforward, for arbitrary complex sets this is the main computational challenge and may be inefficient in practice. In this paper, we consider an alternative online convex optimization problem. Instead of requiring decisions belong to $\mathcal{K}$ for all rounds, we only require that the constraints which define the set $\mathcal{K}$ be satisfied in the long run. We show that our framework can be utilized to solve a relaxed version of online learning with side constraints addressed in \cite{DBLP:conf/colt/MannorT06} and \cite{DBLP:conf/aaai/KvetonYTM08}. By turning the problem into an online convex-concave optimization problem, we propose an efficient algorithm which achieves $ ilde{\mathcal{O}}(\sqrt{T})$ regret bound and $ ilde{\mathcal{O}}(T^{3/4})$ bound for the violation of constraints. Then we modify the algorithm in order to guarantee that the constraints are satisfied in the long run. This gain is achieved at the price of getting $ ilde{\mathcal{O}}(T^{3/4})$ regret bound. Our second algorithm is based on the Mirror Prox method \citep{nemirovski-2005-prox} to solve variational inequalities which achieves $ ilde{\mathcal{\mathcal{O}}}(T^{2/3})$ bound for both regret and the violation of constraints when the domain $\K$ can be described by a finite number of linear constraints. Finally, we extend the result to the setting where we only have partial access to the convex set $\mathcal{K}$ and propose a multipoint bandit feedback algorithm with the same bounds in expectation as our first algorithm.

研究の動機と目的

オンライン凸最適化における複雑な凸集合への射影の計算ボトル neck を軽減すること。
各ラウンドで意思決定が実行可能集合に属する必要があるという要件を緩和し、代わりに長期的に制約が満たされることを保証すること。
サブ線形リグレットを達成すると同時に、時間経過とともに制約違反を最小限に抑えるアルゴリズムの開発。
部分的フィードバック（例：バンディットフィードバック）の設定へもフレームワークを拡張し、良好な境界を維持すること。
一般および線形制約付きドメインの両方に対して、リグレットと制約違反に関する理論的保証を提供すること。

提案手法

制約違反に基づく罰則項を導入することで、制約付きオンライン最適化問題を凸-凹サドルポイント問題に再定式化する。
正の制約違反に対して罰則を含む修正された損失関数にオンライン勾配降下法を適用し、射影を必要としない更新を可能にする。
δをパrameterとするパラメータ化された罰則関数を用い、リグレットと制約違反のバランスをとる。δを最適化することで、望ましいトレードオフを達成する。
ジェンセンの不等式と勾配のノルムバウンドを用い、問題パラメータの関数としてリグレットと違反バウンドを導出する。
制約が線形の場合に変分不等式に適応したミラー・プロックス法を適用し、リグレットと違反の両方でO(T^{2/3})の改善された境界を達成する。
バンディットフィードバックに拡張するため、マルチポイント推定戦略を用い、完全情報ケースと同等の期待値バウンドを維持する。

実験結果

リサーチクエスチョン

RQ1各ラウンドの実行可能性制約を緩和し、長期的実行可能性を優先することで、オンライン凸最適化の計算効率を向上させられるか？
RQ2射影を避ける代わりに罰則に基づく手法を用いる場合、リグレットと制約違反のトレードオフはどのように変化するか？
RQ3射影を避けることで、サブ線形制約違反を達成しつつ、O(√T)のリグレット境界を維持できるか？
RQ4制約が線形で、ミラー・プロックスが適用される場合、境界はどのように変化するか？
RQ5部分的情報であるバンディットフィードバック設定へも、良好なリグレットと違反バウンドを維持したままアルゴリズムを拡張できるか？

主な発見

提案されたアルゴリズムは、O(√T)のリグレットとO(T^{3/4})の制約違反を達成し、性能と計算コストの間で良好なトレードオフを実現する。
罰則パラメータδを調整することで、O(T^{3/4})のリグレットとO(T^{3/4})の制約違反を達成でき、ナイーブな手法よりも優れる。
線形制約付きドメインでは、ミラー・プロックスに基づくアルゴリズムが、リグレットと制約違反の両方でO(T^{2/3})の境界を達成し、一般ケースよりも改善される。
罰則パラメータγをbT^{-1/3}（b=2√F）に設定すれば、十分に大きなTに対して、長期的に制約違反がゼロになることが保証される。
バンディット拡張は、完全情報ケースと同等の期待値バウンドを達成し、部分的フィードバックに対してもロバストであることが示された。
理論的解析により、与えられた仮定の下でリグレットと違反バウンドがタイトであることが確認され、G、D、R、Fといった問題パラメータに明示的な依存関係があることが示された。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。