QUICK REVIEW

[論文レビュー] A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic

Mingyi Hong, Hoi To Wai|arXiv (Cornell University)|Jul 10, 2020

Adaptive Dynamic Programming Control参考文献 62被引用数 52

ひとこと要約

二階層最適化のための二時刻確率近似（TTSA）アルゴリズムを導入し、内部問題が unconstrained, 強い凸性で、外部目的が滑らかな場合の収束速度を導出し、TTSAを二時尺度の自然 Actor-Critic ポリシー最適化へ適用した際の収束速度を示す。

ABSTRACT

This paper analyzes a two-timescale stochastic algorithm framework for bilevel optimization. Bilevel optimization is a class of problems which exhibit a two-level structure, and its goal is to minimize an outer objective function with variables which are constrained to be the optimal solution to an (inner) optimization problem. We consider the case when the inner problem is unconstrained and strongly convex, while the outer problem is constrained and has a smooth objective function. We propose a two-timescale stochastic approximation (TTSA) algorithm for tackling such a bilevel problem. In the algorithm, a stochastic gradient update with a larger step size is used for the inner problem, while a projected stochastic gradient update with a smaller step size is used for the outer problem. We analyze the convergence rates for the TTSA algorithm under various settings: when the outer problem is strongly convex (resp.~weakly convex), the TTSA algorithm finds an $\mathcal{O}(K^{-2/3})$-optimal (resp.~$\mathcal{O}(K^{-2/5})$-stationary) solution, where $K$ is the total iteration number. As an application, we show that a two-timescale natural actor-critic proximal policy optimization algorithm can be viewed as a special case of our TTSA framework. Importantly, the natural actor-critic algorithm is shown to converge at a rate of $\mathcal{O}(K^{-1/4})$ in terms of the gap in expected discounted reward compared to a global optimal policy.

研究の動機と目的

内部問題が強く凸で外部問題が滑らかな双レベル最適化を動機づけ・形式化する。
内部と外部の変数を異なる時刻スケールで更新する単一ループ TTSA アルゴリズムを提案する。
強く凸・凸・弱凸の外部目的に対するTTSAの収束速度を確立する。
暗黙微分を用いた内部解から外部目的の代理勾配を構築する。
二時尺度の自然 Actor-Critic PPO フレームワークを介した強化学習への応用を実証する。

提案手法

外部目的の勾配を y からの近似として用い、具体的には overline{∇}_x f(x,y) = ∇_x f(x,y) − ∇_{xy}^2 g(x,y) [∇_{yy}^2 g(x,y)]^{-1} ∇_y f(x,y) を用い、x より大きなステップサイズで y を更新して y が x の変化に追従するよう TTSA を定式化する。
外部目的の勾配の縮退代理として、y に基づく勾配代理を用い、具体的には overline{∇}_x f を構成する。
確率的勾配・ヘッセ行列/ヤコビ行列の推定を、制御されたバイアスと分散とともに提供する（前提条件 3, 7）。
強く凸な内部問題を活用して、外部目的の勾配推定器 h_f^k をランダムサンプルから構築し、overline{∇}_x f を近似する。
結合不等式と追従誤差 Δ_y^k を分析して、外部・内部反復の収束速度を確立する。

実験結果

リサーチクエスチョン

RQ1内部問題が強く凸で外部目的が滑らかな双レベル問題に対して、単一ループの TTSA アルゴリズムは収束を達成できるのか。
RQ2強く凸外部・凸外部・弱凸外部の設定におけるTTSAの収束速度はどうなるのか。
RQ3二時尺度のダイナミクスは追従誤差と全体の収束にどのような影響を与えるのか。
RQ4TTSAは自然 Actor-Critic のような強化学習フレームワークに効果的に適用できるのか。
RQ5TTSA における外部目的勾配の実用的計算を可能にする代理勾配の定式化は何か。

主な発見

ell(x)	Constraint	Step Size (α_k, β_k)	Rate (outer)	Rate (Inner)
SC	X ⊆ R^{d1}	O(k^{-1}), O(k^{-2/3})	O(K_max^{-2/3})^{†}	O(K_max^{-2/3})^{*}
C	X ⊆ R^{d1}	O(K_max^{-3/4}), O(K_max^{-1/2})	O(K_max^{-1/4})^{Ψ}	O(K_max^{-1/2})^{*}
WC	X ⊆ R^{d1}	O(K_max^{-3/5}), O(K_max^{-2/5})	O(K_max^{-2/5})^{#}	O(K_max^{-2/5})^{*}

強い凸の外部目的に対して、TTSA は減衰ステップサイズとともに O(K_max^{-2/3})-最適性を達成する。
弱凸の外部目的に対して、TTSA は O(K_max^{-2/5})-停留性を達成する。
凸外部目的の場合、適切なステップサイズ選択で O(K_max^{-1/4})-外部レートと O(K_max^{-1/2})-内部レートを達成する。
暗黙分化に基づく代理勾配は、偏り／分散を制御しつつ unbiased-ish な推定を実現する。
二時尺度の自然 Actor-Critic PPO への適用は、最適ポリシーに対する後悔の量として O(K^{-1/4}) の収束率を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。