QUICK REVIEW

[논문 리뷰] A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic

Mingyi Hong, Hoi To Wai|arXiv (Cornell University)|2020. 07. 10.

Adaptive Dynamic Programming Control참고 문헌 62인용 수 52

한 줄 요약

두 시차 확률적 근사(TTSA) 알고리즘을 unconstrained, strongly convex inner problem 및 smooth outer objective를 갖는 이중 수준 최적화에 도입하고 수렴 속도를 도출하며, TTSA를 두 시점의 자연 Actor-Critic 정책 최적화에 적용하여 제시된 속도를 보인다.

ABSTRACT

This paper analyzes a two-timescale stochastic algorithm framework for bilevel optimization. Bilevel optimization is a class of problems which exhibit a two-level structure, and its goal is to minimize an outer objective function with variables which are constrained to be the optimal solution to an (inner) optimization problem. We consider the case when the inner problem is unconstrained and strongly convex, while the outer problem is constrained and has a smooth objective function. We propose a two-timescale stochastic approximation (TTSA) algorithm for tackling such a bilevel problem. In the algorithm, a stochastic gradient update with a larger step size is used for the inner problem, while a projected stochastic gradient update with a smaller step size is used for the outer problem. We analyze the convergence rates for the TTSA algorithm under various settings: when the outer problem is strongly convex (resp.~weakly convex), the TTSA algorithm finds an $\mathcal{O}(K^{-2/3})$-optimal (resp.~$\mathcal{O}(K^{-2/5})$-stationary) solution, where $K$ is the total iteration number. As an application, we show that a two-timescale natural actor-critic proximal policy optimization algorithm can be viewed as a special case of our TTSA framework. Importantly, the natural actor-critic algorithm is shown to converge at a rate of $\mathcal{O}(K^{-1/4})$ in terms of the gap in expected discounted reward compared to a global optimal policy.

연구 동기 및 목표

내부 문제가 강하게 볼록하고 외부 문제가 매끄러운 이중 수준 최적화를 동기 부여하고 형식화한다.
다른 시점 규모로 내부와 외부 변수를 업데이트하는 단일 루프 TTSA 알고리즘을 제안한다.
강하게(convex), 볼록(convex), 약하게 볼록(weakly convex)인 외부 목적에 대한 TTSA의 수렴 속도를 확립한다.
암묵적 미분을 이용하여 내부 해를 통해 외부 목적의 대리 기울기(서로게이트 gradient) 구성을 제공한다.
두 시점의 자연 Actor-Critic PPO 프레임워크를 통한 강화학습 적용을 시연한다.

제안 방법

x보다 더 큰 스텝 사이즈로 y를 업데이트하고 x를 더 작은 스텝 사이즈로 업데이트하는 TTSA를 형식화하여 x가 바뀌는 동안 y가 y*(x)를 추적하도록 한다.
y를 기반으로 한 외부 목적의 그라디언트 대리값을 사용한다, 구체적으로 overline{∇}_x f(x,y) = ∇_x f(x,y) − ∇_{xy}^2 g(x,y) [∇_{yy}^2 g(x,y)]^{-1} ∇_y f(x,y) 이다.
제어된 바이어스와 분산을 갖는 확률적 그라디언트 및 해시(Hessian)/야코비안 추정치를 제공한다(가정 3, 7).
강하게 볼록한 내부 문제를 활용하면서 overline{∇}_x f를 근사하기 위해 무작위 샘플로 구성된 그라디언트 추정기 h_f^k를 제안한다.
결합된 부등식과 추적 오차 Δ_y^k를 분석하여 외부 및 내부 재귀의 수렴 속도를 입증한다.

실험 결과

연구 질문

RQ1단일 루프 TTSA 알고리즘이 내부 문제의 강한 볼록성 및 외부 목표의 매끄러움이 있는 이중 수준 문제에 대해 수렴을 달성할 수 있는가?
RQ2강하게 볼록한 외부, 볼록한 외부, 그리고 약하게 볼록한 외부 설정에서 TTSA의 수렴 속도는 어떤가?
RQ3두 시점의 동적이 추적 오차 및 실질적 수렴에 어떤 영향을 미치는가?
RQ4TTSA를 Actor-Critic 방법과 같은 강화학습 프레임워크에 효과적으로 적용할 수 있는가?
RQ5TTSA에서 외부 목적의 기울기를 실용적으로 계산하게 하는 대리 기울기 형식은 무엇인가?

주요 결과

TTSA가 감소하는 스텝 사이즈에서 강하게 볼록한 외부 목표에 대해 O(K_max^{-2/3})-최적성을 달성한다.
약하게 볼록한 외부 목표에 대해 TTSA가 O(K_max^{-2/5})-정지성을 달성한다.
볼록한 외부 목표의 경우 적절한 스텝사이즈 선택으로 TTSA가 O(K_max^{-1/4})-외부 속도와 O(K_max^{-1/2})-내부 속도를 달성한다.
암묵적 미분에 기반한 대리 기울기가 편향/분산을 통제된 수준으로 추정 가능하도록 편향없는(또는 편향 거의 없는) 추정을 가능하게 한다.
두 시점의 자연스러운 Actor-Critic PPO에의 적용은 최적 정책에 대한 후회(regret) 측면에서 O(K^{-1/4})의 수렴 속도를 보인다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.