QUICK REVIEW

[논문 리뷰] Global Convergence and Variance-Reduced Optimization for a Class of Nonconvex-Nonconcave Minimax Problems

Junchi Yang, Negar Kiyavash|arXiv (Cornell University)|2020. 02. 22.

Stochastic Gradient Optimization Techniques참고 문헌 31인용 수 33

한 줄 요약

본 논문은 두-면 Polyak-Łojasiewicz(PL) 조건 하에서 교대 그래디언트 하강-상승(AGDA) 및 확률적 AGDA가 비볼록-비궤적 최솟-최댓값 문제에 대해 전역 수렴한다는 것을 보이고, finite-sum 설정에서 입증된 더 빠른 속도를 가진 분산 감소 VR-AGDA를 제시한다.

ABSTRACT

Nonconvex minimax problems appear frequently in emerging machine learning applications, such as generative adversarial networks and adversarial learning. Simple algorithms such as the gradient descent ascent (GDA) are the common practice for solving these nonconvex games and receive lots of empirical success. Yet, it is known that these vanilla GDA algorithms with constant step size can potentially diverge even in the convex setting. In this work, we show that for a subclass of nonconvex-nonconcave objectives satisfying a so-called two-sided Polyak-Łojasiewicz inequality, the alternating gradient descent ascent (AGDA) algorithm converges globally at a linear rate and the stochastic AGDA achieves a sublinear rate. We further develop a variance reduced algorithm that attains a provably faster rate than AGDA when the problem has the finite-sum structure.

연구 동기 및 목표

비볼록-비궤적 최솟-최댓값 문제에서 볼록-오목 가정 없이 전역 수렴을 유도한다.
AGDA와 Stoc-AGDA의 전역 수렴을 보장하는 실용적 조건(two-sided PL)을 식별한다.
향상된 속도와 함께 finite-sum 최댓-최솟 문제를 위한 분산 감소 VR-AGDA를 개발하고 분석한다.

제안 방법

독립적인 x와 y에 대한 PL 상수를 갖는 minimax 목적에 대해 Polyak-Łojasiewicz(PL)를 two-sided PL 조건으로 일반화한다.
교대 업데이트를 사용하는 AGDA 및 Stoc-AGDA를 제안하고 two-sided PL 하에서의 수렴성을 분석한다.
최적성 격차를 결합한 포텐셜 함수를 도입하여 수렴 속도를 입증한다.
SVRG 스타일의 분산 감소를 교대 업데이트에 통합하여 VR-AGDA를 개발하고 two-sided PL에서 선형 수렴을 증명한다.
finite-sum 설정에서 AGDA에 비해 VR-AGDA의 개선된 복잡도를 제시한다.

실험 결과

연구 질문

RQ1두-면 PL 조건 하에서 AGDA 및 Stoc-AGDA가 비볼록-비궤적 최솟-최댓값 문제에 대해 전역 수렴을 달성할 수 있는가?
RQ2분산 감소 변형(VR-AGDA)이 convex-concave 구조를 요구하지 않으면서 finite-sum 최솟-최댓값 문제의 수렴 속도를 개선하는가?

주요 결과

Algorithm	Complexity
AGDA	O(n κ^3 log(1/ε))
Stoc-AGDA	O(κ^5 /(μ_2 ε))
VR-AGDA (n≤κ^9)	O(n^{2/3} κ^3 log(1/ε))
VR-AGDA (n≥κ^9)	O((n+κ^9) log(1/ε))

두-면 PL 하에서 AGDA는 적절한 스텝 크기로 전역 선형 수렴을 달성하여 saddle point에 수렴한다.
Stoc-AGDA는 확률적 분산을 고려한 감소 스텝 크기로 sublinear(O(1/t))으로 saddle point에 수렴한다.
VR-AGDA는 n≤κ^9일 때 O(n^{2/3} κ^3 log(1/ε))의 총 복잡도, n≥κ^9일 때 O((n+κ^9) log(1/ε))의 총 복잡도로 더 빠른 속도를 달성하며 AGDA보다 개선된다.
두-면 PL 하에서 세 가지 등가 최적성 기준(사들 포인트, 전역 최소-최대, 정지점)이 성립한다.
강건한 최소제곱 및 LQR 모방 학습에 대한 실험 결과는 VR-AGDA의 우수한 성능을 보여주며, 특히 높은 조건수에서 두드러진다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.