QUICK REVIEW

[论文解读] Global Convergence and Variance-Reduced Optimization for a Class of Nonconvex-Nonconcave Minimax Problems

Junchi Yang, Negar Kiyavash|arXiv (Cornell University)|Feb 22, 2020

Stochastic Gradient Optimization Techniques参考文献 31被引用 33

一句话总结

本文在两边 Polyak-Łojasiewicz (PL) 条件下，交替梯度下降-上升（AGDA）和随机 AGDA 全局收敛于非凸-非凸极小极大问题，并提出一个方差减小的 AGDA (VR-AGDA)，在有限和设定下具有更快的收敛速率。

ABSTRACT

Nonconvex minimax problems appear frequently in emerging machine learning applications, such as generative adversarial networks and adversarial learning. Simple algorithms such as the gradient descent ascent (GDA) are the common practice for solving these nonconvex games and receive lots of empirical success. Yet, it is known that these vanilla GDA algorithms with constant step size can potentially diverge even in the convex setting. In this work, we show that for a subclass of nonconvex-nonconcave objectives satisfying a so-called two-sided Polyak-Łojasiewicz inequality, the alternating gradient descent ascent (AGDA) algorithm converges globally at a linear rate and the stochastic AGDA achieves a sublinear rate. We further develop a variance reduced algorithm that attains a provably faster rate than AGDA when the problem has the finite-sum structure.

研究动机与目标

在不假设凸-凹的情况下，促进非凸-非凹极小极大问题的全局收敛。
识别一个实用条件（两边 PL 条件），可保证 AGDA 与 Stoc-AGDA 的全局收敛。
开发并分析一个方差减小的 AGDA（VR-AGDA），用于有限和极小极大问题并获得改进的收敛速率。

提出的方法

将 Polyak-Łojasiewicz (PL) 推广为对目标为极小极大问题的两边 PL 条件，对于 x 和 y 独立的 PL 常数。
提出带交替更新的 AGDA 与 Stoc-AGDA，并在两边 PL 下分析收敛性。
引入一个将最优性差结合在一起的势函数以证明收敛率。
通过将 SVRG 风格的方差减小与交替更新相结合来实现 VR-AGDA；在两边 PL 下证明线性收敛。
给出复杂度结果，显示 VR-AGDA 在有限和设定下相对 AGDA 的改进。

实验结果

研究问题

RQ1在两边 PL 条件下，AGDA 和 Stoc-AGDA 是否能实现非凸-非凹极小极大问题的全局收敛？
RQ2在不要求凸-凹结构的前提下，方差减小的变体（VR-AGDA）是否能提高有限和极小极大问题的收敛速率？

主要发现

算法	复杂度
AGDA	O(n κ^3 log(1/ε))
Stoc-AGDA	O(κ^5 /(μ_2 ε))
VR-AGDA (n≤κ^9)	O(n^{2/3} κ^3 log(1/ε))
VR-AGDA (n≥κ^9)	O((n+κ^9) log(1/ε))

在两边 PL 下，AGDA 以适当的步长实现对鞍点的全局线性收敛。
Stoc-AGDA 在下降步长下以子线性收敛至鞍点（O(1/t)），并考虑随机方差。
VR-AGDA 在 n≥κ^9 时总复杂度为 O((n+κ^9) log(1/ε))，在 n≤κ^9 时为 O(n^{2/3} κ^3 log(1/ε))，相较于 AGDA 有所提升。
在两边 PL 下，三种等价的最优性标准（鞍点、全局极小极大、驻点）成立。
对鲁棒最小二乘和 LQR 模仿学习的经验结果表明 VR-AGDA 的性能优越，尤其在高条件数时。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。