QUICK REVIEW

[论文解读] Efficient Algorithms for Smooth Minimax Optimization

Kiran Koshy Thekumparampil, Prateek Jain|arXiv (Cornell University)|Jul 2, 2019

Sparse and Compressive Sensing Techniques被引用 53

一句话总结

该论文通过结合镜像-近似（Mirror-Prox）与奈斯特罗夫加速梯度下降（AGD），提出了一种高效的首阶算法用于光滑极小极大优化，在 $ g(\cdot,y) $ 为强凸函数时实现了全局收敛速率 $ \widetilde{O}(1/k^2) $，并通过一种非精确的邻近点方法将其扩展至非凸情形，使非凸情形下的收敛速率提升至 $ \widetilde{O}(1/k^{1/3}) $，显著优于以往的速率。

ABSTRACT

This paper studies first order methods for solving smooth minimax optimization problems $\min_x \max_y g(x,y)$ where $g(\cdot,\cdot)$ is smooth and $g(x,\cdot)$ is concave for each $x$. In terms of $g(\cdot,y)$, we consider two settings -- strongly convex and nonconvex -- and improve upon the best known rates in both. For strongly-convex $g(\cdot, y), \forall y$, we propose a new direct optimal algorithm combining Mirror-Prox and Nesterov's AGD, and show that it can find global optimum in $\widetilde{O}\left(1/k^2 ight)$ iterations, improving over current state-of-the-art rate of $O(1/k)$. We use this result along with an inexact proximal point method to provide $\widetilde{O}\left(1/k^{1/3} ight)$ rate for finding stationary points in the nonconvex setting where $g(\cdot, y)$ can be nonconvex. This improves over current best-known rate of $O(1/k^{1/5})$. Finally, we instantiate our result for finite nonconvex minimax problems, i.e., $\min_x \max_{1\leq i\leq m} f_i(x)$, with nonconvex $f_i(\cdot)$, to obtain convergence rate of $O(m^{1/3}\sqrt{\log m}/k^{1/3})$.

研究动机与目标

开发针对光滑极小极大问题 $\min_x \max_y g(x,y)$ 的更快首阶方法，其中 $ g(x,\cdot) $ 光滑且为凹函数。
提升当 $ g(\cdot,y) $ 对所有 $ y $ 均为强凸函数时的收敛速率。
将改进后的速率推广至 $ g(\cdot,y) $ 可能为非凸函数的非凸情形。
将结果应用于有限极小极大问题 $\min_x \max_{1\leq i\leq m} f_i(x)$，其中 $ f_i $ 为非凸函数。

提出的方法

为强凸 $ g(\cdot,y) $ 的情形，提出一种结合镜像-近似与奈斯特罗夫 AGD 的新型直接最优算法。
将所提算法作为子程序，嵌入非精确邻近点方法中，以处理 $ g(\cdot,y) $ 为非凸的情形。
通过利用加速技术与镜像下降原理，建立强凸情形下 $ \widetilde{O}(1/k^2) $ 的收敛速率。
通过在非精确邻近点框架中精细控制误差，推导出非凸情形下寻找驻点的 $ \widetilde{O}(1/k^{1/3}) $ 速率。
将通用框架应用于有限极小极大问题 $ \min_x \max_{1\leq i\leq m} f_i(x) $，其中 $ f_i $ 为非凸函数，获得 $ O(m^{1/3}\sqrt{\log m}/k^{1/3}) $ 的收敛速率。
通过假设 $ g(x,\cdot) $ 的光滑性与凹性，以及 $ g(\cdot,y) $ 的凸性，推导出紧致的复杂度界。

实验结果

研究问题

RQ1当 $ g(\cdot,y) $ 为强凸函数时，能否在光滑极小极大问题中实现优于 $ O(1/k) $ 的收敛速率？
RQ2能否将凸优化中的加速方法推广至极小极大设置并获得更优速率？
RQ3在非凸光滑极小极大问题中，寻找驻点的最优收敛速率是什么？
RQ4在具有非凸分量的有限极小极大问题中，该方法的规模如何随函数数量 $ m $ 变化？

主要发现

所提算法在 $ g(\cdot,y) $ 为强凸函数的光滑极小极大问题中实现了全局收敛速率 $ \widetilde{O}(1/k^2) $，优于以往的 $ O(1/k) $ 速率。
通过将新算法与非精确邻近点方法结合，论文在非凸情形下实现了 $ \widetilde{O}(1/k^{1/3}) $ 的驻点收敛速率，优于以往的 $ O(1/k^{1/5}) $ 速率。
对于有限非凸极小极大问题 $ \min_x \max_{1\leq i\leq m} f_i(x) $，该方法实现了 $ O(m^{1/3}\sqrt{\log m}/k^{1/3}) $ 的收敛速率。
结果表明，在光滑性与凹性假设下，凸优化中的加速技术可被有效适配至极小极大设置。
通过在非精确邻近点迭代中精细平衡近似误差，分析建立了紧致的复杂度界。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。