QUICK REVIEW

[论文解读] Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile

Panayotis Mertikopoulos, Bruno Lecouat|arXiv (Cornell University)|Jul 7, 2018

Advanced Numerical Methods in Computational Mathematics参考文献 3被引用 111

一句话总结

本论文分析用于非单调鞍点问题的镜像下降（mirror descent）定义的相干性(coherence)，引入带有额外梯度步的乐观镜像下降（OMD），证明收敛结果，并在 GAN 等模型上对收益进行实证验证。

ABSTRACT

Owing to their connection with generative adversarial networks (GANs), saddle-point problems have recently attracted considerable interest in machine learning and beyond. By necessity, most theoretical guarantees revolve around convex-concave (or even linear) problems; however, making theoretical inroads towards efficient GAN training depends crucially on moving beyond this classic framework. To make piecemeal progress along these lines, we analyze the behavior of mirror descent (MD) in a class of non-monotone problems whose solutions coincide with those of a naturally associated variational inequality - a property which we call coherence. We first show that ordinary, "vanilla" MD converges under a strict version of this condition, but not otherwise; in particular, it may fail to converge even in bilinear models with a unique solution. We then show that this deficiency is mitigated by optimism: by taking an "extra-gradient" step, optimistic mirror descent (OMD) converges in all coherent problems. Our analysis generalizes and extends the results of Daskalakis et al. (2018) for optimistic gradient descent (OGD) in bilinear problems, and makes concrete headway for establishing convergence beyond convex-concave games. We also provide stochastic analogues of these results, and we validate our analysis by numerical experiments in a wide array of GAN models (including Gaussian mixture models, as well as the CelebA and CIFAR-10 datasets).

研究动机与目标

动机并分析在非单调鞍点问题中，若与变分不等式保持相干性时，普通镜像下降的局限性。
引入带有额外梯梯度步的乐观镜像下降（OMD）以稳定并确保收敛。
建立 OMD 在严格相干性与随机设置下的收敛性保障。
提供随机对照类似物并通过对多数据集的 GAN 相关实验验证理论。

提出的方法

通过变量 x=(x1,x2) 的可微目标函数 f 来建模鞍点问题。
定义梯度向量 g(x)=(∇x1 f(x1,x2), -∇x2 f(x1,x2)) 并研究其与相关 VI 的相干性。
使用距离生成函数 h 来定义 Bregman 发散和用于镜像下降（MD）的近端映射。
表明在空相干性情况下，即使步长趋于0，普通 MD 也可能发散或循环。
通过增加额外梯度步引入乐观镜像下降（OMD）：先计算中间点 x+，再使用 g(x+) 进行更新。
给出收敛性结果：在相干性下，OMD 的 D(x*,Xn) 单调收敛；在随机的严格相干设置中几乎必然收敛；以及双线性和凸-凹情形的推论。

实验结果

研究问题

RQ1在非单调、相干的鞍点问题中，普通镜像下降何时收敛？
RQ2是否可以通过额外梯度（乐观）步来稳定 MD，以在相干性和严格相干性设置中确保收敛？
RQ3在随机鞍点问题中，OMD 的收敛保障是什么？
RQ4理论收益是否转化为在 GAN 训练及其他非凸双线性或多模态情景中的实际改进？
RQ5相干性属性（严格 vs 空）如何影响 MD 与 OMD 的行为？

主要发现

在空相干（如双线性）问题中，普通 MD 可能无法收敛或发生循环，即使问题存在唯一解。
带有额外梯度步的 OMD 在所有相干问题上都能保证收敛，包括空相干问题，并使到解的 Bregman 距离单调减小。
在严格相干问题中，OMD 在随机设置下几乎必然收敛到鞍点，且到解的 Bregman 距离单调下降。
对双线性问题，OMD 保证单调收敛，而普通 MD 可能发散。
实验表明，在 Adam 或 RMSProp 中加入额外梯度步可减少 GAN 的循环和振荡，并在 CelebA 和 CIFAR-10 上提高 Inception 分数和 Fréchet 距离。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。