QUICK REVIEW

[论文解读] A Decentralized Parallel Algorithm for Training Generative Adversarial Nets

Mingrui Liu, Wei Zhang|arXiv (Cornell University)|Oct 28, 2019

Human Pose and Action Recognition参考文献 101被引用 34

一句话总结

本文提出一种基于梯度的去中心化并行算法（DPOSG），用于训练 GAN，能够处理非凸-非凹极值问题，具有可证明的非渐近收敛性并在中心化训练上显示经验加速。

ABSTRACT

Generative Adversarial Networks (GANs) are a powerful class of generative models in the deep learning community. Current practice on large-scale GAN training utilizes large models and distributed large-batch training strategies, and is implemented on deep learning frameworks (e.g., TensorFlow, PyTorch, etc.) designed in a centralized manner. In the centralized network topology, every worker needs to either directly communicate with the central node or indirectly communicate with all other workers in every iteration. However, when the network bandwidth is low or network latency is high, the performance would be significantly degraded. Despite recent progress on decentralized algorithms for training deep neural networks, it remains unclear whether it is possible to train GANs in a decentralized manner. The main difficulty lies at handling the nonconvex-nonconcave min-max optimization and the decentralized communication simultaneously. In this paper, we address this difficulty by designing the extbf{first gradient-based decentralized parallel algorithm} which allows workers to have multiple rounds of communications in one iteration and to update the discriminator and generator simultaneously, and this design makes it amenable for the convergence analysis of the proposed decentralized algorithm. Theoretically, our proposed decentralized algorithm is able to solve a class of non-convex non-concave min-max problems with provable non-asymptotic convergence to first-order stationary point. Experimental results on GANs demonstrate the effectiveness of the proposed algorithm.

研究动机与目标

动机并解决在大规模、带宽受限或高时延网络中中心化 GAN 训练的瓶颈。
提出一种基于梯度的去中心化并行算法，用于非凸-非凹极值的 GAN 最小-最大化问题。
提供非渐近收敛性保证并分析通信效率。
在带去中心化通信的 GAN 基准测试中展示经验加速。
探讨多轮局部通信和同时更新如何促进收敛性与可扩展性。

提出的方法

设计一个带有生成器和判别器同时更新的去中心化并行乐观随机梯度（DPOSG）算法。
在每次迭代中允许多轮局部邻居通信，并使用双随机混合矩阵 W 来建模拓扑结构。
更新规则将乐观随机梯度扩展到具有两个更新序列的去中心化设置，并在 t 轮局部平均。
给出理论分析，在标准假设下达到对 epsilon-一阶驻点的非渐近收敛。
引入随机混合策略以降低有效特征间隙并提升实际性能。
在 WGAN-GP/CIFAR-10 与 Self-Attention GAN/ImageNet 上进行 Adam 变体实验（DP-OAdam、Rand-DP-OAdam），并与中心化 CP-OAdam 进行比较。

实验结果

研究问题

RQ1是否可以在没有中央参数服务器的去中心化网络中有效进行 GAN 训练？
RQ2基于梯度的去中心化算法是否对非凸-非凹极值的 GAN 目标非渐近地收敛到一阶驻点？
RQ3相比于集中式方法，去中心化 GAN 优化的通信和计算复杂度是多少？
RQ4带随机混合的去中心化变体是否在标准 GAN 基准测试中提供经验上的加速？

主要发现

DPOSG 在标准假设下实现对 epsilon-一阶驻点的非渐近收敛。
在某些条件下，算法在最繁忙节点上的通信复杂度呈对数级，前提是有 t 轮局部通信。
实证结果显示在 CIFAR-10（WGAN-GP）和 ImageNet（Self-Attention GAN）上，去中心化变体在墙钟时间上优于中心化训练，并且随节点增加而扩展。
使用随机混合（Rand-DP-OAdam）比 DP-OAdam 进一步加速，在 epoch 方面与或超过中心化优化器的表现，同时降低运行时间。
在高时延云环境中的实验显示去中心化 GAN 训练具有显著的运行时优势。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。