QUICK REVIEW

[论文解读] Asynchronous Accelerated Proximal Stochastic Gradient for Strongly Convex Distributed Finite Sums

Hadrien Hendrikx, Francis Bach|arXiv (Cornell University)|Jan 28, 2019

Stochastic Gradient Optimization Techniques参考文献 37被引用 22

一句话总结

该论文提出 ADFS，一种用于在节点网络上最小化强凸有限和的去中心化、异步且加速的随机梯度方法。它实现了线性收敛，相较于批量方法具有最优的 $O(\sqrt{m})$ 速度提升，并达到单机有限和优化的最佳已知速率，同时在通信开销较低的分布式环境中实现高效扩展。

ABSTRACT

In this work, we study the problem of minimizing the sum of strongly convex functions split over a network of $n$ nodes. We propose the decentralized and asynchronous algorithm ADFS to tackle the case when local functions are themselves finite sums with $m$ components. ADFS converges linearly when local functions are smooth, and matches the rates of the best known finite sum algorithms when executed on a single machine. On several machines, ADFS enjoys a $O (\sqrt{n})$ or $O(n)$ speed-up depending on the leading complexity term as long as the diameter of the network is not too big with respect to $m$. This also leads to a $\sqrt{m}$ speed-up over state-of-the-art distributed batch methods, which is the expected speed-up for finite sum algorithms. In terms of communication times and network parameters, ADFS scales as well as optimal distributed batch algorithms. As a side contribution, we give a generalized version of the accelerated proximal coordinate gradient algorithm using arbitrary sampling that we apply to a well-chosen dual problem to derive ADFS. Yet, ADFS uses primal proximal updates that only require solving one-dimensional problems for many standard machine learning applications. Finally, ADFS can be formulated for non-smooth objectives with equally good scaling properties. We illustrate the improvement of ADFS over state-of-the-art approaches with simulations.

研究动机与目标

解决在高数据量下，将强凸函数的和分布于网络节点上的最小化挑战。
通过支持异步、去中心化和加速更新，弥合随机优化与分布式优化之间的差距。
在分布式环境中实现与单机有限和算法相当的最优收敛速率，同时保持高效扩展。
提供一种在中等直径网络中保持强收敛保证和通信效率的方法。

提出的方法

ADFS 使用一种具有任意采样的加速近端坐标梯度算法，应用于对偶问题以推导原始更新。
它采用异步、去中心化的更新机制，节点仅与邻居通信，避免中心化服务器的瓶颈。
对于许多标准机器学习问题，该算法使用一维近端更新，从而降低每次迭代的计算成本。
通过根据网络和计算参数动态调整计算与通信步骤的比例，实现计算与通信的平衡。
收敛速率通过在任意采样下对加速近端方法的广义分析推导得出，包含对谱间隙和混合时间的界。
该方法同时适用于光滑和非光滑目标函数，在两种情况下均保持强可扩展性。

实验结果

研究问题

RQ1我们能否设计一种去中心化、异步且加速的随机梯度方法，使其在强凸有限和上实现线性收敛？
RQ2ADFS 在分布式环境中是否实现了相对于批量方法的最优 $O(\sqrt{m})$ 速度提升？
RQ3ADFS 在网络直径、通信延迟 $\tau$ 和混合时间 $\gamma^{-1}$ 方面的扩展性能如何？
RQ4在存在异步性和部分更新的情况下，该算法能否保持快速收敛和低通信开销？
RQ5该方法是否适用于非光滑目标函数，同时保持有利的收敛性和可扩展性？

主要发现

ADFS 对于光滑的强凸局部函数实现线性收敛，并达到最佳已知的单机有限和优化速率。
在多台机器上，ADFS 的速度提升为 $O(\sqrt{n})$ 或 $O(n)$，具体取决于主导复杂度项，前提是网络直径相对于 $m$ 不是过大。
ADFS 相对于最先进的分布式批量方法实现了 $\sqrt{m}$ 的速度提升，这正是有限和算法应具备的预期速度提升。
在通信次数和网络参数方面，该算法的扩展性能与最优的分布式批量算法相当。
对于非光滑目标函数，ADFS 在可扩展性方面保持与光滑情况相同的有利特性。
仿真结果表明，ADFS 在收敛速度和可扩展性方面优于现有最先进方法。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。