QUICK REVIEW

[论文解读] Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Kevin Scaman, Francis Bach|arXiv (Cornell University)|Jun 1, 2018

Distributed Control Multi-Agent Systems参考文献 11被引用 79

一句话总结

本文在两种正则性假设（全局 Lipschitz 和局部 Lipschitz）下推导了非光滑分布式凸优化的最优收敛速率。它提出 MSPD 作为在局部正则性下的最优去中心化算法，以及在全局正则性下的 DRS，并给出匹配的下界和一个与维度相关的平滑化方法。

ABSTRACT

In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in $O(1/\sqrt{t})$, the structure of the communication network only impacts a second-order term in $O(1/t)$, where $t$ is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.

研究动机与目标

在一个计算单元网络上推动对非光滑凸目标的分布式优化。
在两种正则性假设下推导最优收敛速率：全局 Lipschitz 和局部 Lipschitz。
给出达到这些最优速率的算法：在局部正则性下的 MSPD 以及在全局正则性下的 DRS。
建立下界以显示所提方法的最优性，并讨论通信与计算之间的权衡。

提出的方法

将问题建模为在强连通图上最小化局部凸函数的平均值。
在局部正则性下，将问题表述为对偶变量的鞍点问题，并设计带加速 gossip 的多步 primal-dual (MSPD) 算法以达到最优速率。
在全局正则性下，应用基于高斯平滑的分布式平滑化方法（DRS）以获得快速通信速率并分析其收敛性。
证明下界与局部正则性下的 MSPD 速率相匹配，并在全局正则性下表明 DRS 的最优性在 d^{1/4} 因子内。
将去中心化方法扩展为带切比雪夫加速的形式，以在 MSPD 中达到最优的通信速率。

实验结果

研究问题

RQ1在全局 Lipschitz 正则性下，非光滑分布式优化的最优收敛速率是多少？
RQ2在局部 Lipschitz 正则性下，最优收敛速率是多少，我们是否能设计出达到它们的算法？
RQ3网络拓扑结构和通信如何影响非光滑分布式优化中的速率？
RQ4平滑化技术是否能在分布式 setting 下产生维度相关但近似最优的速率？
RQ5在去中心化非光滑优化中，计算和通信的基本下界是什么？

主要发现

在全局正则性下，DRS 将近似误差控制在 ε 之内的时间上界为 O(RL_g/ε · (Δτ+1) d^{1/4} + (RL_g/ε)^2)。
在局部正则性下，MSPD 是最优的，达到 ε-近似所需的时间界限为 O(RL_ℓ/ε · τ/√γ(W) + (RL_ℓ/ε)^2)。
在局部正则性下，主导误差项来自局部计算，为 O(1/√t)，而通信误差衰减为 O(1/t)。
下界表明 DRS 的计算时间是最优的，并且在通信速率上仅落后于最优解一个 d^{1/4} 因子。
MSPD 通过结合加速 gossip 和原-对偶更新机制实现了最优的收敛速率。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。