QUICK REVIEW

[论文解读] Accelerated, Parallel and Proximal Coordinate Descent

Olivier Fercoq, Peter Richtárik|arXiv (Cornell University)|Dec 20, 2013

Sparse and Compressive Sensing Techniques参考文献 13被引用 44

一句话总结

本文提出了 APPROX，这是首个同时具备加速、并行与近端更新特性的随机坐标下降方法，实现了 O(1/k²) 的收敛速率。它基于期望可分上界（ESO）提出了新颖的安全大步长，无需全向量运算即可实现更快收敛，显著提升了大规模稀疏结构凸优化问题的性能。

ABSTRACT

We propose a new stochastic coordinate descent method for minimizing the sum of convex functions each of which depends on a small number of coordinates only. Our method (APPROX) is simultaneously Accelerated, Parallel and PROXimal; this is the first time such a method is proposed. In the special case when the number of processors is equal to the number of coordinates, the method converges at the rate $2\barω\bar{L} R^2/(k+1)^2 $, where $k$ is the iteration counter, $\barω$ is an average degree of separability of the loss function, $\bar{L}$ is the average of Lipschitz constants associated with the coordinates and individual functions in the sum, and $R$ is the distance of the initial point from the minimizer. We show that the method can be implemented without the need to perform full-dimensional vector operations, which is the major bottleneck of existing accelerated coordinate descent methods. The fact that the method depends on the average degree of separability, and not on the maximum degree of separability, can be attributed to the use of new safe large stepsizes, leading to improved expected separable overapproximation (ESO). These are of independent interest and can be utilized in all existing parallel stochastic coordinate descent algorithms based on the concept of ESO.

研究动机与目标

开发一种同时结合加速、并行与近端更新的坐标下降方法，用于大规模凸优化。
解决现有方法缺乏上述三种特性中的一项或多项的局限性，特别是并行设置下缺乏加速近端方法的问题。
设计比以往方法更大且更安全的步长，利用平均可分度而非最大可分度。
消除对全维向量运算的需求，这些运算在加速方法中是主要瓶颈。
在 kddb 和恶意 URL 等真实数据集上，展示其在收敛速度与可扩展性方面的优越表现。

提出的方法

提出 APPROX，一种集成 Nesterov 风格加速、并行更新与块可分正则化器近端算子的随机坐标下降算法。
引入一种新的期望可分上界（ESO）框架，基于平均可分度（ω̄）而非最大可分度（ω），实现更大且更安全的步长。
设计一种包含辅助变量（x, y, u, z）的三阶段更新机制，实现在无需全向量运算的前提下引入动量与近端步骤。
采用非均匀概率的随机块选择策略，以平衡收敛速度与计算成本。
通过利用稀疏性与可分性，以一种避免昂贵全向量运算的方式实现该方法，使其可扩展至大规模问题。
采用类似线搜索的步长选择策略，确保收敛性的同时最大化每次迭代的进展。

实验结果

研究问题

RQ1能否设计一种同时具备加速、并行与近端更新特性的坐标下降方法，实现 O(1/k²) 的收敛速率？
RQ2通过使用平均上界而非最坏情况边界，能否推导出显著大于以往并行方法的步长？
RQ3是否可能在不进行全向量运算的情况下实现加速方法，从而提升大规模问题的可扩展性？
RQ4在真实世界稀疏数据集上，该方法与非加速或非并行的替代方法相比，实际表现如何？
RQ5使用平均可分度（ω̄）而非最大可分度（ω）是否能在实际中实现可证明的更快收敛？

主要发现

APPROX 实现了 O(1/k²) 的收敛速率，即使在非强凸情况下，也达到了已知加速方法的最佳收敛速率。
在 kddb 数据集上，尽管 PCDM 每轮计算成本更低，APPROX 在初始迭代后仍优于 PCDM，这是由于其在后期阶段具有更快的收敛速度。
在恶意 URL 数据集上，APPROX 的对偶间隙缩小速度约为 SDCA 的两倍，实现了约 2 倍的收敛时间加速。
该方法的收敛速率为 2ω̄L̄R²/(k+1)²，其中 ω̄ 为平均可分度，L̄ 为平均利普希茨常数，R 为初始到最小值点的距离。
所提出的基于 ESO 的步长允许比以往方法更大且更安全的步长，尤其当 ω̄ ≪ ω 时，可实现更快收敛。
该算法避免了全向量运算，使其在稀疏大规模问题中极为高效，此类运算正是此类问题的主要瓶颈。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。