QUICK REVIEW

[论文解读] Improved Iteration Complexity Bounds of Cyclic Block Coordinate Descent for Convex Problems

Ruoyu Sun, Mingyi Hong|arXiv (Cornell University)|Dec 15, 2015

Sparse and Compressive Sensing Techniques参考文献 17被引用 21

一句话总结

本文改进了凸问题中循环块坐标下降（BCD）的迭代复杂度界，表明对于一类二次非光滑问题，BCD及其近端变体（BCPG）的复杂度界与梯度下降（GD）相当，仅相差 log²(K) 因子，消除了此前存在的 K 倍差距。该分析适用于循环和随机排列的 BCD，提供了更紧的收敛保证，且不依赖于固定的更新顺序。

ABSTRACT

The iteration complexity of the block-coordinate descent (BCD) type algorithm has been under extensive investigation. It was recently shown that for convex problems the classical cyclic BCGD (block coordinate gradient descent) achieves an $\\mathcal{O}(1/r)$ complexity ($r$ is the number of passes of all blocks). However, such bounds are at least linearly depend on $K$ (the number of variable blocks), and are at least $K$ times worse than those of the gradient descent (GD) and proximal gradient (PG) methods. In this paper, we aim to close such theoretical performance gap between cyclic BCD and GD/PG. First we show that for a family of quadratic nonsmooth problems, the complexity bounds for cyclic Block Coordinate Proximal Gradient (BCPG), a popular variant of BCD, can match those of the GD/PG in terms of dependency on $K$ (up to a $\\log^2(K)$ factor). For the same family of problems, we also improve the bounds of the classical BCD (with exact block minimization) by an order of $K$. Second, we establish an improved complexity bound of Coordinate Gradient Descent (CGD) for general convex problems which can match that of GD in certain scenarios. Our bounds are sharper than the known bounds as they are always at least $K$ times worse than GD. Our analyses do not depend on the update order of block variables inside each cycle, thus our results also apply to BCD methods with random permutation (random sampling without replacement, another popular variant).

研究动机与目标

为弥合循环块坐标下降（BCD）与梯度下降（GD）/近端梯度（PG）方法之间的理论性能差距，此前由于复杂度界对 K 的线性依赖，其性能至少差 K 倍。
为一类二次非光滑问题上的循环 BCD 及其近端变体（BCPG）建立更紧的迭代复杂度界，使其复杂度界与 GD/PG 的速率仅相差 log²(K) 因子。
为一般凸问题中的循环坐标梯度下降（CGD）推导一个通用的元复杂度界，表明在特定条件下其复杂度可匹配 GD 的速率。
证明改进后的复杂度界不依赖于块的更新顺序，因此可推广至无放回的随机块选择（随机排列 BCD）。
通过构造一个紧的下界示例，证明先前复杂度界中 K 倍的差距在一般情况下不可避免，从而验证了新分析的紧致性。

提出的方法

提出一种新颖的分析框架，用于分析二次非光滑问题上的循环 BCD 和 BCPG，利用 Hessian 矩阵的谱性质及分块收敛行为。
引入一个通用的元迭代复杂度界，用于一般凸问题中的 CGD，其表达式基于一个‘移动迭代 Hessian’矩阵的谱范数。
通过构造一个特定初始点的紧下界示例，证明先前复杂度界中 K 倍的差距是不可避免的，从而证明新分析的最优性。
将分析同时应用于精确块最小化（经典 BCD）和通过梯度步长实现的近似最小化（BCPG），统一处理了不同 BCD 变体。
推导出不依赖于更新顺序的复杂度界，从而证明其适用于循环和随机排列 BCD 方法。
在下界示例中采用递归更新结构，证明一次迭代后的最优性差距与 K 呈线性关系，验证了新上界紧致性。

实验结果

研究问题

RQ1循环 BCD 的迭代复杂度能否改进至与梯度下降（GD）相当，仅相差关于块数 K 的对数因子？
RQ2先前复杂度界中 BCD 的 K 倍退化是分析松散所致，还是方法本身的固有缺陷？
RQ3相同的改进复杂度界能否推广至非光滑问题的 BCD 近端变体（BCPG）？
RQ4改进后的复杂度界是否同样适用于随机块选择（随机排列）而非仅循环选择？
RQ5CGD 在一般凸问题中的新复杂度界是否比现有界更紧？在二次情况下是否与 GD 的速率一致？

主要发现

对于一类二次非光滑问题，BCPG 的迭代复杂度被改进为 O(1/r)，其对 K 的依赖仅相差 log²(K) 因子，与 GD/PG 的速率一致。
经典循环 BCD（采用精确块最小化）的复杂度界相比先前结果提升了 K 倍，消除了先前的 K 倍差距。
对于一般光滑凸问题，循环 CGD 的元复杂度界被证明比现有界更紧，并且在二次情况下使用步长 1/L 时，其复杂度可匹配 GD 的速率。
改进后的复杂度界对循环和随机排列 BCD 均成立，因为分析不依赖于更新顺序。
构造了一个紧的下界示例，证明一次迭代后的最优性差距至少为 Ω(K) 倍的初始范数平方，从而验证了新上界紧致性。
结果表明，先前复杂度界中 K 倍的差距并非源于分析松散，而是循环 BCD 本身固有的限制，现已被新框架彻底解决。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。