QUICK REVIEW

[论文解读] Stochastic Dual Ascent for Solving Linear Systems

Robert M. Gower, Peter Richtárik|arXiv (Cornell University)|Dec 21, 2015

Stochastic Gradient Optimization Techniques参考文献 70被引用 54

一句话总结

该论文提出随机对偶上升（Stochastic Dual Ascent, SDA），一种新颖的随机算法，通过迭代最大化一个非强凹的对偶二次问题来求解线性系统。SDA 在仅需系统一致性的最弱假设下，实现了期望指数收敛，并统一且改进了已知方法，如随机Kaczmarz方法和坐标下降法，其收敛速率随系统秩的降低而提升。

ABSTRACT

We develop a new randomized iterative algorithm---stochastic dual ascent (SDA)---for finding the projection of a given vector onto the solution space of a linear system. The method is dual in nature: with the dual being a non-strongly concave quadratic maximization problem without constraints. In each iteration of SDA, a dual variable is updated by a carefully chosen point in a subspace spanned by the columns of a random matrix drawn independently from a fixed distribution. The distribution plays the role of a parameter of the method. Our complexity results hold for a wide family of distributions of random matrices, which opens the possibility to fine-tune the stochasticity of the method to particular applications. We prove that primal iterates associated with the dual process converge to the projection exponentially fast in expectation, and give a formula and an insightful lower bound for the convergence rate. We also prove that the same rate applies to dual function values, primal function values and the duality gap. Unlike traditional iterative methods, SDA converges under no additional assumptions on the system (e.g., rank, diagonal dominance) beyond consistency. In fact, our lower bound improves as the rank of the system matrix drops. Many existing randomized methods for linear systems arise as special cases of SDA, including randomized Kaczmarz, randomized Newton, randomized coordinate descent, Gaussian descent, and their variants. In special cases where our method specializes to a known algorithm, we either recover the best known rates, or improve upon them. Finally, we show that the framework can be applied to the distributed average consensus problem to obtain an array of new algorithms. The randomized gossip algorithm arises as a special case.

研究动机与目标

开发一种新型随机迭代方法，用于在对偶空间中求解线性系统，并在弱假设下实现快速收敛。
将现有的随机方法（如随机Kaczmarz、坐标下降和牛顿法）统一到一个统一框架下。
为原始迭代、对偶目标函数值、对偶间隙和残差建立紧致的收敛速率，包括明确的下界。
将该框架扩展至分布式一致性问题，恢复并推广随机泛洪算法。
证明收敛性随系统秩的降低而改善，与传统直觉相反，并通过数值实验验证该结论。

提出的方法

SDA 在对偶空间中通过最大化一个无约束的非强凹二次对偶函数来运行。
在每次迭代中，从固定的分布 $ \rho $ 中独立抽取一个随机矩阵 $ S $，并通过公式 $ y^{k+1} = y^k + S(S^\top A B^{-1} A^\top S)^\top S^\top (b - A(c + B^{-1}A^\top y^k)) $ 更新对偶迭代。
步长 $ \theta^k $ 选择为子问题的最小范数解，以确保在 $ S $ 张成的随机子空间中实现最优进展。
通过仿射变换 $ x^k = c + B^{-1}A^\top y^k $ 恢复原始迭代，将对偶更新与原始解联系起来。
该方法的收敛性在期望下进行分析，收敛速率取决于 $ A^\top A $ 的最小正特征值以及系统的秩。
该框架被证明可推广已知算法：当 $ S $ 为随机坐标向量时，得到随机坐标下降；当 $ S $ 为单位矩阵的随机列子矩阵时，得到随机牛顿法；当 $ S $ 为高斯向量时，得到高斯下降。

实验结果

研究问题

RQ1能否开发一个统一的框架，推广现有的随机迭代方法用于求解线性系统？
RQ2当对偶目标函数非强凹时，基于对偶的随机方法的收敛性保证是什么？
RQ3该方法的收敛速率如何依赖于系统矩阵的秩？是否可能随秩降低而改善？
RQ4该框架能否扩展至分布式优化问题（如平均一致性问题）？是否能恢复已知算法（如随机泛洪）？
RQ5对于此类随机对偶方法，收敛速率的最紧可能下界是什么？

主要发现

在仅需线性系统一致性的假设下，SDA 对原始迭代、对偶目标函数值、原始目标函数值、对偶间隙和残差均实现了期望指数收敛。
收敛速率的下界为 $ 1 - 1/\text{Rank}(A) $，且该下界随 $ A $ 的秩降低而改善，这一结果与直觉相反，但已通过实验验证。
当特化至已知算法时，SDA 在若干情况下实现了最佳已知收敛速率：它恢复了随机Kaczmarz和随机坐标下降，并在某些情况下进一步提升了收敛速率。
对于随机Kaczmarz方法，理论预测并数值验证了即使在秩亏系统中也能收敛，只要 $ A $ 的任意行不全为零。
在数值实验中，经验收敛速率与预测速率 $ \rho = 1 - \frac{\tilde{\nu}_{\text{min}}(A^\top A)}{\|A\|_F^2} $ 非常吻合，尤其在低秩系统中表现更佳。
该框架可推广至分布式一致性问题：随机泛洪算法作为特例出现，其复杂度与边数及图拉普拉斯矩阵最小非零特征值的倒数成比例。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。