QUICK REVIEW

[论文解读] Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization

Meisam Razaviyayn, Mingyi Hong|arXiv (Cornell University)|Jun 13, 2014

Sparse and Compressive Sensing Techniques参考文献 30被引用 58

一句话总结

本文提出了一种用于非光滑非凸优化的并行不精确块坐标下降方法，其中多个块通过目标函数的凸近似同时更新。该方法为循环和随机块选择规则建立了非渐近收敛保证，证明了其在Lasso问题中相较于串行方法的效率提升，尤其在循环选择时表现更优。

ABSTRACT

Consider the problem of minimizing the sum of a smooth (possibly non-convex) and a convex (possibly nonsmooth) function involving a large number of variables. A popular approach to solve this problem is the block coordinate descent (BCD) method whereby at each iteration only one variable block is updated while the remaining variables are held fixed. With the recent advances in the developments of the multi-core parallel processing technology, it is desirable to parallelize the BCD method by allowing multiple blocks to be updated simultaneously at each iteration of the algorithm. In this work, we propose an inexact parallel BCD approach where at each iteration, a subset of the variables is updated in parallel by minimizing convex approximations of the original objective function. We investigate the convergence of this parallel BCD method for both randomized and cyclic variable selection rules. We analyze the asymptotic and non-asymptotic convergence behavior of the algorithm for both convex and non-convex objective functions. The numerical experiments suggest that for a special case of Lasso minimization problem, the cyclic block selection rule can outperform the randomized rule.

研究动机与目标

为机器学习和信号处理中出现的大规模非光滑非凸问题开发一种可扩展的并行优化方法。
通过允许多个变量块同时更新，克服顺序块坐标下降的局限性。
在一般凸近似下，为并行不精确BCD提供收敛保证——包括渐近和非渐近两种。
在并行设置下比较循环与随机块选择规则的性能，特别是在Lasso型问题中。
通过使用常数和递减步长，实现在Lipschitz常数未知情况下的实际部署。

提出的方法

该算法使用连续凸逼近（SCA）对目标函数的光滑部分进行局部逼近，为每个块形成一个凸子问题。
在每次迭代中，通过使用参数化逼近函数的近似梯度更新方式，最小化全目标函数的凸逼近，对一组块进行并行更新。
该方法支持循环和随机块选择规则，数值实验表明在Lasso问题中循环规则优于随机选择。
其采用通用逼近框架，将线性和近端逼近作为特例，相较于先前方法更具灵活性。
通过满足下降条件和对近端梯度范数的界进行收敛性分析，得出非渐近迭代复杂度界。
该算法为同步式，避免了无锁方法中常见的竞争条件，专为高性能多核架构设计。

实验结果

研究问题

RQ1并行不精确块坐标下降方法能否在一般凸近似下，实现非光滑非凸问题的非渐近收敛？
RQ2在并行设置下，循环与随机块选择规则在收敛速度和效率方面有何差异？
RQ3使用一般凸近似（超越线性/近端方法）是否能在实际中带来更优的收敛行为？
RQ4当Lipschitz常数未知时，算法能否在递减步长下保持效率？
RQ5在大规模并行实现中，通信开销和处理器数量对收敛速度有何影响？

主要发现

所提出的并行不精确BCD方法对凸与非凸问题均实现了非渐近收敛，并提供了迭代复杂度界。
在Lasso最小化问题中，循环块选择规则优于随机规则，表明其在结构化问题中具有更优的收敛行为。
在常数和递减步长下均能保证收敛，增强了实际应用中的鲁棒性。
由于并行更新，该算法在收敛速度上优于串行BCD方法，但因通信开销导致加速比为次线性。
使用一般凸近似可实现比线性/近端方法更紧的局部逼近，从而提升收敛效率。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。