QUICK REVIEW

[论文解读] Structured Nonconvex and Nonsmooth Optimization: Algorithms and Iteration Complexity Analysis

Bo Jiang, Tianyi Lin|arXiv (Cornell University)|May 9, 2016

Sparse and Compressive Sensing Techniques参考文献 48被引用 22

一句话总结

本文提出了一类用于具有块变量和仿射约束的结构化非凸、非光滑优化问题的一阶算法，引入了增广拉格朗日乘子法（ADMM）的近端变体和一种广义条件梯度方法。该文建立了实现 ε-驻点解的 O(1/ε²) 迭代复杂度上界，并通过张量鲁棒主成分分析（tensor robust PCA）的数值验证表明，与块坐标下降法相比，所提方法具有更优的全局收敛性。

ABSTRACT

Nonconvex and nonsmooth optimization problems are frequently encountered in much of statistics, business, science and engineering, but they are not yet widely recognized as a technology in the sense of scalability. A reason for this relatively low degree of popularity is the lack of a well developed system of theory and algorithms to support the applications, as is the case for its convex counterpart. This paper aims to take one step in the direction of disciplined nonconvex and nonsmooth optimization. In particular, we consider in this paper some constrained nonconvex optimization models in block decision variables, with or without coupled affine constraints. In the case of without coupled constraints, we show a sublinear rate of convergence to an $ε$-stationary solution in the form of variational inequality for a generalized conditional gradient method, where the convergence rate is shown to be dependent on the Hölderian continuity of the gradient of the smooth part of the objective. For the model with coupled affine constraints, we introduce corresponding $ε$-stationarity conditions, and apply two proximal-type variants of the ADMM to solve such a model, assuming the proximal ADMM updates can be implemented for all the block variables except for the last block, for which either a gradient step or a majorization-minimization step is implemented. We show an iteration complexity bound of $O(1/ε^2)$ to reach an $ε$-stationary solution for both algorithms. Moreover, we show that the same iteration complexity of a proximal BCD method follows immediately. Numerical results are provided to illustrate the efficacy of the proposed algorithms for tensor robust PCA.

研究动机与目标

为实际应用中可扩展的非凸与非光滑优化缺乏系统理论与算法框架的问题提供解决方案。
为具有块变量和仿射耦合约束的结构化非凸优化问题，开发具有可证明收敛速率的一阶算法。
在 Hölder 连续梯度和近端更新假设下，建立实现 ε-驻点解的迭代复杂度上界。
在张量鲁棒主成分分析上验证所提算法的有效性，比较其收敛行为与全局解质量。

提出的方法

针对无耦合约束的非凸问题，提出一种广义条件梯度方法，在 Hölder 连续梯度条件下实现子线性收敛至 ε-驻点解。
针对具有耦合仿射约束的问题，提出两种近端型 ADMM 变体，其中仅最后一个块变量使用梯度或极大化-最小化步骤。
应用一种近端块坐标下降（BCD）方法，表明其与 ADMM 变体具有相同的 O(1/ε²) 迭代复杂度。
为具有仿射约束的非凸问题定义 ε-驻点条件，从而在非凸性与非光滑性下实现收敛性分析。
在 ADMM 中对除最后一个块变量外的所有变量使用近端更新，确保可实施性的同时保持收敛性保证。
采用具有光滑、非凸 f 和非光滑、非凸 r_i 的结构化优化模型，包含仿射耦合约束以及对各块的凸集约束。

实验结果

研究问题

RQ1对于具有耦合仿射约束的结构化非凸与非光滑优化问题，一阶方法的迭代复杂度是多少？
RQ2当仅最后一个块变量使用梯度或极大化-最小化步骤时，近端 ADMM 变体能否实现收敛至 ε-驻点解？
RQ3广义条件梯度方法的收敛速率如何依赖于目标函数光滑部分梯度的 Hölder 连续性？
RQ4在张量鲁棒主成分分析上，所提算法在收敛速度与全局解质量方面的实际表现如何比较？
RQ5在相同假设下，近端 BCD 方法是否继承与近端 ADMM 变体相同的迭代复杂度？

主要发现

广义条件梯度方法实现对 ε-驻点解的子线性收敛速率，其收敛性依赖于目标函数光滑部分梯度的 Hölder 连续性。
两种近端 ADMM 变体在假设除最后一个块外所有近端更新均可实施的前提下，均实现 O(1/ε²) 的迭代复杂度上界以达到 ε-驻点解。
通过 ADMM 分析可直接推导出，近端 BCD 方法也具有相同的 O(1/ε²) 迭代复杂度。
张量鲁棒主成分分析的数值结果表明，BCD 方法通常收敛至局部解，而 ADMM 及其变体则实现了更优的全局解质量。
当张量分解中允许更大的基维数（R = R_CP + ⌈0.2*R_CP⌉）时，算法实现了更低的相对误差与更快的收敛速度，且近端 BCD 通常比 ADMM 变体需要更少的迭代次数。
在所有测试案例中，相对误差小于 0.01 的解的数量（Num）在 ADMM 及其变体中高于标准 BCD，表明其具有更优的全局收敛性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。