[论文解读] Inexact and Stochastic Generalized Conditional Gradient with Augmented Lagrangian and Proximal Step
本文提出ICGALP,即CGALP算法的不精确与随机变体,用于求解具有仿射约束的复合凸优化问题。该算法容许在梯度、近端项和线性最小化预言机的计算中存在误差,实现了拉格朗日值几乎必然收敛至最优性,且约束Ax = b几乎必然满足可行性,同时在温和参数条件下,最优性与可行性间隙的遍历收敛率为O(1/(k+1)^0.24)。
In this paper we propose and analyze inexact and stochastic versions of the CGALP algorithm developed in [34], which we denote ICGALP, that allow for errors in the computation of several important quantities. In particular this allows one to compute some gradients, proximal terms, and/or linear minimization oracles in an inexact fashion that facilitates the practical application of the algorithm to computationally intensive settings, e.g., in high (or possibly infinite) dimensional Hilbert spaces commonly found in machine learning problems. The algorithm is able to solve composite minimization problems involving the sum of three convex proper lowersemicontinuous functions subject to an affine constraint of the form Ax = b for some bounded linear operator A. Only one of the functions in the objective is assumed to be differentiable, the other two are assumed to have an accessible proximal operator and a linear minimization oracle. As main results, we show convergence of the Lagrangian values (so-called convergence in the Bregman sense) and asymptotic feasibility of the affine constraint as well as strong convergence of the sequence of dual variables to a solution of the dual problem, in an almost sure sense. Almost sure convergence rates are given for the Lagrangian values and the feasibility gap for the ergodic primal variables. Rates in expectation are given for the Lagrangian values and the feasibility gap subsequentially in the pointwise sense. Numerical experiments verifying the predicted rates of convergence are shown as well.
研究动机与目标
- 开发一种适用于高维或无限维凸优化问题的实用算法,其中精确计算梯度、近端项或线性最小化预言机不可行。
- 将CGALP算法扩展至允许关键组件中存在确定性或随机误差,同时保持收敛性保证。
- 建立拉格朗日值几乎必然收敛至最优值,且仿射约束Ax = b几乎必然满足可行性的理论结果。
- 在不精确与随机设置下,推导最优性间隙与可行性间隙的最坏情况收敛速率。
- 通过在风险最小化与投影问题上进行数值实验,验证理论结果,涵盖不同误差源与批量大小。
提出的方法
- 提出ICGALP,即针对问题min_x {f(x) + g(Tx) + h(x) : Ax = b} 的不精确与随机扩展CGALP算法,其中包含三个凸函数与一个仿射约束。
- 通过增广拉格朗日方法结合近端步骤实现对偶变量的更新,允许对偶迭代几乎必然弱收敛至对偶问题的解。
- 在误差序列满足可 summability 条件的前提下,采用不精确计算∇f、proxβg与线性最小化预言机,使用随机梯度或确定性误差。
- 利用Cesàro平均迭代(即遍历迭代)推导全局收敛速率,确保对噪声与误差具有鲁棒性。
- 应用方差缩减与逐步增大的批量大小,以满足随机误差下所需的summability条件。
- 在不依赖迭代的抽象开环参数序列下建立收敛性,增强实际应用中的灵活性。
实验结果
研究问题
- RQ1是否可将CGALP算法扩展至允许在梯度、近端项与线性最小化预言机的计算中存在不精确或随机误差,同时保持收敛性保证?
- RQ2在何种误差序列与算法参数条件下,可保证拉格朗日值几乎必然收敛至最优值,且仿射约束Ax = b几乎必然满足可行性?
- RQ3在不精确与随机设置下,最优性间隙与可行性间隙的最坏情况收敛速率为何?与精确情况相比如何?
- RQ4在实际中,如何通过方差缩减或增大批量大小来满足随机误差下所需的summability条件?
- RQ5该不精确变体是否保持与原始CGALP算法相同的参数依赖收敛速率?
主要发现
- 即使在关键组件的计算中存在不精确或随机误差,该算法仍能实现拉格朗日值几乎必然收敛至最优值。
- 原始迭代几乎必然渐近满足仿射约束Ax = b,确保极限状态下的可行性。
- 在相同误差条件下,对偶迭代几乎必然弱收敛至对偶问题的解。
- 建立了最优性间隙与可行性间隙的遍历收敛率均为O(1/(k+1)^0.24),与原始CGALP算法的收敛率一致。
- 数值实验验证了在确定性扫描与随机方差缩减方法下,不同批量大小的预测收敛速率。
- 该框架支持实际中的误差源,如随机梯度与确定性误差,当误差序列满足summability条件时,收敛性仍可保持。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。