QUICK REVIEW

[论文解读] Accelerated Bregman Proximal Gradient Methods for Relatively Smooth Convex Optimization

Filip Hanzely, Peter Richtárik|arXiv (Cornell University)|Aug 9, 2018

Sparse and Compressive Sensing Techniques参考文献 28被引用 28

一句话总结

本文提出了一种用于相对光滑凸优化的加速Bregman近端梯度（ABPG）方法，利用Bregman散度的三角形缩放指数（TSE）实现$O(k^{-\theta})$的收敛速率，其中$ heta \in (0,2]$。研究证明，内在TSE恒为2，从而使得自适应算法在实际中可实现$O(k^{-2})$的收敛速率，并通过数值证书验证，即使理论保证有限。

ABSTRACT

We consider the problem of minimizing the sum of two convex functions: one is differentiable and relatively smooth with respect to a reference convex function, and the other can be nondifferentiable but simple to optimize. We investigate a triangle scaling property of the Bregman distance generated by the reference convex function and present accelerated Bregman proximal gradient (ABPG) methods that attain an $O(k^{-γ})$ convergence rate, where $γ\in(0,2]$ is the triangle scaling exponent (TSE) of the Bregman distance. For the Euclidean distance, we have $γ=2$ and recover the convergence rate of Nesterov's accelerated gradient methods. For non-Euclidean Bregman distances, the TSE can be much smaller (say $γ\leq 1$), but we show that a relaxed definition of intrinsic TSE is always equal to 2. We exploit the intrinsic TSE to develop adaptive ABPG methods that converge much faster in practice. Although theoretical guarantees on a fast convergence rate seem to be out of reach in general, our methods obtain empirical $O(k^{-2})$ rates in numerical experiments on several applications and provide posterior numerical certificates for the fast rates.

研究动机与目标

开发适用于相对于参考Bregman散度为相对光滑的凸优化问题的加速一阶方法。
利用Bregman距离的三角形缩放指数（TSE）表征这些方法的收敛速率。
引入内在TSE概念，其恒等于2，从而支持自适应加速策略。
设计自适应ABPG方法，实现在理论$O(k^{-2})$速率无法证明时仍能实现快速的实验收敛速率。
通过实际实现中增益$G_k$的几何平均值$\overline{G}_k$提供$O(k^{-2})$收敛的数值证书。

提出的方法

将三角形缩放指数（TSE）$\gamma$引入为Bregman距离增长速率的度量，定义$\gamma \in (0,2]$。
提出加速Bregman近端梯度（ABPG）方法，在$\gamma$-缩放条件下实现$O(k^{-\gamma})$的收敛速率。
将内在TSE定义为所有满足三角形缩放不等式的$\gamma$的上确界，并证明其恒等于2。
开发自适应ABPG变体（如ABPG-g、ABPG-e），根据观测到的增益$G_k$动态调整加速参数$\gamma_k$。
使用增益$G_k$的几何平均值$\overline{G}_k$作为$O(k^{-2})$收敛的后验数值证书。
在近端子问题中使用Bregman散度$D_h(x,y)$作为接近度量，替代标准方法中的欧几里得范数。

实验结果

研究问题

RQ1当Bregman距离的三角形缩放指数较小时，加速Bregman近端梯度方法能否实现$O(k^{-\gamma})$的收敛速率，其中$\gamma < 2$？
RQ2是否存在一种跨不同Bregman散度的通用加速度潜力度量，而不受其各自TSE值的影响？
RQ3能否通过实时估计TSE的自适应策略，实现比固定参数方法更快的实际收敛速度？
RQ4在理论无法保证$O(k^{-2})$收敛的非欧几里得设置中，哪些数值指标可作为可靠的$O(k^{-2})$收敛证书？
RQ5ABPG方法在非欧几里得、相对光滑问题中的性能与收敛行为，与标准BPG和BPG-LS相比如何？

主要发现

内在三角形缩放指数（TSE）恒等于2，为加速提供了普适基础，与底层Bregman散度无关。
自适应ABPG方法（如ABPG-g和ABPG-e）在数值实验中实现了经验上的$O(k^{-2})$收敛速率，即使理论保证不可靠。
增益$G_k$的几何平均值$\overline{G}_k$在整个迭代过程中保持较小（例如$G_k \ll 1$），作为$O(k^{-2})$收敛的数值证书。
在D-最优设计问题中，采用$\gamma=2$的ABPG实现的收敛速率与Nesterov加速方法相当或更优。
在相对熵非负回归中，采用重启和自适应$\gamma_k$的ABPG优于标准BPG和BPG-LS，尤其在高维设置下表现更优。
在ABPG-e中，当$\gamma_k$从3降至2.8时，有效增益$\widehat{G}_k$显著下降，表明收敛行为与恒为2的内在TSE一致。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。