QUICK REVIEW

[论文解读] Improved Zeroth-Order Variance Reduced Algorithms and Analysis for Nonconvex Optimization

Kaiyi Ji, Zhe Wang|arXiv (Cornell University)|Oct 26, 2019

Stochastic Gradient Optimization Techniques参考文献 38被引用 24

一句话总结

本文提出两种改进的零阶方差缩减算法——ZO-SVRG-Coord-Rand 和 ZO-SPIDER-Coord，用于非凸优化，实现了更优的函数查询复杂度与收敛速率。通过使用坐标梯度估计并避免生成高斯随机变量，该方法在保持常数步长的同时，实现了局部 PL 区域的线性收敛，且无需重启，其收敛速度优于 ZO-GD、ZO-SGD 及现有 SVRG/SPIDER 类方法。

ABSTRACT

Two types of zeroth-order stochastic algorithms have recently been designed for nonconvex optimization respectively based on the first-order techniques SVRG and SARAH/SPIDER. This paper addresses several important issues that are still open in these methods. First, all existing SVRG-type zeroth-order algorithms suffer from worse function query complexities than either zeroth-order gradient descent (ZO-GD) or stochastic gradient descent (ZO-SGD). In this paper, we propose a new algorithm ZO-SVRG-Coord-Rand and develop a new analysis for an existing ZO-SVRG-Coord algorithm proposed in Liu et al. 2018b, and show that both ZO-SVRG-Coord-Rand and ZO-SVRG-Coord (under our new analysis) outperform other exiting SVRG-type zeroth-order methods as well as ZO-GD and ZO-SGD. Second, the existing SPIDER-type algorithm SPIDER-SZO (Fang et al. 2018) has superior theoretical performance, but suffers from the generation of a large number of Gaussian random variables as well as a $\\sqrt{\\epsilon}$-level stepsize in practice. In this paper, we develop a new algorithm ZO-SPIDER-Coord, which is free from Gaussian variable generation and allows a large constant stepsize while maintaining the same convergence rate and query complexity, and we further show that ZO-SPIDER-Coord automatically achieves a linear convergence rate as the iterate enters into a local PL region without restart and algorithmic modification.

研究动机与目标

解决现有 SVRG 类零阶算法相比 ZO-GD 和 ZO-SGD 具有较差函数查询复杂度的问题。
在 SPIDER 类方法中消除生成高斯随机变量的需求，同时保持最优收敛速率。
在 SPIDER 类算法中实现常数步长，而不牺牲收敛性能。
在局部 Polyak-Łojasiewicz (PL) 区域实现线性收敛，无需算法重启或修改。
为现有 ZO-SVRG-Coord 提供更紧致的理论分析，进一步提升其查询复杂度与收敛速率。

提出的方法

提出 ZO-SVRG-Coord-Rand，一种基于坐标梯度估计的 ZO-SVRG-Coord 随机变体，实现更优的收敛性能。
为 ZO-SVRG-Coord 开发新的理论分析，实现常数步长与 O(1/K) 的收敛速率，优于先前工作。
提出 ZO-SPIDER-Coord，一种新型 SPIDER 类算法，避免生成高斯随机变量并支持常数步长。
采用坐标梯度估计器以降低方差，提升非凸设置下的查询效率。
在分析中使用伸缩求和（telescoping）技巧，以有界梯度范数的期望，确保收敛至驻点。
引入自适应批量大小与周期长度选择策略，以平衡查询复杂度与收敛速度。

实验结果

研究问题

RQ1ZO-SVRG 类算法能否实现优于 ZO-GD 和 ZO-SGD 的函数查询复杂度？
RQ2SPIDER 类零阶方法能否在避免生成高斯随机变量的同时保持最优收敛速率？
RQ3SPIDER 类算法能否在不降低性能的前提下使用常数步长？
RQ4ZO-SPIDER-Coord 是否能在无需重启的情况下实现局部 PL 区域的线性收敛？
RQ5新的理论分析能否改进现有 ZO-SVRG-Coord 的收敛速率与查询复杂度？

主要发现

ZO-SVRG-Coord-Rand 与 ZO-SVRG-Coord 的新分析实现函数查询复杂度为 O(min{dn²/³/ε, d/ε⁵/³})，优于 ZO-GD 与 ZO-SGD。
ZO-SPIDER-Coord 实现与 SPIDER-SZO 相同的收敛速率与查询复杂度，但无需生成高斯随机变量。
ZO-SPIDER-Coord 支持常数步长，而 SPIDER-SZO 在实际中使用 √ε 级别的步长。
ZO-SPIDER-Coord 自动实现局部 PL 区域的线性收敛，无需重启或算法修改。
ZO-SVRG-Coord 的新分析实现 O(1/K) 收敛速率与常数步长，相比先前 SVRG 类方法显著降低查询复杂度。
所提算法实现 O(d min{n, 1/ε} log(1/ε)) 的函数查询复杂度，与现有最先进方法相当或更优。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。