[Paper Review] Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization
Introduces ZO-SVRG, a zeroth-order variance-reduced optimization method for nonconvex problems, analyzes its convergence and bias, and demonstrates enhanced performance with two accelerated variants and practical black-box applications.
As application demands for zeroth-order (gradient-free) optimization accelerate, the need for variance reduced and faster converging approaches is also intensifying. This paper addresses these challenges by presenting: a) a comprehensive theoretical analysis of variance reduced zeroth-order (ZO) optimization, b) a novel variance reduced ZO algorithm, called ZO-SVRG, and c) an experimental evaluation of our approach in the context of two compelling applications, black-box chemical material classification and generation of adversarial examples from black-box deep neural network models. Our theoretical analysis uncovers an essential difficulty in the analysis of ZO-SVRG: the unbiased assumption on gradient estimates no longer holds. We prove that compared to its first-order counterpart, ZO-SVRG with a two-point random gradient estimator could suffer an additional error of order $O(1/b)$, where $b$ is the mini-batch size. To mitigate this error, we propose two accelerated versions of ZO-SVRG utilizing variance reduced gradient estimators, which achieve the best rate known for ZO stochastic optimization (in terms of iterations). Our extensive experimental results show that our approaches outperform other state-of-the-art ZO algorithms, and strike a balance between the convergence rate and the function query complexity.
Motivation & Objective
- Motivate variance reduction for zeroth-order (gradient-free) nonconvex optimization.
- Develop ZO-SVRG by blending SVRG with zeroth-order gradient estimators.
- Analyze convergence and error terms introduced by zeroth-order estimates.
- Propose accelerated variants to improve iteration complexity.
- Demonstrate effectiveness on black-box material classification and black-box adversarial attack generation.
Proposed method
- Formulate nonconvex finite-sum problem and adopt two-point zeroth-order gradient estimators.
- Introduce ZO-SVRG by replacing true gradients with blended zeroth-order gradient estimates in SVRG (Algorithm 2).
- Derive second-moment bounds for the blended gradient estimator and identify an O(d/b) error term when b<n.
- Provide convergence results showing E[||∇f(x̄)||^2] ≤ … with explicit terms including sampling and smoothing parameters.
- Propose acceleration via Avg-RandGradEst and CoordGradEst to improve iteration complexity.
- Compare query complexity and convergence across ZO-SVRG, ZO-SVRG-Ave, ZO-SVRG-Coord, ZO-SGD, and ZO-SVRC.
Experimental results
Research questions
- RQ1Can variance reduction techniques be effectively adapted to zeroth-order optimization for nonconvex objectives?
- RQ2What is the impact of using two-point zeroth-order estimators on SVRG-type convergence guarantees?
- RQ3How do accelerated zeroth-order variance-reduced variants compare in iteration and query complexity?
- RQ4Do these methods perform well on real-world black-box problems such as material classification and black-box adversarial attacks?
Key findings
- ZO-SVRG achieves a convergence rate similar to SVRG with an additional O(d/b) error term due to zeroth-order estimation.
- Two accelerated variants, ZO-SVRG-Ave and ZO-SVRG-Coord, can reach best-known ZO convergence bounds in iterations.
- Avg-RandGradEst reduces the O(d/b) error to O(d/(bq)) with a moderate number of directions q, speeding up convergence.
- CoordGradEst offers the fastest iteration rate but requires many function queries, increasing overall cost.
- Empirical results show ZO-SVRG family outperforming ZO-SGD and ZO-SVRC in black-box chemical material classification and black-box DNN adversarial attack tasks.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.