QUICK REVIEW

[论文解读] Stochastic Zeroth-order Optimization via Variance Reduction method

Liu Liu, Minhao Cheng|arXiv (Cornell University)|May 30, 2018

Stochastic Gradient Optimization Techniques参考文献 17被引用 18

一句话总结

该论文提出SZVR-G，一种新颖的随机零阶优化方法，通过高斯平滑实现方差减少，以解决高维黑箱优化问题。通过在采样和搜索方向上同时减少方差，其在维度 $d$ 上实现次线性依赖，将查询复杂度提升至 $O(d^{5/3}B^{1/3}/\varepsilon^{11/3})$，在光滑与非光滑设置下均严格优于先前方法如RGF和RSG。

ABSTRACT

Derivative-free optimization has become an important technique used in machine learning for optimizing black-box models. To conduct updates without explicitly computing gradient, most current approaches iteratively sample a random search direction from Gaussian distribution and compute the estimated gradient along that direction. However, due to the variance in the search direction, the convergence rates and query complexities of existing methods suffer from a factor of $d$, where $d$ is the problem dimension. In this paper, we introduce a novel Stochastic Zeroth-order method with Variance Reduction under Gaussian smoothing (SZVR-G) and establish the complexity for optimizing non-convex problems. With variance reduction on both sample space and search space, the complexity of our algorithm is sublinear to $d$ and is strictly better than current approaches, in both smooth and non-smooth cases. Moreover, we extend the proposed method to the mini-batch version. Our experimental results demonstrate the superior performance of the proposed method over existing derivative-free optimization techniques. Furthermore, we successfully apply our method to conduct a universal black-box attack to deep neural networks and present some interesting results.

研究动机与目标

解决现有无导数优化方法因随机搜索方向中 $d$-相关方差导致的高查询复杂度问题。
为随机零阶优化开发一种在样本空间与搜索方向空间上均适用的方差减少框架。
在非凸优化中实现收敛速率与查询复杂度在维度 $d$ 上次线性增长的改进。
将方法扩展至小批量设置，使查询复杂度相对于批量大小 $B$ 呈次线性增长。
通过在深度神经网络上实现通用黑箱对抗性攻击，展示方法的实际应用价值。

提出的方法

引入两级方差减少机制：每轮训练中固定一组高斯随机向量以估计平均梯度方向，从而降低搜索方向上的方差。
使用随机零阶 oracle (SZO) 通过有限差分沿随机方向估计梯度：$ G_\mu(x,u,\xi) = \frac{F(x+\mu u,\xi) - F(x,\xi)}{\mu} u $。
将一阶优化中的方差减少技术（如SVRG）引入零阶设置，通过跨迭代复用梯度估计。
设计外层循环，定期在一组 $D$ 个高斯向量上重新计算平均梯度；内层循环则从该集合中采样以计算更新。
引入小批量变体，每轮迭代处理多个样本，且查询复杂度在批量大小 $B$ 上呈次线性增长。
推导出步长 $\eta$、平滑参数 $\mu$ 与迭代次数 $K$ 的理论边界，确保收敛至 $\|\nabla f(x)\|^2 \leq \varepsilon^2$。

实验结果

研究问题

RQ1能否将一阶优化中的方差减少技术适配至零阶随机优化，以减少对 $d$ 的依赖？
RQ2在高维零阶优化中，每轮最优的高斯向量集合大小 $D$ 是多少，可使查询复杂度最小化？
RQ3在光滑与非光滑非凸问题中，所提方法与RGF和RSG相比，其查询复杂度如何？
RQ4该方法能否有效扩展至小批量设置，且在批量大小上实现次线性复杂度增长？
RQ5由于查询成本降低，该方法是否能更高效地实现对深度神经网络的黑箱对抗性攻击？

主要发现

所提SZVR-G方法的查询复杂度为 $O(d^{5/3}B^{1/3}/\varepsilon^{11/3})$，在 $d$ 上严格次线性，优于RGF与RSG。
通过在样本空间与搜索方向空间上同时应用方差减少，显著降低了收敛速率中的 $d$ 因子。
对于小批量变体，查询复杂度在批量大小 $B$ 上呈次线性增长，而RGF与RSG则呈线性增长。
理论分析证实，当步长 $\eta = O(\varepsilon^{5/3}/(d^{5/3}B^{1/3}))$ 且平滑参数 $\mu \leq O(\varepsilon/(L_0 d^{1/2}))$ 时，可收敛至 $\|\nabla f(x)\|^2 \leq \varepsilon^2$。
实验结果表明，在逻辑回归任务上性能更优，并成功应用于深度神经网络的通用黑箱对抗攻击，且查询次数更少。
该方法展现出更好的可并行性，因为增大小批量大小可减少总迭代次数，同时使查询成本呈次线性增长。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。