QUICK REVIEW

[论文解读] ZO-AdaMM: Zeroth-Order Adaptive Momentum Method for Black-Box Optimization

Xiangyi Chen, Sijia Liu|arXiv (Cornell University)|Oct 15, 2019

Stochastic Gradient Optimization Techniques被引用 34

一句话总结

ZO-AdaMM 将自适应动量方法扩展到零阶梯度无关优化，分析马哈拉斯距离投影以实现收敛，并在 ImageNet 的黑盒对抗攻击中相较六种最先进的零阶方法展示出更快的收敛。

ABSTRACT

The adaptive momentum method (AdaMM), which uses past gradients to update descent directions and learning rates simultaneously, has become one of the most popular first-order optimization methods for solving machine learning problems. However, AdaMM is not suited for solving black-box optimization problems, where explicit gradient forms are difficult or infeasible to obtain. In this paper, we propose a zeroth-order AdaMM (ZO-AdaMM) algorithm, that generalizes AdaMM to the gradient-free regime. We show that the convergence rate of ZO-AdaMM for both convex and nonconvex optimization is roughly a factor of $O(\sqrt{d})$ worse than that of the first-order AdaMM algorithm, where $d$ is problem size. In particular, we provide a deep understanding on why Mahalanobis distance matters in convergence of ZO-AdaMM and other AdaMM-type methods. As a byproduct, our analysis makes the first step toward understanding adaptive learning rate methods for nonconvex constrained optimization. Furthermore, we demonstrate two applications, designing per-image and universal adversarial attacks from black-box neural networks, respectively. We perform extensive experiments on ImageNet and empirically show that ZO-AdaMM converges much faster to a solution of high accuracy compared with $6$ state-of-the-art ZO optimization methods.

研究动机与目标

将 AdaMM 扩展到零阶（无梯梯度）约束优化的范畴。
给出带有马哈拉斯距离投影的非凸和约束情形的收敛性分析。
量化维数 d 如何影响收敛性并与最先进的零阶方法进行比较。
在 ImageNet 的黑盒对抗攻击中展示实际有效性。

提出的方法

使用沿随机单位方向的前向差分定义零阶梯度估计量。
将估计量整合到带动量和自适应学习率的 AdaMM 框架中（AMSGrad 型更新）。
使用基于马哈拉斯距离的投影到可行集，以确保在约束情形下的收敛。
引入基于马哈拉斯距离的梯度映射作为收敛度量，连接到变换后的坐标。
给出非凸无约束和有约束情形的理论收敛结果，并讨论方差降维估计量以处理有约束问题中的投影偏差。
在 ImageNet 的黑盒对抗攻击任务中，将 ZO-AdaMM 与六种最先进的零阶方法进行比较。

实验结果

研究问题

RQ1如何将自适应动量方法推广到零阶（无梯度）优化？
RQ2在带约束的情况下，基于马哈拉斯距离的投影在 ZO-AdaMM 的收敛性中扮演什么角色？
RQ3在无约束与有约束的非凸优化中，ZO-AdaMM 的收敛速率是多少，以及它们如何随问题维度 d 变化？
RQ4在实际黑盒问题如对抗攻击中，ZO-AdaMM 与现有零阶方法相比的表现如何？

主要发现

ZO-AdaMM 在非凸情形下的收敛速率大致比一阶 AdaMM 慢 O(sqrt(d))，强调了维度相关的减速。
基于马哈拉斯距离的投影对于收敛是必要的；欧氏投影在有约束问题中可能导致非收敛。
在恰当的参数选择下，ZO-AdaMM 获得非凸收敛性保证，并以受控方式随维度 d 扩展。
在 ImageNet 的黑盒对抗攻击中，ZO-AdaMM 收敛得更快，达到高精度解，并在逐图像和通用扰动任务中产生比六种竞争零阶方法更小的扰动。
该分析引入了基于马哈拉斯距离的收敛度量，与等效变换后的（y 坐标）梯度下降视图相连，帮助非凸有约束分析。
方差降低方法还可以进一步减轻约束情况下零阶优化中的投影偏差。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。