QUICK REVIEW

[论文解读] Query-Efficient Hard-label Black-box Attack:An Optimization-based Approach

Minhao Cheng, Thong Le|arXiv (Cornell University)|Jul 12, 2018

Adversarial Robustness in Machine Learning参考文献 14被引用 104

一句话总结

本文将硬标签黑盒攻击重新表述为一个连续的实值优化问题，并用无梯度方法求解，在CNNs上实现查询高效的对抗样本，甚至在不可微分模型如GBDT上也有效。

ABSTRACT

We study the problem of attacking a machine learning model in the hard-label black-box setting, where no model information is revealed except that the attacker can make queries to probe the corresponding hard-label decisions. This is a very challenging problem since the direct extension of state-of-the-art white-box attacks (e.g., CW or PGD) to the hard-label black-box setting will require minimizing a non-continuous step function, which is combinatorial and cannot be solved by a gradient-based optimizer. The only current approach is based on random walk on the boundary, which requires lots of queries and lacks convergence guarantees. We propose a novel way to formulate the hard-label black-box attack as a real-valued optimization problem which is usually continuous and can be solved by any zeroth order optimization algorithm. For example, using the Randomized Gradient-Free method, we are able to bound the number of iterations needed for our algorithm to achieve stationary points. We demonstrate that our proposed method outperforms the previous random walk approach to attacking convolutional neural networks on MNIST, CIFAR, and ImageNet datasets. More interestingly, we show that the proposed algorithm can also be used to attack other discrete and non-continuous machine learning models, such as Gradient Boosting Decision Trees (GBDT).

研究动机与目标

Motivate and formalize attacking models when only hard-label decisions are observable in a black-box setting.
Propose a continuous real-valued reformulation of the hard-label attack objective to enable zeroth-order optimization.
Develop a Randomized Gradient-Free (RGF) method to solve the reformulated problem with convergence guarantees.
Demonstrate effectiveness and query efficiency on MNIST, CIFAR-10, ImageNet, and Gradient Boosting Decision Trees (GBDT).

提出的方法

Formulate untargeted and targeted hard-label attacks as minimizing the distance to the original input along a search direction, using a continuous function g(theta).
Compute g(theta) via boundary-search (fine-grained search followed by binary search) using only hard-label queries.
Solve min_theta g(theta) with Randomized Gradient-Free (RGF): estimate gradient from finite-difference of g at theta and theta+beta u, then update theta with a step in the negative estimated gradient.
Estimate gradients using multiple Gaussian directions to reduce noise, and apply backtracking line search for stability.
Provide convergence guarantees: with Lipschitz gradient and controlled evaluation error epsilon, iteration complexity is O(d/delta^2).
Extend the framework to non-differentiable/discrete models (e.g., GBDT) and demonstrate query-efficiency against prior decision-based attacks.

实验结果

研究问题

RQ1Can hard-label black-box attacks be effectively formulated as a real-valued optimization problem amenable to zeroth-order methods?
RQ2What are the convergence guarantees and query complexities when using a randomized gradient-free approach under hard-label constraints?
RQ3How does the proposed method perform in terms of distortion and query count compared to existing decision-based black-box attacks on CNNs across MNIST, CIFAR-10, and ImageNet?
RQ4Is the approach applicable to non-differentiable models such as Gradient Boosting Decision Trees (GBDT) and what adversarial distortions are achievable?

主要发现

The proposed g(theta) based reformulation yields a continuous objective suitable for zeroth-order optimization in hard-label black-box settings.
RGF with boundary-based g evaluations achieves adversarial examples with fewer queries than the state-of-the-art decision-based attack across MNIST, CIFAR-10, and ImageNet.
The method attains comparable or better distortion with substantially fewer queries than prior black-box approaches for untargeted attacks and often faster convergence for targeted attacks.
The approach successfully attacks non-differentiable models like Gradient Boosting Decision Trees (GBDT) using only hard-label queries.
Theoretical results show convergence to stationary points under Lipschitz gradient assumptions and controlled evaluation precision, with O(d/δ^2) iterations for desired accuracy.
Empirical results demonstrate effectiveness on CNNs and GBDTs, highlighting robustness concerns for widely used models.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。