[论文解读] Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors
该论文提出一个带有梯度先验的带臂优化框架,以更高效地执行黑箱对抗攻击,相比现有方法减少查询和失败率。
We study the problem of generating adversarial examples in a black-box setting in which only loss-oracle access to a model is available. We introduce a framework that conceptually unifies much of the existing work on black-box attacks, and we demonstrate that the current state-of-the-art methods are optimal in a natural sense. Despite this optimality, we show how to improve black-box attacks by bringing a new element into the problem: gradient priors. We give a bandit optimization-based algorithm that allows us to seamlessly integrate any such priors, and we explicitly identify and incorporate two examples. The resulting methods use two to four times fewer queries and fail two to five times less often than the current state-of-the-art.
研究动机与目标
- Formalize gradient estimation as the central problem for query-efficient black-box attacks and show least-squares estimation is optimal in general settings.
- Introduce gradient priors (time-dependent and data-dependent) to exploit structure in gradients.
- Develop a bandit optimization framework to integrate priors into gradient estimation for adversarial example generation.
- Demonstrate substantial improvements in query efficiency and reduced failure rates over prior state-of-the-art on ImageNet classifiers.
提出的方法
- Model the gradient estimation task as a bandit problem where actions are gradient estimates and losses measure inner-product alignment with the true gradient.
- Show least-squares gradient estimation is equivalent to NES and is optimal in underdetermined regimes.
- Propose two gradient priors: time-dependent (gradient correlation along optimization path) and data-dependent (spatial gradient similarity via tiling).
- Use a two-query spherical gradient estimator within a bandit update (A updates v_t; g_t = projection of v_t).
- Translate the bandit-optimized gradient into an iterative adversarial attack by updating inputs with projected gradient steps and projecting back to the perturbation set.
实验结果
研究问题
- RQ1Can gradient estimation for black-box adversarial attacks be made more efficient by exploiting priors?
- RQ2What priors on gradients (time-dependent, data-dependent) improve query efficiency and reduce failure rates?
- RQ3Is there a principled bandit framework to incorporate gradient priors into black-box adversarial example generation?
- RQ4How do priors affect performance in l2 and l-infinity constrained ImageNet attacks?
主要发现
- Bandits TD (time + data priors) achieves 2–5x reduction in failure rate compared to prior state-of-the-art.
- Bandits methods yield 2–4x fewer queries than NES while maintaining or increasing success rates.
- Under l-infinity and l2 constraints on ImageNet, Bandits TD attains substantially lower average queries per successful attack.
- Two priors (time-correlated gradients along optimization path and data-dependent spatial gradient similarity) substantially improve gradient prediction quality.
- The least-squares gradient estimator remains optimal in the standard setting, but priors enable further gains beyond this baseline.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。