QUICK REVIEW

[论文解读] Perturbing Across the Feature Hierarchy to Improve Standard and Strict Blackbox Attack Transferability

Nathan Inkawhich, Kevin J Liang|arXiv (Cornell University)|Apr 29, 2020

Adversarial Robustness in Machine Learning参考文献 37被引用 30

一句话总结

本文提出一种多层特征空间对抗攻击框架（FDA variants），在 DNN 的中间表示上进行扰动，以实现对抗的业界领先的有目标的黑盒转移性，包括跨分布情景和一个基于查询的扩展。

ABSTRACT

We consider the blackbox transfer-based targeted adversarial attack threat model in the realm of deep neural network (DNN) image classifiers. Rather than focusing on crossing decision boundaries at the output layer of the source model, our method perturbs representations throughout the extracted feature hierarchy to resemble other classes. We design a flexible attack framework that allows for multi-layer perturbations and demonstrates state-of-the-art targeted transfer performance between ImageNet DNNs. We also show the superiority of our feature space methods under a relaxation of the common assumption that the source and target models are trained on the same dataset and label space, in some instances achieving a $10 imes$ increase in targeted success rate relative to other blackbox transfer methods. Finally, we analyze why the proposed methods outperform existing attack strategies and show an extension of the method in the case when limited queries to the blackbox model are allowed.

研究动机与目标

Motivate and develop stronger targeted blackbox transfer attacks for DNN image classifiers.
Move beyond output-layer perturbations by perturbing multi-layer feature representations across the network depth.
Model layer-wise and class-wise feature distributions with auxiliary networks to guide adversarial noise.
Demonstrate state-of-the-art targeted transfer performance on ImageNet models and evaluate cross-distribution transfer.
Explore a query-efficient extension by combining FDA-based priors with a gradient-estimation attack.

提出的方法

Define a set of layers and classes and train auxiliary models g_{l,y} to estimate p(y|f_l(x)) for each layer l and class y.
Extend the FDA objective to multi-layer perturbations and optionally include a cross-entropy term to align the whitebox output with the target class.
Optimize perturbations delta under L_p constraints by maximizing a sum of per-layer target probabilities and feature-disruption terms across chosen layers (FDA(N)); optionally include the +xent extension.
Use PyTorch autograd to assemble the attack and apply iterative projected gradient descent with momentum to produce adversarial examples.
Evaluate standard blackbox transfers on ImageNet models, conduct cross-distribution experiments with restricted datasets (RINet), and test a query-based extension by using FDA-based priors in P-RGF.

实验结果

研究问题

RQ1How does perturbing across multiple intermediate layers affect targeted transferability in blackbox attacks?
RQ2Do multi-layer feature-space attacks outperform single-layer or output-layer attacks in standard and strict blackbox settings?
RQ3Can cross-distribution training data and label space differences be mitigated by feature-space attacks?
RQ4What is the impact of adding a cross-entropy term and multi-layer perturbations on transfer success and disruption in feature space?
RQ5Can a limited-query extension leveraging transfer priors improve query-efficient targeted attacks?

主要发现

Multi-layer FDA attacks substantially improve targeted transfer rates compared to output-layer baselines and single-layer FDA, with gains often exceeding 10 percentage points in tSuc.
Incorporating a cross-entropy term (FDA +xent) yields large gains in targeted success rate across many settings, and it complements multi-layer perturbations.
Increasing the number of layers N in FDA(N) nearly doubles the targeted success rate in many transfers, outperforming ensemble-based approaches.
Across cross-distribution scenarios, FDA-based methods remain superior to TMIM and TMIM+SGM, with notable gains even when whitebox and blackbox datasets or label spaces differ.
A query-based extension using FDA(5) +xent as a prior in P-RGF yields substantial gains in targeted success with limited queries, achieving over 90% tSuc with enough queries.
Distal transfers show FDA(N) +xent markedly higher transferability than baselines, underscoring the strength of feature-hierarchy perturbations.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。