QUICK REVIEW

[论文解读] Discrete Adversarial Attacks and Submodular Optimization with Applications to Text Classification

Lei Qi, Lingfei Wu|arXiv (Cornell University)|Dec 1, 2018

Adversarial Robustness in Machine Learning被引用 57

一句话总结

论文将文本的离散对抗攻击建模为集合函数优化问题，在特定条件下证明了常见神经网络文本分类器的子模性，并开发了基于梯度引导的贪婪式改写方法，结合句子与词语替换，在保持语义的同时提高攻击效果。

ABSTRACT

Adversarial examples are carefully constructed modifications to an input that completely change the output of a classifier but are imperceptible to humans. Despite these successful attacks for continuous data (such as image and audio samples), generating adversarial examples for discrete structures such as text has proven significantly more challenging. In this paper we formulate the attacks with discrete input on a set function as an optimization task. We prove that this set function is submodular for some popular neural network text classifiers under simplifying assumption. This finding guarantees a $1-1/e$ approximation factor for attacks that use the greedy algorithm. Meanwhile, we show how to use the gradient of the attacked classifier to guide the greedy search. Empirical studies with our proposed optimization scheme show significantly improved attack ability and efficiency, on three different text classification tasks over various baselines. We also use a joint sentence and word paraphrasing technique to maintain the original semantics and syntax of the text. This is validated by a human subject evaluation in subjective metrics on the quality and semantic coherence of our generated adversarial text.

研究动机与目标

动机化并形式化对离散文本输入的对抗攻击为集合函数优化问题。
找出在何种条件下攻击目标具备子模性，以实现高效的贪婪近似。
开发以梯度为引导的并基于改写的攻击算法，以保持语义。
在多种文本分类任务和模型上进行经验验证攻击的有效性。
提供可扩展到文本以外的其他离散领域的框架（如恶意软件检测、垃圾邮件过滤）。

提出的方法

将攻击表述为在一个稀疏的特征变换集合上最大化 C_y(V(T_l(x)))，其中 ||l||_0 ≤ m。
定义集合函数 f(S) = max_{supp(l)⊆S} C_y(V(T_l(x)))，并在一般情形下给出 NP-hard 性证明。
若 f 为单调且子模，则贪婪算法可达到 (1-1/e) 的近似。
在两类神经网络上证明子模性：简化的 Word CNN（无 dropout/softmax）以及在某些条件下的一维隐藏单元 RNN。
引入基于梯度引导的贪婪词语改写（受 Gauss–Southwell 启发），以选择高影响的词并高效搜索替换。
提出句子与词语的联合改写，结合语义（Word Mover Distance）和句法约束以保留含义，使用同义改写语料（词汇 Paragram-SL999，句子 Para-nmt-50m）。
给出算法：联合句子与词语改写（算法1），贪婪句子改写（算法2），梯度引导的贪婪词语改写（算法3）。

实验结果

研究问题

RQ1离散文本攻击是否可以被表述为一个可适用于子模优化保证的集合函数优化问题？
RQ2在何种条件下，常见文本分类器（如 WCNN、RNN）的攻击目标具有子模性？
RQ3与现有基线相比，梯度引导的贪婪搜索是否提高了攻击的效率和有效性？
RQ4如何将保持语义的改写融入对抗文本生成中，同时不牺牲攻击成功率？
RQ5所提出的方法是否能在假新闻检测、垃圾邮件过滤和情感分析等文本分类任务中泛化？

主要发现

数据集	WCNN Origin	WCNN ADV(ours)	WCNN ADV [19]	LSTM Origin	LSTM ADV(ours)	LSTM ADV [19]	注
News	93.1%	35.4%	71.0%	93.3%	16.5%	37.0%	70.5%* and 22.8%* respectively
Trec07p	99.1%	48.6%	64.5%	99.7%	31.1%	39.8%	63.5%* and 37.6%* respectively
Yelp	93.6%	23.1%	39.0%	96.4%	30.0%	24.0%	41.2%* and 29.2%* respectively

在若干文本分类器下，在特定建模假设下，攻击目标 f 为单调且子模，从而可通过贪婪方法实现 (1-1/e) 近似。
基于梯度引导的贪婪词语改写通过优先考虑梯度范数最大的词，能高效地识别高影响的替换。
联合句子与词语改写在跨数据集和模型上显著提升了相对于仅词语的方法的攻击成功率。
实证结果显示，在假新闻检测、垃圾邮件过滤和情感分析任务中，所提方法相较基线具有更高的对抗成功率；结果包括在替换数量较少的情况下攻击准确率的大幅下降。
在 News、TREC07p、Yelp 数据集上，对 WCNN 和 LSTM 模型得到结果的一致性，并与之前的基线进行了详细比较。
作者提供可公开获取的攻击复现代码（在线）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。