QUICK REVIEW

[论文解读] Natural Language Adversarial Attacks and Defenses in Word Level

Xiaosen Wang, Hao Jin|arXiv (Cornell University)|Sep 15, 2019

Adversarial Robustness in Machine Learning参考文献 9被引用 66

一句话总结

本文提出同义词编码方法（SEM），一种针对词级对抗攻击的防御方法，该攻击通过替换同义词来保持语义和语法。SEM在输入层前引入神经编码器以学习鲁棒表征，从而在对干净样本的准确率下降最小的情况下减少对抗性扰动，并引入改进的遗传算法（IGA）作为评估的强攻击基线。

ABSTRACT

Up until recent two years, inspired by the big amount of research about adversarial example in the field of computer vision, there has been a growing interest in adversarial attacks for Natural Language Processing (NLP). What followed was a very few works of adversarial defense for NLP. However, there exists no defense method against the successful synonyms substitution based attacks that aim to satisfy all the lexical, grammatical, semantic constraints and thus are hard to perceived by humans. To fill this gap, we postulate the generalization of the model leads to the existence of adversarial examples, and propose an adversarial defense method called Synonyms Encoding Method (SEM), which inserts an encoder before the input layer of the model and then trains the model to eliminate adversarial perturbations. Extensive experiments demonstrate that SEM can efficiently defend current best synonym substitution based adversarial attacks with almost no decay on the accuracy for benign examples. Besides, to better evaluate SEM, we also propose a strong attack method called Improved Genetic Algorithm (IGA) that adopts the genetic metaheuristic against synonyms substitution based attacks. Compared with existing genetic based adversarial attack, the proposed IGA can achieve higher attack success rate at the same time maintain the transferability of adversarial examples.

研究动机与目标

解决自然语言处理中针对保留语言约束的同义词替换型对抗攻击缺乏有效防御的问题。
将模型泛化能力识别为词级攻击中对抗脆弱性的根本原因。
开发一种在良性输入上保持高准确率的同时有效中和对抗性扰动的防御机制。
提出一种强攻击方法，以更好地评估所提防御的鲁棒性。
通过可迁移、人类难以察觉的对抗性样本，建立用于评估自然语言处理对抗鲁棒性的基准。

提出的方法

提出同义词编码方法（SEM），在自然语言处理模型的输入层前插入神经编码器，以学习不变表征。
通过对抗样本端到端训练模型，以降低对同义词替换的敏感性。
在 SEM 中采用类似序列到序列自编码器的结构，将输入句子编码到潜在空间，以抑制对抗噪声。
设计改进的遗传算法（IGA）作为元启发式攻击方法，通过演化同义词替换以最大化攻击成功率。
在 IGA 中引入适应度函数，以平衡攻击成功率、语义相似性和语法正确性。
通过在搜索空间中优化泛化能力，确保对抗样本在不同模型间的可迁移性。

实验结果

研究问题

RQ1能否将模型泛化能力作为词级对抗样本的根本原因加以利用？
RQ2基于神经编码器的防御是否能有效中和同义词替换型对抗攻击，同时不降低干净样本的准确率？
RQ3所提出的改进遗传算法（IGA）在生成高成功率、可迁移的对抗样本方面有多高效？
RQ4在对抗训练下，SEM 在良性输入上的模型性能保持程度如何？
RQ5IGA 与 SEM 的结合能否建立一个评估自然语言处理对抗鲁棒性的稳健基准？

主要发现

SEM 在几乎不降低干净样本准确率的情况下，对最先进的同义词替换型攻击表现出强大的防御性能。
所提出的改进遗传算法（IGA）相比现有基于遗传的方法取得了更高的攻击成功率，同时保持了可迁移性。
IGA 生成的对抗样本在不同模型间具有高度可迁移性，表明其具备强鲁棒性和泛化能力。
通过在潜在空间中学习鲁棒的句子表征，SEM 有效降低了对抗扰动的影响。
大量实验表明，SEM 在良性输入上保持了高性能，证明了其在真实自然语言处理应用中的实用性。
IGA 与 SEM 的结合为评估自然语言处理中的对抗鲁棒性提供了强有力的基准，尤其适用于词级攻击。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。