QUICK REVIEW

[论文解读] Text Processing Like Humans Do: Visually Attacking and Shielding NLP Systems

Steffen Eger, Gözde Gül Şahin|arXiv (Cornell University)|Mar 27, 2019

Adversarial Robustness in Machine Learning被引用 49

一句话总结

本文提出 VIPER，一种可视文本扰动器，展示视觉字符扰动在多任务上显著削弱 NLP 模型，并评估提升鲁棒性的防护方法。人类对这类扰动在很大程度上不受影响，凸显人机文本处理之间的差距。

ABSTRACT

Visual modifications to text are often used to obfuscate offensive comments in social media (e.g., "!d10t") or as a writing style ("1337" in "leet speak"), among other scenarios. We consider this as a new type of adversarial attack in NLP, a setting to which humans are very robust, as our experiments with both simple and more difficult visual input perturbations demonstrate. We then investigate the impact of visual adversarial attacks on current NLP systems on character-, word-, and sentence-level tasks, showing that both neural and non-neural models are, in contrast to humans, extremely sensitive to such attacks, suffering performance decreases of up to 82\%. We then explore three shielding methods---visual character embeddings, adversarial training, and rule-based recovery---which substantially improve the robustness of the models. However, the shielding methods still fall behind performances achieved in non-attack scenarios, which demonstrates the difficulty of dealing with visual attacks.

研究动机与目标

将文本的可视扰动动机化并形式化为一个真实的 NLP 威胁模型。
评估可视攻击对多任务（字符级、词级、句子级）上的最先进 NLP 模型的影响。
探索提升对可视扰动鲁棒性的防护技术。
比较在可视扰动下人类感知鲁棒性与机器脆弱性。

提出的方法

介绍 VIPER，一种视觉扰动器，在视觉嵌入空间中用视觉上相似的邻近字符替换字符。
定义三种字符嵌入空间（ICES、DCES、ECES），以提供视觉邻居和扰动依据。
将 ELMo 扩展为 SELMo（标准 ELMo）和 VELMo（视觉信息化的 ELMo），以研究视觉信息的整合。
开展人工标注实验以衡量被扰文本的可恢复性。
在可视扰动及防护方法下评估 NLP 任务（G2P、POS 标注、分块、有毒评论分类）。
通过对抗训练、视觉嵌入和基于规则的恢复来分析防护，并与未扰动基线进行比较。

实验结果

研究问题

RQ1文本的可视扰动如何影响跨字符级、词级和句子级的最先进 NLP 模型？
RQ2人类对视觉扰动文本是否具备鲁棒性，扰动类型如何影响可恢复性？
RQ3防护方法（对抗训练、视觉嵌入、基于规则的恢复）是否提升对视觉攻击的鲁棒性？
RQ4被攻击模型与人类之间的相对性能差距是多少，领域转变如何影响防护效果？

主要发现

NLP 模型在 VIPER 攻击下出现显著性能下降，在某些任务中下降幅度高达 82%。
人类受到的影响很小或无明显影响，相对于机器展现出强鲁棒性。
对抗训练和视觉字符嵌入显著提高鲁棒性，其中对抗训练常常提升 CE 的增益，AT+CE 组合优于单独方法。
基于规则的恢复在某些情境下提供强保护，尤其是 ECES 扰动，但在高扰动强度下没有任何防护能将性能完全恢复到干净数据水平。
G2P、POS 标注和分块比有毒评论分类更易受可视扰动影响，字符级任务受影响最严重。
DCES 扰动在现实攻击中比 ECES 更具挑战性，防护效果因任务和扰动类型而异。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。