QUICK REVIEW

[논문 리뷰] Adversarial Texts with Gradient Methods

Zhitao Gong, Wenlu Wang|arXiv (Cornell University)|2018. 01. 22.

Adversarial Robustness in Machine Learning인용 수 55

한 줄 요약

이 논문은 이미지에서의 그래디언트 기반 적대적 공격을 임베딩 공간 검색 및 최근접 이웃 재구성으로 텍스트에 적용하고, 질을 측정하기 위해 Word Mover's Distance를 사용하며, IMDB와 Reuters 데이터셋에서 FGM과 DeepFool을 고품질의 소단어 변경 적대적 예제로 보여준다.

ABSTRACT

Adversarial samples for images have been extensively studied in the literature. Among many of the attacking methods, gradient-based methods are both effective and easy to compute. In this work, we propose a framework to adapt the gradient attacking methods on images to text domain. The main difficulties for generating adversarial texts with gradient methods are i) the input space is discrete, which makes it difficult to accumulate small noise directly in the inputs, and ii) the measurement of the quality of the adversarial texts is difficult. We tackle the first problem by searching for adversarials in the embedding space and then reconstruct the adversarial texts via nearest neighbor search. For the latter problem, we employ the Word Mover's Distance (WMD) to quantify the quality of adversarial texts. Through extensive experiments on three datasets, IMDB movie reviews, Reuters-2 and Reuters-5 newswires, we show that our framework can leverage gradient attacking methods to generate very high-quality adversarial texts that are only a few words different from the original texts. There are many cases where we can change one word to alter the label of the whole piece of text. We successfully incorporate FGM and DeepFool into our framework. In addition, we empirically show that WMD is closely related to the quality of adversarial texts.

연구 동기 및 목표

이산 텍스트 입력에 그래디언트 공격을 적용하는 문제에 대응한다.
임베딩 공간에서 동작하고 텍스트를 근접 이웃을 통해 재구성하는 프레임워크를 개발한다.
Word Mover's Distance (WMD)로 적대적 텍스트의 질을 계량한다.
FGM과 DeepFool 같은 그래디언트 방법의 프레임워크 통합을 시연한다.
일부 단어의 소수 변경으로 표준 데이터셋에서 텍스트 레이블이 뒤집힐 수 있음을 보인다.

제안 방법

이산 입력 문제를 피하기 위해 임베딩 공간에서 적대적 예를 검색한다.
임베딩을 단어로 다시 매핑하기 위해 최근접 이웃 검색으로 적대적 텍스트를 재구성한다.
Word Mover's Distance를 사용하여 적대적 텍스트의 질을 계량한다.
FGM과 DeepFool 같은 그래디언트 기반 공격을 텍스트 프레임워크에 통합한다.
IMDB, Reuters-2, Reuters-5 데이터셋에서 효과를 평가한다.

실험 결과

연구 질문

RQ1그래디언트 기반 적대적 공격을 이산 텍스트 도메인에 어떻게 적용할 수 있는가?
RQ2텍스트로 재구성될 때 임베딩 공간의 적대적 예가 얼마나 효과적인가?
RQ3Word Mover's Distance와 적대적 텍스트의 지각적 질 사이의 관계는 무엇인가?
RQ4표준 데이터셋에서 텍스트 레이블을 바꾸려면 일반적으로 얼마나 많은 단어 변경이 필요한가?

주요 결과

이 프레임워크는 단 몇 차례의 단어 변경으로 고품질의 적대적 텍스트를 생성할 수 있다.
한 단어를 바꾸는 것만으로도 텍스트의 레이블이 뒤집히는 경우가 많다.
WMD는 적대적 텍스트의 지각적 질과 밀접한 관련이 있다.
FGM과 DeepFool은 프레임워크에 성공적으로 통합될 수 있다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.