QUICK REVIEW

[논문 리뷰] Spatially Transformed Adversarial Examples

Chaowei Xiao, Jun-Yan Zhu|arXiv (Cornell University)|2018. 01. 08.

Advanced Malware Detection Techniques인용 수 240

한 줄 요약

논문은 픽셀 값 변화가 아닌 공간 변환(flow fields)을 통해 생성된 적대적 예제를 제시하며, 높은 지각적 현실성과 표준 방어에 대한 강인성을 보인다.

ABSTRACT

Recent studies show that widely used deep neural networks (DNNs) are vulnerable to carefully crafted adversarial examples. Many advanced algorithms have been proposed to generate adversarial examples by leveraging the $\mathcal{L}_p$ distance for penalizing perturbations. Researchers have explored different defense methods to defend against such adversarial attacks. While the effectiveness of $\mathcal{L}_p$ distance as a metric of perceptual quality remains an active research area, in this paper we will instead focus on a different type of perturbation, namely spatial transformation, as opposed to manipulating the pixel values directly as in prior works. Perturbations generated through spatial transformation could result in large $\mathcal{L}_p$ distance measures, but our extensive experiments show that such spatially transformed adversarial examples are perceptually realistic and more difficult to defend against with existing defense systems. This potentially provides a new direction in adversarial example generation and the design of corresponding defenses. We visualize the spatial transformation based perturbation for different examples and show that our technique can produce realistic adversarial examples with smooth image deformation. Finally, we visualize the attention of deep networks with different types of adversarial examples to better understand how these examples are interpreted.

연구 동기 및 목표

Lp 픽셀 공간 왜곡을 넘어선 교란의 탐구를 동기 부여합니다.
로컬 공간 왜곡을 최소화하면서 오분류를 달성하는 기하학적 공격을 도입합니다.
MNIST, CIFAR-10, ImageNet 데이터셋에서 공간적으로 변형된 적대적 예제의 지각적 현실성을 입증합니다.
stAdv 교란하에서 방어 강건성과 주의 메커니즘을 분석합니다.

제안 방법

대적 예 perturbations를 adversarial 이미지의 픽셀을 원본 이미지 픽셀로 맵핑하는 per-pixel flow field f로 표현하고 양선형 보간을 통해 매핑합니다.
공격 목표를 misclassification 로스와 국소적 변형을 촉진하는 흐름 규제 항과 결합하여 정의합니다.
Carlini-Wagner 스타일 손실을 사용하여 타겟 공격을 수행하고 g(x_adv)를 지정된 대상 클래스에 맞춰 최적화합니다.
로컬로 매끄러운 변환을 강제하기 위한 total-variation 유사 손실로 흐름을 정규화(L_flow).
최적의 흐름장을 얻기 위해 backtracking이 있는 L-BFGS로 최적화를 수행하여 적대적 이미지를 산출합니다.
흐름장을 시각화하여 국소성 및 경계에 집중된 왜곡을 보여줍니다.

Figure 1: Generating adversarial examples with spatial transformation: the blue point denotes the coordinate of a pixel in the output adversarial image and the green point is its corresponding pixel in the input image. Red flow field represents the displacement from pixels in adversarial image to pi

실험 결과

연구 질문

RQ1공간적으로 변형된 교란이 지각적 현실성을 보존하면서 분류기 정확도를 저하시킬 수 있나요?
RQ2stAdv 적대적 예제가 FGSM, C&W, 적대적 학습을 포함한 표준 방어에 대해 어떻게 성능을 보이나요?
RQ3stAdv 교란이 네트워크의 주의 변화를 유발하고, 강건한 모델은 어떻게 반응하나요?

주요 결과

stAdv는 픽셀 값을 변경하는 대신 이미지를 매끄럽게 변형시켜 MNIST, CIFAR-10, ImageNet에서 지각적으로 현실적인 적대적 예제를 생성합니다.
최적화된 흐름장은 국소적으로 매끄럽고 종종 물체의 가장자리나 인식에 중요한 영역에 집중합니다.
stAdv는 높은 공격 성공률을 달성하고 적대적 학습 변형을 포함한 여러 방어 전략에 남아 도전적입니다.
CAM 시각화는 stAdv가 네트워크의 주의를 재지시킬 수 있음을 보여주며, 강건한 모델도 stAdv 공격에 취약한 상태를 유지합니다.
Mean blur 방어는 stAdv에 대해 다른 공격에 비해 제한된 보호를 제공하며, 적응형 공격은 강건성 취약점을 회복할 수 있습니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.