QUICK REVIEW

[논문 리뷰] Delving into Transferable Adversarial Examples and Black-box Attacks

Yanpei Liu, Xinyun Chen|arXiv (Cornell University)|2016. 11. 08.

Adversarial Robustness in Machine Learning인용 수 819

한 줄 요약

이 논문은 ImageNet 규모 모델에서 적대적 예제의 전달 가능성을 연구하여 비목표전달이 흔하고, 목표 전달은 앙상블 기반 방법 없이는 드물며, 블랙박스 전달이 Clarifai.com으로 시연됨.

ABSTRACT

An intriguing property of deep neural networks is the existence of adversarial examples, which can transfer among different architectures. These transferable adversarial examples may severely hinder deep neural network-based applications. Previous works mostly study the transferability using small scale datasets. In this work, we are the first to conduct an extensive study of the transferability over large models and a large scale dataset, and we are also the first to study the transferability of targeted adversarial examples with their target labels. We study both non-targeted and targeted adversarial examples, and show that while transferable non-targeted adversarial examples are easy to find, targeted adversarial examples generated using existing approaches almost never transfer with their target labels. Therefore, we propose novel ensemble-based approaches to generating transferable adversarial examples. Using such approaches, we observe a large proportion of targeted adversarial examples that are able to transfer with their target labels for the first time. We also present some geometric studies to help understanding the transferable adversarial examples. Finally, we show that the adversarial examples generated using ensemble-based approaches can successfully attack Clarifai.com, which is a black-box image classification system.

연구 동기 및 목표

대규모 ImageNet 모델 간의 비목표적 적대적 예제의 전달 가능성 평가.
목표 지향적 적대적 전달 가능성과 그 발생 빈도 조사.
목표 공격의 전달 가능성을 향상시키기 위한 앙상블 기반 방법 개발.
전이 가능성을 이해하기 위한 대규모 모델의 기하학적 특성 분석.
실제 서비스(Clarifai.com)에 대한 블랙박스 전달 가능성 시연.

제안 방법

다중 아키텍처에 걸쳐 비목표적 및 목표 공격에 대해 최적화 기반, 빠른 기울기(FG), 빠른 기울기 부호(FGS) 방법 비교.
검토된 모든 모델에서 올바르게 분류된 100개의 ImageNet 검증 이미지에서 전달 가능성 평가.
여러 모델을 동시에 대상으로 하는 적대적 예제를 생성하기 위한 앙상블 기반 최적화를 도입.
왜 적대적 예제가 전달되는지 이해하기 위해 그래디언트 방향과 결정 경계 분석.
실제 블랙박스 서비스(Clarifai.com)에 대한 적대적 예제 테스트.

실험 결과

연구 질문

RQ1대규모 ImageNet 모델 간에 비목표적 적대적 예제는 얼마나 전달 가능한가?
RQ2모델 간 목표 지향적 적대적 전달 가능성은 얼마나 되고, 전달을 개선할 수 있는가?
RQ3다중 모델 간에 앙상블 기반 접근법이 목표 전달 가능성을 높이는가?
RQ4전이 가능성을 설명하는 대규모 CNN의 기하학적 특성은 무엇인가(예: 그래디언트 직교성, 결정 경계 정렬)?
RQ5알 수 없는 모델 및 데이터의 블랙박스 온라인 분류기에 적대적 예제가 전달될 수 있는가(Clarifai.com)?

주요 결과

비목표적 적대적 예제는 ImageNet 모델 간에 상당한 전달 가능성을 보이며, 소스/타깃 모델 쌍에 따라 누출 정도가 다릅니다.
기존 단일 모델 방법으로 생성된 목표 지향적 적대적 예제의 전달은 저조하지만, 앙상블 기반 생성은 다중 모델에 걸쳐 목표 전달 가능성을 크게 높입니다.
앙상블 기반 목표 공격은 여러 모델에 대해 높은 전달율을 달성하지만 전달이 보편적이지 않으며 대상 모델에 따라 다릅니다.
다른 모델 간 그래디언트 방향은 대체로 직교하고, 결정 경계는 정렬되어 있어 전달 현상을 부분적으로 설명합니다.
앙상블 방법으로 생성된 적대적 예제는 학습 데이터와 라벨이 알려지지 않은 상태에서도 실제 블랙박스 서비스(Clarifai.com)로 전달될 수 있습니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.