QUICK REVIEW

[논문 리뷰] Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models

Wieland Brendel, Jonas Rauber|arXiv (Cornell University)|2017. 12. 12.

Adversarial Robustness in Machine Learning인용 수 889

한 줄 요약

경계 Attack(Boundary Attack)은 큰 공격 교란에서 시작하여 의사결정 경계선을 따라 점차 축소시키는 간단하고 효과적인 의사결정 기반 적대적 공격으로, 표준 비전 작업에서 gradient-based 공격과 비교할 만한 성능을 보인다.

ABSTRACT

Many machine learning algorithms are vulnerable to almost imperceptible perturbations of their inputs. So far it was unclear how much risk adversarial perturbations carry for the safety of real-world machine learning applications because most methods used to generate such perturbations rely either on detailed model information (gradient-based attacks) or on confidence scores such as class probabilities (score-based attacks), neither of which are available in most real-world scenarios. In many such cases one currently needs to retreat to transfer-based attacks which rely on cumbersome substitute models, need access to the training data and can be defended against. Here we emphasise the importance of attacks which solely rely on the final model decision. Such decision-based attacks are (1) applicable to real-world black-box models such as autonomous cars, (2) need less knowledge and are easier to apply than transfer-based attacks and (3) are more robust to simple defences than gradient- or score-based attacks. Previous attacks in this category were limited to simple models or simple datasets. Here we introduce the Boundary Attack, a decision-based attack that starts from a large adversarial perturbation and then seeks to reduce the perturbation while staying adversarial. The attack is conceptually simple, requires close to no hyperparameter tuning, does not rely on substitute models and is competitive with the best gradient-based attacks in standard computer vision tasks like ImageNet. We apply the attack on two black-box algorithms from Clarifai.com. The Boundary Attack in particular and the class of decision-based attacks in general open new avenues to study the robustness of machine learning models and raise new questions regarding the safety of deployed machine learning systems. An implementation of the attack is available as part of Foolbox at https://github.com/bethgelab/foolbox .

연구 동기 및 목표

실제 세계의 흑박 모델에 대한 의사결정 기반 공격의 중요성 부각.
복잡한 데이터셋에 대해 최초의 효과적인 의사결정 기반 방법으로 Boundary Attack 도입.
의사결정 기반 공격이 특정 방어 전략을 깨뜨릴 수 있음을 보임.
실제 흑박 API(Clarifai) 및 표준 비전 벤치마크에의 적용 가능성 시연.

제안 방법

대상 모델 경계에 따라 움직이도록 거절 샘플링을 수행하는 경계 추적(attTrans) 공격 제안(적대적 예시에서 시작하여 경계에 따라 최소 교란으로 이동).
간단한 제안 분포를 사용: 가우시안 방향 샘플링, 구(sphere)로 투사하고 원래 입력으로 향하는 방향으로 두 가지 가변 스텝 크기(직교 방향 및 원점으로의 방향)로 이동.
최종 모델의 의사결정만 필요하며 확신도나 그래디언트는 필요 없도록 임의의 적대적 기준 허용.
로컬 경계 기하학에 기반한 신뢰 영역(trust-region) 방식으로 교란 길이와 스텝 크기를 동적으로 조정.
MNIST, CIFAR-10, ImageNet에서 일반 아키텍처(VGG-19, ResNet-50, Inception-v3)를 사용하여 비표적 및 표적 설정에서 평가.
교차점에서의 교란 크기 및 방어에 대한 견고성 측면에서 gradient-based 공격(FGSM, DeepFool, Carlini & Wagner)과 비교.

실험 결과

연구 질문

RQ1그레이디언트나 확신 점수에 접근하지 못하는 실제 모델에서 의사결정 기반 공격이 일관되게 적대적 예시를 생성할 수 있는가?
RQ2MNIST, CIFAR-10, ImageNet에서 비표적 및 표적 시나리오에서 Boundary Attack의 성능은 gradient-based 방법에 비해 어떤가?
RQ3Boundary Attack가 그레이디언트 매스킹 및 방어적 증류(defensive distillation)와 같은 방어에 대해 얼마나 견고한가?
RQ4Boundary Attack가 최종 결정만 관찰되는 Clarifai 같은 흑박 API에서 효과적으로 작동하는가?

주요 결과

Attack Type	MNIST	CIFAR	VGG-19	ResNet-50	Inception-v3
FGSM gradient-based	4.2e-02	2.5e-05	1.0e-06	1.0e-06	9.7e-07
DeepFool gradient-based	4.3e-03	5.8e-06	1.9e-07	7.5e-08	5.2e-08
Carlini & Wagner gradient-based	2.2e-03	7.5e-06	5.7e-07	2.2e-07	7.6e-08
Boundary (our) decision-based	3.6e-03	5.6e-06	2.9e-07	1.0e-07	6.5e-08

Boundary Attack은 비표적 설정에서 MNIST, CIFAR, ImageNet 전역에서 gradient-based 공격과 비교하여 경쟁력 있는 최소 교란을 달성한다.
비표적 ImageNet 실험에서 Boundary(ours)는 MNIST에서 3.6e-03, CIFAR에서 5.6e-06, VGG-19에서 2.9e-07, ResNet-50에서 1.0e-07, Inception-v3에서 6.5e-08의 중앙값 교란 측정을 달성한다.
표적 설정에서 Boundary(ours)는 MNIST에서 6.5e-03, CIFAR에서 3.3e-05, ImageNet에서 VGG-19로 9.9e-06의 교란을 산출한다.
Boundary Attack은 방어적 증류(defensive distillation)와 같은 방어가 적용된 경우에도 효과적이며, 그레이디언트 매스킹에 대한 면역성을 보여준다.
두 개의 Clarifai 흑박 모델(브랜드 인식 및 유명인 인식)에서 Boundary Attack은 대체로 1e-2에서 1e-3 범위의 교란으로 악성 예시를 생성할 수 있었으나 일부 샘플은 잘못 분류되기 위해 더 큰 교란이 필요했다.
이 공격은 제로 백워드 패스 제로(0 backward passes)를 사용하고 그레이디언트 기반 공격보다 훨씬 더 많은 순방향 패스를 필요로 하여 그래디언트 대신 모델의 결정에 의존함을 시사한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.