QUICK REVIEW

[논문 리뷰] Robust Physical-World Attacks on Deep Learning Models

Kevin Eykholt, Ivan Evtimov|arXiv (Cornell University)|2017. 07. 27.

Adversarial Robustness in Machine Learning참고 문헌 39인용 수 506

한 줄 요약

이 논문은 Robust Physical Perturbations (RP2)를 도입하여 다양한 거리와 각도에서 물리적 객체에 대상화된 오분류를 야기하는 섭동을 만들고, 로드사인 및 기타 객체에 대한 실험실 및 현장 테스트에서 이를 평가한다.

ABSTRACT

Recent studies show that the state-of-the-art deep neural networks (DNNs) are vulnerable to adversarial examples, resulting from small-magnitude perturbations added to the input. Given that that emerging physical systems are using DNNs in safety-critical situations, adversarial examples could mislead these systems and cause dangerous situations.Therefore, understanding adversarial examples in the physical world is an important step towards developing resilient learning algorithms. We propose a general attack algorithm,Robust Physical Perturbations (RP2), to generate robust visual adversarial perturbations under different physical conditions. Using the real-world case of road sign classification, we show that adversarial examples generated using RP2 achieve high targeted misclassification rates against standard-architecture road sign classifiers in the physical world under various environmental conditions, including viewpoints. Due to the current lack of a standardized testing method, we propose a two-stage evaluation methodology for robust physical adversarial examples consisting of lab and field tests. Using this methodology, we evaluate the efficacy of physical adversarial manipulations on real objects. Witha perturbation in the form of only black and white stickers,we attack a real stop sign, causing targeted misclassification in 100% of the images obtained in lab settings, and in 84.8%of the captured video frames obtained on a moving vehicle(field test) for the target classifier.

연구 동기 및 목표

현실 세계의 동적 조건하에서 물리적 섭동이 DNN 분류기를 신뢰할 수 있게 오도시킬 수 있음을 입증한다.
거리지 각도 조명 변화에 강인한 섭동을 생성하도록 RP2를 개발한다.
물리적 공격 예시를 위한 실험실-현장 이중 단계 평가 방법론을 제안한다.
표준 도로 표지판 분류기에 대한 섭동을 평가하고 다른 객체로의 일반화를 보여준다.

제안 방법

거리, 각도, 조명 등 물리적 변환 분포를 모델링하고 실제 및 합성 변형을 샘플링하여 섭동을 최적화한다.
대상 물체의 표면에 섭동을 제한하기 위해 마스크 Mx를 사용하고, 물체 변환에 맞추기 위해 T_i를 통해 섭동을 투사한다.
최적화 목적식에서 프린터의 색 재현 오류를 반영하기 위해 Non-Printability Score (NPS)를 도입한다.
Lp 정규화와 변환된 인스턴스의 기대치를 사용하여 느슨한 목적식을 풀고 섭동을 최적화한다: argmin_delta lambda||Mx·delta||p + NPS + E_{xi~XV} J(f_theta(xi + Ti(Mx·delta)), y*) .
ADAM 최적화를 사용하고 정지 표지판에 흑백 스티커나 그래피티 스타일의 포스터로 섭동을 제작한다.

실험 결과

연구 질문

RQ1실제 물체에 적용된 물리적 섭동이 다양한 거리와 시야 각도에서 대상화된 오분류를 일으킬 수 있는가?
RQ2환경 변화 및 제조 한계 하에서 표면에 제약된 강인한 섭동이 효과를 유지하는가?
RQ3실내(고정) 테스트와 현장(주행 중) 테스트가 물리적 적대적 섭동 평가에 어떤 차이를 보이는가?
RQ4RP2 섭동이 도로 표지판 외 다른 분류기 및 객체로도 일반화될 수 있는가?
RQ5섭동 유형(포스터 대 스티커)이 공격 성공 및 가시성에 미치는 영향은 무엇인가?

주요 결과

RP2 섭동은 포스터 공격을 통해 LISA-CNN의 정지 표지판에 대해 100% 대상화 성공을 달성했다.
주행 중 테스트에서 은폐 그래피티를 사용한 LISA-CNN에서 84.8%, GTSRB-CNN에서 87.5%의 대상화 성공을 보였다.
실험실 테스트에서 포스터 및 스티커 공격은 최대 40피트 거리와 최대 60도 각도까지 높은 대상화 성공률을 보였다.
Inception-v3에서 스티커 공격은 전자레인지를 전화기로 잘못 분류하는 90%의 대상화 성공률, 커피 머그컵을 현금인출기로 잘못 분류하는 71.4%의 대상화 성공률을 보였다.
정지 표지판 대 제한 속도 80의 경우 GTSRB-CNN이 정지 테스트에서 80%, 주행 테스트에서 87.5%의 대상화 성공을 보였다.
이 접근법은 도로 표지판을 넘어 다른 객체에도 일반화되며, 이미지 분류기가 강건한 물리적 섭동에 폭넓게 취약하다는 것을 보여준다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.