QUICK REVIEW

[논문 리뷰] Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

Dan Hendrycks, Thomas G. Dietterich|arXiv (Cornell University)|2019. 03. 28.

Adversarial Robustness in Machine Learning인용 수 785

한 줄 요약

이미지넷-C(ImageNet-C)와 이미지넷-P(ImageNet-P) 벤치마크를 도입하여 일반 손상 및 perturbations에 대한 이미지 분류기의 강건성을 평가하고, 아키텍처를 비교하며 청정 정확도 외의 강건성 향상을 제시한다.

ABSTRACT

In this paper we establish rigorous benchmarks for image classifier robustness. Our first benchmark, ImageNet-C, standardizes and expands the corruption robustness topic, while showing which classifiers are preferable in safety-critical applications. Then we propose a new dataset called ImageNet-P which enables researchers to benchmark a classifier's robustness to common perturbations. Unlike recent robustness research, this benchmark evaluates performance on common corruptions and perturbations not worst-case adversarial perturbations. We find that there are negligible changes in relative corruption robustness from AlexNet classifiers to ResNet classifiers. Afterward we discover ways to enhance corruption and perturbation robustness. We even find that a bypassed adversarial defense provides substantial common perturbation robustness. Together our benchmarks may aid future work toward networks that robustly generalize.

연구 동기 및 목표

적대적 예제 외의 강건성 벤치마크의 필요성에 대한 동기를 부여한다.
이미지 분류에서 손상(Corruption)과 교란(Perturbation) 강건성을 정의한다.
ImageNet-C(손상)와 ImageNet-P(교란) 데이터세트를 생성하고 공개한다.
손상과 교란 강건성을 정량화하는 지표와 다양한 아키텍처에 대한 기준선을 제안한다.
강건성을 향상시키는 방법을 시연하고 적대적 방어와의 상호작용을 드러낸다.]
method':['손상 강건성을 ImageNet 검증 데이터에서 다섯 수준의 심각도에 걸쳐 75개 손상에 대한 평균 성능으로 정의한다.','교란 시퀀스(ImageNet-P)와 Flip Rate, Top-5 Distance와 같은 지표를 통해 교란 강건성을 정의한다.','소음, 흐림, 날씨, 디지털의 네 범주에 걸친 15가지 손상 유형과 다섯 수준의 심각도로 ImageNet-C를 도입한다.','선정된 교란 유형들에 걸친 시간적으로 순차된 교란(ImageNet-P)과 평가 지표를 도입한다.','다양한 아키텍처(AlexNet, SqueezeNet, VGG, ResNet, DenseNet, ResNeXt 등)를 평가하여 강건성 경향을 확인한다.','강건성 향상 방법(CLAHE, 다중 스케일 네트워크, 더 큰 특징 집계, 스타일화 증강, ALP), 그리고 적대적 방어와의 상호작용을 보고한다.]
research_questions':['일반 손상과 교란이 아키텍처 전반에서 이미지 분류기 성능에 어떤 영향을 미치는가?','청정 정확도 향상이 손상 및 교란에 대한 강건성으로 이어지는가?','특정 아키텍처 변경이나 전처리로 정확도를 해치지 않으면서 손상 및 교란 강건성을 개선할 수 있는가?','적대적 방어와 일반 교란에 대한 강건성 사이의 관계는 무엇인가?','모델 간 공정한 비교를 가능하게 하고 강건성을 가장 잘 포착하는 기준 지표는 무엇인가?'],
key_findings':['AlexNet에서 ResNet으로의 아키텍처 발전은 손상 강건성의 이득이 제한적이며(mCE 개선은 완만하고 종종 청정 정확도와 연계되어 있다).','교란 강건성은 덜 탐구되었고 강한 모델에서도 흔히 악화되며; 일반적인 교란에서도 상위 5개 예측이 불안정할 수 있다.','다중 스케일 및 특징 집계 아키텍처(DenseNets, ResNeXts, Multigrid)는 기본 ResNet에 비해 손상 강건성에서 주목할 만한 이득을 보인다.','더 크고 단일화된 모형은 중복성이 커서 순수 정확도 이득을 넘어 잡음과 왜곡에 대한 강건성을 개선할 수 있다.','CLAHE 전처리는 손상 강건성을 다소 개선하고, 스타일링 기반 증강과 ALP 방어도 일반 교란에 대한 강건성을 향상시킬 수 있다.','강건성 개선은 아키텍처 변경과 표적 전처리/증강에서 모두 올 수 있으며, 특정 적대적 방어는 교차 강건성 이점을 제공한다.'],
table_headers':['Architecture','Clean Error','mCE','Gaussian Noise','Shot Noise','Impulse Noise','Defocus Blur','Frosted Glass Blur','Motion Blur','Zoom Blur','Snow','Frost','Fog','Bright','Contrast','Elastic','Pixelation','JPEG'],
table_rows':[['AlexNet','43.5','100.0','100','100','100','100','100','100','100','100','100','100','100','100','100','100','100'],['SqueezeNet','41.8','104.4','107','106','105','100','103','101','100','101','103','97','97','98','106','109','134'],['VGG-11','31.0','93.5','97','97','100','92','99','93','91','92','91','84','75','86','97','107','100'],['VGG-19','27.6','88.9','89','91','95','89','98','90','90','89','86','75','68','80','97','102','94'],['VGG-19+BN','25.8','81.6','82','83','88','82','94','84','86','80','78','69','61','74','94','85','83'],['ResNet-18','30.2','84.7','87','88','91','84','91','87','89','86','84','78','69','78','90','80','85'],['ResNet-50','23.9','76.7','80','82','83','75','89','78','80','78','75','66','57','71','85','77','77']]}]}? Wait formatting issue. JSON malformed. We need proper JSON-like. Let's reconstruct cleanly. I'll present proper JSON with fields. Remove stray characters. Provide translated content. Let's craft final properly. I'll redo. }
} (I must fix) Sorry. I'll provide clean JSON. ] Sorry. I'll output corrected.

제안 방법

Define corruption robustness as the average performance across 75 corruptions at five severity levels on ImageNet validation data.
Define perturbation robustness via perturbation sequences (ImageNet-P) and metrics like Flip Rate and Top-5 Distance.
Introduce ImageNet-C with 15 corruption types across four categories (noise, blur, weather, digital) and five severity levels.
Introduce ImageNet-P with temporally sequenced perturbations across selected perturbation types and evaluation metrics.
Evaluate multiple architectures (e.g., AlexNet, SqueezeNet, VGG, ResNet, DenseNet, ResNeXt) to assess robustness trends.
Explore robustness enhancements (CLAHE, multiscale networks, larger feature aggregation, stylization augmentation, and ALP) and report interactions with adversarial defenses.

실험 결과

연구 질문

RQ1How do common corruptions and perturbations affect image classifier performance across architectures?
RQ2Do improvements in clean accuracy translate into robustness to corruptions and perturbations?
RQ3Can specific architectural or preprocessing changes improve corruption and perturbation robustness without sacrificing accuracy?
RQ4What is the relationship between adversarial defenses and robustness to common perturbations?
RQ5What baseline metrics best capture robustness and enable fair comparisons across models?

주요 결과

Architectural progress from AlexNet to ResNet yields limited gains in corruption robustness (mCE improvements are modest and often tied to clean accuracy).
Perturbation robustness is underexplored and often deteriorates even for strong models; top-5 predictions can be unstable under common perturbations.
Multiscale and feature-aggregating architectures (DenseNets, ResNeXts, Multigrid) show notable gains in corruption robustness over vanilla ResNets.
Larger, more monolithic models with greater redundancy can improve robustness to noise and distortions beyond pure accuracy gains.
CLAHE preprocessing modestly improves corruption robustness; stylization-based augmentation and ALP defense can also enhance robustness to common perturbations.
Robustness improvements can come from both architectural changes and targeted preprocessing/augmentation, with certain adversarial defenses providing cross-robustness benefits.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.