QUICK REVIEW

[논문 리뷰] In or Out? Fixing ImageNet Out-of-Distribution Detection Evaluation

Julian Bitterwolf, Maximilian A. Müller|arXiv (Cornell University)|2023. 06. 01.

Adversarial Robustness in Machine Learning인용 수 11

한 줄 요약

이 논문은 많은 ImageNet-1K OOD 데이터셋에 ID 오염이 존재함을 보이고, 64개의 OOD 클래스와 5,879장의 정제된 이미지를 가진 NINCO(No ImageNet Class Objects)를 도입하며, 다양한 아키텍처를 가로지르는 광범위한 OOD 탐지기를 분석하고, 프리트레이닝의 영향과 클래스별 평가 및 OOD 유닛 테스트의 필요성을 강조합니다.

ABSTRACT

Out-of-distribution (OOD) detection is the problem of identifying inputs which are unrelated to the in-distribution task. The OOD detection performance when the in-distribution (ID) is ImageNet-1K is commonly being tested on a small range of test OOD datasets. We find that most of the currently used test OOD datasets, including datasets from the open set recognition (OSR) literature, have severe issues: In some cases more than 50$\%$ of the dataset contains objects belonging to one of the ID classes. These erroneous samples heavily distort the evaluation of OOD detectors. As a solution, we introduce with NINCO a novel test OOD dataset, each sample checked to be ID free, which with its fine-grained range of OOD classes allows for a detailed analysis of an OOD detector's strengths and failure modes, particularly when paired with a number of synthetic "OOD unit-tests". We provide detailed evaluations across a large set of architectures and OOD detection methods on NINCO and the unit-tests, revealing new insights about model weaknesses and the effects of pretraining on OOD detection performance. We provide code and data at https://github.com/j-cb/NINCO.

연구 동기 및 목표

Widely used ImageNet-1K OOD 테스트 데이터셋에서 ID 오염을 식별하고 정량화한다.
클린하고 도전적인 OOD 테스트 세트(NINCO)와 클래스별 평가를 제시하여 탐지기의 약점을 더 잘 이해한다.
다양한 아키텍처와 프리트레이닝 체계에서 서로 다른 OOD 탐지 방법의 성능을 분석한다.
자연 이미지 외의 탐지기 약점을 파헤치기 위한 OOD 유닛 테스트를 도입한다.
공정한 평가와 OOD 탐지기 보고를 위한 권고를 제시한다.

제안 방법

일반적으로 사용되는 OOD 데이터셋에서 ID 오염을 측정하기 위해 400개의 무작위 샘플에 대한 체계적 수동 검사.
64개의 OOD 클래스에 걸쳐 5,879장의 이미지를 포함하고 ID-free로 수동 검증된 NINCO를 구성하고, 17개의 합성 OOD 유닛 테스트를 추가로 제시한다.
여러 아키텍처(ViT, ConvNets)와 다양한 프리트레이닝(IN-21K, CLIP, JFT 등)을 사용하는 11개의 OOD 탐지 방법을 평가한다.
MSP 기준선, 특징 기반 탐지기(Maha, RMaha, ViM) 및 기타 방법(MaxLogit, Energy, KL-Matching, KNN, ReAct 등)과 사전 로그이전 피처 사용 여부를 분석한다.
프리트레이닝이 OOD 탐지 성능에 미치는 영향과 집계 지표와 클래스별 지표의 신뢰성 차이를 평가한다.

Figure 3: OOD-detection before and after removing samples with ID-objects: We show FPR (lower is better) of two OOD detectors (MSP and Mahalanobis distance) for a ViT, evaluated on cleaned and full subsets of four popular OOD datasets.

실험 결과

연구 질문

RQ1기존 ImageNet-1K OOD 테스트 데이터셋이 In-distribution 객체로 얼마나 오염되어 있는가?
RQ2정리된 ID-free OOD 테스트 세트(NINCO)가 아키텍처 전반에 걸쳐 OOD 탐지기의 보다 신뢰할 만한 평가를 제공할 수 있는가?
RQ3프리트레이닝 유형과 피처 사용이 OOD 탐지 성능에 미치는 영향은 무엇인가?
RQ4합성 OOD 유닛 테스트가 자연 이미지 데이터셋에서 드러나지 않는 약점을 드러내는가?
RQ5공정한 OOD 탐지기 벤치마킹을 위한 평가 관행(클래스별 분포, 유닛 테스트)은 무엇인가?

주요 결과

IN-1K용으로 널리 사용되는 많은 OOD 데이터셋에서 상당한 ID 오염이 존재하며, Places와 Species 데이터셋과 같은 경우에 50% 이상인 경우가 많다.
ID 오염은 강력한 탐지기를 불공정하게 패널티를 주고 거짓 양성을 부풀리며, 탐지기가 OOD로 간주되지 않아야 할 ID 콘텐츠를 올바르게 식별할 수 있다.
NINCO는 64개의 수작업으로 검증된 OOD 클래스와 5,879장의 ID-free 이미지를 제공하여 탐지기의 강점과 실패 모드를 자세히 분석할 수 있게 하며, 약점을 파악하기 위한 합성 17개의 유닛 테스트를 추가로 제공한다.
대형 데이터셋에 대한 프리트레이닝은 일반적으로 OOD 탐지 성능을 향상시키며, 프리-로지트 피처 기반 방법이 MSP보다 성능이 나은 경향이 있지만 그 이점은 모델과 프리트레이닝에 크게 의존한다.
프리로지트 피처를 명시적으로 사용하는 피처 기반 탐지기가 모델 전반에서 더 일관된 개선을 보이는 경향이 있으며, 제로샷 CLIP 기반 방법은 NINCO에서 IN-1K 분류기보다 성능이 좋지 않다.
고급 탐지기의 평균 FPR 개선은 NINCO에서 더 두드러지며, 클래스별 분석은 성능이 OOD 클래스에 따라 크게 달라짐을 보여준다.

Figure 5: Cumulative distribution of the % of NINCO-classes for which an FPR at least as low as a given x-value is achieved. The area over this curve corresponds to the mean FPR. The further in the top left corner, the better. The best methods explicitly access pre-logit features (Left): Different O

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.