QUICK REVIEW

[논문 리뷰] SSD: A Unified Framework for Self-Supervised Outlier Detection

Vikash Sehwag, Mung Chiang|arXiv (Cornell University)|2021. 03. 22.

Anomaly Detection Techniques and Applications참고 문헌 53인용 수 44

한 줄 요약

SSD는 자체 감독 표현 학습과 Mahalanobis 거리를 사용하여 레이블이 없는 인-디스트리뷰션 데이터만으로 out-of-distribution 샘플을 탐지하며, few-shot OOD 및 레이블 보조 탐지에 대한 확장을 통해 강력한 성능을 달성한다.

ABSTRACT

We ask the following question: what training information is required to design an effective outlier/out-of-distribution (OOD) detector, i.e., detecting samples that lie far away from the training distribution? Since unlabeled data is easily accessible for many applications, the most compelling approach is to develop detectors based on only unlabeled in-distribution data. However, we observe that most existing detectors based on unlabeled data perform poorly, often equivalent to a random prediction. In contrast, existing state-of-the-art OOD detectors achieve impressive performance but require access to fine-grained data labels for supervised training. We propose SSD, an outlier detector based on only unlabeled in-distribution data. We use self-supervised representation learning followed by a Mahalanobis distance based detection in the feature space. We demonstrate that SSD outperforms most existing detectors based on unlabeled data by a large margin. Additionally, SSD even achieves performance on par, and sometimes even better, with supervised training based detectors. Finally, we expand our detection framework with two key extensions. First, we formulate few-shot OOD detection, in which the detector has access to only one to five samples from each class of the targeted OOD dataset. Second, we extend our framework to incorporate training data labels, if available. We find that our novel detection framework based on SSD displays enhanced performance with these extensions, and achieves state-of-the-art performance. Our code is publicly available at https://github.com/inspire-group/SSD.

연구 동기 및 목표

레이블이 없는 인-디스트리뷰션 데이터만으로 이상치(OOD) 탐지를 촉진한다.
인-디스트리뷰션 특징을 활용하는 self-supervised, cluster-conditioned Mahalanobis 탐지기를 개발한다.
가능한 경우 레이블을 통합하기 위한 확장을 제공한다. - few-shot OOD 탐지 및 확장 포함

제안 방법

레이블이 없는 인-디스트리뷰션 데이터에서 대조적 self-supervised 학습(NT-Xent)으로 특징 추출기를 훈련한다.
인-디스트리뷰션 특징을 클러스터로 분할하고 각 클러스터를 특징 공간에서 Mahalanobis 거리로 모델링한다.
이상치 점수는 OOD 샘플을 탐지하기 위한 최소 클러스터 Mahalanobis 거리로 계산한다.
few-shot OOD의 경우, 수축된 공분산과 데이터 증강을 사용하여 Mahalanobis 항의 차이를 활용해 인-디스트리션 및 OOD 통계를 추정한다.
선택적으로 감독된 대조 손실(Supervised contrastive loss) SSD+를 통해 레이블을 도입하여 튜닝 매개변수 없이 최첨단 탐지를 달성한다.
CIFAR-10/100, STL-10, ImageNet 등 다양한 데이터셋에서 AUROC, FPR at TPR=95%, 및 AUPR로 평가한다.

실험 결과

연구 질문

RQ1레이블이 없는 인-디스트리뷰션 데이터만으로 학습된 이상치 탐지기가 이미지 OOD 작업에서 감독 학습 탐지기와 맞서거나 이를 능가할 수 있는가?
RQ2자체 감독 표현을 사용한 OOD 탐지에서 클러스터 조건부 Mahalanobis 탐지기가 얼마나 효과적인가?
RQ3공분산 축소 및 데이터 증강을 통한 통계 기반 적응이 few-shot OOD 시나리오에 이점을 주는가?
RQ4감독적 대조 손실을 통한 레이블 도입이 매개변수 튜닝 없이 OOD 탐지 성능을 향상시키는가?

주요 결과

SSD는 표준 이미지 데이터셋에서 대부분의 비지도 이상치 탐지기보다 큰 차이로 현저히 우수하다.
SSD는 레이블이 있는 인-디스트리뷰션 데이터를 사용하는 감독 탐지기와 동등하거나 때로는 더 나은 성능을 달성한다.
few-shot OOD 확장(SSD k)은 대상 OOD 샘플을 소량 이용하고 데이터 증강과 수축된 공분산으로 주목할 만한 이득을 준다.
SSD+ 및 감독 대조 손실을 통한 레이블 도입은 추가 튜닝 매개변수 없이 최첨단 성능을 제공한다.
다수의 데이터셋 쌍에 걸쳐 자기 감독 표현은 종종 감독 표현보다 OOD 탐지 작업에서 우수하며, 여러 경우에서 AUROC가 크게 향상된다.
SSD+가 다섯 샷 OOD 및 레이블 도입을 포함하는 경우, 여러 벤치마크에서 이전 감독 방법보다 우수한 성능을 낼 수 있다(예: CIFAR-100 vs CIFAR-10 등).

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.