QUICK REVIEW

[논문 리뷰] Ensemble Distribution Distillation

Andrey Malinin, Bruno Mlodozeniec|arXiv (Cornell University)|2019. 04. 30.

Anomaly Detection Techniques and Applications참고 문헌 37인용 수 82

한 줄 요약

Ensemble Distribution Distillation (EnD2)가 앙상블 예측의 분포를 Dirichlet로 모델링된 단일 Prior Network에 증류하여 다양성을 보존하고 불확실성 추정 및 OOD 탐지 성능을 향상시키며 CIFAR/TinyImageNet에서 앙상블 성능에 근접합니다.

ABSTRACT

Ensembles of models often yield improvements in system performance. These ensemble approaches have also been empirically shown to yield robust measures of uncertainty, and are capable of distinguishing between different \emph{forms} of uncertainty. However, ensembles come at a computational and memory cost which may be prohibitive for many applications. There has been significant work done on the distillation of an ensemble into a single model. Such approaches decrease computational cost and allow a single model to achieve an accuracy comparable to that of an ensemble. However, information about the \emph{diversity} of the ensemble, which can yield estimates of different forms of uncertainty, is lost. This work considers the novel task of \emph{Ensemble Distribution Distillation} (EnD$^2$) --- distilling the distribution of the predictions from an ensemble, rather than just the average prediction, into a single model. EnD$^2$ enables a single model to retain both the improved classification performance of ensemble distillation as well as information about the diversity of the ensemble, which is useful for uncertainty estimation. A solution for EnD$^2$ based on Prior Networks, a class of models which allow a single neural network to explicitly model a distribution over output distributions, is proposed in this work. The properties of EnD$^2$ are investigated on both an artificial dataset, and on the CIFAR-10, CIFAR-100 and TinyImageNet datasets, where it is shown that EnD$^2$ can approach the classification performance of an ensemble, and outperforms both standard DNNs and Ensemble Distillation on the tasks of misclassification and out-of-distribution input detection.

연구 동기 및 목표

앙상블이 정확도와 불확실성 추정을 향상시키는 이유와 비용이 많이 든다는 점을 동기 부여한다.
Ensemble Distribution Distillation (EnD2)을 단일 모델에서 앙상블의 다양성을 보존하는 것으로 정의한다.
출력 분포에 대한 분포를 모델링하기 위해 Prior Networks를 사용하는 접근법을 제안한다.
인위적 데이터와 표준 시각 데이터 세트에서 EnD2를 평가하여 EnD 및 PN 베이스라인과 비교한다.

제안 방법

Dirichlet 매개변수를 사용하여 출력 분포에 대한 분포의 샘플로 앙상블 출력을 모델링한다.
Dirichlet 집중 매개변수를 통해 범주 분포에 대한 분포를 매개변수화하기 위해 Prior Networks를 사용한다.
학습 안정화를 위한 온도 어닐링을 적용한 앙상블 예측에서 파생된 전이 데이터세트에서 EnD2 모델을 학습시킨다.
전이 데이터세트에서 Dirichlet 분포 출력의 음의 로그가능도(NLL)를 최소화한다.
Dirichlet 출력에 대해 엔트로피와 상호정보를 통해 불확실성 측정(총합, 데이터, 지식)을 계산한다.
선택적으로 보조 데이터를 사용하여 OOD 동작과 앙상블 다양성을 더 잘 캡처한다.

실험 결과

연구 질문

RQ1EnD2가 앙상블의 예측 분포를 지식(knowledge)과 데이터(data) 구성 요소로 나뉜 불확실성까지 정확하게 재현할 수 있는가?
RQ2표준 증류와 비교하여 EnD2가 앙상블의 분류 성능을 유지하고 잘못 분류 및 OOD 탐지를 개선하는가?
RQ3보조 데이터가 EnD2의 보정(calibration), NLL 및 OOD 탐지 성능에 미치는 영향은?
RQ4CIFAR-10/100 및 Tiny ImageNet에서 EnD2가 전통적 Ensemble Distillation과 Prior Networks와 어떻게 비교되는가?

주요 결과

EnD2는 개별 모델에 비해 앙상블 분류 성능을 크게 보존하고 잘못 분류 탐지를 개선한다.
EnD2는 도메인 내 데이터에서 앙상블과 유사하게 전체 불확실성을 데이터 불확실성과 지식 불확실성으로 분해할 수 있다.
보조 데이터를 사용할 때 EnD2는 일반적으로 OOD 탐지에서 Ensemble Distillation과 동등하거나 이를 능가한다.
보정 지표(NLL 및 ECE)는 EnD2가 종종 이점을 보이지만 데이터세트와 보조 데이터 사용에 따라 이점이 달라진다.
Prior Networks 단독은 보조 데이터가 있을 때에도 PRR 및 일부 보정 지표에서 앙상블 기반 방법보다 미달하므로 EnD2가 앙상블 다양성 포착에서 우수하다는 점을 강조한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.