QUICK REVIEW

[논문 리뷰] Self-Distillation as Instance-Specific Label Smoothing

Zhilu Zhang, Mert R. Sabuncu|arXiv (Cornell University)|2020. 06. 09.

Machine Learning and Data Classification참고 문헌 41인용 수 52

한 줄 요약

논문은 자기 증류를 MAP 프레임워크 내에서 인스턴스별 정규화로 해석하고, 증류를 레이블 스무딩과 연결하며, 별도의 교사를 두지 않고 신뢰도 다양성을 촉진하는 Beta 스무딩을 도입한다.

ABSTRACT

It has been recently demonstrated that multi-generational self-distillation can improve generalization. Despite this intriguing observation, reasons for the enhancement remain poorly understood. In this paper, we first demonstrate experimentally that the improved performance of multi-generational self-distillation is in part associated with the increasing diversity in teacher predictions. With this in mind, we offer a new interpretation for teacher-student training as amortized MAP estimation, such that teacher predictions enable instance-specific regularization. Our framework allows us to theoretically relate self-distillation to label smoothing, a commonly used technique that regularizes predictive uncertainty, and suggests the importance of predictive diversity in addition to predictive uncertainty. We present experimental results using multiple datasets and neural network architectures that, overall, demonstrate the utility of predictive diversity. Finally, we propose a novel instance-specific label smoothing technique that promotes predictive diversity without the need for a separately trained teacher model. We provide an empirical evaluation of the proposed method, which, we find, often outperforms classical label smoothing.

연구 동기 및 목표

다세대(Self-distillation)가 일반화 성능을 향상시키는 원인을 조사한다.
MAP 기반 해석을 제공한다 of teacher-student training.
증류를 레이블 스무딩과 연관시키고 예측 다양성의 역할을 강조한다.
효율적인 인스턴스별 정규화 기법으로 Beta 스무딩을 제안한다.
확률 단순체상에서의 정규화를 통한 보정 개선을 시연한다.

제안 방법

소프트맥스 출력의 근사화된 MAP 추정으로 증류 과정을 모델링한다.
교사 예측을 출력 분포에 대한 인스턴스별 사전으로 연결한다.
체계적인 실험을 통해 자기 증류를 고전적 레이블 스무딩과 비교한다.
별도의 교사 없이 인스턴스별 사전을 구현하기 위해 Beta 스무딩을 도입한다.
엔트로피 기반 지표를 사용해 예측 불확실성과 신뢰도 다양성을 분석한다.
다양한 데이터셋에서 기대 보정 오차(ECE)를 통해 보정 개선을 평가한다.

실험 결과

연구 질문

RQ1교사 예측의 다양성이 증가하면 자기 증류에서 학생 성능이 향상되는가?
RQ2MAP 프레임워크를 통해 자기 증류를 이론적으로 레이블 스무딩과 연결할 수 있는가?
RQ3인스턴스별 정규화( Beta 스무딩 포함)가 전통적 레이블 스무딩보다 우수한가?
RQ4Beta 스무딩이 자기 증류와 견주어 보정 이점을 제공하는가?
RQ5일반화 및 보정 개선에 있어 예측 다양성의 역할은 무엇인가?

주요 결과

연쇄적 자기 증류는 세대에 걸쳐 테스트 정확도와 보정이 향상됨을 보인다.
교사 예측의 다양성이 높을수록 학생 성능이 향상된다.
레이블 스무딩은 예측 불확실성을 증가시키지만 다양성까지 달성하지 못할 수 있으며 인스턴스별 사전이 도움이 된다.
Beta 스무딩은 일반적으로 고전적 레이블 스무딩보다 우수하며 별도의 교사 없이도 자기 증류에 상응할 수 있다.
MAP 관점은 증류를 인스턴스별 정규화의 한 형태로 설명하며 보정을 향상시킨다.
온도 조정된 교사 예측은 불확실성과 다양성을 제어함으로써 학생의 정확도를 크게 높일 수 있다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.