QUICK REVIEW

[논문 리뷰] When Does Label Smoothing Help?

Rafael Rios Müller, Simon Kornblith|arXiv (Cornell University)|2019. 06. 06.

Time Series Analysis and Forecasting참고 문헌 17인용 수 884

한 줄 요약

본 논문은 레이블 스무딩이 일반화, 보정(calibration), 지식 증류에 미치는 영향을 분석하여, 보정과 일반화를 개선하지만 로그잇(logits)에서의 정보 소실로 인해 증류에는 해로울 수 있음을 시사한다.

ABSTRACT

The generalization and learning speed of a multi-class neural network can often be significantly improved by using soft targets that are a weighted average of the hard targets and the uniform distribution over labels. Smoothing the labels in this way prevents the network from becoming over-confident and label smoothing has been used in many state-of-the-art models, including image classification, language translation and speech recognition. Despite its widespread use, label smoothing is still poorly understood. Here we show empirically that in addition to improving generalization, label smoothing improves model calibration which can significantly improve beam-search. However, we also observe that if a teacher network is trained with label smoothing, knowledge distillation into a student network is much less effective. To explain these observations, we visualize how label smoothing changes the representations learned by the penultimate layer of the network. We show that label smoothing encourages the representations of training examples from the same class to group in tight clusters. This results in loss of information in the logits about resemblances between instances of different classes, which is necessary for distillation, but does not hurt generalization or calibration of the model's predictions.

연구 동기 및 목표

레이블 스무딩이 왜 그리고 언제 신경망 성능을 향상시키는지 조사한다.
레이블 스무딩이 펜튤리드-레이어 표현을 어떻게 바꾸는지 특징화한다.
다양한 작업에서 레이블 스무딩이 모델 보정에 미치는 영향을 평가한다.
레이블 스무딩이 지식 증류와 정보 전달에 미치는 영향을 살펴본다.

제안 방법

투영(projections)을 이용한 펜튤리드-레이어 활성화의 시각화 방법을 도입한다.
expected calibration error (ECE)와 신뢰도 다이어그램을 사용하여 보정을 정량화한다.
레이블 스무딩 여부에 관계없이 이미지 분류 및 번역 작업에서 보정과 정확도를 평가한다.
교사-학생 설정을 사용하여 레이블 스무딩이 지식 증류에 미치는 영향을 분석한다.
레이블 스무딩 하에서 입력과 로짓 간의 상호정보를 추정하여 정보 보존을 연구한다.

실험 결과

연구 질문

RQ1레이블 스무딩이 모델 보정을 향상시켜 빔 서치와 같은 다운스트림 작업에 도움이 되는가?
RQ2레이블 스무딩이 펜튤리드-레이어 표현을 어떻게 재구성하는가?
RQ3교사 정확도는 향상시키면서도 왜 레이블 스무딩이 지식 증류를 저해하는가?
RQ4네트워크에서 레이블 스무딩, 상호 정보, 정보 압축 간의 관계는 무엇인가?

주요 결과

레이블 스무딩은 보정을 향상시키고 예측의 과신(over-confidence)을 줄일 수 있다.
레이블 스무딩은 펜튤리드-레이어 활성화에서 더 촘촘하고 등간격의 군집을 형성하게 하여 클래스 간 정보 소거 효과를 시사한다.
레이블 스무딩은 번역 작업에서 BLEU와 보정을 개선하지만 hard targets에 비해 NLL은 더 나쁘다.
레이블 스무딩으로 학습된 교사로부터의 증류는 로그잇 정보의 손실로 인해 hard targets로 학습된 교사로부터의 증류보다 성능이 떨어질 수 있다.
입력과 로짓 차이 간의 상호정보는 레이블 스무딩으로 감소하여 표현에서의 정보 소실을 나타낸다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.