QUICK REVIEW

[논문 리뷰] Self-training with Noisy Student improves ImageNet classification

Qizhe Xie, Minh-Thang Luong|arXiv (Cornell University)|2019. 11. 11.

Advanced Neural Network Applications참고 문헌 99인용 수 240

한 줄 요약

Noisy Student Training은 교사로부터의 의사라벨로 더 큰 노이즈를 가진 학생 모델을 학습시켜 라벨이 없는 데이터를 활용해 ImageNet 정확도와 강건성을 크게 향상시킵니다.

ABSTRACT

We present Noisy Student Training, a semi-supervised learning approach that works well even when labeled data is abundant. Noisy Student Training achieves 88.4% top-1 accuracy on ImageNet, which is 2.0% better than the state-of-the-art model that requires 3.5B weakly labeled Instagram images. On robustness test sets, it improves ImageNet-A top-1 accuracy from 61.0% to 83.7%, reduces ImageNet-C mean corruption error from 45.7 to 28.3, and reduces ImageNet-P mean flip rate from 27.8 to 12.2. Noisy Student Training extends the idea of self-training and distillation with the use of equal-or-larger student models and noise added to the student during learning. On ImageNet, we first train an EfficientNet model on labeled images and use it as a teacher to generate pseudo labels for 300M unlabeled images. We then train a larger EfficientNet as a student model on the combination of labeled and pseudo labeled images. We iterate this process by putting back the student as the teacher. During the learning of the student, we inject noise such as dropout, stochastic depth, and data augmentation via RandAugment to the student so that the student generalizes better than the teacher. Models are available at https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet. Code is available at https://github.com/google-research/noisystudent.

연구 동기 및 목표

라벨이 없는 이미지를 활용해 라벨링된 데이터만으로 얻는 것보다 ImageNet 정확도를 향상시킨다.
동등하거나 더 큰 학생 모델과 노이즈 주입을 사용해 이전의 자기지도 학습 및 증류를 능가하는 반지도학습 프레임워크를 개발한다.
표준 ImageNet 지표를 넘어 ImageNet-A, ImageNet-C, ImageNet-P에서의 강건성 향상을 입증한다.

제안 방법

라벨링된 데이터로 교사를 학습시켜 라벨링되지 않은 데이터에 의사 라벨을 생성한다.
라벨링 데이터와 의사 라벨 데이터의 결합에 노이즈를 주어 동등하거나 더 큰 학생을 학습시킨다(입력은 RandAugment, 모델은 드롭아웃과 확률적 깊이를 통해).
가장 우수한 학생으로 교사를 반복적으로 교체해 새로운 의사 라벨을 생성하고 새로운 학생을 학습시킨다.
데이터 필터링 및 균형화를 활용해 클래스별로 라벨링되지 않은 데이터 분포를 라벨링된 데이터와 정렬한다.
소프트 vs 하드 의사 라벨을 비교하고 노이즈 구성 요소를 제거해 그 영향력을 확인한다.

실험 결과

연구 질문

RQ1강력한 교사에 의해 라벨링된 데이터가 있으면, 초지도 학습 상태의 최첨단 supervised 학습을 넘어 ImageNet 정확도를 향상시킬 수 있는가?
RQ2노이즈 주입과 교사와 같은 규모의 학생 모델 사용이 의사 라벨 학습을 강화하는가?
RQ3Noisy Student Training은 ImageNet-A, ImageNet-C, ImageNet-P의 강건성에 어떤 영향을 미치는가?
RQ4반복 학습이 최종 성능에 어떤 영향을 미치는가?
RQ5이 프레임워크에서 소프트 vs 하드 의사 라벨의 비교는 어떤 차이가 있는가?

주요 결과

모델	Params	추가 데이터	Top-1 Acc.	Top-5 Acc.
Noisy Student Training (EfficientNet-L2)	480M	300M unlabeled images from JFT	88.4%	98.7%

Noisy Student Training은 3억 개의 라벨링되지 않은 이미지로 ImageNet에서 Top-1 정확도 88.4%를 달성해 더 많은 비감 라벨 데이터를 사용한 이전 방법들을 능가한다.
강건성: ImageNet-A Top-1 정확도는 61.0%에서 83.7%로 향상; ImageNet-C 평균 왜곡 오차는 45.7에서 28.3으로 감소; ImageNet-P 평균 뒤집힘 비율은 27.8에서 12.2로 감소.
EfficientNet-L2와 Noisy Student Training으로 ImageNet에서 Top-1 88.4% 및 Top-5 98.7% 정확도를 달성(Table 2).
반복 학습(교사 -> 학생 -> 새 교사)은 점점 증가하는 라벨링되지 않은 배치 비율에서 먼저 87.6%, 그다음 88.1%, 최종적으로 88.4%의 Top-1 정확도를 달성한다.
노이즈는 결정적이다: 증강 제거, 확률적 깊이 제거, 혹은 드롭아웃 제거는 성능 저하를 유발한다; 큰 규모의 비라벨 데이터가 이점이다.
Noisy Student Training은 FGSM/PGD 하에서 상대적으로 적대적 강건성도 향상시키며, 적대적 강건성을 최적화하지는 않았다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.