QUICK REVIEW

[논문 리뷰] Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation

Linfeng Zhang, Jiebo Song|arXiv (Cornell University)|2019. 05. 17.

Advanced Neural Network Applications참고 문헌 40인용 수 83

한 줄 요약

자기 증류(self distillation)를 도입하는 훈련 프레임워크로, 더 깊은 네트워크 부분이 같은 모델 내의 얕은 부분을 가르치며 추론 비용을 늘리지 않고 정확도를 향상시킨다. CIFAR100에서 평균 약 2.65%의 정확도 이득을 얻고 깊이 적응 추론을 가능하게 한다.

ABSTRACT

Convolutional neural networks have been widely deployed in various application scenarios. In order to extend the applications' boundaries to some accuracy-crucial domains, researchers have been investigating approaches to boost accuracy through either deeper or wider network structures, which brings with them the exponential increment of the computational and storage cost, delaying the responding time. In this paper, we propose a general training framework named self distillation, which notably enhances the performance (accuracy) of convolutional neural networks through shrinking the size of the network rather than aggrandizing it. Different from traditional knowledge distillation - a knowledge transformation methodology among networks, which forces student neural networks to approximate the softmax layer outputs of pre-trained teacher neural networks, the proposed self distillation framework distills knowledge within network itself. The networks are firstly divided into several sections. Then the knowledge in the deeper portion of the networks is squeezed into the shallow ones. Experiments further prove the generalization of the proposed self distillation framework: enhancement of accuracy at average level is 2.65%, varying from 0.61% in ResNeXt as minimum to 4.07% in VGG19 as maximum. In addition, it can also provide flexibility of depth-wise scalable inference on resource-limited edge devices.Our codes will be released on github soon.

연구 동기 및 목표

accuracy-crucial applications에서의 정확도를 유지하면서 CNN의 계산량 감소를 목표로 한다.
하나의 네트워크를 Shallow 구간으로 나누고 각 구간에 classifier를 배치하여 지식을 증류하는 self distillation 프레임워크를 제안한다.
가장 깊은 classifier(teacher)로부터의 증류를 통해 모든 얕은 classifier의 정확도가 향상돼 추가적인 추론 비용 없이 정확도를 개선한다.
자원 제약 환경에서 규모에 따라 깊이(depth-wise) 추론이 가능하도록 메서드의 이점을 보여준다.

제안 방법

대상을 CNN을 깊이에 대응하는 여러 얕은 구간으로 나눈다.
각 구간 뒤에 bottleneck와 fully connected classifier를 연결한다(훈련 시에만).
가장 얕은 classifier들을 teacher로부터의 증류를 받는 student로 동시에 학습시킨다.
각 얕은 classifier에 대해 세 가지 손실원을 사용한다: (1) 레이블과의 교차 엔트로피, (2) 얕은 classifier와 가장 깊은 classifier 간의 KL-divergence, (3) bottleneck 계층을 통한 얕은 및 deepest feature maps를 맞추는 L2 hint 손실.
각 분류기의 손실 합계에 대해 alpha와 lambda로 세 가지 감독 신호의 가중치를 조정한다; 가장 깊은 classifier는 레이블 감독만 의존한다.

실험 결과

연구 질문

RQ1self distillation이 추론 비용을 늘리지 않으면서 서로 다른 CNN 아키텍처와 데이터셋 전반에서 정확도를 향상시키는가?
RQ2얕은 classifier들이 deepest classifier의 증류로부터 이익을 얻고, 이것이 전체 모델 성능 및 학습 효율성에 어떤 영향을 주는가?
RQ3self distillation이 전통적 증류와 깊은 감독 네트보다 정확도, 학습 시간, 엣지 디바이스에서의 실용성 측면에서 어떻게 비교되는가?
RQ4이 접근법이 자원 제약 환경에서 확장 가능한 깊이 인식 추론을 가능하게 하는가?

주요 결과

Neural Networks	Baseline	Classifier 1/4	Classifier 2/4	Classifier3/4	Classifier 4/4	Ensemble
VGG19(BN)	64.47	63.59	67.04	68.03	67.73	68.54
ResNet18	77.09	67.85	74.57	78.23	78.64	79.67
ResNet50	77.68	68.23	74.21	75.23	80.56	81.04
ResNet101	77.98	69.45	77.29	81.17	81.23	82.03
ResNet152	79.21	68.84	78.72	81.43	81.61	82.29
ResNeXt29-8	81.29	71.15	79.00	81.48	81.51	81.90
WideResNet20-8	79.76	68.85	78.15	80.98	80.92	81.38
WideResNet44-8	79.93	72.54	81.15	81.96	82.09	82.61
WideResNet28-12	80.07	71.21	80.86	81.58	81.59	82.09
PyramidNet101-240	81.12	69.23	78.15	80.98	82.30	83.51

Self distillation은 CIFAR100에서 테스트된 네트워크들 전반에 평균 2.65%의 정확도 향상을 보였으며, 범위는 0.61% (ResNeXt)에서 4.07% (VGG19)까지이다.
ImageNet에서 평가된 네트워크들 전반에 걸쳐 평균 정확도 향상은 2.02%이다.
더 깊은 네트워크일수록 self distillation으로부터 더 큰 이득을 얻는 경향이 있다 (예: ResNet101/152에서 더 큰 이득).
Self distillation은 확장 가능한 깊이 기반 추론을 지원하며, 추론 시 얕은 classifier를 사용하면 정확도 손실은 있으나 의미 있는 가속을 달성한다.
전통적 증류와 비교할 때, self distillation은 종종 동일하거나 더 나은 정확도 향상을 제공하며 별도의 교사 모델이 필요 없고 훈련 속도가 더 빠르다(예: CIFAR100 실험에서 4.6배 빠름).
Self distillation으로 학습된 얕은 classifier는 보고된 모든 사례에서 Deep Supervision으로 학습된 경우보다 더 잘 작동한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.