QUICK REVIEW

[논문 리뷰] Feature-map-level Online Adversarial Knowledge Distillation

Inseop Chung, Seonguk Park|arXiv (Cornell University)|2020. 02. 05.

Adversarial Robustness in Machine Learning참고 문헌 21인용 수 77

한 줄 요약

본 논문은 온라인 지식 증류를 도입하여 분류 확률 지식(class-probability knowledge)과 특징 맵 분포(feature-map distributions)를 적대적 학습(adversarial training)을 통해 전이하며, 여러 네트워크를 학습시키는 순환(cyclic) 방식으로 제시하고, 특히 소형-대형 네트워크 쌍에서의 향상을 제공합니다.

ABSTRACT

Feature maps contain rich information about image intensity and spatial correlation. However, previous online knowledge distillation methods only utilize the class probabilities. Thus in this paper, we propose an online knowledge distillation method that transfers not only the knowledge of the class probabilities but also that of the feature map using the adversarial training framework. We train multiple networks simultaneously by employing discriminators to distinguish the feature map distributions of different networks. Each network has its corresponding discriminator which discriminates the feature map from its own as fake while classifying that of the other network as real. By training a network to fool the corresponding discriminator, it can learn the other network's feature map distribution. We show that our method performs better than the conventional direct alignment method such as L1 and is more suitable for online distillation. Also, we propose a novel cyclic learning scheme for training more than two networks together. We have applied our method to various network architectures on the classification task and discovered a significant improvement of performance especially in the case of training a pair of a small network and a large one.

연구 동기 및 목표

로그잇(logits) 너머의 중간 특징 맵을 활용해 온라인 지식 증류를 개선하려는 동기 부여.
협력 학습 네트워크 간 특징 맵 분포를 적대적으로 증류하는 방법 제안.
온라인에서 두 개를 넘는 네트워크를 효율적으로 학습시키기 위한 순환 학습 방식 개발.
아키텍처 간 및 규모 간의 효과를 보여주기 위한 across 아키텍처 및 ImageNet 포함 평가.

제안 방법

로짓 수준 지식(교차 엔트로피 CE + KL 기반 상호 증류)과 특징 맵 수준 지식(적대적 증류)을 모두 증류한다.
각 네트워크에 판별기(discriminator)를 부착해 자신의 특징 맵 분포와 다른 네트워크의 분포를 구분하게 한다.
각 네트워크가 해당 판별기를 속이도록 학습시켜 특징 맵 분포를 정렬한다.
네트워크 간 아키텍처 차이로 인한 특징 맵 채널 불일치를 처리하기 위한 전이 계층(transfer layer) 사용.
온라인 증류를 두 대 네트워크를 넘어서 확장하기 위한 순환 학습 프레임워크를 도입해 판별기 수와 계산량을 줄이고 1→2→…→K→1의 순차적 증류 흐름을 제공.

실험 결과

연구 질문

RQ1온라인 증류에서 로짓 외에 특징 맵 분포를 전이하는 것이 이익이 되나요?
RQ2온라인 설정에서 직접적인 특징 맵 정렬 손실(L1/L2)과 비교해 보았을 때, 적대적 특징 맵 증류가 더 우수한가요?
RQ3순환 학습 방식이 온라인 증류를 두 네트워크 이상으로 효과적으로 확장하나요?
RQ4제안된 방법들이 동일- 및 교차-아키텍처 네트워크 쌍에 대해 일반화되며 ImageNet으로 확장되나요?

주요 결과

적대적 특징 맵 증류(AF D)가 온라인 및 오프라인 설정에서 직접 특징 맵 정렬 방법(L1/L1+KD)보다 우수하다.
AFD는 소형-대형 네트워크를 짝지었을 때 및 동일-은 교차-아키텍처 페어링 전반에서 이익을 제공한다.
여러 아키텍처에서 CIFAR-100에 나타난 뚜렷한 개선과 DML을 베이스라인으로 한 ImageNet에서도 개선이 관측된다.
순환 학습 프레임워크를 통해 세 개 이상의 네트워크의 온라인 학습을 효율적으로 구현하고 경쟁력 있거나 더 우수한 성능을 달성한다.
추가 실험에서 로짓 수준의 상호 증류와 적대적 특징 맵 증류 모두 성능 향상에 기여함을 확인했다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.