QUICK REVIEW

[论文解读] ShrinkTeaNet: Million-scale Lightweight Face Recognition via Shrinking Teacher-Student Networks

Chi Nhan Duong, Khoa Luu|arXiv (Cornell University)|May 25, 2019

Face recognition and analysis参考文献 52被引用 26

一句话总结

ShrinkTeaNet 提出了一种新颖的知识蒸馏框架，通过使用角度蒸馏损失，从一个大型教师网络中蒸馏特征方向和样本分布知识，以训练轻量级学生网络，实现百万规模的人脸识别。该方法在 LFW 上达到 99.77% 的准确率，在 MegaFace 上达到 95.64%，显著优于先前方法在开放集设置下的表现。

ABSTRACT

Large-scale face recognition in-the-wild has been recently achieved matured performance in many real work applications. However, such systems are built on GPU platforms and mostly deploy heavy deep network architectures. Given a high-performance heavy network as a teacher, this work presents a simple and elegant teacher-student learning paradigm, namely ShrinkTeaNet, to train a portable student network that has significantly fewer parameters and competitive accuracy against the teacher network. Far apart from prior teacher-student frameworks mainly focusing on accuracy and compression ratios in closed-set problems, our proposed teacher-student network is proved to be more robust against open-set problem, i.e. large-scale face recognition. In addition, this work introduces a novel Angular Distillation Loss for distilling the feature direction and the sample distributions of the teacher's hypersphere to its student. Then ShrinkTeaNet framework can efficiently guide the student's learning process with the teacher's knowledge presented in both intermediate and last stages of the feature embedding. Evaluations on LFW, CFP-FP, AgeDB, IJB-B and IJB-C Janus, and MegaFace with one million distractors have demonstrated the efficiency of the proposed approach to learn robust student networks which have satisfying accuracy and compact sizes. Our ShrinkTeaNet is able to support the light-weight architecture achieving high performance with 99.77% on LFW and 95.64% on large-scale Megaface protocols.

研究动机与目标

解决在计算资源有限的移动设备和嵌入式设备上部署大规模人脸识别的挑战。
通过超越仅分类 logits 的知识迁移，提升开放集人脸识别中的鲁棒性，其中测试类别与训练类别不同。
开发一种方法，保留教师网络超球面的几何结构，特别是特征方向和样本分布，以实现更好的泛化能力。
实现在特征嵌入所有阶段的高效蒸馏，而不仅限于最终层，以增强学生网络的性能。
在保持大规模、开放集条件下鲁棒性的前提下，实现紧凑、轻量级架构的竞争力准确率。

提出的方法

提出一种角度蒸馏损失，促使学生网络匹配教师网络超球面中特征的方向和分布，而非精确的特征值。
在特征嵌入过程的每一阶段应用蒸馏，实现从教师网络的中间层和最终层的知识迁移。
使用高性能的大型网络作为教师，轻量级架构（如 MobileNetV1、MobileNetV2、MobileFaceNet）作为学生，通过知识蒸馏进行训练。
采用基于特征向量之间角度相似性的软约束，相比传统的 ℓ₂ 或交叉熵损失更具灵活性。
设计一种多阶段蒸馏框架，联合优化学生网络的表示学习，结合最终分类器和中间特征图。
利用教师网络学习到的类别分布和特征几何结构，指导学生网络决策边界的形成，提升对未见类别的泛化能力。

实验结果

研究问题

RQ1知识蒸馏能否有效应用于开放集人脸识别，其中测试类别与训练类别不同？
RQ2与基于 logits 或 ℓ₂ 损失的标准蒸馏相比，从教师网络超球面中蒸馏特征方向和样本分布是否能提升学生网络的泛化能力？
RQ3轻量级学生网络能否在 MegaFace 等大规模基准上实现具有竞争力的性能，同时保持极低的参数量？
RQ4与仅在最终层进行蒸馏相比，跨特征嵌入层的多阶段蒸馏如何影响学生网络的性能？
RQ5与基于 ℓ₂ 的蒸馏相比，所提出的角度蒸馏损失是否能减少轻量级模型中的过拟合和训练不稳定性？

主要发现

ShrinkTeaNet 在 LFW 基准上达到 99.77% 的准确率，即使使用轻量级学生网络也表现出强大性能。
在包含一百万个干扰样本的大规模 MegaFace 协议中，ShrinkTeaNet-MFNR 达到 95.64% 的准确率，与 ArcFace 的差距仅 1.71%。
在 LFW 上，教师与学生网络的性能差距缩小至 0.05%；在 CFP-FP 上为 1.83%；在 AgeDB 上为 0.74%，显著优于基于 ℓ₂ 的蒸馏方法。
在 IJB-B 和 IJB-C 协议上，ShrinkTeaNet 相较基线模型将学生性能提升了 1.9% 至 3.64%，在 IJB-C 上与 ArcFace 的差距仅为 0.016。
使用角度蒸馏损失的训练过程比基于 ℓ₂ 的蒸馏更稳定，后者在轻量级模型中常出现过正则化和不稳定性问题。
ShrinkTeaNet 是首个专为开放集大规模人脸识别设计并经过验证的蒸馏框架，展现出对分布偏移的强鲁棒性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。