QUICK REVIEW

[论文解读] Distribution Distillation Loss: Generic Approach for Improving Face Recognition from Hard Samples

Yuge Huang, Pengcheng Shen|arXiv (Cornell University)|Feb 10, 2020

Face recognition and analysis参考文献 35被引用 3

一句话总结

本文提出分布蒸馏损失（Distribution Distillation Loss），一种通用方法，通过从易样本（教师）和难样本（学生）蒸馏相似度分布，提升在困难样本上的人脸识别性能。通过新型损失函数将学生模型的分布对齐至教师模型分布，有效减少了正负样本对之间的重叠，显著提升了在姿态、种族和分辨率等复杂变化下的性能表现，在大规模基准测试中优于ArcFace和CosFace。

ABSTRACT

Large facial variations are the main challenge in face recognition. To this end, previous variation-specific methods make full use of task-related prior to design special network losses, which are typically not general among different tasks and scenarios. In contrast, the existing generic methods focus on improving the feature discriminability to minimize the intra-class distance while maximizing the interclass distance, which perform well on easy samples but fail on hard samples. To improve the performance on those hard samples for general tasks, we propose a novel Distribution Distillation Loss to narrow the performance gap between easy and hard samples, which is a simple, effective and generic for various types of facial variations. Specifically, we first adopt state-of-the-art classifiers such as ArcFace to construct two similarity distributions: teacher distribution from easy samples and student distribution from hard samples. Then, we propose a novel distribution-driven loss to constrain the student distribution to approximate the teacher distribution, which thus leads to smaller overlap between the positive and negative pairs in the student distribution. We have conducted extensive experiments on both generic large-scale face benchmarks and benchmarks with diverse variations on race, resolution and pose. The quantitative results demonstrate the superiority of our method over strong baselines, e.g., Arcface and Cosface.

研究动机与目标

解决人脸识别中易样本与难样本之间的性能差距，尤其在大范围面部变化下。
克服任务特定损失在不同变化类型间泛化能力不足的局限。
在不依赖任务特定先验知识的前提下，提升难样本上的特征可分性。
开发一种适用于多样化人脸识别场景与变化类型的通用损失函数。
通过分布级别的知识蒸馏，最小化难样本中的类内方差与类间重叠。

提出的方法

使用如ArcFace等先进的分类器，生成两组相似度分布：一组来自易样本（教师），另一组来自难样本（学生）。
定义一种基于分布的损失函数，促使学生分布逼近教师分布。
构建蒸馏损失，以减少学生分布中正负样本对之间的重叠。
在标准分类损失之外，联合使用所提出的分布蒸馏损失，端到端训练学生网络。
利用教师模型中已充分分离的分布知识，引导学生学习更鲁棒的难样本表示。
通过避免依赖任务特定设计或变化类型特定先验，确保方法的通用性。

实验结果

研究问题

RQ1是否能够通过一种通用损失函数，在无需任务特定设计的前提下，有效提升人脸识别在难样本上的性能？
RQ2从易样本和难样本蒸馏相似度分布，对特征可分性有何影响？
RQ3分布蒸馏在多大程度上能够缩小不同面部变化下易样本与难样本之间的性能差距？
RQ4所提出的方法是否能在包含种族、分辨率和姿态差异的不同基准上实现泛化？
RQ5与ArcFace和CosFace等现有最先进损失相比，分布蒸馏损失在处理难样本时表现如何？

主要发现

所提出的分布蒸馏损失在多种面部变化下，显著提升了难样本上的人脸识别准确率。
该方法在大规模人脸识别基准上优于强基线模型如ArcFace和CosFace。
易样本与难样本之间的性能差距明显缩小，尤其在极端姿态和低分辨率等挑战性条件下表现突出。
该损失在多个基准上均实现一致的性能提升，包括存在多样化种族和分辨率变化的基准。
该方法在不同网络架构和数据分布下保持强泛化能力，无需任务特定调整。
消融实验确认，性能提升的关键在于分布蒸馏机制本身，而非仅使用ArcFace作为教师模型。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。