QUICK REVIEW

[论文解读] Partial FC: Training 10 Million Identities on a Single Machine

Xiang An, Xuhan Zhu|arXiv (Cornell University)|Oct 11, 2020

Face recognition and analysis参考文献 23被引用 32

一句话总结

论文提出基于 PPRN 的 softmax 近似和分布式训练策略，在有限硬件条件下训练拥有数千万身份的人脸识别模型，仅使用每次迭代的 10% 类中心，同时达到 state-of-the-art 的结果。

ABSTRACT

Face recognition has been an active and vital topic among computer vision community for a long time. Previous researches mainly focus on loss functions used for facial feature extraction network, among which the improvements of softmax-based loss functions greatly promote the performance of face recognition. However, the contradiction between the drastically increasing number of face identities and the shortage of GPU memories is gradually becoming irreconcilable. In this paper, we thoroughly analyze the optimization goal of softmax-based loss functions and the difficulty of training massive identities. We find that the importance of negative classes in softmax function in face representation learning is not as high as we previously thought. The experiment demonstrates no loss of accuracy when training with only 10\% randomly sampled classes for the softmax-based loss functions, compared with training with full classes using state-of-the-art models on mainstream benchmarks. We also implement a very efficient distributed sampling algorithm, taking into account model accuracy and training efficiency, which uses only eight NVIDIA RTX2080Ti to complete classification tasks with tens of millions of identities. The code of this paper has been made available https://github.com/deepinsight/insightface/tree/master/recognition/partial_fc.

研究动机与目标

推动在 GPU 内存限制下，用极大数量级的身份来训练基于 softmax 的损失的挑战性
提出一种基于采样的 softmax 近似，在仅使用子集类中心的同时保持精度
开发一种分布式训练策略，降低超大规模分类的通信和内存开销
介绍并发布 Glint360K，这是一份大型、干净的人脸识别数据集，用于支持大规模实验

提出的方法

将 softmax 损失在特征和权重具备固定范数的前提下进行公式化，关注特征与类中心之间的角度分离
提出 Positive Plus Randomly Negative (PPRN) 采样：始终包含正类中心并随机采样负类，显示对采样率的鲁棒性
通过在每个 GPU 上存储不重叠的 W 子集并聚合局部 logits 来实现近似全 softmax 的分布式方法，以降低通信开销
给出内存和吞吐分析，比较模型并行与所提方法，显示在多 GPU 环境下可扩展到 10M+ 身份
给出训练设置（ResNet 主干网络、CosFace/ArcFace 损失、批大小 512、学习率等）并在多个基准和大规模数据集上报告性能

实验结果

研究问题

RQ1当每次迭代只使用一小部分类中心时，基于 softmax 的人脸识别损失能否保持准确度？
RQ2如何在不耗尽 GPU 内存的情况下高效训练极大规模的 softmax 分类（数百万身份）？
RQ3带部分类中心的分布式近似策略是否在标准基准上保持最先进的性能？
RQ4采样策略（PPRN 与随机）对模型精度和收敛有何影响？
RQ5新发布的大规模数据集（Glint360K）能否在最少的类中心使用情况下实现有竞争力的结果？

主要发现

仅用 10% 的类中心即可实现主流基准上接近全类 softmax 的准确性
PPRN 采样（包含正样本和随机选择的负样本）在 0.1、0.5、1.0 的采样率下表现鲁棒
所提出的分布式近似降低了通信和内存开销，在八 GPU 下可扩展到 1000 万身份，更多 GPUs 下可扩展到 1 亿以上
在 Glint360K 上训练时，该方法在 MegaFace 上达到最先进的验证结果，在 IJB-B 和 IJB-C 上也具有竞争力或更优，且仅使用 10% 的采样
该方法使在常规硬件上对超大规模人脸识别模型的实际训练成为可能（如 8–64 GPU），相较于传统模型并行 softmax 提供显著加速
Glint360K 作为大型、清理过的数据集发布，用于支持未来的大规模人脸识别研究

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。