QUICK REVIEW

[Paper Review] CosFace: Large Margin Cosine Loss for Deep Face Recognition

Hao Wang, Yitong Wang|arXiv (Cornell University)|Jan 29, 2018

Face recognition and analysis29 references271 citations

TL;DR

CosFace introduces Large Margin Cosine Loss (LMCL) by normalizing features and weights and adding a cosine margin, achieving state-of-the-art results on LFW, YTF, and MegaFace benchmarks.

ABSTRACT

Face recognition has made extraordinary progress owing to the advancement of deep convolutional neural networks (CNNs). The central task of face recognition, including face verification and identification, involves face feature discrimination. However, the traditional softmax loss of deep CNNs usually lacks the power of discrimination. To address this problem, recently several loss functions such as center loss, large margin softmax loss, and angular softmax loss have been proposed. All these improved losses share the same idea: maximizing inter-class variance and minimizing intra-class variance. In this paper, we propose a novel loss function, namely large margin cosine loss (LMCL), to realize this idea from a different perspective. More specifically, we reformulate the softmax loss as a cosine loss by $L_2$ normalizing both features and weight vectors to remove radial variations, based on which a cosine margin term is introduced to further maximize the decision margin in the angular space. As a result, minimum intra-class variance and maximum inter-class variance are achieved by virtue of normalization and cosine decision margin maximization. We refer to our model trained with LMCL as CosFace. Extensive experimental evaluations are conducted on the most popular public-domain face recognition datasets such as MegaFace Challenge, Youtube Faces (YTF) and Labeled Face in the Wild (LFW). We achieve the state-of-the-art performance on these benchmarks, which confirms the effectiveness of our proposed approach.

Motivation & Objective

Motivate the need for stronger discriminative features in face recognition beyond traditional softmax.
Propose LMCL to maximize inter-class variance and minimize intra-class variance in cosine space.
Show how normalization of both features and weights creates a hyperspherical feature distribution with a large angular margin.
Demonstrate state-of-the-art performance on standard benchmarks (LFW, YTF, Megaface) using CosFace.
Provide theoretical insights into the margin parameter and normalization effects on learning dynamics.

Proposed method

Reformulate softmax as a cosine loss by L2-normalizing both features and class weight vectors.
Introduce a cosine margin m to create a large-margin decision boundary in cosine space.
Define LMCL objective: L_lmc = (1/N) sum_i -log( e^{s(cos(theta_{y_i,i})-m)} / ( e^{s(cos(theta_{y_i,i})-m)} + sum_{j≠y_i} e^{s cos(theta_{j,i})} ) ), with W and x normalized and scaled by s.
Normalize both features and weights to place features on a hypersphere and use a fixed scaling parameter s (set to 64).
Compare LMCL against Softmax, Normalized Softmax (NSL), and A-Softmax, highlighting consistent cosine-space margins and robustness to perturbations.
Provide theoretical analysis of margin m bounds and discuss geometric interpretation of LMCL on hyperspherical feature space.

Figure 1: An overview of the proposed CosFace framework. In the training phase, the discriminative face features are learned with a large margin between different classes. In the testing phase, the testing data is fed into CosFace to extract face features which are later used to compute the cosine s

Experimental results

Research questions

RQ1Does LMCL improve discrimination by enforcing a fixed cosine margin in the angular space, compared to previously proposed angular or Euclidean margins?
RQ2What is the effect of jointly normalizing features and weights on training dynamics and the resulting feature geometry on a hypersphere?
RQ3How does LMCL perform on standard face recognition benchmarks (LFW, YTF, MegaFace) relative to state-of-the-art loss functions?
RQ4What guidance can be provided for choosing the cosine margin parameter m and scaling s in practice?

Key findings

LMCL yields competitive and often superior results across major benchmarks (LFW, YTF, MegaFace) compared to prior losses.
With feature normalization and a fixed scale s=64, LMCL achieves LFW 99.33 and YTF 96.1 in the cited experiments (using CASIA-WebFace).
LMCL outperforms A-Softmax with feature normalization on YTF and MegaFace in the reported comparisons.
CosFace results on LFW and YTF surpass many competing methods, including single-patch and ensemble configurations, demonstrating strong generalization.
Feature normalization significantly improves Megaface performance, as shown by the reported MF1 scores (Rank1 and Veri).
CosFace achieves state-of-the-art on LFW (99.73) and YTF (97.6) in the larger benchmark experiments.

Figure 2: The comparison of decision margins for different loss functions the binary-classes scenarios. Dashed line represents decision boundary, and gray areas are decision margins.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.