[Paper Review] CosFace: Large Margin Cosine Loss for Deep Face Recognition
CosFace introduces Large Margin Cosine Loss (LMCL) by normalizing features and weights and adding a cosine margin, achieving state-of-the-art results on LFW, YTF, and MegaFace benchmarks.
Face recognition has made extraordinary progress owing to the advancement of deep convolutional neural networks (CNNs). The central task of face recognition, including face verification and identification, involves face feature discrimination. However, the traditional softmax loss of deep CNNs usually lacks the power of discrimination. To address this problem, recently several loss functions such as center loss, large margin softmax loss, and angular softmax loss have been proposed. All these improved losses share the same idea: maximizing inter-class variance and minimizing intra-class variance. In this paper, we propose a novel loss function, namely large margin cosine loss (LMCL), to realize this idea from a different perspective. More specifically, we reformulate the softmax loss as a cosine loss by $L_2$ normalizing both features and weight vectors to remove radial variations, based on which a cosine margin term is introduced to further maximize the decision margin in the angular space. As a result, minimum intra-class variance and maximum inter-class variance are achieved by virtue of normalization and cosine decision margin maximization. We refer to our model trained with LMCL as CosFace. Extensive experimental evaluations are conducted on the most popular public-domain face recognition datasets such as MegaFace Challenge, Youtube Faces (YTF) and Labeled Face in the Wild (LFW). We achieve the state-of-the-art performance on these benchmarks, which confirms the effectiveness of our proposed approach.
Motivation & Objective
- Motivate the need for stronger discriminative features in face recognition beyond traditional softmax.
- Propose LMCL to maximize inter-class variance and minimize intra-class variance in cosine space.
- Show how normalization of both features and weights creates a hyperspherical feature distribution with a large angular margin.
- Demonstrate state-of-the-art performance on standard benchmarks (LFW, YTF, Megaface) using CosFace.
- Provide theoretical insights into the margin parameter and normalization effects on learning dynamics.
Proposed method
- Reformulate softmax as a cosine loss by L2-normalizing both features and class weight vectors.
- Introduce a cosine margin m to create a large-margin decision boundary in cosine space.
- Define LMCL objective: L_lmc = (1/N) sum_i -log( e^{s(cos(theta_{y_i,i})-m)} / ( e^{s(cos(theta_{y_i,i})-m)} + sum_{j≠y_i} e^{s cos(theta_{j,i})} ) ), with W and x normalized and scaled by s.
- Normalize both features and weights to place features on a hypersphere and use a fixed scaling parameter s (set to 64).
- Compare LMCL against Softmax, Normalized Softmax (NSL), and A-Softmax, highlighting consistent cosine-space margins and robustness to perturbations.
- Provide theoretical analysis of margin m bounds and discuss geometric interpretation of LMCL on hyperspherical feature space.

Experimental results
Research questions
- RQ1Does LMCL improve discrimination by enforcing a fixed cosine margin in the angular space, compared to previously proposed angular or Euclidean margins?
- RQ2What is the effect of jointly normalizing features and weights on training dynamics and the resulting feature geometry on a hypersphere?
- RQ3How does LMCL perform on standard face recognition benchmarks (LFW, YTF, MegaFace) relative to state-of-the-art loss functions?
- RQ4What guidance can be provided for choosing the cosine margin parameter m and scaling s in practice?
Key findings
- LMCL yields competitive and often superior results across major benchmarks (LFW, YTF, MegaFace) compared to prior losses.
- With feature normalization and a fixed scale s=64, LMCL achieves LFW 99.33 and YTF 96.1 in the cited experiments (using CASIA-WebFace).
- LMCL outperforms A-Softmax with feature normalization on YTF and MegaFace in the reported comparisons.
- CosFace results on LFW and YTF surpass many competing methods, including single-patch and ensemble configurations, demonstrating strong generalization.
- Feature normalization significantly improves Megaface performance, as shown by the reported MF1 scores (Rank1 and Veri).
- CosFace achieves state-of-the-art on LFW (99.73) and YTF (97.6) in the larger benchmark experiments.

Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.