[论文解读] Rethinking Feature Discrimination and Polymerization for Large-scale Recognition
本文提出了共族余弦(COCO)损失,通过利用类别质心和余弦相似性,联动优化类内聚合和类间判别性,从而实现大规模识别的端到端稳定训练。
Feature matters. How to train a deep network to acquire discriminative features across categories and polymerized features within classes has always been at the core of many computer vision tasks, specially for large-scale recognition systems where test identities are unseen during training and the number of classes could be at million scale. In this paper, we address this problem based on the simple intuition that the cosine distance of features in high-dimensional space should be close enough within one class and far away across categories. To this end, we proposed the congenerous cosine (COCO) algorithm to simultaneously optimize the cosine similarity among data. It inherits the softmax property to make inter-class features discriminative as well as shares the idea of class centroid in metric learning. Unlike previous work where the center is a temporal, statistical variable within one mini-batch during training, the formulated centroid is responsible for clustering inner-class features to enforce them polymerized around the network truncus. COCO is bundled with discriminative training and learned end-to-end with stable convergence. Experiments on five benchmarks have been extensively conducted to verify the effectiveness of our approach on both small-scale classification task and large-scale human recognition problem.
研究动机与目标
- 在超大规模识别中,需要在跨类别具备判别性、在同一类内具有聚合性的特征的动机。
- 提出一种新损失(COCO),通过对类别质心优化余弦相似性来实现两者目标。
- 确保在小规模到大规模基准测试中实现端到端可训练性和稳定收敛。
提出的方法
- 定义特征与类别质心之间的余弦相似度。
- 将 COCO 损失表述为对归一化、缩放后的特征和质心之间的交叉熵。
- 在训练过程中与网络参数共同更新类别质心(无单独的中心损失项)。
- 提供特征与质心的梯度,以便在标准 CNN 流水线中进行反向传播。
- 从理论上将缩放因子 alpha 与网络与类别数量相关联,并推导出一个最优下界。
- 证明在稳定性和收敛性方面相对于三元组损失和中心损失的优势。
实验结果
研究问题
- RQ1基于余弦的、以质心为引导的目标是否能够在极大数量级的类别下同时实现紧密的类内聚类和较大的类间边距?
- RQ2COCO 是否能够实现稳定的端到端训练,并在大规模识别任务中比现有度量学习损失(如三元组损失、中心损失)具有更好的可扩展性?
主要发现
| 方法 | MNIST 错误率 (%) | CIFAR-10 错误率 (%) |
|---|---|---|
| Softmax | 0.36 | 6.70 |
| Center loss + softmax | 0.32 | 6.66 |
| Triplet loss | 1.45 | 12.69 |
| Triplet loss + softmax | 0.38 | 6.73 |
| COCO | 0.30 | 6.25 |
- 在 MNIST 和 CIFAR-10 上,在不进行数据增强的情况下,COCO 实现具有竞争力或更优的准确率,超过若干基线。
- 在大规模人脸识别基准上,COCO 在验证和识别任务中取得了最先进或具竞争力的结果(如 LFW 和 MegaFace)。
- COCO 在余弦距离可视化中较 softmax 和 triplet 损失呈现出更清晰的类内聚合和更大的类间分离。
- 可确定一个最优的缩放因子 alpha,并给出推导出的下界以指导实际设置(alpha ≈ 0.5 log(K-1) + 3)。
- 该方法保持稳定的收敛性,避免在大量类别时有时观察到的三元组损失训练不稳定性。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。