QUICK REVIEW

[论文解读] Learning Deep Features via Congenerous Cosine Loss for Person Recognition

Yu Liu, Hongyang Li|arXiv (Cornell University)|Feb 22, 2017

Video Surveillance and Tracking Methods参考文献 17被引用 56

一句话总结

这篇论文提出 Congenerous Cosine (COCO) 损失，通过基于质心的 softmax 最大化类内相似度与类间变异性来训练区域基深度特征，实现一次性训练而无需在测试阶段微调。

ABSTRACT

Person recognition aims at recognizing the same identity across time and space with complicated scenes and similar appearance. In this paper, we propose a novel method to address this task by training a network to obtain robust and representative features. The intuition is that we directly compare and optimize the cosine distance between two features - enlarging inter-class distinction as well as alleviating inner-class variance. We propose a congenerous cosine loss by minimizing the cosine distance between samples and their cluster centroid in a cooperative way. Such a design reduces the complexity and could be implemented via softmax with normalized inputs. Our method also differs from previous work in person recognition that we do not conduct a second training on the test subset. The identity of a person is determined by measuring the similarity from several body regions in the reference set. Experimental results show that the proposed approach achieves better classification accuracy against previous state-of-the-arts.

研究动机与目标

在不受控场景中跨时间和空间激发鲁棒的人物识别。
开发一种直接优化余弦相似度的损失，以减少类内方差并增大类间分离。
启用一个多区域、基于对齐的识别流程，避免在测试数据上进行第二次训练。
展示使用 COCO 损失的单阶段训练在 PIPA 上达到具有竞争力或优越的结果。
提供对区域贡献和对齐的分析，以减少过拟合。

提出的方法

在小批量内定义样本与其类别质心之间的余弦相似度。
引入 COCO 损失，利用归一化特征和质心上的 softmax 来优化类内紧凑性和类间可分离性。
通过仿射变换将四个区域补丁（脸部、头部、上半身、全身）对齐到一个基准位置，以降低方差。
在 PIPA 训练集上训练四个区域特定的 COCO 模型，并在推理时结合区域分数。
将区域相似度与逻辑回归归一化和加权平均合并，以在 test_1 上预测身份，而无需在 test_0 上重新训练。
详细描述 COCO 损失在归一化特征和质心上的反向传播，以实现端到端训练。

实验结果

研究问题

RQ1基于质心的余弦损失是否能在不受控图像中改善类内紧凑性和类间分离以提升人物识别？
RQ2将区域补丁对齐到基准位置是否降低类内方差并提升识别准确率？
RQ3在单阶段训练下，汇集多个身体区域线索如何影响 PIPA 的识别性能？
RQ4是否可行在不对 test_0 再次训练的情况下，通过直接比较特征来完成跨测试划分的人物识别？
RQ5与标准 softmax 相比，COCO 损失在特征可视化和判别中的影响？

主要发现

COCO 损失在特征空间中放大类间距离，同时减少类内方差。
区域补丁的对齐在各区域显著提升性能。
脸部和头部区域提供最强信号，全身区域的贡献则取决于对齐的程度。
将四个区域结合可获得最佳原始集准确率（并提升其他分割），相较于单一区域线索。
在 PIPA 数据集的 original、album、time、day 划分中，该方法超越了之前的最新方法。
区域基分数可以通过归一化和加权平均合并，以在不进行测试时重新训练的情况下产生最终身份。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。