QUICK REVIEW

[论文解读] Not Afraid of the Dark: NIR-VIS Face Recognition via Cross-spectral Hallucination and Low-rank Embedding

José Lezama, Qiang Qiu|arXiv (Cornell University)|Nov 21, 2016

Face recognition and analysis参考文献 38被引用 19

一句话总结

该论文提出了一种方法，使预训练的可见光谱（VIS）深度人脸识别模型无需微调即可有效处理近红外（NIR）图像。该方法结合了跨谱系幻觉——使用卷积神经网络（CNN）从NIR输入生成合成VIS人脸——以及低秩嵌入以对齐跨谱系的特征，在CASIA NIR-VIS v2.0数据集上实现了96.41%的SOTA（最先进）rank-1准确率。

ABSTRACT

Surveillance cameras today often capture NIR (near infrared) images in low-light environments. However, most face datasets accessible for training and verification are only collected in the VIS (visible light) spectrum. It remains a challenging problem to match NIR to VIS face images due to the different light spectrum. Recently, breakthroughs have been made for VIS face recognition by applying deep learning on a huge amount of labeled VIS face samples. The same deep learning approach cannot be simply applied to NIR face recognition for two main reasons: First, much limited NIR face images are available for training compared to the VIS spectrum. Second, face galleries to be matched are mostly available only in the VIS spectrum. In this paper, we propose an approach to extend the deep learning breakthrough for VIS face recognition to the NIR spectrum, without retraining the underlying deep models that see only VIS faces. Our approach consists of two core components, cross-spectral hallucination and low-rank embedding, to optimize respectively input and output of a VIS deep model for cross-spectral face recognition. Cross-spectral hallucination produces VIS faces from NIR images through a deep learning approach. Low-rank embedding restores a low-rank structure for faces deep features across both NIR and VIS spectrum. We observe that it is often equally effective to perform hallucination to input NIR images or low-rank embedding to output deep features for a VIS deep model for cross-spectral recognition. When hallucination and low-rank embedding are deployed together, we observe significant further improvement; we obtain state-of-the-art accuracy on the CASIA NIR-VIS v2.0 benchmark, without the need at all to re-train the recognition system.

研究动机与目标

解决在NIR训练数据有限的情况下，将NIR人脸图像与VIS人脸图库进行匹配的挑战。
使最先进VIS人脸识别模型无需微调或重新训练即可泛化至NIR领域。
通过修改预训练VIS深度神经网络（DNN）的输入和输出，克服VIS与NIR之间的谱域偏移。
开发一种迁移学习框架，在保持模型性能的同时扩展其用于跨谱系识别。

提出的方法

应用基于图像块的CNN，从输入的NIR图像中幻觉生成高分辨率的可见光谱人脸，以保留面部细节。
使用学习得到的混合参数（α ≈ 0.6–0.7）将幻觉生成的亮度通道与原始NIR图像融合，以减少伪影。
将预训练的VIS DNN（如VGG-S、VGG-face、COTS）作为固定特征提取器，应用于幻觉生成的VIS输入。
对DNN的深层特征（倒数第二层）应用低秩变换，以在NIR和VIS谱系之间强制共享低维子空间。
学习一个1024×1024的低秩嵌入矩阵，以对齐同一人的特征，同时分离不同人的特征。
使用余弦相似度对图库（VIS）与探测样本（NIR，经由幻觉与嵌入处理）的特征进行匹配。

实验结果

研究问题

RQ1是否可以无需重新训练，就有效将预训练的VIS人脸识别模型适配至NIR人脸识别？
RQ2将NIR图像跨谱系幻觉生成VIS空间，是否能显著提升识别性能？
RQ3在NIR与VIS谱系之间对DNN特征进行低秩嵌入，是否能增强跨谱系特征对齐？
RQ4幻觉与低秩嵌入的联合效应如何影响识别准确率？
RQ5该方法对幻觉混合参数（α）和模型选择的变化具有多强的鲁棒性？

主要发现

所提方法在CASIA NIR-VIS v2.0基准测试中实现了96.41%的rank-1准确率，创下新的SOTA记录。
仅使用跨谱系幻觉，即可将VGG-S的rank-1准确率从75.04%提升至95.72%，显示出显著的性能增益。
仅使用低秩嵌入，即可将VGG-S的准确率从57.53%提升至82.07%，证明其在特征空间对齐中的有效性。
幻觉与低秩嵌入的结合带来了最高性能提升，在COTS上实现了96.41%的rank-1准确率。
幻觉图像重建的最优混合参数α约为0.6–0.7，可在保留细节与减少伪影之间实现良好平衡。
该方法具有模型无关性，在不同预训练VIS DNN（VGG-S、VGG-face、COTS）上均表现良好，证实了其泛化能力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。