QUICK REVIEW

[论文解读] Occlusion Robust Face Recognition Based on Mask Learning with PairwiseDifferential Siamese Network

Lingxue Song, Dihong Gong|arXiv (Cornell University)|Aug 17, 2019

Face recognition and analysis参考文献 32被引用 46

一句话总结

论文提出一个 Pairwise Differential Siamese Network (PDSN) 来学习哪些深度 CNN 特征被遮挡所污染，构建 Feature Discarding Masks (FDM) 的掩码字典，并应用这些掩码实现对遮挡鲁棒的人脸识别。

ABSTRACT

Deep Convolutional Neural Networks (CNNs) have been pushing the frontier of the face recognition research in the past years. However, existing general CNN face models generalize poorly to the scenario of occlusions on variable facial areas. Inspired by the fact that a human visual system explicitly ignores occlusions and only focuses on non-occluded facial areas, we propose a mask learning strategy to find and discard the corrupted feature elements for face recognition. A mask dictionary is firstly established by exploiting the differences between the top convoluted features of occluded and occlusion-free face pairs using an innovatively designed Pairwise Differential Siamese Network (PDSN). Each item of this dictionary captures the correspondence between occluded facial areas and corrupted feature elements, which is named Feature Discarding Mask (FDM). When dealing with a face image with random partial occlusions, we generate its FDM by combining relevant dictionary items and then multiply it with the original features to eliminate those corrupted feature elements. Comprehensive experiments on both synthesized and realistic occluded face datasets show that the proposed approach significantly outperforms the state-of-the-arts.

研究动机与目标

在随机部分遮挡下提升鲁棒性的人脸识别动机，其中遮挡区域会降低深度特征。
使用 Pairwise Differential Siamese Network 学习遮挡的人脸块与被污染 CNN 特征之间的对应关系。
在测试阶段构建一个 Feature Discarding Masks (FDM) 的掩码字典，以去除被污染的特征。
在综合与现实遮挡的人脸数据集上展示遮挡鲁棒性与泛化能力。
表明对被污染特征进行遮蔽不会降低非遮挡人脸的性能，且在遮挡情形下提升表现。

提出的方法

将对齐后的人脸分成 N x N 的块，以局部化遮挡效应。
使用一个 trunk CNN（以 ArcFace 为支撑）并配备掩码生成器 M_theta，产生顶层卷积特征的逐元素掩码：f̃ = M_theta(·) f。
用联合损失训练 M_theta：L_theta = sum_i L_cls( f̃(x_j^i), y^i ) + lambda L_diff( f̃(x_j^i), f̃(x^i) )，其中 L_diff 是对遮蔽特征的成对对比损失。
L_diff = || M_theta(·) f(x^i) − M_theta(·) f(x_j^i) ||_1，鼓励带遮挡的特征与非遮挡对应特征对齐。
阶段性训练：1) 在 CASIA-WebFace 上训练 trunk CNN；2) 使用遮挡对训练块特定的掩码生成器；3) 通过对大量遮挡样本的平均掩码二值化来构建掩码字典。
构建掩码字典：对每个块 j，计算大约 ~200k 对的平均掩码 m̄_j，然后通过丢弃最小 τ*K 值来二值化为 M_j（K = C×W×H）。
测试中的遮挡：通过对被遮挡块相关的 M_j 掩码进行 AND 运算来派生探针的 FDM M，并在比较前将 M 应用于顶层卷积特征。

实验结果

研究问题

RQ1当面部块被遮挡时，如何识别哪些 CNN 特征元素被污染？
RQ2在 pairwise 遮挡/清洁人脸上训练的掩码生成模型是否能揭示遮挡区域与受污染的特征元素在跨身份的一致对应？
RQ3丢弃被污染的特征是否在部分遮挡下提升识别率，同时不损害非遮挡性能？
RQ4在测试中是否可以用固定的掩码字典在没有成对遮挡数据的情况下推广到任意遮挡？

主要发现

该方法在合成与真实遮挡人脸上实现了强鲁棒性，在某些遮挡类型的二值化正则化后，峰值 AR rank-1 准确率达到 98.26%。
在 MegaFace Challenge 1 的遮挡探针数据上，该方法将被遮挡人脸识别性能（MF1occ）提高到 56.34% 相较于较低的 baseline margin。
该方法在 LFW 上保持或提升验证强度，并在 AR 数据集对真实生活遮挡（如太阳镜和围巾）表现出优越的鲁棒性，相较于 trunk CNN 基线。
一种微分监督策略（成对损失加分类损失）比仅使用分类损失更稳定、可解释的均值掩码，从而提升特征映射中的遮挡定位。
二值掩蔽（与软加权相对）更有优势：它带来更好的性能，并提供高效的计算与存储优势。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。