QUICK REVIEW

[论文解读] Multicolumn Networks for Face Recognition

Weidi Xie, Andrew Zisserman|arXiv (Cornell University)|Jul 24, 2018

Face recognition and analysis参考文献 17被引用 79

一句话总结

该论文提出多列网络，通过对图像进行加权以衡量视觉质量并通过内容相关性重新校准，改进了 IJB 基准上的表现，优于先前方法。

ABSTRACT

The objective of this work is set-based face recognition, i.e. to decide if two sets of images of a face are of the same person or not. Conventionally, the set-wise feature descriptor is computed as an average of the descriptors from individual face images within the set. In this paper, we design a neural network architecture that learns to aggregate based on both "visual" quality (resolution, illumination), and "content" quality (relative importance for discriminative classification). To this end, we propose a Multicolumn Network (MN) that takes a set of images (the number in the set can vary) as input, and learns to compute a fix-sized feature descriptor for the entire set. To encourage high-quality representations, each individual input image is first weighted by its "visual" quality, determined by a self-quality assessment module, and followed by a dynamic recalibration based on "content" qualities relative to the other images within the set. Both of these qualities are learnt implicitly during training for set-wise classification. Comparing with the previous state-of-the-art architectures trained with the same dataset (VGGFace2), our Multicolumn Networks show an improvement of between 2-6% on the IARPA IJB face recognition benchmarks, and exceed the state of the art for all methods on these benchmarks.

研究动机与目标

通过超越简单的平均池化来学习质量感知聚合，解决基于集合的人脸验证问题。
引入一个可视化质量控制模块来对低质量图像进行下权重。
引入一个基于集合中相对判别重要性的内容质量控制模块来重新加权图像。
展示提出的 MN 架构在使用 VGGFace2 训练的骨干上提升 IJB-A/B/C 验证性能。
证明 MN 在为 ResNet50 增加最小的参数开销的同时，带来稳定的提升。

提出的方法

使用共享的 ResNet50 骨干对每张图像进行嵌入，以获得每张图像的描述符。
通过一个 Sigmoid 激活的全连接层计算每张图像的自我感知视觉质量权重。
通过将每张图像与集合均值人脸相关联并通过第二个 Sigmoid 激活的全连接层聚合，计算基于内容的质量权重。
将视觉权重和内容权重结合，使用对图像描述符的加权平均来形成集合描述符。
先进行基于图像的 VGGFace2 预训练，然后端到端地进行集合级分类微调。
在 IJB-A/B/C 基准上使用集合描述符的余弦相似度进行评估。

实验结果

研究问题

RQ1通过让每张图像的贡献同时取决于绝对图像质量和相对于集合的内容质量，是否可以改善集合级人脸描述符？
RQ2在非受限人脸基准上，结合视觉和内容质量控制是否优于简单平均池化和先前的注意力聚合？
RQ3在使用 MN 时，视觉仅质量控制与视觉+内容质量控制在 IJB-A/B/C 基准上的性能提升是多少？

主要发现

在相同骨干网络下，使用视觉质量的 MN-v 在 IJB 基准上超越先前的最先进方法。
加入内容质量控制（MN-vc）在 IJB-B 和 IJB-C 数据集上进一步提升。
与 ResNet50 基线相比，MN 引入大约 6K 的额外参数，在 IJB-B 和 IJB-C 上实现约 2-6% 的绝对提升。
在 IJB-B 上，MN-v 和 MN-vc 在 FAR=1e-5…1e-1 时的 TAR 分别为 0.683/0.708, 0.818/0.831, 0.902/0.909, 0.955/0.958, 0.984/0.985。
在 IJB-C 上，MN-v 和 MN-vc 在相同的 FAR 下的 TAR 分别为 0.755/0.771, 0.852/0.862, 0.920/0.927, 0.965/0.968, 0.988/0.989。
结果显示在极低 FAR（1e-5 到 1e-3）时改进最为显著，因为更好地抑制异常图像并强调判别视角。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。