QUICK REVIEW

[论文解读] Multi-Level Factorisation Net for Person Re-Identification

Xiaobin Chang, Timothy M. Hospedales|arXiv (Cornell University)|Mar 24, 2018

Video Surveillance and Tracking Methods参考文献 11被引用 74

一句话总结

MLFN 自动发现并在多个语义层次动态选择潜在判别因子，融合紧凑的 Factor Signature 与深度特征，以实现最先进的人员再识别结果。

ABSTRACT

Key to effective person re-identification (Re-ID) is modelling discriminative and view-invariant factors of person appearance at both high and low semantic levels. Recently developed deep Re-ID models either learn a holistic single semantic level feature representation and/or require laborious human annotation of these factors as attributes. We propose Multi-Level Factorisation Net (MLFN), a novel network architecture that factorises the visual appearance of a person into latent discriminative factors at multiple semantic levels without manual annotation. MLFN is composed of multiple stacked blocks. Each block contains multiple factor modules to model latent factors at a specific level, and factor selection modules that dynamically select the factor modules to interpret the content of each input image. The outputs of the factor selection modules also provide a compact latent factor descriptor that is complementary to the conventional deeply learned features. MLFN achieves state-of-the-art results on three Re-ID datasets, as well as compelling results on the general object categorisation CIFAR-100 dataset.

研究动机与目标

在跨越多层语义级别的前提下，建模具有区分性且视角不变的人员外观因子以用于 Re-ID。
提出一个深层架构，在无需手动属性标注的情况下发现潜在因子。
实现紧凑的多级因子表示，并将其与传统深度特征融合以提升识别效果。
通过捷径连接提供对学习到的因子的深度监督。
在主要 Re-ID 基准上展示最先进的性能，并展示对 CIFAR-100 的适用性。

提出的方法

引入由堆叠块组成的多层因子化网络（MLFN）；每个块包含多个 Factor Modules (FMs) 和一个 Factor Selection Module (FSM)。
FSMs 动态激活 FMs 的子集以在特定语义层次建模潜在因子。
通过把所有块中的 FSM 输出串联，产生 Factor Signature (FS)，表示多层因子。
通过共享投影将最后一个块的特征与 FS 融合，形成最终表示 R。
端到端训练，使用身份分类损失；利用跳跃连接和基于 FS 的深度监督以提升因子区分性。
将 MLFN 解释为 ResNeXt 和 Mixture-of-Experts 的推广，具备动态因子选择和紧凑的语义描述符。
可选仅使用 FS 进行属性类比匹配，以揭示潜在属性相关性。

实验结果

研究问题

RQ1在没有属性标注的情况下，是否能够自动发现潜在的多层外观因子？
RQ2对每个输入动态选择的 Factor Modules (FSMs) 是否在各语义层次上提供具有判别力、视角不变的特征？
RQ3将紧凑的 Factor Signature 与最终深度特征结合，是否比单独的传统深度特征提升 Re-ID 性能？
RQ4学习到的潜在因子是否对应可解释的属性并有助于跨数据集的泛化？
RQ5该方法是否有能力在主要的人员 Re-ID 基准以及一般对象分类任务中达到最先进的结果？

主要发现

MLFN 在 Market-1501、CUHK03 与 DukeMTMC-reID 数据集上取得最先进的结果。
在 Market-1501，MLFN 对 SQ 达到 R1=90.0、mAP=74.3；对 MQ 达到 R1=92.3、mAP=82.4。
在 CUHK03 Setting 1（检测到的边界框）下，MLFN 达到 82.8% R1 和 89.2% mAP；在检测数据的更强设定下，达到 89.2% R1 及更高。
在 CUHK03 Setting 2，MLFN 达到 54.7% R1 和 49.2% mAP（有标签），以及 52.8% R1 和 47.8% mAP（检测）。
在 DukeMTMC-reID，MLFN 达到 81.0% R1 和 62.8% mAP。
MLFN-Fusion（含 FS）优于 ResNeXt 和 ResNet 基线；基于 FSM 的动态因子选择相对于消融变体带来显著提升。
仅 FS 就提供了具有竞争力的归因式性能，当 FS 与深度特征融合时，R 值提升。
发现的潜在因子在视觉上与跨层语义属性对齐，从颜色/纹理演变到服装风格和性别，且无需属性监督。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。