QUICK REVIEW

[论文解读] CM-NAS: Rethinking Cross-Modality Neural Architectures for Visible-Infrared Person Re-Identification.

Chaoyou Fu, Yibo Hu|arXiv (Cornell University)|Jan 21, 2021

Video Surveillance and Tracking Methods被引用 3

一句话总结

该论文提出CM-NAS，一种用于可见光-红外行人重识别的新型神经架构搜索框架，通过优化批量归一化层拆分以减少模态差异。通过引入以BN为导向的搜索空间和基于相关性一致性的MMD损失（C3MMD），该方法在SYSU-MM01和RegDB数据集上实现了最先进性能，Rank-1/mAP分别提升6.70%/6.13%和12.17%/11.23%。

ABSTRACT

Visible-Infrared person re-identification (VI-ReID) aims at matching cross-modality pedestrian images, breaking through the limitation of single-modality person ReID in dark environment. In order to mitigate the impact of large modality discrepancy, existing works manually design various two-stream architectures to separately learn modality-specific and modality-sharable representations. Such a manual design routine, however, highly depends on massive experiments and empirical practice, which is time consuming and labor intensive. In this paper, we systematically study the manually designed architectures, and identify that appropriately splitting Batch Normalization (BN) layers to learn modality-specific representations will bring a great boost towards cross-modality matching. Based on this observation, the essential objective is to find the optimal splitting scheme for each BN layer. To this end, we propose a novel method, named Cross-Modality Neural Architecture Search (CM-NAS). It consists of a BN-oriented search space in which the standard optimization can be fulfilled subject to the cross-modality task. Besides, in order to better guide the search process, we further formulate a new Correlation Consistency based Class-specific Maximum Mean Discrepancy (C3MMD) loss. Apart from the modality discrepancy, it also concerns the similarity correlations, which have been overlooked before, in the two modalities. Resorting to these advantages, our method outperforms state-of-the-art counterparts in extensive experiments, improving the Rank-1/mAP by 6.70%/6.13% on SYSU-MM01 and 12.17%/11.23% on RegDB. The source code will be released soon.

研究动机与目标

为解决可见光-红外行人重识别中两流架构手动设计带来的高时间和人力成本问题。
识别最优的批量归一化层拆分方案，以增强模态特定表示学习能力。
通过联合优化架构搜索与相关性感知特征对齐，减少模态差异。
设计专门针对跨模态ReID任务的搜索空间与损失函数。
在基准数据集上实现最先进性能，且无需依赖大量手动调优。

提出的方法

提出一种以BN为导向的搜索空间，其中每个批量归一化层被拆分为模态特定的组件，以学习不同的表示。
引入基于相关性一致性的类别特定最大均值差异（C3MMD）损失，以在保留类间相关性结构的同时对齐特征。
采用可微架构搜索与基于梯度的优化方法，高效探索搜索空间。
通过最小化C3MMD损失引导搜索过程，该损失同时捕捉领域差异与相似性相关性一致性。
使用具有模态特定归一化机制的两流主干网络，以提升特征表示能力，同时保持模态特定不变性。
在所提出的搜索空间内应用标准优化技术，以寻找适用于跨模态匹配的最优架构配置。

实验结果

研究问题

RQ1如何通过批量归一化层拆分提升可见光-红外ReID中的跨模态表示学习？
RQ2何种最优策略可实现BN层拆分，以减少模态差异并保留身份相关性？
RQ3如何显式建模两模态特征间的相似性相关性以改善对齐？
RQ4以BN拆分为重点的可微搜索空间是否能超越人工设计的两流架构？
RQ5在基准数据集上，引入相关性感知损失函数对ReID性能有何影响？

主要发现

与最先进方法相比，CM-NAS在SYSU-MM01数据集上将Rank-1准确率提升6.70%，mAP提升6.13%。
在RegDB数据集上，该方法实现Rank-1提升12.17%和mAP提升11.23%，展现出强大的泛化能力。
所提出的C3MMD损失有效减少了领域差异，同时保留了跨模态特征中的类别特定相关性结构。
以BN为导向的搜索空间实现了无需人工试错的高效且有效的架构发现。
消融实验证实，最优BN拆分显著提升了表示学习与匹配性能。
源代码将公开发布，以支持可复现性与进一步研究。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。