QUICK REVIEW

[논문 리뷰] CM-NAS: Rethinking Cross-Modality Neural Architectures for Visible-Infrared Person Re-Identification.

Chaoyou Fu, Yibo Hu|arXiv (Cornell University)|2021. 01. 21.

Video Surveillance and Tracking Methods인용 수 3

한 줄 요약

이 논문은 시각-적외선 신원 재확인을 위한 새로운 신경망 아키텍처 탐색 프레임워크인 CM-NAS를 제안하며, 배치 정규화(Batch Normalization) 분할을 최적화하여 모odal 간 이질성을 감소시킵니다. BN 기반의 탐색 공간과 상관관계 일관성 기반의 MMD 손실(C3MMD)을 도입함으로써, SYSU-MM01에서 Rank-1/mAP가 각각 6.70%/6.13% 향상되고 RegDB에서 각각 12.17%/11.23% 향상되어 최신 기술 수준(SOTA) 성능을 달성합니다.

ABSTRACT

Visible-Infrared person re-identification (VI-ReID) aims at matching cross-modality pedestrian images, breaking through the limitation of single-modality person ReID in dark environment. In order to mitigate the impact of large modality discrepancy, existing works manually design various two-stream architectures to separately learn modality-specific and modality-sharable representations. Such a manual design routine, however, highly depends on massive experiments and empirical practice, which is time consuming and labor intensive. In this paper, we systematically study the manually designed architectures, and identify that appropriately splitting Batch Normalization (BN) layers to learn modality-specific representations will bring a great boost towards cross-modality matching. Based on this observation, the essential objective is to find the optimal splitting scheme for each BN layer. To this end, we propose a novel method, named Cross-Modality Neural Architecture Search (CM-NAS). It consists of a BN-oriented search space in which the standard optimization can be fulfilled subject to the cross-modality task. Besides, in order to better guide the search process, we further formulate a new Correlation Consistency based Class-specific Maximum Mean Discrepancy (C3MMD) loss. Apart from the modality discrepancy, it also concerns the similarity correlations, which have been overlooked before, in the two modalities. Resorting to these advantages, our method outperforms state-of-the-art counterparts in extensive experiments, improving the Rank-1/mAP by 6.70%/6.13% on SYSU-MM01 and 12.17%/11.23% on RegDB. The source code will be released soon.

연구 동기 및 목표

시각-적외선 신원 재확인을 위한 이중 스트림 아키텍처에서 수작업 설계의 높은 시간 및 노동 비용 문제를 해결하기 위해.
모달 특화 표현 학습을 향상시키기 위한 최적의 배치 정규화(Batch Normalization) 레이어 분할 전략을 규명하기 위해.
아키텍처 탐색과 상관관계 인식 특징 정렬을 공동 최적화하여 모달 간 이질성을 감소시키기 위해.
다중 모달 ReID 작업에 특화된 탐색 공간과 손실 함수를 개발하기 위해.
광범위한 수작업 튜닝에 의존하지 않고도 벤치마크 데이터셋에서 최신 기술 수준의 성능을 달성하기 위해.

제안 방법

각 배치 정규화 레이어를 모달 특화 구성 요소로 분할하여 별도의 표현을 학습할 수 있는 BN 기반 탐색 공간을 제안합니다.
특징 정렬과 함께 클래스별 상관관계 구조를 유지하기 위해 상관관계 일관성 기반의 클래스별 최대 평균 이질성(C3MMD) 손실을 도입합니다.
기울기 기반 최적화를 활용한 미분 가능 아키텍처 탐색을 통해 탐색 공간을 효율적으로 탐색합니다.
도메인 이질성과 유사도 상관관계 일관성을 모두 반영한 C3MMD 손실을 최소화함으로써 탐색 과정을 안내합니다.
모달 특화 정규화를 갖춘 이중 스트림 백본을 활용하여 특징 표현을 향상시키면서도 모달 특화 불변성을 유지합니다.
제안된 탐색 공간 내에서 표준 최적화 기법을 활용하여 다중 모달 매칭을 위한 최적의 아키텍처 구성 요건을 찾습니다.

실험 결과

연구 질문

RQ1배치 정규화(BN) 레이어 분할은 어떻게 시각-적외선 ReID에서 다중 모달 표현 학습을 향상시킬 수 있는가?
RQ2신원 상관관계를 유지하면서 모달 간 이질성을 감소시키기 위한 BN 레이어 분할의 최적 전략은 무엇인가?
RQ3두 모달 간 특징 간 유사도 상관관계를 명시적으로 모델링하여 정렬 성능을 향상시킬 수 있는가?
RQ4BN 분할에 초점을 맞춘 미분 가능 탐색 공간은 수작업 설계된 이중 스트림 아키텍처를 능가할 수 있는가?
RQ5상관관계 인식 손실 함수를 통합할 경우, 다양한 벤치마크에서 ReID 성능에 어떤 영향을 미치는가?

주요 결과

CM-NAS는 최신 기술 수준의 방법들과 비교해 SYSU-MM01 데이터셋에서 Rank-1 정확도를 6.70% 향상시키고 mAP를 6.13% 향상시켰습니다.
RegDB 데이터셋에서는 Rank-1에서 12.17% 향상되고 mAP에서 11.23% 향상되어 강력한 일반화 성능을 입증했습니다.
제안된 C3MMD 손실은 다중 모달 특징에서 도메인 이질성을 효과적으로 감소시키면서도 클래스 특화 상관관계 구조를 유지합니다.
BN 기반 탐색 공간은 수작업 기반의 시행착오 없이 효율적이고 효과적인 아키텍처 탐색을 가능하게 합니다.
절단 실험을 통해 최적의 BN 분할 전략이 표현 학습 및 매칭 성능을 크게 향상시킨다는 것이 확인되었습니다.
소스 코드는 재현성 및 향후 연구를 지원하기 위해 공개될 예정입니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.