QUICK REVIEW

[논문 리뷰] Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person Re-Identification

Mang Ye, Jianbing Shen|arXiv (Cornell University)|2020. 07. 18.

Video Surveillance and Tracking Methods참고 문헌 61인용 수 32

한 줄 요약

논문은 intra-modality 가중 파트 어그리게이션과 cross-modality 그래프 구조 주의를 결합하는 두 스트림 VI-ReID 프레임워크 DDAG를 제안하며, 학습 중 두 구성요소를 점진적으로 통합하는 파라미터 프리(dynamic) 듀얼 어그리게이션 전략을 사용한다.

ABSTRACT

Visible-infrared person re-identification (VI-ReID) is a challenging cross-modality pedestrian retrieval problem. Due to the large intra-class variations and cross-modality discrepancy with large amount of sample noise, it is difficult to learn discriminative part features. Existing VI-ReID methods instead tend to learn global representations, which have limited discriminability and weak robustness to noisy images. In this paper, we propose a novel dynamic dual-attentive aggregation (DDAG) learning method by mining both intra-modality part-level and cross-modality graph-level contextual cues for VI-ReID. We propose an intra-modality weighted-part attention module to extract discriminative part-aggregated features, by imposing the domain knowledge on the part relationship mining. To enhance robustness against noisy samples, we introduce cross-modality graph structured attention to reinforce the representation with the contextual relations across the two modalities. We also develop a parameter-free dynamic dual aggregation learning strategy to adaptively integrate the two components in a progressive joint training manner. Extensive experiments demonstrate that DDAG outperforms the state-of-the-art methods under various settings.

연구 동기 및 목표

VI-ReID의 도전 과제인 큰 클래스 내부 변이, 모달리티 간 간격 차이, 데이터의 상당한 노이즈를 동기화하고 해결한다.
배경 잡음에 강인성을 높이기 위해 각 모달리티 내에서 판별 가능한 파트-레벨 특징을 개발한다.
가시 영상과 열 영상 간의 표현을 강화하기 위해 크로스-모달리티 그래프 관계를 활용한다.
두 주의 구성요소를 공동으로 최적화하기 위한 파라미터 프리(dynamic) 학습 전략을 제안한다.

제안 방법

모달리티별 첫 번째 블록은 독립적으로 학습하고 깊은 블록은 공유하여 모달리티-공유 중간 수준 특징을 학습한다.
Intra-modality Weighted-Part Aggregation (IWPA)는 각 모달리티 내에서 파트 레벨 주의를 학습하고 비-local 스타일 메커니즘을 이용해 p 파트에서의 주의를 학습하고 이를 잔류 BatchNorm (RBN) 가중 합으로 융합한다.
Cross-modality Graph Structured Attention (CGSA)는 2mn 이미지 배치에 대해 크로스-모달리티 그래프를 구성하고 다중 헤드 그래프 어텐션을 적용해 크로스 모달 이웃 관계를 포착하며 그래프 주의가 적용된 특징을 출력한다.
Dynamic dual aggregation 학습은 인스턴스 수준의 파트-집계 학습을 우선으로 두고(L_P), 점진적으로 그래프 수준의 크로스 모달리티 학습(L_g)을 더해 파라미터 프리 일정(L^t = L_P^t + 1/(1+E[L_P^{t-1}]) * L_g^t)으로 결합한다.
**주요 식들 포함:** intra-modality 파트 주의 맵 alpha^p_{i,j} = f(x^p_i, x^p_j) / sum_j f(x^p_i, x^p_j) with f = exp(u(x^p_i)^T v(x^p_j)); 잔차 BN 합성 x^* = BN(x^o) + sum_i w^p_i x̄^p_i; 그래프 주의 alpha^g_{i,j}는 변환된 특징 h(x^o_i), h(x^o_j)에서 다중 헤드를 사용해 계산; 최종 그래프 특징 x^g_i = φ(concat_heads sum_j alpha^g_{i,j} h^l(x^o_j)).

실험 결과

연구 질문

RQ1Intra-modality 파트-레벨 주의가 VI-ReID에서 구별력을 향상시키고 잡음에 대한 강건성을 높일 수 있는가?
RQ2크로스-모달리티 그래프 구조 주의의 도입이 크로스-모달리티 특징 학습을 개선하고 모달리티 간 간극을 줄일 수 있는가?
RQ3동적이고 파라미터 프리한 어그리게이션 전략이 훈련 중 두 주의 주의를 안정성 있게 통합하는가?
RQ4제안된 DDAG 프레임워크가 SYSU-MM01 및 RegDB 데이터셋에서 최첨단의 VI-ReID 방법들과 어떻게 비교되는가?

주요 결과

Method	r=1	r=5	r=10	r=20	mAP
B (Baseline)	48.18	75.81	85.73	93.52	47.64
B+P (Baseline+IWPA)	53.69	81.16	88.38	94.56	51.37
B+G (Baseline+CGSA)	50.75	78.43	86.71	93.62	49.73
B+P+G (DDAG)	54.75	82.31	90.39	95.81	53.02

DDAG는 SYSU-MM01 및 RegDB에서 다수 설정에서 최첨단 VI-ReID 방법들을 능가한다.
잔차 BN과 학습 가능한 파트 가중치를 갖는 IWPA가 baseline 대비 rank-1, rank-5, rank-10, rank-20, 및 mAP를 향상시킨다.
CGSA는 크로스 모달리티 이웃 관계를 활용하고 학습을 안정화시켜 성능을 추가로 향상시킨다.
추가 하이퍼파라미터 없이 동적 듀얼 어그리게이션이 인스턴스 수준 학습과 그래프 수준 학습을 효과적으로 결합하여 더해진 이득을 낸다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.