QUICK REVIEW

[논문 리뷰] Relation-Aware Global Attention for Person Re-identification

Zhizheng Zhang, Cuiling Lan|arXiv (Cornell University)|2019. 04. 05.

Video Surveillance and Tracking Methods참고 문헌 67인용 수 38

한 줄 요약

한두 문장 직접 답변 요약: 이 논문은 각 피처 노드에 대해 글로벌 구조 관계를 학습하여 공간 및 채널 어텐션을 생성하는 Relation-Aware Global Attention (RGA) 모듈을 도입하고, CUHK03, Market1501, MSMT17에서 재식별 성능을 최첨단으로 달성한다.

ABSTRACT

For person re-identification (re-id), attention mechanisms have become attractive as they aim at strengthening discriminative features and suppressing irrelevant ones, which matches well the key of re-id, i.e., discriminative feature learning. Previous approaches typically learn attention using local convolutions, ignoring the mining of knowledge from global structure patterns. Intuitively, the affinities among spatial positions/nodes in the feature map provide clustering-like information and are helpful for inferring semantics and thus attention, especially for person images where the feasible human poses are constrained. In this work, we propose an effective Relation-Aware Global Attention (RGA) module which captures the global structural information for better attention learning. Specifically, for each feature position, in order to compactly grasp the structural information of global scope and local appearance information, we propose to stack the relations, i.e., its pairwise correlations/affinities with all the feature positions (e.g., in raster scan order), and the feature itself together to learn the attention with a shallow convolutional model. Extensive ablation studies demonstrate that our RGA can significantly enhance the feature representation power and help achieve the state-of-the-art performance on several popular benchmarks. The source code is available at https://github.com/microsoft/Relation-Aware-Global-Attention-Networks.

연구 동기 및 목표

사람 재식별에서 로컬 수용영역을 넘어서는 글로벌 구조 정보를 활용하기 위한 어텐션 학습의 동기를 부여한다.
각 피처 노드로부터 글로벌 관계에서 의미를 추출하는 컴팩트한 메커니즘을 제안한다.
공간(RGA-S)과 채널(RGA-C) 관계 인식 글로벌 어텐션 모듈을 개발하고 그 효과를 입증한다.
RGA-S와 RGA-C의 결합이 주요 재식별 벤치마크에서 최첨단 결과를 낳는지 보여준다.

제안 방법

피처 노드 간의 쌍별 관계(친화도)를 모델링하고 이를 쌓아 각 노드에 대한 글로벌 관계 벡터를 형성한다.
Spatial RGA (RGA-S)의 경우, embedded 1x1 합성곱으로 r_i,j = f_s(x_i, x_j)를 계산하고 r_i = [R_s(i,:), R_s(:,i)]를 형성한 뒤, 이를 x_i와 결합하여 작은 2층 convnet을 통해 어텐션 a_i를 예측한다.
Channel RGA (RGA-C)의 경우, 채널을 노드로 간주하고 임베딩된 특징으로 비슷하게 r_i,j를 계산하여 r_i를 형성하고, 공간 차원과 동일한 방식으로 채널 어텐션 a_i를 도출한다.
로컬 피처(임베딩을 통해)와 글로벌 관계 벡터를 결합하여 관계 인식 피처 및 어텐션을 두 층의 convnet과 시그모이드 출력으로 생성한다.
ResNet-50 백본에 RGA 모듈을 통합한 RetNet-50 변형을 CUHK03, Market1501, MSMT17에서 평가한다.

실험 결과

연구 질문

RQ1피처 위치 간의 글로벌 구조와 쌍별 관계를 활용해 사람 재식별의 어텐션을 개선할 수 있는가?
RQ2공간 및 채널 관계 인식 어텐션이 서로를 보완하여 더 구별력 있는 피처를 만들어내는가?
RQ3RGA가 표준 재식별 벤치마크에서 기존 어텐션 메커니즘(local attention, non-local, CBAM 등)과 비교해 우수한가?
RQ4백본 내 임베딩 선택 및 모듈 배치가 성능에 미치는 영향은 무엇인가?

주요 결과

모델	CUHK03(L) R1	CUHK03(L) mAP	Market1501 R1	Market1501 mAP	MSMT17 R1	MSMT17 mAP
Baseline	73.8	69.0	94.2	83.7	-	-
RGA-S w/o Rel.	76.8	72.3	94.3	83.8	-	-
RGA-S w/o Ori.	78.2	74.0	95.4	86.7	-	-
RGA-S	79.3	74.7	96.0	87.5	-	-
RGA-C w/o Rel.	77.8	73.7	94.7	84.8	-	-
RGA-C w/o Ori.	78.1	74.9	95.4	87.1	-	-
RGA-C	79.3	75.6	95.9	87.9	-	-
RGA-S//C	77.3	73.4	95.3	86.6	-	-
RGA-CS	78.6	75.5	95.3	87.8	-	-
RGA-SC	81.1	77.4	96.1	88.4	80.3	57.5

RGA-S와 RGA-C가 CUHK03 및 Market1501에서 베이스라인보다 성능을 크게 향상시킨다.
공간 및 채널 어텐션의 결합(RGA-SC)이 최상의 결과를 낳으며, CUHK03에서 mAP에서 베이스라인 대비 최대 8.4% 포인트를 넘고 Market1501과 MSMT17에서도 강력한 이점을 보여준다.
RGA-S 및 RGA-C는 Rank-1 및 mAP 지표 모두에서 CBAM, FC-S/FC-C, SE, NL 등의 여러 어텐션 베이스라인보다 우수하다.
비대칭 임베딩은 관계 모델링에 추가 이점을 제공하며 대칭 또는 임베딩 없음보다 더 유리하다.
RGA-SC는 보고된 방법들 중 CUHK03, Market1501, MSMT17에서 최첨단 결과를 달성하며, 두 번째로 높은 방법 대비 현저한 개선을 보인다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.