QUICK REVIEW

[논문 리뷰] Rethinking of Pedestrian Attribute Recognition: Realistic Datasets with Efficient Method

Jian Jia, Houjing Huang|arXiv (Cornell University)|2020. 05. 25.

Video Surveillance and Tracking Methods참고 문헌 36인용 수 30

한 줄 요약

본 논문은 기존 보행자 속성 데이터셋이 train-test 신원 overlap을 허용하여 결과를 과대평가하게 한다는 것을 밝히고, zero-shot 데이터셋 PETA zs와 RAPv2 zs를 제시하며, localization 튜징 없이도 최근 SOTA 방법들보다 성능이 우수한 강력한 baseline을 제안한다.

ABSTRACT

Despite various methods are proposed to make progress in pedestrian attribute recognition, a crucial problem on existing datasets is often neglected, namely, a large number of identical pedestrian identities in train and test set, which is not consistent with practical application. Thus, images of the same pedestrian identity in train set and test set are extremely similar, leading to overestimated performance of state-of-the-art methods on existing datasets. To address this problem, we propose two realistic datasets PETA extsubscript{$zs$} and RAPv2 extsubscript{$zs$} following zero-shot setting of pedestrian identities based on PETA and RAPv2 datasets. Furthermore, compared to our strong baseline method, we have observed that recent state-of-the-art methods can not make performance improvement on PETA, RAPv2, PETA extsubscript{$zs$} and RAPv2 extsubscript{$zs$}. Thus, through solving the inherent attribute imbalance in pedestrian attribute recognition, an efficient method is proposed to further improve the performance. Experiments on existing and proposed datasets verify the superiority of our method by achieving state-of-the-art performance.

연구 동기 및 목표

현행 보행자 속성 데이터셋에서의 비현실적인 identity overlap와 그것이 평가에 미치는 영향을 강조한다.
실제 배포 시나리오를 반영하기 위해 zero-shot 데이터셋(PETA zs와 RAPv2 zs)을 제안한다.
SOTA의 향상이 속성 로컬라이제이션 모듈에 의존한다는 개념에 도전하는 강력한 baseline을 소개한다.

제안 방법

dataset 문제인 train-test identity overlap를 식별하고 일반적인 신원과 고유 신원 테스트 이미지 간의 성능 격차를 데모한다.
두 개의 zero-shot 데이터셋 PETA zs와 RAPv2 zs를 구성하여 테스트 신원이 학습에 나타나지 않도록 신원을 재분배한다.
ResNet50 백본 위에 선형 분류기를 두고 과제별 가중치 정규화를 적용한 강력한 baseline을 제안하고, mA, Accuracy, Precision, Recall, F1로 평가한다.
동일한 baseline과 백본에서 SOTA 방법들(MsVAA, VAC, ALM)을 재구현하여 공정한 비교를 수행한다.
Grad-CAM을 사용해 속성 로컬라이제이션을 분석하고, baseline이 명시적 로컬라이제이션 모듈 없이도 속성 영역을 암묵적으로 로컬라이즈할 수 있음을 보여준다.

실험 결과

연구 질문

RQ1현행 보행자 속성 데이터셋이train과 test 간의 신원 중복으로 인해 모델 성능을 과대평가하고 있는가?
RQ2평가 중 테스트 신원이 엄격하게 unseen인 zero-shot일 때 성능에 어떤 영향이 있는가?
RQ3명시적 로컬라이제이션 모듈 없이도 속성별 영역에 대해 강력한 baseline이 경쟁력 있는 결과를 낼 수 있는가?
RQ4최첨단 방법들이 zero-shot 데이터셋에서 전통적 baseline에 비해 일관된 향상을 제공하는가?
RQ5일반 신원 vs 고유 신원 이미지 서브세트가 데이터셋 전반의 평가 지표에 어떤 영향을 미치는가?

주요 결과

Existing datasets show large overlap of identities between train and test sets, leading to inflated performance estimates.
Zero-shot datasets PETA zs and RAPv2 zs reduce overlap and reveal significant performance drops for current methods.
A strong baseline with a ResNet50 backbone outperforms several SOTA methods when evaluated under zero-shot settings.
Reimplemented SOTA methods under the same baseline yield comparable or inferior gains, indicating the baseline strength rather than novel modules drives performance.
Localization-focused attention modules offer limited or no additional gains on the strong baseline, suggesting localization may not be the key factor for improvement.
The proposed datasets and baseline together reveal that improvements claimed by SOTA methods on existing datasets may be artifacts of data leakage and inadequate baselines.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.