QUICK REVIEW

[논문 리뷰] Rethinking the competition between detection and ReID in Multi-Object Tracking

Chao Liang, Zhipeng Zhang|arXiv (Cornell University)|2020. 10. 23.

Video Surveillance and Tracking Methods인용 수 32

한 줄 요약

본 논문은 Reciprocal Network(REN)과 Scale-aware Attention Network(SAAN)을 갖춘 한 번의 MOT 프레임워크 CSTrack를 제시하여 탐지와 ReID 간의 경쟁을 줄이고 MOT16/17/20에서 최첨단 성능을 달성하며 높은 FPS로 동작합니다.

ABSTRACT

Due to balanced accuracy and speed, one-shot models which jointly learn detection and identification embeddings, have drawn great attention in multi-object tracking (MOT). However, the inherent differences and relations between detection and re-identification (ReID) are unconsciously overlooked because of treating them as two isolated tasks in the one-shot tracking paradigm. This leads to inferior performance compared with existing two-stage methods. In this paper, we first dissect the reasoning process for these two tasks, which reveals that the competition between them inevitably would destroy task-dependent representations learning. To tackle this problem, we propose a novel reciprocal network (REN) with a self-relation and cross-relation design so that to impel each branch to better learn task-dependent representations. The proposed model aims to alleviate the deleterious tasks competition, meanwhile improve the cooperation between detection and ReID. Furthermore, we introduce a scale-aware attention network (SAAN) that prevents semantic level misalignment to improve the association capability of ID embeddings. By integrating the two delicately designed networks into a one-shot online MOT system, we construct a strong MOT tracker, namely CSTrack. Our tracker achieves the state-of-the-art performance on MOT16, MOT17 and MOT20 datasets, without other bells and whistles. Moreover, CSTrack is efficient and runs at 16.4 FPS on a single modern GPU, and its lightweight version even runs at 34.6 FPS. The complete code has been released at https://github.com/JudasDie/SOTS.

연구 동기 및 목표

일회성 MOT 프레임워크에서 탐지와 ReID 간의 경쟁 원인 분석 및 동기 부여.
태스크별 표현 학습 및 교차 태스크 협업 강화 메커니즘 개발.
다중 스케일 간 의미 불일치 방지로 ID 임베딩 개선.
온라인 MOT 추적기 CSTrack를 구축하고 최첨단 성능과 효율성 시연.

제안 방법

자기-관계 및 교차-관계를 이용해 태스크 특화 특징을 해방하고 서로 교환하는 REN(Reciprocal Network) 도입.
다중 해상도 특징을 공간 및 채널 주의로 융합하는 SAAN(Scale-aware Attention Network) 도입으로 robust한 ID 임베딩 확보.
REN과 SAAN을 JDE 유사 베이스라인 위에 구축된 CSTrack에 통합.
탐지 손실(분류 + CIOU 기반 회귀)과 ReID 손실을 결합한 공동 손실로 학습하되 가중치를 통해 균형 조정.
JDE에서 영감을 받은 데이터 동합을 위한 계단식 매칭 전략으로 온라인 추적 수행.

실험 결과

연구 질문

RQ1탐지와 ReID 간의 경쟁이 일회성 MOT 표현 및 성능에 어떤 영향을 미치는가?
RQ2REN이 이 경쟁을 완화하고 태스크 의존적 표현 학습을 개선할 수 있는가?
RQ3SAAN이 스케일 간 의미 불일치를 완화해 ID 임베딩을 개선하는가?
RQ4CSTrack가 MOT16, MOT17, MOT20에서 정확도와 속도 측면에서 최첨단 온라인 MOT 방법과 비교해 어떤 차이가 있는가?

주요 결과

감지 기준선을 YOLOv5로 교체하면 YOLOv3 대비 강력한 성능 향상을 보이며 견고한 기준선을 확립한다.
REN은 MOTA를 1.9포인트, IDF1을 2.4포인트 개선하고 ID 스위치를 1798에서 1365로 감소시킨다.
SAAN은 스케일 간 ID 임베딩 정렬을 개선해 IDF1을 +8.6포인트 크게 향상시킨다.
REN과 SAAN을 도입한 CSTrack은 MOT16에서 MOTA 72.9, IDF1 71.6, 1121개 ID 스위치를 달성하며 일반 JDE 대비 성능이 우수하다.
일회성 베이스라인과 비교해 CSTrack은 MOTA 및 IDF1에서 뚜렷한 이점을 보이며 온라인 추적의 효율성을 유지한다.
전반적으로 CSTrack은 MOT16, MOT17, MOT20에서 최첨단/경쟁력 있는 성능을 달성하며 단일 GPU에서 FPS 16.4(경량 버전은 34.6)로 보고된다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.