QUICK REVIEW

[论文解读] Rethinking the competition between detection and ReID in Multi-Object Tracking

Chao Liang, Zhipeng Zhang|arXiv (Cornell University)|Oct 23, 2020

Video Surveillance and Tracking Methods被引用 32

一句话总结

该论文提出 CSTrack，一种具有 Reciprocal Network (REN) 与 Scale-aware Attention Network (SAAN) 的一站式 MOT 框架，以减少检测与 ReID 之间的竞争，在 MOT16/17/20 上达到最新研究水平并实现高帧率。

ABSTRACT

Due to balanced accuracy and speed, one-shot models which jointly learn detection and identification embeddings, have drawn great attention in multi-object tracking (MOT). However, the inherent differences and relations between detection and re-identification (ReID) are unconsciously overlooked because of treating them as two isolated tasks in the one-shot tracking paradigm. This leads to inferior performance compared with existing two-stage methods. In this paper, we first dissect the reasoning process for these two tasks, which reveals that the competition between them inevitably would destroy task-dependent representations learning. To tackle this problem, we propose a novel reciprocal network (REN) with a self-relation and cross-relation design so that to impel each branch to better learn task-dependent representations. The proposed model aims to alleviate the deleterious tasks competition, meanwhile improve the cooperation between detection and ReID. Furthermore, we introduce a scale-aware attention network (SAAN) that prevents semantic level misalignment to improve the association capability of ID embeddings. By integrating the two delicately designed networks into a one-shot online MOT system, we construct a strong MOT tracker, namely CSTrack. Our tracker achieves the state-of-the-art performance on MOT16, MOT17 and MOT20 datasets, without other bells and whistles. Moreover, CSTrack is efficient and runs at 16.4 FPS on a single modern GPU, and its lightweight version even runs at 34.6 FPS. The complete code has been released at https://github.com/JudasDie/SOTS.

研究动机与目标

Motivate and analyze why detection and ReID compete in one-shot MOT frameworks.
Develop mechanisms to learn task-specific representations and improve cross-task collaboration.
Prevent semantic misalignment across scales to improve ID embeddings.
Build an online MOT tracker CSTrack and demonstrate state-of-the-art performance and efficiency.

提出的方法

Propose Reciprocal Network (REN) with self-relation and cross-relation to decouple and then exchange task-specific features.
Introduce Scale-aware Attention Network (SAAN) to fuse multi-resolution features with spatial and channel attention for robust ID embeddings.
Integrate REN and SAAN into a one-shot MOT framework CSTrack built on a JDE-like baseline.
Train with a joint loss combining detection loss (classification + CIOU-based regression) and ReID loss, balanced by a tunable weight.
Perform online tracking with a cascade matching strategy inspired by JDE for data association.

实验结果

研究问题

RQ1How does competition between detection and ReID affect one-shot MOT representations and performance?
RQ2Can REN mitigate this competition and improve task-dependent representation learning?
RQ3Does SAAN mitigate semantic misalignment across scales to improve ID embeddings?
RQ4How does CSTrack compare to state-of-the-art online MOT methods on MOT16, MOT17, and MOT20 in terms of accuracy and speed?

主要发现

Replacing the detector baseline with YOLOv5 yields strong performance gains over YOLOv3, establishing a solid baseline.
REN improves MOTA by 1.9 points and IDF1 by 2.4 points, and reduces ID switches from 1798 to 1365.
SAAN provides a substantial boost to IDF1 (+8.6 points) by improving ID embedding alignment across scales.
With REN and SAAN, CSTrack achieves MOTA 72.9 and IDF1 71.6 on MOT16, with 1121 ID switches, outperforming the vanilla JDE setup.
Compared to one-shot baselines, CSTrack delivers notable gains in MOTA and IDF1 while maintaining efficient online tracking performance.
Overall CSTrack achieves state-of-the-art/competitive results on MOT16, MOT17, and MOT20, with reported FPS of 16.4 on a single GPU (and 34.6 for a lightweight version).

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。