[论文解读] Rethinking the competition between detection and ReID in Multi-Object Tracking
该论文提出 CSTrack,一种具有 Reciprocal Network (REN) 与 Scale-aware Attention Network (SAAN) 的一站式 MOT 框架,以减少检测与 ReID 之间的竞争,在 MOT16/17/20 上达到最新研究水平并实现高帧率。
Due to balanced accuracy and speed, one-shot models which jointly learn detection and identification embeddings, have drawn great attention in multi-object tracking (MOT). However, the inherent differences and relations between detection and re-identification (ReID) are unconsciously overlooked because of treating them as two isolated tasks in the one-shot tracking paradigm. This leads to inferior performance compared with existing two-stage methods. In this paper, we first dissect the reasoning process for these two tasks, which reveals that the competition between them inevitably would destroy task-dependent representations learning. To tackle this problem, we propose a novel reciprocal network (REN) with a self-relation and cross-relation design so that to impel each branch to better learn task-dependent representations. The proposed model aims to alleviate the deleterious tasks competition, meanwhile improve the cooperation between detection and ReID. Furthermore, we introduce a scale-aware attention network (SAAN) that prevents semantic level misalignment to improve the association capability of ID embeddings. By integrating the two delicately designed networks into a one-shot online MOT system, we construct a strong MOT tracker, namely CSTrack. Our tracker achieves the state-of-the-art performance on MOT16, MOT17 and MOT20 datasets, without other bells and whistles. Moreover, CSTrack is efficient and runs at 16.4 FPS on a single modern GPU, and its lightweight version even runs at 34.6 FPS. The complete code has been released at https://github.com/JudasDie/SOTS.
研究动机与目标
- Motivate and analyze why detection and ReID compete in one-shot MOT frameworks.
- Develop mechanisms to learn task-specific representations and improve cross-task collaboration.
- Prevent semantic misalignment across scales to improve ID embeddings.
- Build an online MOT tracker CSTrack and demonstrate state-of-the-art performance and efficiency.
提出的方法
- Propose Reciprocal Network (REN) with self-relation and cross-relation to decouple and then exchange task-specific features.
- Introduce Scale-aware Attention Network (SAAN) to fuse multi-resolution features with spatial and channel attention for robust ID embeddings.
- Integrate REN and SAAN into a one-shot MOT framework CSTrack built on a JDE-like baseline.
- Train with a joint loss combining detection loss (classification + CIOU-based regression) and ReID loss, balanced by a tunable weight.
- Perform online tracking with a cascade matching strategy inspired by JDE for data association.
实验结果
研究问题
- RQ1How does competition between detection and ReID affect one-shot MOT representations and performance?
- RQ2Can REN mitigate this competition and improve task-dependent representation learning?
- RQ3Does SAAN mitigate semantic misalignment across scales to improve ID embeddings?
- RQ4How does CSTrack compare to state-of-the-art online MOT methods on MOT16, MOT17, and MOT20 in terms of accuracy and speed?
主要发现
- Replacing the detector baseline with YOLOv5 yields strong performance gains over YOLOv3, establishing a solid baseline.
- REN improves MOTA by 1.9 points and IDF1 by 2.4 points, and reduces ID switches from 1798 to 1365.
- SAAN provides a substantial boost to IDF1 (+8.6 points) by improving ID embedding alignment across scales.
- With REN and SAAN, CSTrack achieves MOTA 72.9 and IDF1 71.6 on MOT16, with 1121 ID switches, outperforming the vanilla JDE setup.
- Compared to one-shot baselines, CSTrack delivers notable gains in MOTA and IDF1 while maintaining efficient online tracking performance.
- Overall CSTrack achieves state-of-the-art/competitive results on MOT16, MOT17, and MOT20, with reported FPS of 16.4 on a single GPU (and 34.6 for a lightweight version).
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。