QUICK REVIEW

[论文解读] SparseTrack: Multi-Object Tracking by Performing Scene Decomposition based on Pseudo-Depth

Zelin Liu, Xinggang Wang|arXiv (Cornell University)|Jun 8, 2023

Video Surveillance and Tracking Methods被引用 30

一句话总结

SparseTrack 引入伪深度基于场景分解和深度级联匹配，在拥挤的 MOT 场景中仅 IoU 的数据关联，在 MOT17、MOT20 和 DanceTrack 上交付具有竞争力的结果。

ABSTRACT

Exploring robust and efficient association methods has always been an important issue in multiple-object tracking (MOT). Although existing tracking methods have achieved impressive performance, congestion and frequent occlusions still pose challenging problems in multi-object tracking. We reveal that performing sparse decomposition on dense scenes is a crucial step to enhance the performance of associating occluded targets. To this end, we propose a pseudo-depth estimation method for obtaining the relative depth of targets from 2D images. Secondly, we design a depth cascading matching (DCM) algorithm, which can use the obtained depth information to convert a dense target set into multiple sparse target subsets and perform data association on these sparse target subsets in order from near to far. By integrating the pseudo-depth method and the DCM strategy into the data association process, we propose a new tracker, called SparseTrack. SparseTrack provides a new perspective for solving the challenging crowded scene MOT problem. Only using IoU matching, SparseTrack achieves comparable performance with the state-of-the-art (SOTA) methods on the MOT17 and MOT20 benchmarks. Code and models are publicly available at \url{https://github.com/hustvl/SparseTrack}.

研究动机与目标

在普遍存在遮挡的拥挤 MOT 场景中提升鲁棒数据关联。
提出一个轻量级的仅 IoU 跟踪器，利用基于深度的场景分解来减少遮挡。
引入伪深度，在简单地面平面先验下从二维图像估算相对深度。
开发深度级联匹配（DCM），在深度子集之间执行分层关联。
证明所提出的方法在标准 MOT 基准上可与最先进方法相媲美。

提出的方法

使用地面平面先验，从二维图像中对检测和跟踪计算伪深度值。
通过伪深度值将场景分割为基于深度的子集。
应用深度级联匹配，对近到远的各深度子集执行基于 IoU 的关联。
使用卡尔曼滤波进行运动预测，并使用 IoU 距离进行匹配，结合基于高/低分数的检测拆分来引导深度层级。
DCM 即插即用，可以集成到其他基于 IoU 的跟踪器中以改善遮挡处理。

实验结果

研究问题

RQ1来自二维图像的伪深度是否能可靠揭示相对深度，从而实现有效的基于深度的场景分解？
RQ2在深度切片子集内进行基于 IoU 的数据关联（通过 DCM）是否减少拥挤 MOT 场景中由遮挡引起的错误？
RQ3与基线基于 IoU 的跟踪器相比，SparseTrack 在标准 MOT 基准（MOT17、MOT20）以及挑战性数据集（DanceTrack）上的表现如何？
RQ4深度级联匹配方法是否可以泛化为其他跟踪器的直接替换模块？
RQ5伪深度层数对密集场景中关联性能的影响是多少？

主要发现

跟踪器	HOTA↑	MOTA↑	IDF1↑	假阳性↓	假阴性↓	ID 数↓	帧率↑
SparseTrack（IoU-only，我们的方法）	65.1	81.0	80.1	23904	81927	1170	19.9
ByteTrack	63.1	80.3	77.3	25491	83721	2196	29.6
BoT-SORT-ReID	65.0	80.5	80.2	22521	86037	1212	4.5
SparseTrack（IoU-only，我们的方法）	63.4	78.2	77.3	25108	86720	1116	12.5
ByteTrack	61.3	77.8	75.2	26249	87594	1223	17.5
BoT-SORT	62.6	77.7	76.3	22521	86037	1212	6.6
SparseTrack（IoU-only，我们的方法）	55.5	91.3	58.3	39.1	78.9	-	12.5

SparseTrack 在仅 IoU 的数据关联下在 MOT17 上取得具竞争力的结果，例如 MOT17 测试集上的 65.1 HOTA、81.0 MOTA 和 80.1 IDF1。
在 MOT20 上，SparseTrack 获得 63.4 HOTA、78.2 MOTA 和 77.3 IDF1，优于基线 IoU 方法。
在 DanceTrack 上，SparseTrack 给出 55.5 HOTA、91.3 IDF1，以及 58.3，作为一个强大的仅 IoU 方法，对比基线有显著提升。
通过伪深度和 DCM 的基于深度的场景分解在多种基线上持续提升关联指标，有时甚至在没有外观特征的情况下达到或接近 SOTA。
DCM 模块即插即用，集成到依赖 IoU 基于数据关联的其他跟踪器时可提升性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。