QUICK REVIEW

[论文解读] MOT20: A benchmark for multi object tracking in crowded scenes

Patrick Dendorfer, Hamid Rezatofighi|arXiv (Cornell University)|Mar 19, 2020

Video Surveillance and Tracking Methods参考文献 16被引用 510

一句话总结

MOT20 引入了八个高度拥挤的行人序列来扩展 MOTChallenge，提供标准化注释、公开检测和评估协议，以在极度拥挤的场景中压力测试追踪器。

ABSTRACT

Standardized benchmarks are crucial for the majority of computer vision applications. Although leaderboards and ranking tables should not be over-claimed, benchmarks often provide the most objective measure of performance and are therefore important guides for research. The benchmark for Multiple Object Tracking, MOTChallenge, was launched with the goal to establish a standardized evaluation of multiple object tracking methods. The challenge focuses on multiple people tracking, since pedestrians are well studied in the tracking community, and precise tracking and detection has high practical relevance. Since the first release, MOT15, MOT16, and MOT17 have tremendously contributed to the community by introducing a clean dataset and precise framework to benchmark multi-object trackers. In this paper, we present our MOT20benchmark, consisting of 8 new sequences depicting very crowded challenging scenes. The benchmark was presented first at the 4thBMTT MOT Challenge Workshop at the Computer Vision and Pattern Recognition Conference (CVPR) 2019, and gives to chance to evaluate state-of-the-art methods for multiple object tracking when handling extremely crowded scenarios.

研究动机与目标

为拥挤场景中的多目标跟踪提供一个具有挑战性、标准化的基准。
通过更高的行人密度扩展先前的 MOTChallenge 发布，以测试泛化能力和鲁棒性。
提供仔细的注释、公开检测，以及一致的评估框架，以实现公平的跟踪器比较。

提出的方法

定义目标类别和注释规则，聚焦移动的行人，同时在评估中排除干扰项。
策划八个高密度序列（每帧最多246名行人），覆盖室内/室外、日/夜条件。
提供带地面实况注释的训练/测试划分，用于训练，以及用于跟踪评估的公开检测。
使用标准化数据格式（CSV）进行检测和注释，并为每个序列提交提供一个 ZIP 包。
采用 CLEAR 指标和跟踪质量度量（MOTA、MOTP、MT/PT/ML、ID 换位、碎片化）进行全面评估。
呈现在 MOT20 训练数据上训练的 Faster R-CNN（ResNet101）检测器，作为公开基线。

实验结果

研究问题

RQ1最先进的跟踪器在极度拥挤的群体场景中的表现如何？
RQ2检测器和跟踪器是否能在训练中未见的场景和条件上泛化？
RQ3在密集人群中，传统 MOT 指标（MOTA、MOTP）与轨迹质量度量（MT/PT/ML、ID 换位）之间的关系如何？
RQ4使用公开检测与私有检测对跟踪器评估有何影响？
RQ5跟踪方法对遮挡和高密度场景的鲁棒性如何？

主要发现

序列	AP	Rcll	Prcn	FAR	GT	TP	FP	FN	MODA	MODP
MOT20-01	0.82	86.5	99.5	0.14	12945	11199	58	1746	86.06	91.61
MOT20-02	0.82	85.9	99.5	0.15	93107	79971	421	13136	85.44	92.13
MOT20-03	0.54	59.0	98.4	1.10	278148	163988	2653	114160	58.00	86.00
MOT20-05	0.63	64.2	99.4	0.60	528037	338826	1979	189211	63.79	87.59
MOT20-04	0.63	69.7	98.0	1.55	230729	160783	3230	69946	68.29	81.41
MOT20-06	0.43	57.9	74.4	12.64	63889	37002	12745	26887	37.97	73.67
MOT20-07	0.78	83.6	92.5	1.89	16298	13627	1106	2671	76.83	79.11
MOT20-08	0.38	55.2	61.6	13.93	32608	17998	11230	14610	20.76	71.55

MOT20 数据集包含来自 3 个场景的 8 个序列，密度最高可达到帧内 246 名行人。
公开的 Faster R-CNN 检测（在 MOT20 训练数据上训练）被作为跟踪评估的基线提供。
训练序列合计产生 1,134,614 个框；测试序列跨帧产生 517,426 个框。
报告的 MOT 分数在序列之间存在变异，例如 MOT20-01 至 MOT20-05 的 AP 在 0.63–0.82 之间，MT/ML 变动，反映拥挤挑战。
每个序列的检测数量差异很大，从约 12k 到 381k 不等，最小/最大高度表明尺度多样。
在八个序列中，基于检测的基线实现了不同的 AP（0.38–0.82）和 MOTA/MOTP 值，体现拥挤场景的挑战。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。