Skip to main content
QUICK REVIEW

[Paper Review] SimpleTrack: Understanding and Rethinking 3D Multi-object Tracking

Ziqi Pang, Zhichao Li|arXiv (Cornell University)|Nov 18, 2021
Video Surveillance and Tracking Methods43 references28 citations
TL;DR

The paper deconstructs 3D MOT into four modules, analyzes failure modes, and proposes SimpleTrack, a strong, simple baseline with improvements like stricter NMS, Generalized IoU for association, and a two-stage association to extend track life.

ABSTRACT

3D multi-object tracking (MOT) has witnessed numerous novel benchmarks and approaches in recent years, especially those under the "tracking-by-detection" paradigm. Despite their progress and usefulness, an in-depth analysis of their strengths and weaknesses is not yet available. In this paper, we summarize current 3D MOT methods into a unified framework by decomposing them into four constituent parts: pre-processing of detection, association, motion model, and life cycle management. We then ascribe the failure cases of existing algorithms to each component and investigate them in detail. Based on the analyses, we propose corresponding improvements which lead to a strong yet simple baseline: SimpleTrack. Comprehensive experimental results on Waymo Open Dataset and nuScenes demonstrate that our final method could achieve new state-of-the-art results with minor modifications. Furthermore, we take additional steps and rethink whether current benchmarks authentically reflect the ability of algorithms for real-world challenges. We delve into the details of existing benchmarks and find some intriguing facts. Finally, we analyze the distribution and causes of remaining failures in ame\ and propose future directions for 3D MOT. Our code is available at https://github.com/TuSimple/SimpleTrack.

Motivation & Objective

  • Decouple tracking-by-detection 3D MOT into pre-processing, motion model, association, and life cycle management to identify failure points.
  • Propose simple yet effective improvements for each module to build a strong baseline.
  • Evaluate SimpleTrack on Waymo Open Dataset and nuScenes to establish state-of-the-art performance."
  • Reassess benchmarks and propose directions for future 3D MOT research and evaluation.

Proposed method

  • Decompose 3D MOT into four modules: pre-processing of detections, motion model, association, and life cycle management.
  • Apply stricter non-maximum suppression (NMS) in pre-processing to improve precision while preserving recall.
  • Use Generalized IoU (GIoU) as the association metric to better handle IoU- and distance-based failures.
  • Adopt a two-stage association to extend track life by matching low-score detections to unmatched tracklets after a high-threshold pass.
  • Compare motion models (Kalman Filter vs. Constant Velocity) and show similar or context-dependent advantages.
  • Integrate tracklet interpolation and motion-model-based predictions to improve recall and output scoring, tailored to evaluation protocols.

Experimental results

Research questions

  • RQ1What are the main failure modes of current 3D MOT methods across the four pipeline components?
  • RQ2Can simple, well-mounded changes (NMS, GIoU, two-stage association) yield state-of-the-art performance on major benchmarks?
  • RQ3How do detection frequency and interpolation-based evaluation influence 3D MOT performance on nuScenes and Waymo Open Dataset?
  • RQ4What are the upper-bound limits and remaining challenges for tracking-by-detection approaches in 3D MOT?

Key findings

  • SimpleTrack achieves competitive to state-of-the-art results on Waymo Open Dataset and nuScenes with modest modifications.
  • Stricter NMS improves precision substantially with relatively small recall loss.
  • GIoU-based association mitigates both IoU and distance-based failures and performs well with both bipartite matching and greedy strategies.
  • Two-stage association dramatically reduces ID switches by better maintaining track lifecycles, with minimal impact on MOTA.
  • Motion model choice (KF vs. CV) yields context-dependent gains; KF generally helpful in higher-frequency settings, while CV can be robust at lower frequencies.
  • Using motion-model predictions for high-frequency frames and then extending lifecycles with low-score detections improves AMOTA and reduces ID switches on nuScenes, particularly under 10 Hz settings.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.