Skip to main content
QUICK REVIEW

[Paper Review] StrongSORT: Make DeepSORT Great Again

Yunhao Du, Zhao, Zhicheng|arXiv (Cornell University)|Feb 28, 2022
Video Surveillance and Tracking Methods20 citations
TL;DR

The paper revisits DeepSORT to build StrongSORT, a strong and fair MOT baseline, and adds two lightweight plug-in modules (AFLink and GSI) to create StrongSORT++ with state-of-the-art results on MOT17, MOT20, DanceTrack, and KITTI.

ABSTRACT

Recently, Multi-Object Tracking (MOT) has attracted rising attention, and accordingly, remarkable progresses have been achieved. However, the existing methods tend to use various basic models (e.g, detector and embedding model), and different training or inference tricks, etc. As a result, the construction of a good baseline for a fair comparison is essential. In this paper, a classic tracker, i.e., DeepSORT, is first revisited, and then is significantly improved from multiple perspectives such as object detection, feature embedding, and trajectory association. The proposed tracker, named StrongSORT, contributes a strong and fair baseline for the MOT community. Moreover, two lightweight and plug-and-play algorithms are proposed to address two inherent "missing" problems of MOT: missing association and missing detection. Specifically, unlike most methods, which associate short tracklets into complete trajectories at high computation complexity, we propose an appearance-free link model (AFLink) to perform global association without appearance information, and achieve a good balance between speed and accuracy. Furthermore, we propose a Gaussian-smoothed interpolation (GSI) based on Gaussian process regression to relieve the missing detection. AFLink and GSI can be easily plugged into various trackers with a negligible extra computational cost (1.7 ms and 7.1 ms per image, respectively, on MOT17). Finally, by fusing StrongSORT with AFLink and GSI, the final tracker (StrongSORT++) achieves state-of-the-art results on multiple public benchmarks, i.e., MOT17, MOT20, DanceTrack and KITTI. Codes are available at https://github.com/dyhBUPT/StrongSORT and https://github.com/open-mmlab/mmtracking.

Motivation & Objective

  • Provide a strong, fair baseline for tracking-by-detection MOT methods to enable fair comparisons.
  • Improve DeepSORT by upgrading detector, embedding, and inference tricks while maintaining efficiency.
  • Address two MOT-typical problems—missing association and missing detection—via lightweight, plug-and-play modules (AFLink and GSI).
  • Demonstrate state-of-the-art performance on multiple public MOT benchmarks.
  • Offer open-source code to facilitate adoption in academia and industry.

Proposed method

  • Upgrade DeepSORT with a stronger detector (YOLOX-X) and embedding (BoT) for better appearance modeling.
  • Replace feature bank with exponential moving average (EMA) updating of appearance features to reduce noise.
  • Incorporate camera motion compensation (ECC) and NSA Kalman for adaptive noise handling.
  • Use motion-aware cost by combining appearance and motion (C = lambda Aa + (1-lambda) Am).
  • Replace matching cascade with vanilla global linear assignment to avoid overly restrictive priors for stronger trackers.
  • Introduce AFLink, an appearance-free global tracklet linking model using spatiotemporal features to predict tracklet connectivity (binary classifier).
  • Introduce Gaussian-smoothed interpolation (GSI) based on Gaussian process regression to interpolate missing detections with an adaptive smoothness parameter, improving trajectory localization.

Experimental results

Research questions

  • RQ1How can a classic MOT framework (DeepSORT) be re-engineered to serve as a strong, fair baseline for modern MOT methods?
  • RQ2Can lightweight, appearance-free linkage (AFLink) and Gaussian-process-based interpolation (GSI) improve association and restoration of trajectories without heavy computational cost?
  • RQ3Do AFLink and GSI generalize across different trackers to yield consistent performance gains?
  • RQ4What is the impact of replacing cascade matching with vanilla global assignment on stronger trackers?
  • RQ5Do StrongSORT and StrongSORT++ achieve state-of-the-art results on MOT17, MOT20, DanceTrack, and KITTI datasets?

Key findings

modeMethodRef.HOTA(↑)IDF1(↑)MOTA(↑)AssA(↑)DetA(↑)IDs(↓)FPS(↑)
onlineSORT[3]34.039.843.131.837.04,852143.3
StrongSORToursStrongSORT78.578.578.363.763.61,4467.5
  • Replacing DeepSORT components with stronger detector and embedding improves IDF1 and related metrics.
  • EMA-based appearance updates and ECC/NASA Kalman provide incremental gains in IDF1, MOTA, and speed.
  • Including both appearance and motion costs in matching (MC) improves association; abandoning the matching cascade can further boost performance for stronger baselines.
  • AFLink yields notable gains in IDF1 and HOTA across trackers, especially those with missing associations.
  • GSI provides improvements in IDF1, MOTA, and HOTA by smoothing trajectories with Gaussian-process-based interpolation, while maintaining reasonable FPS.
  • StrongSORT++ (StrongSORT with AFLink and GSI) achieves state-of-the-art results on MOT17, MOT20, DanceTrack, and KITTI in various settings.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.