[論文レビュー] Tracklets Predicting Based Adaptive Graph Tracking
TPAGTは追跡子の運動ベースの特徴再抽出と、位置情報・外観・履歴を統合する適応グラフニューラルネットワークを提案し、頑健な多物体追跡を実現。MOT16/17で最先端のMOTスコアを達成。
Most of the existing tracking methods link the detected boxes to the tracklets using a linear combination of feature cosine distances and box overlap. But the problem of inconsistent features of an object in two different frames still exists. In addition, when extracting features, only appearance information is utilized, neither the location relationship nor the information of the tracklets is considered. We present an accurate and end-to-end learning framework for multi-object tracking, namely extbf{TPAGT}. It re-extracts the features of the tracklets in the current frame based on motion predicting, which is the key to solve the problem of features inconsistent. The adaptive graph neural network in TPAGT is adopted to fuse locations, appearance, and historical information, and plays an important role in distinguishing different objects. In the training phase, we propose the balanced MSE LOSS to successfully overcome the unbalanced samples. Experiments show that our method reaches state-of-the-art performance. It achieves 76.5\% MOTA on the MOT16 challenge and 76.2\% MOTA on the MOT17 challenge.
研究の動機と目的
- フレーム間の特徴不整合を解消することで頑健な多物体追跡を動機づける。
- 動作予測に基づいて現在フレームで追跡子特徴を再抽出する方法を開発する。
- 適応的グラフニューラルネットワークを介して位置情報・外観・履歴情報を統合する。
- 学習時のデータ不均衡に対処するためバランスの取れたMSE損失を用いる。
- MOT16およびMOT17のベンチマークで最先端性能を示す。
提案手法
- Tracklets predicting-based feature re-extracting: predict tracklet motion and re-extract features in the current frame using pyramid LK to align features.
- Adaptive Graph Neural Network: treat detections and tracklets as bipartite graph nodes; adaptively weight edge info using IOU and feature similarity to update node embeddings.
- Compute similarity with normalized embeddings and form an output similarity matrix for matching.
- Balanced MSE Loss (BMSE): balance positive/negative sample contributions to address data imbalance in training.
- Inference uses an augmented similarity matrix with a margin and Hungarian algorithm for data association.
- Ablation utilities: backbone comparison, motion estimation methods, and AGNN variants to assess contribution of each component.
実験結果
リサーチクエスチョン
- RQ1How to align tracklet features with current-frame detections to reduce cross-frame feature inconsistency?
- RQ2Can adaptive graph neural networks effectively fuse location, appearance, and historical tracklet information to improve association?
- RQ3Does balancing the loss help mitigate unbalanced sample distribution in MOT data associations?
- RQ4What is the impact of tracklet motion-based feature re-extraction on overall MOT performance?
- RQ5How does TPAGT compare to state-of-the-art trackers on MOT16 and MOT17 under public and private detections?
主な発見
| 手法 | MOTA ↑ | IDF1 ↑ | MT ↑ | ML ↓ | FP ↓ | FN ↓ | IDSw ↓ |
|---|---|---|---|---|---|---|---|
| Ours (Public MOT16) | 62.7 | 60.3 | 28.5 | 26.9 | 5077 | 61952 | 978 |
| Ours (Public MOT17) | 62.0 | 59.5 | 27.8 | 31.5 | 15114 | 196672 | 2621 |
- TPAGT achieves 62.7% MOTA and 60.3% IDF1 on MOT16 public detections with the proposed approach.
- TPAGT achieves 62.0% MOTA and 59.5% IDF1 on MOT17 public detections with the proposed approach.
- On MOT16 MOT17 private detections, TPAGT attains 76.5% MOTA and 68.6% IDF1 (MOTA) and 68.0% IDF1 (IDF1) respectively, surpassing several prior methods.
- Re-extracting tracklet features in the current frame (motion-based alignment) significantly improves MOTA and IDF1.
- Adaptive graph in TPAGT (AGNN) substantially improves performance over non-adaptive GNN variants.
- Balanced MSE Loss yields better results than Triplet Loss in their ablation.
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。