QUICK REVIEW

[論文レビュー] DEFT: Detection Embeddings for Tracking

Mohamed Chaabane, Peter Zhang|arXiv (Cornell University)|Feb 3, 2021

Anomaly Detection Techniques and Applications参考文献 53被引用数 52

ひとこと要約

DEFT は検出器バックボーン内で検出埋込みを共同で学習し、オンラインのマルチオブジェクト追跡を実行する。外観ベースのマッチングヘッドと動作モデルを用いて、遮蔽や大きなフレーム間変位の下でも物体をフレーム間で堅牢に追跡する。2D ベンチマークで強力な結果を達成し、nuScenes のモノキュラー3D追跡を大きく向上させる。

ABSTRACT

Most modern multiple object tracking (MOT) systems follow the tracking-by-detection paradigm, consisting of a detector followed by a method for associating detections into tracks. There is a long history in tracking of combining motion and appearance features to provide robustness to occlusions and other challenges, but typically this comes with the trade-off of a more complex and slower implementation. Recent successes on popular 2D tracking benchmarks indicate that top-scores can be achieved using a state-of-the-art detector and relatively simple associations relying on single-frame spatial offsets -- notably outperforming contemporary methods that leverage learned appearance features to help re-identify lost tracks. In this paper, we propose an efficient joint detection and tracking model named DEFT, or "Detection Embeddings for Tracking." Our approach relies on an appearance-based object matching network jointly-learned with an underlying object detection network. An LSTM is also added to capture motion constraints. DEFT has comparable accuracy and speed to the top methods on 2D online tracking leaderboards while having significant advantages in robustness when applied to more challenging tracking data. DEFT raises the bar on the nuScenes monocular 3D tracking challenge, more than doubling the performance of the previous top method. Code is publicly available.

研究の動機と目的

よりシンプルで堅牢な関連付けメカニズムを用いたトラッキング・バイ・デテクションを動機づける。
埋め込みベースのマッチングのために検出器の特徴を再利用する、検出と追跡を同時に行うネットワークを開発する。
関連付け中の物体軌道の妥当性を制約するために運動モデルを組み込む。
遮蔽や大きなフレーム間変位に対する頑健性を示すため、DEFT を 2D および 3D 追跡ベンチマークで評価する。

提案手法

検出された各物体に対して、複数の検出器バックボーンの特徴マップから外観埋込みを抽出する。
埋込みが検出とフレーム間の関連付けの両方を最適化するように、共有された検出とマッチングネットワークを訓練する。
現在の検出とトラック埋込み間のペアワイズ類似度を1x1畳み込みネットワークを用いて計算するマッチングヘッドを使用する。
最近のフレームにわたるトラック埋込みのメモリを保持して、長距離の関連付けと遮蔽処理を可能にする。
関連付けの妥当性を制約し、あり得ない一致を除外するためにLSTMベースの運動予測モジュールを適用する。
新規または離脱した物体を扱うために非一致スコアを組み込んでハンガリアン法を用いたオンラインデータ関連付けを行う。

実験結果

リサーチクエスチョン

RQ1検出器バックボーン埋込みはオンライン MOT における外観ベースのデータアソシエーションに効果的に再利用できるか？
RQ2検出と追跡を共同で学習することは、別個の段階と比べて検出品質と追跡の頑健性の両方を改善するか？
RQ3遮蔽や大きなフレーム間変位といった難しい状況において、学習済みの運動モデル（LSTM）は埋込みベースのマッチングとどのように相互作用するか？

主な発見

DEFT の共同学習は MOT および KITTI ベンチマークで競争力のある 2D 追跡性能をもたらす。
DEFT は遮蔽や大きなフレーム間変位に対する頑健性を大幅に向上させ、挑戦的なデータ（特に nuScenes）で従来手法を上回る。
学習済みの検出埋込みは、より単純なトラッカーと同等の効率を維持しつつ、フレーム間のアイデンティティ結合に強い信号を提供する。
LSTM 運動モデルは追加の利得を提供し、特に難しいシーケンスで顕著であり、この文脈ではカルマンフィルタのようなアプローチを上回ることがある。
ベンチマーク全体を通じて、検出とマッチングの特徴を共有することが、検出と関連付けを別々に扱う方法を上回ることを示している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。