QUICK REVIEW

[論文レビュー] SparseTrack: Multi-Object Tracking by Performing Scene Decomposition based on Pseudo-Depth

Zelin Liu, Xinggang Wang|arXiv (Cornell University)|Jun 8, 2023

Video Surveillance and Tracking Methods被引用数 30

ひとこと要約

SparseTrack は、疑似深度ベースのシーン分解と深度カスケードマッチングを導入し、IoU のみのデータ連携を混雑した MOT シーンで実行し、MOT17、MOT20、DanceTrack で競争力のある結果を達成します。

ABSTRACT

Exploring robust and efficient association methods has always been an important issue in multiple-object tracking (MOT). Although existing tracking methods have achieved impressive performance, congestion and frequent occlusions still pose challenging problems in multi-object tracking. We reveal that performing sparse decomposition on dense scenes is a crucial step to enhance the performance of associating occluded targets. To this end, we propose a pseudo-depth estimation method for obtaining the relative depth of targets from 2D images. Secondly, we design a depth cascading matching (DCM) algorithm, which can use the obtained depth information to convert a dense target set into multiple sparse target subsets and perform data association on these sparse target subsets in order from near to far. By integrating the pseudo-depth method and the DCM strategy into the data association process, we propose a new tracker, called SparseTrack. SparseTrack provides a new perspective for solving the challenging crowded scene MOT problem. Only using IoU matching, SparseTrack achieves comparable performance with the state-of-the-art (SOTA) methods on the MOT17 and MOT20 benchmarks. Code and models are publicly available at \url{https://github.com/hustvl/SparseTrack}.

研究の動機と目的

混雑する MOT で遮蔽が多い場面における頑健なデータ連携を動機づける。
深度ベースのシーン分解を活用して遮蔽を低減する軽量な IoU のみのトラッカーを提案する。
シンプルな地面平面の事前条件の下で 2D 画像から相対深度を推定する疑似深度を導入する。
階層的な結合を深度サブセット全体で行うための深度カスケードマッチング（DCM）を開発する。
提案手法が標準 MOT ベンチマークで最先端手法と互角またはそれ以上であることを示す。

提案手法

地平面の事前情報を用いて 2D 画像から検出・トラックの疑似深度値を計算する。
疑似深度値を用いてシーンを深度ベースのサブセットに分割する。
近いものから遠いものへ、各深度サブセット内で IoU ベースのアソシエーションを深度カスケードマッチングを適用する。
運動予測にはカルマンフィルタを用い、マッチには IoU 距離を用い、検出を高/低スコアで分割して深度レベルを案内する。
DCM はプラグアンドプレイで、他の IoU ベースのトラッカーへ組み込んで遮蔽処理を改善できる。

実験結果

リサーチクエスチョン

RQ1疑似深度が 2D 画像から導出され、相対深度を明らかにして効果的な深度ベースのシーン分解を実現できるか。
RQ2深度スライスされたサブセット内で IoU ベースのデータ連携を実行する（DCM を介して）ことで、混雑した MOT シーンで遮蔽による誤りを減らせるか。
RQ3SparseTrack は標準 MOT ベンチマーク（MOT17、MOT20）および難易度の高いデータセット（DanceTrack）で、ベースライン IoU ベースのトラッカーと比較してどうなるか。
RQ4深度カスケードマッチングのアプローチは、他のトラッカーへのドロップインモジュールとして一般化可能か。
RQ5密集シーンにおける連携性能に対する疑似深度レベル数の影響はどのようになるか。

主な発見

トラッカー	HOTA↑	MOTA↑	IDF1↑	FP↓	FN↓	IDs↓	FPS↑
SparseTrack (IoU-only, our method)	65.1	81.0	80.1	23904	81927	1170	19.9
ByteTrack	63.1	80.3	77.3	25491	83721	2196	29.6
BoT-SORT-ReID	65.0	80.5	80.2	22521	86037	1212	4.5
SparseTrack (IoU-only, our method)	63.4	78.2	77.3	25108	86720	1116	12.5
ByteTrack	61.3	77.8	75.2	26249	87594	1223	17.5
BoT-SORT	62.6	77.7	76.3	22521	86037	1212	6.6
SparseTrack (IoU-only, our method)	55.5	91.3	58.3	39.1	78.9	-	12.5

SparseTrack は IoU のみのデータ連携で MOT17 において競争力のある結果を達成、例として 65.1 HOTA、81.0 MOTA、80.1 IDF1 を MOT17 テストセットで達成。
MOT20 では SparseTrack が 63.4 HOTA、78.2 MOTA、77.3 IDF1 を達成し、ベースライン IoU 法を上回る。
DanceTrack では SparseTrack が 55.5 HOTA、91.3 IDF1、58.3 を達成し、強力な IoU のみの手法としてベースラインを大幅に上回る。
疑似深度と DCM による深度ベースのシーン分解は、Appearance features を用いない場合でも、さまざまなベースラインより一貫して連携指標を改善し、SOTA に匹敵するか近づくことがある。
DCM モジュールはプラグアンドプレイで、IoU ベースのデータ連携に依存する他のトラッカーへ組み込んだ場合に性能を向上させる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。