QUICK REVIEW

[論文レビュー] You Only Need Two Detectors to Achieve Multi-Modal 3D Multi-Object Tracking

Xiyang Wang, Chunyun Fu|arXiv (Cornell University)|Apr 18, 2023

Video Surveillance and Tracking Methods被引用数 9

ひとこと要約

要約: 2D検出器と3D検出器のみを用い、データ关联を排除して頑健性を向上させる、エンドツーエンドのマルチモーダル3D MOTフレームワークを提案。

ABSTRACT

In the classical tracking-by-detection (TBD) paradigm, detection and tracking are separately and sequentially conducted, and data association must be properly performed to achieve satisfactory tracking performance. In this paper, a new end-to-end multi-object tracking framework is proposed, which integrates object detection and multi-object tracking into a single model. The proposed tracking framework eliminates the complex data association process in the classical TBD paradigm, and requires no additional training. Secondly, the regression confidence of historical trajectories is investigated, and the possible states of a trajectory (weak object or strong object) in the current frame are predicted. Then, a confidence fusion module is designed to guide non-maximum suppression for trajectories and detections to achieve ordered and robust tracking. Thirdly, by integrating historical trajectory features, the regression performance of the detector is enhanced, which better reflects the occlusion and disappearance patterns of objects in real world. Lastly, extensive experiments are conducted on the commonly used KITTI and Waymo datasets. The results show that the proposed framework can achieve robust tracking by using only a 2D detector and a 3D detector, and it is proven more accurate than many of the state-of-the-art TBD-based multi-modal tracking methods. The source codes of the proposed method are available at https://github.com/wangxiyang2022/YONTD-MOT.

研究の動機と目的

Motivation to simplify multi-modal 3D MOT by avoiding explicit data association.
Develop an end-to-end framework that fuses detection and tracking in a single model.
Investigate regression confidence of historical trajectories to predict object states in current frames.
Enhance detector regression using historical trajectory features to reflect occlusion and disappearance patterns.

提案手法

Integrate object detection and multi-object tracking into a single end-to-end model.
Eliminate the traditional data association step of tracking-by-detection (TBD).
Introduce a confidence fusion module to guide non-maximum suppression for trajectories and detections.
Predict possible trajectory states (weak vs strong) in the current frame based on regression confidence of history.
Incorporate historical trajectory features to improve detector regression and handle occlusions.
Evaluate on KITTI and Waymo to demonstrate robustness with only a 2D detector and a 3D detector.

実験結果

リサーチクエスチョン

RQ1Can end-to-end joint detection and tracking remove the need for complex data association in multi-modal 3D MOT?
RQ2How does regression confidence of historical trajectories influence current-frame state prediction and suppression decisions?
RQ3Does incorporating historical trajectory features improve detector regression under occlusion and disappearance patterns?

主な発見

The framework achieves robust multi-modal 3D MOT using only a 2D detector and a 3D detector.
It outperforms many state-of-the-art TBD-based multi-modal tracking methods (based on reported claims).
A confidence fusion module guides non-maximum suppression to yield ordered and robust tracking results.
Historical trajectory features improve regression performance of the detector, better reflecting real-world occlusion and disappearance patterns.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。