QUICK REVIEW

[論文レビュー] The Interstate-24 3D Dataset: a new benchmark for 3D multi-camera vehicle tracking

Derek Gloudemans, Yanbing Wang|arXiv (Cornell University)|Aug 28, 2023

Video Surveillance and Tracking Methods被引用数 8

ひとこと要約

I24-3D を導入。I-24 に沿って記録された 16–17 台の overhead カメラ、877k の 3D 車両注釈、マルチカメラ追跡パイプラインのベースラインベンチマークを提供。

ABSTRACT

This work presents a novel video dataset recorded from overlapping highway traffic cameras along an urban interstate, enabling multi-camera 3D object tracking in a traffic monitoring context. Data is released from 3 scenes containing video from at least 16 cameras each, totaling 57 minutes in length. 877,000 3D bounding boxes and corresponding object tracklets are fully and accurately annotated for each camera field of view and are combined into a spatially and temporally continuous set of vehicle trajectories for each scene. Lastly, existing algorithms are combined to benchmark a number of 3D multi-camera tracking pipelines on the dataset, with results indicating that the dataset is challenging due to the difficulty of matching objects traveling at high speeds across cameras and heavy object occlusion, potentially for hundreds of frames, during congested traffic. This work aims to enable the development of accurate and automatic vehicle trajectory extraction algorithms, which will play a vital role in understanding impacts of autonomous vehicle technologies on the safety and efficiency of traffic.

研究の動機と目的

実世界の高速道路環境で車両追跡のための高品質な 3D マルチカメラ車両データセットを提供する。
混雑した交通におけるカメラ間の 3D 追跡、再識別、軌跡融合に関する研究を促進する。
既存のマルチカメラ追跡パイプラインのベースライン難易度を確立し、今後の手法開発を情報提供する。

提案手法

16–17 台の重複する高速道路カメラを横断する 3 つのシーン合計 57 分間にわたり 3D 車両境界ボックスを収集・注釈する。
道路と整合した座標系で 3D 境界ボックスと再識別・追跡用の ID を付与する。
検出器、トラッカー、およびカメラ間の融合戦略を変化させて複数の既存の 3D MOT パイプラインをベンチマークする。
単眼検出器からの 3D 検出、複数の追跡/アソシエーション手法、および 2 つの融合戦略（検出の融合と軌跡の融合）を用いてカメラ間のクロストラックを作成する。

Figure 1: Example annotated (green boxes) frames from each camera field of view for one scene of the I24-3D Dataset. The approximate field of view for each camera is shown on the overhead roadway diagram below (some cameras shown in unique colors as examples). Regions outside of the considered field

実験結果

リサーチクエスチョン

RQ1オーバーヘッドの高速道路カメラネットワーク上で、道路上の座標系を用いて 3D マルチカメラ追跡を効果的に実現できるか。
RQ2多くのカメラをまたぐ高密度の遮蔽・高速交通に対して、現行の 3D MOT パイプラインはどの程度の性能を示すか。
RQ3交通監視の文脈におけるカメラ間融合戦略が 3D 追跡性能に与える影響はどの程度か。

主な発見

I24-3D は 877k の 3D 車両境界ボックス、720 のユニークな車両軌跡を、16–17 台のカメラで 3 シーンにわたり提供する。
最も高性能のパイプライン（Dual3D 検出器 + KIOU トラッカー + 軌跡融合）は HOTA 44.8%、Mostly Tracked オブジェクト 63.8% を達成。GT 入力時には HOTA が高くなり（GT 検出で 59.6%、GT 軌跡で 61.6%）、検出・トラッキングの理想値に近づく。
グラウンドトゥルース入力でさえ、多くの設定で HOTA が 0.6 未満に留まり、局在精度と密集した交通による遮蔽のためカメラ間追跡の難易度が高いことを示す。
局在誤差は無視できない。カメラ全体で平均位置ずれ 1.24 ft、平均寸法誤差 0.5 ft。
カメラ間の整合と融合は性能に顕著に影響を与え、軌跡融合はしばしば強い結果を生むが、細粒度の交通分析性能（例：HOTA > 0.75、MT/ML のバランス）を達成する手法はまだなし。
密集したシーンでのすべての車両を追跡するには依然大きな課題があり、遮蔽やより高速な物体に対する課題がある。

Figure 2: Example single annotation. The annotation is stored in roadway coordinates (left) but can be projected into cameras 5 and 6 on pole 1 (p1c5 and p1c6).

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。