QUICK REVIEW

[論文レビュー] Flow-Based Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection

Haibao Yu, Yingjuan Tang|arXiv (Cornell University)|Nov 3, 2023

Advanced Neural Network Applications被引用数 10

ひとこと要約

FFNetはVIC3Dのフロー基盤の中間融合を導入し、インフラ特徴フローを予測・圧縮して自車データと整合させ、約1/100の生データ転送量で最先端のmAPを達成します。

ABSTRACT

Cooperatively utilizing both ego-vehicle and infrastructure sensor data can significantly enhance autonomous driving perception abilities. However, the uncertain temporal asynchrony and limited communication conditions can lead to fusion misalignment and constrain the exploitation of infrastructure data. To address these issues in vehicle-infrastructure cooperative 3D (VIC3D) object detection, we propose the Feature Flow Net (FFNet), a novel cooperative detection framework. FFNet is a flow-based feature fusion framework that uses a feature flow prediction module to predict future features and compensate for asynchrony. Instead of transmitting feature maps extracted from still-images, FFNet transmits feature flow, leveraging the temporal coherence of sequential infrastructure frames. Furthermore, we introduce a self-supervised training approach that enables FFNet to generate feature flow with feature prediction ability from raw infrastructure sequences. Experimental results demonstrate that our proposed method outperforms existing cooperative detection methods while only requiring about 1/100 of the transmission cost of raw data and covers all latency in one model on the DAIR-V2X dataset. The code is available at \href{https://github.com/haibao-yu/FFNet-VIC3D}{https://github.com/haibao-yu/FFNet-VIC3D}.

研究の動機と目的

VIC3D物体検出における時間的非同期性と帯域幅の制約を動機づけ、対処する。
将来のインフラ特徴を予測して自車データと整合させる、フロー基盤の特徴融合フレームワークFFNetを提案する。
手作業ラベリングなしで特徴フローを学習する自己教師付き訓練スキームを開発する。
DAIR-V2Xで最先端検出を達成しつつ転送コストを削減することを実証する。

提案手法

不確定な遅延を伴う自車とインフラのLiDARからのVIC3D入力を定義する。
2つの連続インフラフレームを用いた一階(線形)予測としてインフラ特徴フローを生成し、F_i(t_i+k)を予測する。
注目マスクとbビット量子化をオプションとした特徴フロー（特徴と微分）を圧縮・伝送してABコストを低減する。
車両側で予測インフラ特徴をBEV空間の自車特徴と融合し、3D検出ヘッド（SSDベース）を実行する。
FFNetを2段階で訓練する：エンドツーエンド融合のベースラインを最初に、次にコサイン類似度損失による時間的一貫性を用いて自己教師付き特徴フロー生成器を訓練する。

実験結果

リサーチクエスチョン

RQ1Can a flow-based intermediate fusion mitigate temporal misalignment in VIC3D object detection under uncertain latency?
RQ2Does predicting and compressing feature flow reduce transmission cost without sacrificing detection accuracy?
RQ3How effective is self-supervised training of the feature flow generator using infrastructure sequences alone?
RQ4Is FFNet robust across a range of latency conditions (e.g., 100–500 ms) on DAIR-V2X?

主な発見

Model	FusionType	Latency (ms)	mAP@3D IoU=0.5	mAP@3D IoU=0.7	mAP@BEV IoU=0.5	mAP@BEV IoU=0.7	AB (Byte)
PointPillars	non-fusion	/	48.06	-	52.24	-	0
AutoAlignV2	non-fusion	/	50.32	-	53.88	-	0
Early Fusion	early	200	54.63	38.23	61.08	50.06	1.4e6
Late Fusion	late	200	52.43	36.54	58.10	49.25	5.1e2
DiscoNet	middle	200	50.76	28.57	58.20	48.90	1.2e5
V2VNet	middle	200	49.67	26.96	56.02	46.32	1.2e5
FFNet (Ours)	middle	200	55.37	31.66	63.20 (+9.32)	54.69	1.2e5
FFNet-C1 (Ours)	middle	200	55.17	31.20	62.87 (+8.99)	54.28	1.7e4
Early Fusion	early	300	51.37	37.25	58.28	49.81	1.4e6
Late Fusion	late	300	51.35	36.24	56.89	48.79	5.1e2
DiscoNet	middle	300	49.03	27.39	55.81	47.28	1.2e5
V2VNet	middle	300	48.51	27.00	55.81	46.32	1.2e5
FFNet (Ours)	middle	300	53.46	30.42	61.20	52.44	1.2e5
FFNet-C1 (Ours)	middle	300	54.10	29.87	60.76	53.28	1.7e4

FFNetはDAIR-V2Xで生データの約1/100の転送コストで最先端のmAP@BEVを達成し、競合するmAP@3Dと比較して高い性能を示す。
200 msの遅延時、FFNet mAP@BEV IoU=0.5は63.20、mAP@3D IoU=0.5は55.37に達し、いくつかのフュージョンベースラインを上回る。
FFNet-C1（圧縮版）はmAP@BEV IoU=0.5で62.87を達成し、初期融合ベースラインと比較して非常に低い伝送コスト(1.7e4 bytes AB)を維持する。
特徴フロー予測なしでは遅延が性能を大幅に低下させ、FFNetの特徴予測が時間的非同期性補償にとって重要であることを示す。
FFNetは100–500 msの遅延に対して頑健であり、BEVのmAP低下は小さい一方、予測なしの変種はより顕著に劣化する。
自己教師付き訓練（FFNet-V4）は追加のインフラ系列の効果を享受し、協調ビューラベルがなくても性能を向上させる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。