QUICK REVIEW

[论文解读] Vehicle-Infrastructure Cooperative 3D Object Detection via Feature Flow Prediction

Haibao Yu, Yingjuan Tang|arXiv (Cornell University)|Mar 19, 2023

Advanced Neural Network Applications被引用 14

一句话总结

FFNet 引入了特征流预测模块，以在时间不同步的情况下对齐基础设施特征和自车特征，在降低传输成本的同时提升 VIC3D 检测性能。

ABSTRACT

Cooperatively utilizing both ego-vehicle and infrastructure sensor data can significantly enhance autonomous driving perception abilities. However, temporal asynchrony and limited wireless communication in traffic environments can lead to fusion misalignment and impact detection performance. This paper proposes Feature Flow Net (FFNet), a novel cooperative detection framework that uses a feature flow prediction module to address these issues in vehicle-infrastructure cooperative 3D object detection. Rather than transmitting feature maps extracted from still-images, FFNet transmits feature flow, which leverages the temporal coherence of sequential infrastructure frames to predict future features and compensate for asynchrony. Additionally, we introduce a self-supervised approach to enable FFNet to generate feature flow with feature prediction ability. Experimental results demonstrate that our proposed method outperforms existing cooperative detection methods while requiring no more than 1/10 transmission cost of raw data on the DAIR-V2X dataset when temporal asynchrony exceeds 200$ms$. The code is available at \href{https://github.com/haibao-yu/FFNet-VIC3D}{https://github.com/haibao-yu/FFNet-VIC3D}.

研究动机与目标

通过同时利用基础设施和自车传感器来推动 VIC3D 目标检测，克服仅凭自车感知的局限。
解决导致融合错位的时序异步和有限的通信带宽问题。
提出一个可扩展的中级融合框架，其通过预测未来的基础设施特征以与车辆时间戳对齐。
通过传输压缩后的特征流而非原始特征图来降低传输成本。
展示对不确定延迟的鲁棒性，并在真实世界数据集上展现出最先进的性能。

提出的方法

引入特征流网络（FFNet），其传输的是压缩后的特征流而非原始特征。
将特征流定义为 F_i(t_i) 及其一阶导数 F_i'(t_i)，以预测未来的基础设施特征。
利用自监督训练方法，通过对比余弦相似性损失从连续的基础设施帧中学习 F_i'(t_i)。
压缩基础设施特征 F_i(P_i(t_i)) 及其导数，将每次传输的成本降至 0.12 MB。
在车辆端解压缩，使用线性近似为时间戳 t_v 预测对齐的基础设施特征，并与车辆特征融合用于3D检测头。
在 DAIR-V2X 上进行评估，将 FFNet 与非融合、早期融合、晚期融合和中间融合基线进行比较。

实验结果

研究问题

RQ1特征流预测是否能够缓解因 VIC3D 的时间异步而导致的融合错位？
RQ2使用压缩后的特征流相对于原始数据或其他融合方案，在传输成本上有何权衡？
RQ3FFNet 对基础设施与自车传感器之间的变化和不确定延迟的鲁棒性如何？
RQ4自监督学习在无需人工标注的情况下，是否能有效训练特征流生成器？
RQ5在真实世界数据上，FFNet 与最先进的协作感知方法相比的表现如何？

主要发现

FFNet 在 DAIR-V2X 上，FFNet 在中间融合方法中以 200 ms 延迟达到最先进性能。
在 100 ms 延迟下，FFNet 的 mAP@3D IoU0.5 为 55.48，mAP@BEV IoU0.5 为 63.14，AB 为 1.2e5 字节。
在 200 ms 延迟下，FFNet 的 mAP@3D IoU0.5 为 55.37，mAP@BEV IoU0.5 为 63.20，AB 为 1.2e5 字节。
FFNet 在 100 ms 与 200 ms 延迟下均显著优于 DiscoNet 和 V2VNet（中间融合），且传输成本不超过原始数据的 1/10。
基于特征流的方法显著缓解因时间延迟导致的性能下降，在较高延迟下超越了无预测的 FFNet 变体。
自监督训练能够有效从连续的基础设施帧中学习特征流预测器，实现对不确定延迟的鲁棒对齐。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。