QUICK REVIEW

[论文解读] FusionAD: Multi-modality Fusion for Prediction and Planning Tasks of Autonomous Driving

Tengju Ye, Jing Wei|arXiv (Cornell University)|Aug 2, 2023

Advanced Neural Network Applications被引用 11

一句话总结

FusionAD 提出基于 BEV 的多传感器（摄像头+激光雷达）融合框架，联合优化感知、预测和规划，配备 FMSPnP 模块，在 nuScenes 上的预测和规划获得最先进的结果，同时在感知任务上优于仅摄像头的方法。

ABSTRACT

Building a multi-modality multi-task neural network toward accurate and robust performance is a de-facto standard in perception task of autonomous driving. However, leveraging such data from multiple sensors to jointly optimize the prediction and planning tasks remains largely unexplored. In this paper, we present FusionAD, to the best of our knowledge, the first unified framework that fuse the information from two most critical sensors, camera and LiDAR, goes beyond perception task. Concretely, we first build a transformer based multi-modality fusion network to effectively produce fusion based features. In constrast to camera-based end-to-end method UniAD, we then establish a fusion aided modality-aware prediction and status-aware planning modules, dubbed FMSPnP that take advantages of multi-modality features. We conduct extensive experiments on commonly used benchmark nuScenes dataset, our FusionAD achieves state-of-the-art performance and surpassing baselines on average 15% on perception tasks like detection and tracking, 10% on occupancy prediction accuracy, reducing prediction error from 0.708 to 0.389 in ADE score and reduces the collision rate from 0.31% to only 0.12%.

研究动机与目标

激励在自动驾驶中实现统一的端到端学习，将摄像头和 LiDAR 融合以完成感知、预测和规划。
开发一个 BEV 融合 Transformer，以为下游任务生成统一的多模态特征。
引入 FMSPnP（fusion-aided modality-aware prediction and status-aware planning）以利用融合特征提升预测和规划。
在 nuScenes 基准上展示在感知、运动预测、占用预测和规划方面的改进。
提供消融研究和定性分析以验证融合与规划组件的贡献。

提出的方法

一个 BEV 编码器，在基于 Transformer 的多模态时间融合设置中融合相机和 LiDAR 特征，使用 Points 交互注意、Image 交互注意和 Temporal 自注意。
一个面向融合的预测模块（模态自注意和精细化网络），提升多模态轨迹学习与预测质量。
一个规划模块，使用可学习的自我指令嵌入和 BEV 特征，在推理阶段具可微分的碰撞损失和轨迹优化。
三阶段训练策略： (i) 训练 BEV 编码器和感知任务，(ii) 固定编码器并训练感知+预测+规划，(iii) 可选阶段进一步训练占用与规划。

实验结果

研究问题

RQ1基于 BEV 的摄像头与 LiDAR 融合能否提升端到端的联合感知-预测-规划，超过仅摄像头的方法？
RQ2如何设计模态感知的预测和状态感知的规划，以有效利用多模态特征？
RQ3基于融合的特征是否能改善下游规划的安全性指标，如自动驾驶场景中的碰撞率？
RQ4时间和跨模态注意机制对多任务自动驾驶性能的影响是什么？
RQ5所提出的方法在 nuScenes 基准上在感知、预测、占用与规划任务的表现如何？

主要发现

FusionAD 在 nuScenes 的多任务上达到最先进结果，显著提升感知、预测、占用预测和规划，相较基线。
预测误差（ADE）显著降低（例如 ADE 分数从 0.708 降至 0.389），碰撞率下降（0.31% 降至 0.12%）。
运动预测显示显著提升（例如 minADE 和 minFDE 分别提升至 0.388 和 0.617）。
占用预测性能显著提升（IoU 与 VPQ 指标显示明显优势，尤其在远距离）。
规划在平均和轨迹碰撞率方面更低，且位移误差指标具有竞争力。
消融研究指出精细化网络和模态注意力是预测提升的关键贡献因素，且自我感知的规划组件对规划改进至关重要。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。