QUICK REVIEW

[论文解读] Vision Language Models in Autonomous Driving: A Survey and Outlook

Xingcheng Zhou, Mingyu Liu|arXiv (Cornell University)|Oct 22, 2023

Multimodal Machine Learning Applications被引用 17

一句话总结

本文综述 Vision-Language Models (VLMs) 在 Autonomous Driving (AD) 与 Intelligent Transportation Systems (ITS) 领域的研究，按模型、数据集、应用及未来挑战进行分类。

ABSTRACT

The applications of Vision-Language Models (VLMs) in the field of Autonomous Driving (AD) have attracted widespread attention due to their outstanding performance and the ability to leverage Large Language Models (LLMs). By incorporating language data, driving systems can gain a better understanding of real-world environments, thereby enhancing driving safety and efficiency. In this work, we present a comprehensive and systematic survey of the advances in vision language models in this domain, encompassing perception and understanding, navigation and planning, decision-making and control, end-to-end autonomous driving, and data generation. We introduce the mainstream VLM tasks in AD and the commonly utilized metrics. Additionally, we review current studies and applications in various areas and summarize the existing language-enhanced autonomous driving datasets thoroughly. Lastly, we discuss the benefits and challenges of VLMs in AD and provide researchers with the current research gaps and future trends.

研究动机与目标

提供一个关于 Vision-Language Models 在自主驾驶与智能交通系统中的应用的全面概述。
对 VLM 架构和输入-输出模态（M2T、M2V、V2T）以及跨模态策略（VTF vs VTM）进行分类。
总结在 AD/ITS 中使用 VLM 的现有数据集和任务。
识别当前的挑战、差距以及未来的研究方向，以指导 VLM 在 AD/ITS 中的后续工作。

提出的方法

介绍自主驾驶、ITS、LLMs 与 VLMs 的基础背景。
提出基于输入-输出模态和跨模态连接（VTF vs VTM）的 AD/ITS 中 VLM 的分类法。
系统性回顾在 AD（感知、导航、决策、端到端、数据生成）和 ITS（感知、ITS 管理）中使用 VLM 的现有工作。
总结领域内使用的数据集和任务（如图像/视频、文本、点云数据）及所进行的分析类型。
讨论挑战、研究差距与未来方向，以为 AD/ITS 的 VLM 研究和应用提供框架。

实验结果

研究问题

RQ1在自主驾驶与智能交通系统中当前使用的 Vision-Language Model 架构与输入-输出模态有哪些？
RQ2VLM 如何被整合到 AD/ITS 的感知、导航、规划、决策、端到端驾驶和数据生成等任务中？
RQ3在 AD/ITS 中用于评估 VLM 的数据集、任务和基准中，哪些最为普遍？
RQ4哪些主要挑战与差距阻碍了 VLM 于 AD/ITS 的采用与进展，未来研究有哪些有前景的方向？

主要发现

本工作提供了对自治驾驶与 ITS 中 Vision-Language Models 的首次全面综述。
它系统性地总结并分析了在 AD 与 ITS 中现有的 VLM 工作及数据集。
它识别了 VLM 在 AD 与 ITS 中的潜在应用与技术进步。
它讨论了该领域的挑战与研究差距，以指导未来的探索与发展。
它澄清了 VLM 的分类（M2T、M2V、V2T）及跨模态策略（Vision-Text-Fusion vs Vision-Text-Matching）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。