QUICK REVIEW

[論文レビュー] Vision Language Models in Autonomous Driving: A Survey and Outlook

Xingcheng Zhou, Mingyu Liu|arXiv (Cornell University)|Oct 22, 2023

Multimodal Machine Learning Applications被引用数 17

ひとこと要約

本論文は自動運転（AD）と知能交通システム（ITS）における Vision-Language Models（VLMs）を概説し、モデル、データセット、アプリケーション、および将来の課題を分類する。

ABSTRACT

The applications of Vision-Language Models (VLMs) in the field of Autonomous Driving (AD) have attracted widespread attention due to their outstanding performance and the ability to leverage Large Language Models (LLMs). By incorporating language data, driving systems can gain a better understanding of real-world environments, thereby enhancing driving safety and efficiency. In this work, we present a comprehensive and systematic survey of the advances in vision language models in this domain, encompassing perception and understanding, navigation and planning, decision-making and control, end-to-end autonomous driving, and data generation. We introduce the mainstream VLM tasks in AD and the commonly utilized metrics. Additionally, we review current studies and applications in various areas and summarize the existing language-enhanced autonomous driving datasets thoroughly. Lastly, we discuss the benefits and challenges of VLMs in AD and provide researchers with the current research gaps and future trends.

研究の動機と目的

Vision-Language Modelsが自動運転と知能交通システムにどう適用されているかの総覧を提供する。
VLMのアーキテクチャと入力-出力モダリティ（M2T、M2V、V2T）およびモダリティ間戦略（VTF vs VTM）を分類する。
VLMsを用いたAD/ITSで使用される既存のデータセットとタスクを要約する。
現在の課題、ギャップ、およびVLM対応のAD/ITSにおける今後の研究方向を特定し、今後の研究の指針とする。

提案手法

自動運転、 ITS、LLMs、およびVLMsの基礎的背景を紹介する。
入力-出力モダリティとモダリティ間接続（VTF vs VTM）に基づくAD/ITSにおけるVLMの分類法を提案する。
VLMを採用するAD（知覚、ナビゲーション、意思決定、エンドツーエンド、データ生成）およびITS（知覚、ITS管理）の既存研究を系統的にレビューする。
領域で用いられるデータセットとタスク（例：画像/動画、テキスト、点群データ）および実施された分析の種類を要約する。
AD/ITSのVLMsにおける継続中および将来の研究を形作るために、課題、研究ギャップ、および今後の方向性を議論する。

実験結果

リサーチクエスチョン

RQ1自動運転と知能交通システムで現在用いられているVision-Language Modelのアーキテクチャと入力-出力モダリティは何か？
RQ2VLMsはAD/ITSの知覚、ナビゲーション、計画、意思決定、エンドツーエンド走行、およびデータ生成タスクにどのように統合されているか？
RQ3AD/ITSでVLMを評価する際に最も一般的なデータセット、タスク、ベンチマークは何か？
RQ4AD/ITSにおけるVLMの採用と発展を妨げる主な課題とギャップは何か、今後の研究の有望な方向性は何か？

主な発見

本研究は自動運転とITSにおけるVision-Language Modelsの最初の総合的調査を提供する。
ADとITSにまたがる既存のVLM研究とデータセットを系統的に要約・分析する。
ADとITSにおけるVLMの潜在的な応用と技術的進歩を特定する。
この分野の課題と研究ギャップを議論し、今後の探求と開発を指針とする。
VLMの分類（M2T、M2V、V2T）およびモダリティ間戦略（Vision-Text-Fusion vs Vision-Text-Matching）の体系を明確にする。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。