QUICK REVIEW

[論文レビュー] LLM4Drive: A Survey of Large Language Models for Autonomous Driving

Zhenjie Yang, Xiaosong Jia|arXiv (Cornell University)|Nov 2, 2023

Topic Modeling被引用数 18

ひとこと要約

大規模言語モデル（LLMs）が自動運転（LLM4AD）でどのように活用されているかを網羅的に調査し、計画・知覚・質問応答・生成アプローチ、ベンチマーク、データセット、今後の方向性を詳述する。

ABSTRACT

Autonomous driving technology, a catalyst for revolutionizing transportation and urban mobility, has the tend to transition from rule-based systems to data-driven strategies. Traditional module-based systems are constrained by cumulative errors among cascaded modules and inflexible pre-set rules. In contrast, end-to-end autonomous driving systems have the potential to avoid error accumulation due to their fully data-driven training process, although they often lack transparency due to their "black box" nature, complicating the validation and traceability of decisions. Recently, large language models (LLMs) have demonstrated abilities including understanding context, logical reasoning, and generating answers. A natural thought is to utilize these abilities to empower autonomous driving. By combining LLM with foundation vision models, it could open the door to open-world understanding, reasoning, and few-shot learning, which current autonomous driving systems are lacking. In this paper, we systematically review a research line about extit{Large Language Models for Autonomous Driving (LLM4AD)}. This study evaluates the current state of technological advancements, distinctly outlining the principal challenges and prospective directions for the field. For the convenience of researchers in academia and industry, we provide real-time updates on the latest advances in the field as well as relevant open-source resources via the designated link: https://github.com/Thinklab-SJTU/Awesome-LLM4AD.

研究の動機と目的

LLM-enabled 自動運転（LLM4AD）の現状を調査する。
中核原理・手法・実装ワークフローを分析する。
透明性・検証・一般化の主要な課題を特定する。
利用可能なデータセット・ベンチマーク・評価プロトコルを要約する。
今後の研究方向とオープンソース資源を概説する。

提案手法

既存の研究をPlanning & Control、Perception、Question Answering、Generationの4領域に分類する。
ADタスクに適用する2つの主要な適応手段として、ファインチューニングとプロンプト設計を検討する。
LLM4AD研究で使用されるデータセットとベンチマークを要約する（例：LangAuto、LingoQA、nuScenes由来データなど）。
タスク全体の評価指標とベンチマークを検討し、統一評価のギャップを浮き彫りにする。
リンク先リポジトリを通じてリアルタイムの更新とリソースを提供する： https://github.com/Thinklab-SJTU/Awesome-LLM4AD。

実験結果

リサーチクエスチョン

RQ1計画・知覚・QA・生成の各領域で、LLMsを自動運転へ適用した現状の最先端は何か？
RQ2LLM4AD手法を評価する際に用いられるデータセット・ベンチマーク・評価指標は何か？
RQ3主な方法論的アプローチ（ファインチューニング対プロンプト設計）とAD文脈でのトレードオフは何か？
RQ4LLM搭載自動運転における透明性・安全性・一般化の課題は何が残っているか？
RQ5研究者と実務者にとって最も有望な将来の方向性とオープンソース資源は何か？

主な発見

LLMsは基盤ビジョンモデルと統合され、ADにおけるオープンワールド理解と少数ショット学習を可能にしている。
研究は計画・知覚・QA・生成の4領域を軸に組織され、ファインチューニングとプロンプト設計の両方アプローチが検討されている。
多様なデータセットとベンチマーク（例：LangAuto、LingoQA、nuScenes由来データなど）は評価を支えるが、統一的な計画指標が不足している。
評価プロトコルは運転スコア、RMSEに類似した指標、言語タスクのBLEU/METEOR/CIDER、タスク固有の運転指標を網羅している。
生成ベースの手法は拡散モデル等の生成モデルを用いて運転シナリオや動画を合成し、データ拡張と安全性テストに活用している。
オープンソース資源とリアルタイム更新は、コミュニティ協力を加速するための指定GitHubリポジトリを通じて提供されている。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。