QUICK REVIEW

[論文レビュー] PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning

Yupeng Zheng, Zebin Xing|arXiv (Cornell University)|Jun 3, 2024

Natural Language Processing Techniques被引用数 5

ひとこと要約

PlanAgent は、Environment Transformation、Reasoning Engine、Reflection の3つのモジュールを備えたマルチモーダル大規模言語モデル（MLLM）を用い、クローズドループの中～中規模自動運転計画を実現することで、nuPlan Val14 で最先端の結果を達成し、Test14-hard で強い一般化を示します。

ABSTRACT

Vehicle motion planning is an essential component of autonomous driving technology. Current rule-based vehicle motion planning methods perform satisfactorily in common scenarios but struggle to generalize to long-tailed situations. Meanwhile, learning-based methods have yet to achieve superior performance over rule-based approaches in large-scale closed-loop scenarios. To address these issues, we propose PlanAgent, the first mid-to-mid planning system based on a Multi-modal Large Language Model (MLLM). MLLM is used as a cognitive agent to introduce human-like knowledge, interpretability, and common-sense reasoning into the closed-loop planning. Specifically, PlanAgent leverages the power of MLLM through three core modules. First, an Environment Transformation module constructs a Bird's Eye View (BEV) map and a lane-graph-based textual description from the environment as inputs. Second, a Reasoning Engine module introduces a hierarchical chain-of-thought from scene understanding to lateral and longitudinal motion instructions, culminating in planner code generation. Last, a Reflection module is integrated to simulate and evaluate the generated planner for reducing MLLM's uncertainty. PlanAgent is endowed with the common-sense reasoning and generalization capability of MLLM, which empowers it to effectively tackle both common and complex long-tailed scenarios. Our proposed PlanAgent is evaluated on the large-scale and challenging nuPlan benchmarks. A comprehensive set of experiments convincingly demonstrates that PlanAgent outperforms the existing state-of-the-art in the closed-loop motion planning task. Codes will be soon released.

研究の動機と目的

Rule-based な計画と learning-based な計画のギャップを、閉ループ自動運転における common-sense reasoning のための MLLM を活用して埋める。
複雑なシーンを BEV マップと車線グラフのテキスト記述へ変換する効率的な Environment Transformation を導入する。
シナリオに適した IDM ベースのプランナーコードを生成する階層的 CoT を用いた Reasoning Engine を取り入れる。
MLLM の不確実性を抑え、安全性を高めるために、プランナーをシミュレーションしてスコアする Reflection モジュールを追加する。

提案手法

Environment Transformation モジュールは、マルチモーダルプロンプトとして、BEV マップ（グローバル）と車線グラフのテキスト記述（ローカル）を作成する。
Reasoning Engine は、事前定義されたシステムプロンプトと階層的 CoT を用いて、インコンテクスト学習によって IDM ベースのプランナーコードを生成する。
プランナーコードは IDM を、パラメータ c、la、v0、acc、dec で呼び出すように生成され、Reflection モジュールによって実行される。
Reflection は生成されたプランナーをシミュレートし、安全性と効率性をスコア付けし、スコアが lambda（0.75）未満の場合 max_exec = 3 まで再思考をトリガーすることがある。
PlanAgent は、車線グラフのテキスト表現を用いたトークン効率の高いシーン記述を実現し、競合する LLM ベースの方法と比較してプロンプトトークンを削減する。

実験結果

リサーチクエスチョン

RQ1MLLM ベースのエージェントは、共通および長尾の運転シナリオの両方で頑健なクローズドループ運動計画を達成できるか。
RQ2Environment Transformation と階層的推論は、IDM ベースの計画のための信頼できるプランナーコード生成を可能にするか。
RQ3Reflection ベースの安全性チェックは、MLLM の不確実性が存在する状況で計画の信頼性を向上させるか。
RQ4PlanAgent は nuPlan Val14 および Test14-hard のベンチマークで、最先端のルールベース、学習ベース、および LLM ベースのプランナーとどのように比較されるか。

主な発見

Planners	Val14 NR-CLS	Val14 R-CLS	Test14-hard NR-CLS	Test14-hard R-CLS
Expert (Log-replay)	94.03	75.86	85.96	68.80
Rule-based IDM [16]	70.39	72.42	56.16	62.26
PDM-Closed [2]	92.51	91.79	65.07	75.18
Learning-based RasterModel [1]	69.66	67.54	49.47	52.16
UrbanDriver [44]	63.27	61.02	51.54	49.07
GC-PGP [45]	55.99	51.39	43.22	39.63
PDM-Open [2]	52.80	57.23	33.51	35.83
GameFormer [46]	80.80	79.31	66.59	68.83
PlanTF [3]	84.83	76.78	72.68	61.70
DTPP [4]	89.64	89.78	59.44	62.94
LLM-ASSIST UNC * [33]	90.11	90.32	-	-
LLM-ASSIST PAR * [33]	93.05	92.20	-	-
PlanAgent (Ours)	93.26	92.75	72.51	76.82

PlanAgent は nuPlan Val14 で最先端の NR-CLS および R-CLS を実現（NR-CLS 93.26、R-CLS 92.75）。
PlanAgent は長尾シナリオへ一般化し、Test14-hard で NR-CLS 72.51 および R-CLS 76.82 を達成。
トークン使用量では、PlanAgent はシーン記述の平均 141.32 トークンを使用し、GPT-Driver（448.66）および LLM-ASSIST（425.81）を上回る。
アブレーション研究により、BEV マップと車線グラフを追加すると NR-CLS が 1.5–2.0 ポイント、R-CLS が 1.2–1.6 ポイント改善。
Reflection モジュールを削除すると NR-CLS が大幅に低下する（例: 最大 2.67 ポイント低下）、シミュレーションとスコアリング段階の安全性効果を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。