QUICK REVIEW

[论文解读] A Language Agent for Autonomous Driving

Jiageng Mao, Junjie Ye|arXiv (Cornell University)|Nov 17, 2023

Multimodal Machine Learning Applications被引用 14

一句话总结

Agent-Driver 将大型语言模型用作自动驾驶代理，结合工具库、认知记忆和推理引擎，在 nuScenes 数据集上以可解释性和少样本学习超越最先进方法。

ABSTRACT

Human-level driving is an ultimate goal of autonomous driving. Conventional approaches formulate autonomous driving as a perception-prediction-planning framework, yet their systems do not capitalize on the inherent reasoning ability and experiential knowledge of humans. In this paper, we propose a fundamental paradigm shift from current pipelines, exploiting Large Language Models (LLMs) as a cognitive agent to integrate human-like intelligence into autonomous driving systems. Our approach, termed Agent-Driver, transforms the traditional autonomous driving pipeline by introducing a versatile tool library accessible via function calls, a cognitive memory of common sense and experiential knowledge for decision-making, and a reasoning engine capable of chain-of-thought reasoning, task planning, motion planning, and self-reflection. Powered by LLMs, our Agent-Driver is endowed with intuitive common sense and robust reasoning capabilities, thus enabling a more nuanced, human-like approach to autonomous driving. We evaluate our approach on the large-scale nuScenes benchmark, and extensive experiments substantiate that our Agent-Driver significantly outperforms the state-of-the-art driving methods by a large margin. Our approach also demonstrates superior interpretability and few-shot learning ability to these methods.

研究动机与目标

通过利用人类先验和推理能力，推动从感知-预测-规划到以 LLM 驱动的代理范式在自动驾驶中的转变。
引入一种模块化架构，通过工具库、认知记忆和推理引擎，将神经模块与基于语言的接口统一起来。
证明基于 LLM 的推理可以提升大规模驾驶基准测试中的规划质量、安全性和可解释性。
展示少样本学习能力，以及对模块替换和不同 LLM 的鲁棒性。
提供消融研究以阐明各个架构组件的贡献。

提出的方法

将传统自动驾驶转化为具文本界面的 LLM 指导代理架构。
开发一个工具库，将神经模块输出（检测、预测、占据、地图）转换为文本消息，并支持动态函数调用。
结合具备常识和经验记忆的认知记忆，通过两阶段检索（基于嵕的 K-NN，然后基于 LLM 的排序）检索相关规则和以往场景。
使用推理引擎进行连锁思考推理、任务规划、运动规划（以文本生成形式）以及自我反思（碰撞检查与轨迹细化）。
对运动规划 LLM 在人类驾驶轨迹上进行微调，并对推理和规划模块使用上下文学习；将文本轨迹转换回物理轨迹以执行。

实验结果

研究问题

RQ1基于 LLM 的认知代理如何将人类先验和经验知识整合到自动驾驶决策中？
RQ2相比传统流水线，工具库和基于记忆的推理方法是否能提升安全性、规划准确性和可解释性？
RQ3Agent-Driver 能否实现强大的少样本学习性能，并在不同神经模块和 LLM 之间保持稳定？
RQ4消融研究对系统的运动规划性能和碰撞率有何影响？
RQ5两阶段记忆检索（嵌入 + LLM 排序）如何提升决策质量？

主要发现

相比于最先进方法，Agent-Driver 在 nuScenes 上显著提升运动规划性能，在 L2 误差和碰撞率方面均优于方法。
在 ST-P3 指标下，Agent-Driver 实现最低的平均 L2 误差，并且相对于第二好的方法显著降低平均碰撞数（约低35.7%）。
在 UniAD 指标下，Agent-Driver 的 L2 为 0.74 m，碰撞率为 0.21%，相比第二好的方法在（约）11.9% 的 L2 提升和 32.3% 的碰撞提升方面取得显著优势。
系统展现出强大的少样本学习能力，使用 0.1% 的训练数据也能取得有竞争力的性能，使用 1% 的数据在碰撞率方面超过全数据基线。
消融研究表明所有组件（工具库、常识记忆、经验记忆、推理、任务规划和自我反思）对性能有贡献，其中自我反思显著降低了碰撞率。
Agent-Driver 维持与不同神经模块和 LLM 的兼容性，即使在训练数据有限的情况下也表现出高输出稳定性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。