[论文解读] CRASH: Cognitive Reasoning Agent for Safety Hazards in Autonomous Driving
CRASH 是一个基于大语言模型的代理,分析2168份NHTSA AV事故报告(2021–2025),以归因主要原因与AV贡献度,达到强专家对齐、可扩展、结构化的安全分析。
As AVs grow in complexity and diversity, identifying the root causes of operational failures has become increasingly complex. The heterogeneity of system architectures across manufacturers, ranging from end-to-end to modular designs, together with variations in algorithms and integration strategies, limits the standardization of incident investigations and hinders systematic safety analysis. This work examines real-world AV incidents reported in the NHTSA database. We curate a dataset of 2,168 cases reported between 2021 and 2025, representing more than 80 million miles driven. To process this data, we introduce CRASH, Cognitive Reasoning Agent for Safety Hazards, an LLM-based agent that automates reasoning over crash reports by leveraging both standardized fields and unstructured narrative descriptions. CRASH operates on a unified representation of each incident to generate concise summaries, attribute a primary cause, and assess whether the AV materially contributed to the event. Our findings show that (1) CRASH attributes 64% of incidents to perception or planning failures, underscoring the importance of reasoning-based analysis for accurate fault attribution; and (2) approximately 50% of reported incidents involve rear-end collisions, highlighting a persistent and unresolved challenge in autonomous driving deployment. We further validate CRASH with five domain experts, achieving 86% accuracy in attributing AV system failures. Overall, CRASH demonstrates strong potential as a scalable and interpretable tool for automated crash analysis, providing actionable insights to support safety research and the continued development of autonomous driving systems.
研究动机与目标
- 自动化大规模AV事故叙述的结构化推理,超越人工评审。
- 从事故报告中归因主要原因并识别AV子系统失败。
- 评估自动驾驶车辆是否对每起事故有实质性贡献。
- 提供可解释的摘要和数据就绪的输出,以支持安全研究与政策洞察。
提出的方法
- 整理来自2021–2025年的2,168份NHTSA AV事故报告数据集(约8000万英里)。
- 设计三阶段的CRASH管线:预处理、处理(LLM推理)和后处理,输出分析与模拟就绪的结果。
- 采用受限的、基于提示的LLM方法,辅以领域特定规则和一-shot 例子,确保输出可靠且为JSON格式。
- 建立AV事故原因分类法,包含三个广泛类别:系统故障、人为因素、环境条件。
- 通过人工-在环的专家评审进行输出验证,并与两种NLP基线方法(多数类别与关键字规则)进行比较。
实验结果
研究问题
- RQ1CRASH能否从异质事故叙述中可靠地归因AV责任?
- RQ2在关键因果维度上,CRASH的表现与专家判断及基线NLP方法相比如何?
- RQ3CRASH管线是否具有足够的可扩展性和效率,能够处理大规模事故数据集?
- RQ4从大规模叙事分析中出现的系统性安全模式(如延迟、感知失败)有哪些?
主要发现
- 64%的事故归因于感知或规划失败(与系统相关的推理)。
- 约50%的报告事故涉及追尾。
- 在与专家判断对比验证时,CRASH对AV责任的准确率为86%,对晚期AI检测的准确率为84%;对主要原因归因为76%;对失败子系统归因为46%。
- 单个用例推理时间约30秒,在两块NVIDIA A4500显卡上即可实现比人工评审更快的处理速度。
- CRASH在AV失败、晚期AI、原因及系统故障等维度上均优于基线方法(多数类别和关键字规则)(表4结果)。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。